# Working with strings

Here we discuss some common string methods (functions associated with string objects) for manipulation of strings.

For a full list, run the `dir(str)` command.

In [1]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

For a short documentation of a method, use the `help()` function

In [36]:
help(str.isalpha)

Help on method_descriptor:

isalpha(self, /) unbound builtins.str method
    Return True if the string is an alphabetic string, False otherwise.

    A string is alphabetic if all characters in the string are alphabetic and there
    is at least one character in the string.



# Common methods

## count

Returns the number of occurrences of a substring in a string.

In [16]:
"ABCDABCAAABC".count("A")

5

In [17]:
"ABCDABCAAABC".count("ABC")

3

Counts only _nonoverlapping_ substrings.

In [18]:
"AAAAA".count("AA")

2

### find, rfind

Returns the location where a substring occurs first, and -1 if the substring is not found.

In [19]:
s = "ABRACADABRA"
s.find("RA")

2

In [20]:
s.find("XYZ")

-1

In order to find later occurrences, use the previous result as the _start_ argument of `find`

In [23]:
loc = s.find("RA") # first occurrence
loc = s.find("RA",loc+1) # second occurrence
loc

9

To find the _last_ occurrence (i.e., start searching at the end of the string), use the `rfind` method.

In [24]:
s.rfind("RA")

9

### join

Joins together the string elements in a list or tuple.

In [29]:
"".join(["A","67","B","0.15"])    # concatenate elements.

'A67B0.15'

In [30]:
" ".join(["A","67","B","0.15"])   # join with a space between elements

'A 67 B 0.15'

In [31]:
"-*-".join(["A","67","B","0.15"]) # join with -*- between them

'A-*-67-*-B-*-0.15'

This operation is useful if you want to process a string character-by-character: First convert to a list, process over the list, and join the list elements.

In [34]:
x = 12345
L = list(str(x))
L.reverse()
L

['5', '4', '3', '2', '1']

In [35]:
"".join(L)

54321

### strip, lstrip, rstrip

These methods remove the blank characters on either side of the string.

In [36]:
s = "\n\t   wide open spaces   \t\n"
print(s)


	   wide open spaces   	



In [37]:
s.lstrip()

'wide open spaces   \t\n'

In [38]:
s.rstrip()

'\n\t   wide open spaces'

In [39]:
s.strip()

'wide open spaces'

### replace

Replaces occurrences of a substring in a given string.

In [40]:
s = "How much wood would a woodchuck chuck if a woodchuck could chuck wood?"

In [41]:
s.replace("wood", "stone")

'How much stone would a stonechuck chuck if a stonechuck could chuck stone?'

The `count` parameter specifies how many replacements are to be made, starting from the left.

In [42]:
s.replace("wood", "stone", 2)

'How much stone would a stonechuck chuck if a woodchuck could chuck wood?'

### split, rsplit, splitlines

Breaks the string at blank spaces and returns a list of substrings. Disregards extra blank characters.

In [43]:
"  cat dog   bird   \n".split()

['cat', 'dog', 'bird']

Another separator can be specified with the `sep` parameter.

In [44]:
"1,2,3,4,5,6".split(sep=",")

['1', '2', '3', '4', '5', '6']

In [45]:
"a<>b<>c<>d<>e".split(sep="<>")

['a', 'b', 'c', 'd', 'e']

One can limit the splits with the `maxsplit` parameter. In that case, remainders are kept in a single string.

In [46]:
"1 2 3 4 5 6".split(maxsplit=3)

['1', '2', '3', '4 5 6']

To split the string starting from the right, use the `rsplit` method.

In [47]:
"1,2,3,4,5,6".rsplit(",",3)

['1,2,3', '4', '5', '6']

If a string contains line breaks, it can be split into separate lines with the `splitlines` method.

In [48]:
s = "abc def\nxyz jkl mno\r\nqwe rty\r"
s.splitlines()

['abc def', 'xyz jkl mno', 'qwe rty']

This method keeps track of different line breaks in Unix and Windows, so for portability it is preferable to using `split("\n")`.

### Boolean methods

In [37]:
[n for n in dir(str) if "is" in n]

['isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper']

In [39]:
help(str.isdigit)

Help on method_descriptor:

isdigit(self, /) unbound builtins.str method
    Return True if the string is a digit string, False otherwise.

    A string is a digit string if all characters in the string are digits and there
    is at least one character in the string.



In [40]:
s1 = "12321"
s2 = "123abc"
s1.isdigit(), s2.isdigit()

(True, False)

In [43]:
s = "hello --pretty-- world!"
new_s = ""
for c in s:
    if c.isalpha() or c.isspace():
        new_s += c
new_s

'hello pretty world'

In [44]:
help(str.isidentifier)

Help on method_descriptor:

isidentifier(self, /) unbound builtins.str method
    Return True if the string is a valid Python identifier, False otherwise.

    Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
    such as "def" or "class".



In [46]:
"1xyz".isidentifier(), "xyz1".isidentifier()

(False, True)

In [47]:
import keyword
keyword.iskeyword("while")

True

# String formatting

Allows creating strings by substituting values into strings.

The f-string format uses *replacement fields* inside a string, delimited with curly braces.

In [3]:
name = "Kaan"
f"Hello {name}!"

'Hello Kaan!'

The f-string format allows computations inside replacement fields.

In [4]:
a = 5
b = 10
f"{a} plus {b} is {a+b}"

'5 plus 10 is 15'

You can manage the displaying of output with a format specifier:

In [18]:
from math import pi
f"Pi to three decimal places is {pi:.3f}"

'Pi to three decimal places is 3.142'

In [13]:
n = 4
f"The number n is {n:03d}" # padding with zeros

'The number n is 004'

Aligning the text and specifying a width:

In [23]:
text = "Hello World"
f"|{text:<30}|" # left aligned in a space of 30

'|Hello World                   |'

In [24]:
f"|{text:>30}|" # right aligned in a space of 30

'|                   Hello World|'

In [25]:
f"|{text:^30}|" # centered in a space of 30

'|         Hello World          |'

In [26]:
f"|{text:*^30}|" # use * as a fill character

'|*********Hello World**********|'

The `str.format()` method can be used to substitute values in a template string.

In [27]:
text = "Name: {0}, Phone: {1}, email: {2}" # positional matching
text.format("Kaan", "216-333 33 33", "kaan@mail.com")

'Name: Kaan, Phone: 216-333 33 33, email: kaan@mail.com'

In [28]:
text = "Name: {name}, Phone: {phone}, email: {email}"  # keyword matching
text.format(name="Kaan", phone="216-333 33 33", email="kaan@mail.com")

'Name: Kaan, Phone: 216-333 33 33, email: kaan@mail.com'

The data could be stored as a dictionary. Then, parameter unpacking provides a convenient way:

In [30]:
datadic = {"name":"Kaan", "phone":"216-333 33 33", "email":"kaan@mail.com"}
text.format(**datadic)

'Name: Kaan, Phone: 216-333 33 33, email: kaan@mail.com'