# String Manipulation

One place where the Python language really shines is in the manipulation of strings. This section will cover some of Python's built-in string methods and formatting operations, before moving on to a quick guide to the extremely useful subject of regular expressions. Such string manipulation patterns come up often in the context of data science work, and is one big perk of Python in this context.

### Basics

In [41]:
hello = "Hello, World!"
print(hello[0])
print(hello[-1])
print(hello[2:10])
print(hello[::-1])

H
!
llo, Wor
!dlroW ,olleH


### Adjusting case

In [9]:
fox = "tHe qUICk bROWn fOx."

print(fox.lower())
print(fox.upper())
print(fox.title())
print(fox.swapcase())
print(fox.capitalize())


the quick brown fox.
THE QUICK BROWN FOX.
The Quick Brown Fox.
ThE QuicK BrowN FoX.
The quick brown fox.


### Adding removing spaces

In [13]:
line = '         this is the content         '

print(line.strip())
print(line.rstrip())
print(line.lstrip())



this is the content
         this is the content
this is the content         R


In [14]:

num = "000000000000435"
num.strip('0')


'435'

### Splitting and partitioning strings

In [42]:
line.partition('fox')

('the quick brown ', 'fox', ' jumped over a lazy dog')

In [43]:
line.split()

['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'a', 'lazy', 'dog']

### Finding and replacing substrings

In [15]:
line = 'the quick brown fox jumped over a lazy dog'
line.find('fox')

16

In [17]:
line.index('fox')

16

In [18]:
line.endswith('dog')

True

In [19]:
line.startswith('fox')

False

In [20]:
line.replace('brown', 'red')

'the quick red fox jumped over a lazy dog'

In [21]:
line.replace('o', '--')

'the quick br--wn f--x jumped --ver a lazy d--g'

In [22]:
haiku = """matsushima-ya
aah matsushima-ya
matsushima-ya"""

haiku.splitlines()

['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']

In [23]:
'--'.join(['1', '2', '3'])

'1--2--3'

In [24]:
print("\n".join(['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']))

matsushima-ya
aah matsushima-ya
matsushima-ya


### Format Strings

In [25]:
pi = 3.14159
str(pi)

'3.14159'

In [26]:
"The value of pi is " + str(pi)

'The value of pi is 3.14159'

In [30]:
"The value of pi is {}".format(pi)

'The value of pi is 3.14159'

In [None]:
f"The value of pi is {pi}"

In [28]:
"""First letter: {0}. Last letter: {1}.""".format('A', 'Z')

'First letter: A. Last letter: Z.'

In [31]:
"""First letter: {first}. Last letter: {last}.""".format(last='Z', first='A')

'First letter: A. Last letter: Z.'

In [33]:
"pi = {0:.2f}".format(pi)

'pi = 3.14'

## Regular Expressions

Fundamentally, regular expressions are a means of flexible pattern matching in strings. If you frequently use the command-line, you are probably familiar with this type of flexible matching with the "*" character, which acts as a wildcard. 

For example, we can list all the IPython notebooks (i.e., files with extension .ipynb) with "Python" in their filename by using the "*" wildcard to match any characters in between:

In [45]:
!ls *.ipynb

Strings.ipynb


In [48]:
import re
regex = re.compile('\s+')
regex.split(line)


['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'a', 'lazy', 'dog']

In [49]:
for s in ["     ", "abc  ", "  abc"]:
    if regex.match(s):
        print(repr(s), "matches")
    else:
        print(repr(s), "does not match")

'     ' matches
'abc  ' does not match
'  abc' matches


In [50]:
line = 'the quick brown fox jumped over a lazy dog'

regex = re.compile('fox')
match = regex.search(line)
match.start()


16

In [51]:
regex.sub('BEAR', line)

'the quick brown BEAR jumped over a lazy dog'

### A more sophisticated example

In [52]:
email = re.compile('\w+@\w+\.[a-z]{3}')

In [53]:
text = "To email Guido, try guido@python.org or the older address guido@google.com."
email.findall(text)

['guido@python.org', 'guido@google.com']

In [54]:
email.sub('--@--.--', text)

'To email Guido, try --@--.-- or the older address --@--.--.'

In [55]:
email.findall('barack.obama@whitehouse.gov')

['obama@whitehouse.gov']

This goes to show how unforgiving regular expressions can be if you're not careful! If you search around online, you can find some suggestions for regular expressions that will match all valid emails, but beware: they are much more involved than the simple expression used here!

Finally, [Python's official regular expression HOWTO](https://docs.python.org/3/howto/regex.html) resource, can help you understand it in depth.

Python's [`re` module documentation](https://docs.python.org/3/library/re.html) 