*This notebook contains an excerpt from the [Whirlwind Tour of Python](http://www.oreilly.com/programming/free/a-whirlwind-tour-of-python.csp) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/WhirlwindTourOfPython).*

# String Manipulation

Strings in Python can be defined using either single or double quotations (they are functionally equivalent):

In [1]:
x = 'a string'
y = "a string"
x == y

True

In [6]:
"It's a good day"

SyntaxError: EOL while scanning string literal (<ipython-input-6-b23198b62f1a>, line 1)

In [5]:
'He said "good bye"'

'He said "good bye"'

In addition, it is possible to define multi-line strings using a triple-quote syntax:

In [7]:
multiline = """
one
two
three
"""

In [8]:
print(multiline)


one
two
three



### Formatting strings: Adjusting case

Python makes it quite easy to adjust the case of a string.
Here we'll look at the ``upper()``, ``lower()``, ``capitalize()``, ``title()``, and ``swapcase()`` methods, using the following messy string as an example:

In [9]:
fox = "tHe qUICk bROWn fOx."

In [19]:
fox.upper()

'THE QUICK BROWN FOX.'

In [11]:
fox.lower()

'the quick brown fox.'

In [12]:
fox.title()

'The Quick Brown Fox.'

In [7]:
fox.capitalize()

'The quick brown fox.'

In [13]:
fox.swapcase()

'ThE QuicK BrowN FoX.'

### Formatting strings: Adding and removing spaces

Another common need is to remove spaces (or other characters) from the beginning or end of the string.
The basic method of removing characters is the ``strip()`` method, which strips whitespace from the beginning and end of the line:

In [20]:
line = '         this is the content         '
line.strip()

'this is the content'

In [21]:
num = "000000000000435"
num.strip('0')

'435'

In [25]:
num = "0000000000004350"
num.lstrip('0')

'4350'

### Finding and replacing substrings

``find()`` and ``index()`` are very similar, in that they search for the first occurrence of a character or substring within a string, and return the index of the substring:

In [27]:
line = 'the quick brown fox jumped over a lazy dog'
line.find('fox')

16

In [28]:
line.index('fox')

16

The only difference between ``find()`` and ``index()`` is their behavior when the search string is not found; ``find()`` returns ``-1``, while ``index()`` raises a ``ValueError``:

In [20]:
line.find('bear')

-1

In [29]:
line.index('bear')

ValueError: substring not found

For the special case of checking for a substring at the beginning or end of a string, Python provides the ``startswith()`` and ``endswith()`` methods:

In [30]:
line.endswith('dog')

True

In [31]:
line.startswith('fox')

False

To go one step further and replace a given substring with a new string, you can use the ``replace()`` method.
Here, let's replace ``'brown'`` with ``'red'``:

In [33]:
line.replace('brown', 'red')

'the quick red fox jumped over a lazy dog'

In [34]:
line.replace('o', '--')

'the quick br--wn f--x jumped --ver a lazy d--g'

### Splitting and partitioning strings



The ``split()`` method finds *all* instances of the split-point and returns the substrings in between.
The default is to split on any whitespace, returning a list of the individual words in a string:

In [35]:
line

'the quick brown fox jumped over a lazy dog'

In [38]:
line.split()

['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'a', 'lazy', 'dog']

In [42]:
line.split('o')

['the quick br', 'wn f', 'x jumped ', 'ver a lazy d', 'g']

In [43]:
'2020-04-09'.split('-')

['2020', '04', '09']

A related method is ``splitlines()``, which splits on newline characters.
Let's do this with a Haiku, popularly attributed to the 17th-century poet Matsuo Bashō:

In [36]:
haiku = """matsushima-ya
aah matsushima-ya
matsushima-ya"""

haiku.splitlines()

['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']

Note that if you would like to undo a ``split()``, you can use the ``join()`` method, which returns a string built from a splitpoint and an iterable:

In [30]:
'--'.join(['1', '2', '3'])

'1--2--3'

A common pattern is to use the special character ``"\n"`` (newline) to join together lines that have been previously split, and recover the input:

In [41]:
print("\n".join(['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']))

matsushima-ya
aah matsushima-ya
matsushima-ya


## Format Strings

In the preceding methods, we have learned how to extract values from strings, and to manipulate strings themselves into desired formats.
Another use of string methods is to manipulate string *representations* of values of other types.
Of course, string representations can always be found using the ``str()`` function; for example:

In [44]:
pi = 3.14159
str(pi)

'3.14159'

In [45]:
"The value of pi is " + str(pi)

'The value of pi is 3.14159'

A more flexible way to do this is to use *format strings*, which are strings with special markers (noted by curly braces) into which string-formatted values will be inserted.
Here is a basic example:

In [4]:
"The value of pi is {0}".format(pi)

'The value of pi is 3.14159'

In [46]:
f"The value of pi is {pi}"

'The value of pi is 3.14159'

In [47]:
"pi = {0:.3f}".format(pi)

'pi = 3.142'

In [48]:
f"pi = {pi:.3f}"

'pi = 3.142'