### Vectorized String Operations:
    One strength of Python is its relative ease in handling and manipulating string data.
    Pandas builds on this and provides a comprehensive set of vectorized string operations that become an essential piece of the type of munging required when one is working with (read: cleaning up) real-world data.
    In this section, we’ll walk through some of the Pandas string operations, and then take a look at using them to partially clean up a very messy dataset of recipes collected from the Internet.

### Introducing Pandas String Operations
We saw in previous sections how tools like NumPy and Pandas generalize arithmetic operations so that we can easily and quickly perform the same operation on many array elements. For example:

In [1]:
import numpy as np
import pandas as pd

In [2]:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
x * 2

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [3]:
data = ['peter', 'Paul', 'MARY', 'gUIDO']
[s.capitalize() for s in data]

['Peter', 'Paul', 'Mary', 'Guido']

This is perhaps sufficient to work with some data, but it will break if there are any
missing values. For example:

In [6]:
data = ['peter', 'Paul', None, 'MARY', 'gUIDO']
[s.capitalize() for s in data]

AttributeError: 'NoneType' object has no attribute 'capitalize'

Pandas includes features to address both this need for vectorized string operations
and for correctly handling missing data via the str attribute of Pandas Series and
Index objects containing strings.

So, for example, suppose we create a Pandas Series with this data:

In [7]:
# Creating a pandas Series
series = pd.Series(data)
series

0    peter
1     Paul
2     None
3     MARY
4    gUIDO
dtype: object

We can now call a single method that will capitalize all the entries, while skipping over any missing values:

In [8]:
series.str.capitalize()

0    Peter
1     Paul
2     None
3     Mary
4    Guido
dtype: object

### Tables of Pandas String Methods
If you have a good understanding of string manipulation in Python, most of Pandas’
string syntax is intuitive enough that it’s probably sufficient to just list a table of avail‐
able methods; we will start with that here, before diving deeper into a few of the sub‐
tleties.
The examples in this section use the following series of names:

In [9]:
monte = pd.Series(['Graham Chapman', 'John Cleese', 'Terry Gilliam', 'Eric Idle', 'Terry Jone', 'Michael Palin'])

In [11]:
monte.str.lower()

0    graham chapman
1       john cleese
2     terry gilliam
3         eric idle
4        terry jone
5     michael palin
dtype: object

In [12]:
monte.str.len()

0    14
1    11
2    13
3     9
4    10
5    13
dtype: int64

In [15]:
monte.str.startswith('T')

0    False
1    False
2     True
3    False
4     True
5    False
dtype: bool

In [20]:
monte.str.split()

0    [Graham, Chapman]
1       [John, Cleese]
2     [Terry, Gilliam]
3         [Eric, Idle]
4        [Terry, Jone]
5     [Michael, Palin]
dtype: object

### Methods using regular expressions:
In addition, there are several methods that accept regular expressions to examine the
content of each string element, and follow some of the API conventions of Python’s
built-in re module

For example, we can extract the first name from each by asking for a contiguous group of characters at the beginning of each element

In [21]:
monte.str.extract('([A-Za-z]+)')

Unnamed: 0,0
0,Graham
1,John
2,Terry
3,Eric
4,Terry
5,Michael


Or we can do something more complicated, like finding all names that start and end with a consonant, making use of the start-of-string ( ^ ) and end-of-string ( $ ) regular expression characters

In [23]:
monte.str.findall(r'^([^AEIOU].*[^aeiou]$)')

0    [Graham Chapman]
1                  []
2     [Terry Gilliam]
3                  []
4                  []
5     [Michael Palin]
dtype: object