---
title: Strings in Pandas
tags: [jupyter]
keywords: pandas
summary: "Manipulating strings in Pandas."
mlType: dataFrame
infoType: pandas
sidebar: pandas_sidebar
permalink: __AutoGenThis__
notebookfilename:  __AutoGenThis__
---

This is an overview of various [string](https://jakevdp.github.io/PythonDataScienceHandbook/03.10-working-with-strings.html) manipulations you can do in pandas.  It is from the [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/03.10-working-with-strings.html).

In [1]:
import sys

sys.path.append("../")

In [2]:
import pandas as pd
from pprint import pprint

# Padas Options

In [3]:
pd.set_option('max_rows', 5)

# Vectorizing Strings

## Common String Methods

DF.**str.**{_any_method_below}

![](https://drive.google.com/uc?id=1qh4qWXfxG85uCYfxndKIpUVeBBbkkr67)

![](https://drive.google.com/uc?id=1atMNiO_MtrnzoX5bJ-7QlN02tGbukzQP)

In [4]:
data = ['peter', 'Paul', None, 'MARY', 'gUIDO']

In [5]:
data

['peter', 'Paul', None, 'MARY', 'gUIDO']

In [6]:
names = pd.Series(data)
names

0    peter
1     Paul
2     None
3     MARY
4    gUIDO
dtype: object

In python for instance, if we want to capitalize the names we would go through each of the names and perform a ```.capitalize()``` method on them.  This is efficient and intuitive but we can vectorize this by using some pandas methods.

In [7]:
for s in data:
    print(s.capitalize())    

Peter
Paul


AttributeError: 'NoneType' object has no attribute 'capitalize'

Notice that you have to create a catch and try block before applying the capitalize function.  There is an error because there is a **NoneType**

**Recall** that for any pandas series you can access any of the string methods by ```.str``` method and then picking which method you want to apply to the series.

In [8]:
names.str.capitalize()

0    Peter
1     Paul
2     None
3     Mary
4    Guido
dtype: object

In [9]:
Names = pd.Series(['Graham Chapman', 'John Cleese', 'Terry Gilliam',
                   'Eric Idle', 'Terry Jones', 'Michael Palin'])

In [10]:
Names.head()

0    Graham Chapman
1       John Cleese
2     Terry Gilliam
3         Eric Idle
4       Terry Jones
dtype: object

In [11]:
Names.str.contains('Jo')

0    False
1     True
     ...  
4     True
5    False
Length: 6, dtype: bool

You can basically do anything with this and coding can easily be readable.

## Regular Expression Methods

## Cheat Sheets for Regex

![](https://i.pinimg.com/originals/8e/31/b3/8e31b3e0d907cd3a101f63a2a4330e21.png)

![](https://image.slidesharecdn.com/regex-cheatsheet-110519201612-phpapp01/95/regex-cheatsheet-1-728.jpg?cb=1305836207)

You can also use **regex** on each of the string using any of the methods below.

![](https://drive.google.com/uc?id=1QBcvpxeNYK22-KkcbjpIgZ_8VdzTp8Gr)

In [13]:
Names.str.extract(r'([A-za-z]+er[A-za-z]+)',expand=False)

0      NaN
1      NaN
     ...  
4    Terry
5      NaN
Length: 6, dtype: object

Lets say we want to identify 

- all names that start and end with consonant

We can use the ```^``` and ```$``` symbols to look for start and end.

The regular expression would be

```
r'^[^AEIOU].*[^aeiou]$'
```

So this regex just described the following reading from left to right we get:
 
- **r''** indicating the start of the regular expression string
- **[]** list or range of characters
- **^AEIOU** not capital volwels
- **.*** any character in between
- **^aeiou** not lower case vowel
- **$** end of string

In [14]:
Names.str.extract(r'(^[^AEIOU].*[^aeiou]$)',expand=False)

0    Graham Chapman
1               NaN
          ...      
4       Terry Jones
5     Michael Palin
Length: 6, dtype: object