In [1]:
import numpy as np
import pandas as pd

## String Methods

In [None]:
s = "hello wolrd"
s.replace(" ","")

- str.lower() - convert characters into lowercase.
- str.upper() - convert characters into uppercase.
- str.find() -  Used to search the substring in each string present in a series.
- str.rfind() - Used to search a substring in each string in a series from a right side.
- str.findall() - used to find substrings or separators in each string in a series.
- str.replace() - replaces a substring within a string with another value that the user provides.
- str.extract() - Extract groups from the first match of regular expression pattern.
- str.lstrip() - removes whitespace from the left side (beginning) of a string.
- str.rstrip() - removes whitespace from the right side (end) of a string.
- str.strip() - remove leading and trailing whitespace from string.
- str.split() - splits a string value, based on an occurrence of a user-specified value.
- str.join() - used to join all elements in list present in a series with passed delimiter.

In [2]:
# Define a dictionary containing employee data 
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

In [3]:
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data) 

In [4]:
df

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Delhi,Msc
1,Princi,24,Kanpur,MA
2,Gaurav,22,Allahabad,MCA
3,Anuj,32,Kannauj,Phd


In [7]:
df['Name']

'0       Jai\n1    Princi\n2    Gaurav\n3      Anuj\nName: Name, dtype: object'

In [8]:
'Python'.lower()

'python'

In [13]:
l = []
for i in range(len(df['Name'])):
    l.append(df['Name'].values[i].lower())

In [14]:
l

['jai', 'princi', 'gaurav', 'anuj']

In [16]:
df['Name'] = df['Name'].str.lower()

In [17]:
df.head(2)

Unnamed: 0,Name,Age,Address,Qualification
0,jai,27,Delhi,Msc
1,princi,24,Kanpur,MA


In [18]:
df['Name'].str.upper()

0       JAI
1    PRINCI
2    GAURAV
3      ANUJ
Name: Name, dtype: object

In [21]:
df['Name'].apply(lambda x : x.lower())

0       jai
1    princi
2    gaurav
3      anuj
Name: Name, dtype: object

In [22]:
df['Name'].apply(len)

0    3
1    6
2    6
3    4
Name: Name, dtype: int64

## Spliting and Replacing Data

- We use `str.split()` this function returns a list of strings after breaking the given string by the specified separator but it can only be applied to an individual string.
- We use `str.replace()` this function works like Python .replace() method only, but it works on Series too.

In [23]:
# Define a dictionary containing employee data 
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Knnuaj'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 

In [24]:
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data) 
df

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Nagpur,Msc
1,Princi,24,Kanpur,MA
2,Gaurav,22,Allahabad,MCA
3,Anuj,32,Knnuaj,Phd


In [25]:
df['Adress1'] = df['Address'].str.split("a")

In [26]:
df

Unnamed: 0,Name,Age,Address,Qualification,Adress1
0,Jai,27,Nagpur,Msc,"[N, gpur]"
1,Princi,24,Kanpur,MA,"[K, npur]"
2,Gaurav,22,Allahabad,MCA,"[All, h, b, d]"
3,Anuj,32,Knnuaj,Phd,"[Knnu, j]"


In [30]:
df['Adress1'] = df['Address'].str.split("a",n=1,expand= True)

In [29]:
df

Unnamed: 0,Name,Age,Address,Qualification,Adress1
0,Jai,27,Nagpur,Msc,N
1,Princi,24,Kanpur,MA,K
2,Gaurav,22,Allahabad,MCA,All
3,Anuj,32,Knnuaj,Phd,Knnu


In [27]:
s = 'Python'

In [None]:
s.split()

In [None]:
df

As shown in the output image, the Address column was separated at the first occurrence of “a” and not on the later occurrence since the n parameter was set to 1 (Max 1 separation in a string).

## Removing Whitespaces

- str.lstrip() is used to remove spaces from the left side of string.
- str.rstrip() to remove spaces from right side of the string.
- str.strip() removes spaces from both sides. 

In [31]:
# Define a dictionary containing employee data 
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Nagpur junction', 'Kanpur junction', 
                   'Nagpur junction', 'Kannuaj junction'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

In [32]:
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Nagpur junction,Msc
1,Princi,24,Kanpur junction,MA
2,Gaurav,22,Nagpur junction,MCA
3,Anuj,32,Kannuaj junction,Phd


In [33]:
# replacing address name and adding spaces in start and end 
new = df["Address"].replace("Nagpur junction", "  Nagpur junction  ").copy()

In [34]:
new

0      Nagpur junction  
1        Kanpur junction
2      Nagpur junction  
3       Kannuaj junction
Name: Address, dtype: object

In [35]:
new.str.strip()

0     Nagpur junction
1     Kanpur junction
2     Nagpur junction
3    Kannuaj junction
Name: Address, dtype: object

### Handling DataFrame Columns

The string methods on Index are especially useful for cleaning up or transforming DataFrame columns. 

`For instance, you may have columns with leading or trailing whitespace:`

In [36]:
df = pd.DataFrame(np.random.randn(3, 2),columns=[' Column A ', ' Column B '])

In [37]:
df

Unnamed: 0,Column A,Column B
0,0.892927,0.826933
1,0.489901,0.309814
2,-0.179948,0.557014


In [38]:
df.query(' Column A > 0')

SyntaxError: invalid syntax (<unknown>, line 1)

Since df.columns is an Index object, we can use the .str accessor

In [39]:
df.columns

Index([' Column A ', ' Column B '], dtype='object')

In [50]:
df.columns  =df.columns.str.strip().str.replace(" ","")

In [51]:
df

Unnamed: 0,ColumnA,ColumnB
0,25369.0,0.826933
1,0.489901,0.309814
2,-0.179948,0.557014


In [54]:
df.query('ColumnA > 200')

Unnamed: 0,ColumnA,ColumnB
0,25369.0,0.826933


These string methods can then be used to clean up the columns as needed. Here we are removing leading and trailing whitespaces, lower casing all names, and replacing any remaining whitespaces with underscores:

In [55]:
df = pd.DataFrame(np.random.randn(3, 2),columns=[' Column A ', ' Column B '])

In [57]:
df.columns = df.columns.str.strip().str.replace(" ","_")

In [58]:
df

Unnamed: 0,Column_A,Column_B
0,-1.524901,-0.837779
1,-0.957134,0.010424
2,1.554338,2.149804


In [62]:
pd.to_datetime('1-12-2019',dayfirst=True) - pd.to_datetime("30-11-2019",dayfirst=True)

Timedelta('1 days 00:00:00')