# **Data Analysis with Python - 10 (11 May 22)**

## **Pre-Class**

### **Text Methods**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
df = sns.load_dataset('titanic')
df.head(3)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True


#### **`lower()`**

Converts strings to lower case

In [4]:
df.embark_town = df.embark_town.str.lower()
df.embark_town.head(3)

0    southampton
1      cherbourg
2    southampton
Name: embark_town, dtype: object

#### **`upper()`**

Converts strings to upper case.

In [5]:
df.sex = df.sex.str.upper()
df.sex.head(3)

0      MALE
1    FEMALE
2    FEMALE
Name: sex, dtype: object

#### **`islower()`**

Checks whether all characters in each string in lower case or not. Returns Boolean

In [6]:
df.sex.head(2).str.islower()

0    False
1    False
Name: sex, dtype: bool

#### **`isupper()`**

Checks whether all characters in each string in upper case or not. Returns Boolean

In [7]:
df.sex.head(2).str.isupper()

0    True
1    True
Name: sex, dtype: bool

#### **`isdigit()`**

Check whether all characters in each string are digits.

In [8]:
df.pclass.astype('string').str.isdigit().head(2)

0    True
1    True
Name: pclass, dtype: boolean

#### **`replace()`**

Replaces the value a with the value b

In [9]:
df['age'] = df['age'].replace(np.nan, 'UNKNOWN')
df['age']

0         22.0
1         38.0
2         26.0
3         35.0
4         35.0
        ...   
886       27.0
887       19.0
888    UNKNOWN
889       26.0
890       32.0
Name: age, Length: 891, dtype: object

#### **`contains()`**

Returns a Boolean value True for each element if the substring contains in the element, else False.

In [11]:
df.sibsp.astype('string').str.contains('1')

0       True
1       True
2      False
3       True
4      False
       ...  
886    False
887    False
888     True
889    False
890    False
Name: sibsp, Length: 891, dtype: boolean

In [10]:
df[df.sibsp.astype('string').str.contains('1')].head(2)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,MALE,22.0,1,0,7.25,S,Third,man,True,,southampton,no,False
1,1,1,FEMALE,38.0,1,0,71.2833,C,First,woman,False,C,cherbourg,yes,False


#### **`split()`**

Splits each string with the given pattern

In [12]:
df.sex.str.split('MALE').head(2)

0      [, ]
1    [FE, ]
Name: sex, dtype: object

#### **`strip()`**

Helps strip whitespace(including newline) from each string

In [13]:
df.sex.str.strip('LE').head(2)

0      MA
1    FEMA
Name: sex, dtype: object

#### **`findall()`**

Returns the first position of the first occurrence of the pattern

In [14]:
df.embark_town.str.findall('southampton').head(2)

0    [southampton]
1               []
Name: embark_town, dtype: object

### **Time Methods**

In [15]:
df1 = sns.load_dataset('flights')
df1.head(3)

Unnamed: 0,year,month,passengers
0,1949,Jan,112
1,1949,Feb,118
2,1949,Mar,132


`to_datetime()` method parses many different kinds of date representations returning a `Timestamp` object.

In [16]:
df1['year'] = pd.to_datetime(df1['year'], format='%Y')
df1.year.head(3)

0   1949-01-01
1   1949-01-01
2   1949-01-01
Name: year, dtype: datetime64[ns]

`strftime()` - convert object to a string according to a given format

In [17]:
from datetime import datetime
current_date = datetime.now()
current_date

datetime.datetime(2022, 5, 11, 19, 50, 13, 996613)

In [18]:
date = current_date.strftime(('%d'+' '+'%b'+' '+'%Y'))
date

'11 May 2022'

`strptime()` - parse a string into a `datetime` object given a corresponding format

In [19]:
datetime.strptime(date, '%d %b %Y')

datetime.datetime(2022, 5, 11, 0, 0)

`timedelta()` -  gives time difference

In [20]:
from datetime import timedelta
two_days_before = current_date.now() - timedelta(days=2)
two_days_before

datetime.datetime(2022, 5, 9, 19, 54, 28, 835336)

## **In-Class (11 May 22)**