## What is Pandas ?
  
Pandas is a software library written for the python programming language
for data **manipulation and analysis.**

* Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas.
* Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.
* The primary two components of pandas are the **Series and DataFrame.**

## Importing  required libraries

In [37]:
import numpy as np
print('numpy version : ', np.__version__)

import pandas as pd
print('pandas version : ', pd.__version__)

import warnings
warnings.filterwarnings('ignore')

numpy version :  1.24.4
pandas version :  1.5.3


## 1. Working with text data

* 1.1 **dataframe['column_name'].str.lower():** used to lowercase the string.
* 1.2 **dataframe['column_name'].str.upper():** used to uppercase the string.
* 1.3 **dataframe['column_name'].str.title():** used to camel case the string.
* 1.4 **dataframe['column_name'].str.split('delimiter') :** used to split the string based on delimiter.
* 1.5 **dataframe.where(filter_data).dropna():** used to filtered columns based on filter_data.
* 1.6 **dataframe['column_name_2'].str.cat(dataframe['column_name_1'], sep = ", "):** used to concat the column_name_1 with column_name_2 based on delimiter.
* 1.7 **dataframe['column_name'].str.replace('str_1', 'str_2') :** used to replace str_1 with str_2.

    * **str.rstrip():** used to remove spaces from right side of the string.
    * **str.lstrip():** used to remove spaces from left side of the string.
    * **str.strip():** used remove spaces from both side of the string.
    
    * **Series.str.replace(pat, repl, n=-1, case=None, regex=True)**
       * **Parameters:**
       * **pat:** string or compiled regex to be replaced.
       * **repl:** string or callable to replace instead of pat.
       * **n:** Number of replacement to make in a single string, default is -1 which means All.
       * **case:** Takes boolean value to decide case sensitivity. Make false for case insensitivity.
       * **regex:** Boolean value, if True assume that the passed pattern is a regex.
       * **Return:** Series with replaced text values.

* 1.8 **dataframe['column_name'].replace('str_1', 'str_2') :** used to replace str_1 with str_2.

    * **Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')**
       * **Parameters:**
       * **to_replace:** How to find the values that will be replaced.
       * **value:** Value to replace any values matching to_replace with.
       * **inplace:** If True, in place.
       * **limit:** Maximum size gap to forward or backward fill.
       * **regex:** Whether to interpret to_replace and/or value as regular expressions.
       * **method:** The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.
       * **Return:** Object after replacement.
       
* 1.9 **dataframe['column_name'].str.extract('regular_expression_rule'):** used to extract string based on regular exprssion. 
* 1.10 **pd.tseries.offsets.DateOffset:** used to create standard kind of date increment used for a date range. 

In [38]:
data = {'Name':['Name_1', 'Name_2', 'Name_3', 'Name_4'], 
             'Age':[27, 24, 22, 32], 
             'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
             'Qualification':['MSC', 'M.A', 'MCA', 'PHD']} 
   
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD


### Lowercase, Uppercase and Camelcase data

In [39]:
df['Name'] = df['Name'].str.lower()
df

Unnamed: 0,Name,Age,Address,Qualification
0,name_1,27,Nagpur,MSC
1,name_2,24,Delhi,M.A
2,name_3,22,Bangalore,MCA
3,name_4,32,Meerut,PHD


In [40]:
df['Name'] = df['Name'].str.upper()
df

Unnamed: 0,Name,Age,Address,Qualification
0,NAME_1,27,Nagpur,MSC
1,NAME_2,24,Delhi,M.A
2,NAME_3,22,Bangalore,MCA
3,NAME_4,32,Meerut,PHD


In [41]:
df['Name'] = df['Name'].str.title()
df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD


### Splitting and Replacing a Data

In [42]:
df["Address"] = df["Address"].str.replace('nagpur', 'nagpur', case = True)
df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD


In [43]:
df["Address"] = df["Address"].str.replace('nagpur', 'nagpur', case = False)
df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD


In [44]:
df["Address"] = df["Address"].replace(['nagpur', 'Meerut'], ['Noida', 'Gurugram'])
df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Noida,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Gurugram,PHD


In [45]:
df["Qualification"]= df["Qualification"].str.replace('msc', 'PHD', case = False)

edu_filter = df["Qualification"]== 'PHD'
df.where(edu_filter).dropna()

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27.0,Noida,PHD
3,Name_4,32.0,Gurugram,PHD


In [46]:
df["Qualification"].str.split("A", expand = True)

Unnamed: 0,0,1
0,PHD,
1,M.,
2,MC,
3,PHD,


In [47]:
df["Qualification"] = df["Qualification"].str.split("A", expand = True)[0]
df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Noida,PHD
1,Name_2,24,Delhi,M.
2,Name_3,22,Bangalore,MC
3,Name_4,32,Gurugram,PHD


### Concatenation of Data

In [48]:
qua_copy= df["Qualification"].copy() 
df["Name"] = df["Name"].str.cat(qua_copy, sep = ", ") 
df

Unnamed: 0,Name,Age,Address,Qualification
0,"Name_1, PHD",27,Noida,PHD
1,"Name_2, M.",24,Delhi,M.
2,"Name_3, MC",22,Bangalore,MC
3,"Name_4, PHD",32,Gurugram,PHD


### Removing Whitespaces of Data

In [49]:
new_col = df["Address"].replace("Bangalore", "  Bangalore  ").copy()
new_col

0            Noida
1            Delhi
2      Bangalore  
3         Gurugram
Name: Address, dtype: object

In [50]:
print(new_col.str.strip() == " Bangalore")
print(new_col.str.strip() == "Bangalore ")
print(new_col.str.strip() == " Bangalore ")
print(new_col.str.strip() == "Bangalore")

0    False
1    False
2    False
3    False
Name: Address, dtype: bool
0    False
1    False
2    False
3    False
Name: Address, dtype: bool
0    False
1    False
2    False
3    False
Name: Address, dtype: bool
0    False
1    False
2     True
3    False
Name: Address, dtype: bool


### Extracting a Data

In [51]:
series = pd.Series(['a1', 'b2', 'c3'])
series

0    a1
1    b2
2    c3
dtype: object

In [52]:
ext_series = series.str.extract(r'([ab])(\d)')
ext_series

Unnamed: 0,0,1
0,a,1.0
1,b,2.0
2,,


### Pandas tseries.offsets.DateOffset

In [53]:
date_timestamp = pd.Timestamp('2019-10-10 07:15:11')
date_timestamp

Timestamp('2019-10-10 07:15:11')

In [54]:
date_offset = pd.tseries.offsets.DateOffset(n = 2)
date_offset

<2 * DateOffsets>

In [55]:
new_timestamp = date_timestamp + date_offset
new_timestamp

Timestamp('2019-10-12 07:15:11')

In [56]:
days_hours_offset = pd.tseries.offsets.DateOffset(days = 10, hours = 2)
days_hours_offset

<DateOffset: days=10, hours=2>

In [57]:
new_timestamp = date_timestamp + days_hours_offset
new_timestamp

Timestamp('2019-10-20 09:15:11')

## Quick Recap

### 1. Working with text data in Pandas

* 1.1 **dataframe['column_name'].str.lower():** used to lowercase the string.
* 1.2 **dataframe['column_name'].str.upper():** used to uppercase the string.
* 1.3 **dataframe['column_name'].str.title():** used to camel case the string.
* 1.4 **dataframe['column_name'].str.split('delimiter') :** used to split the string based on delimiter.
* 1.5 **dataframe.where(filter_data).dropna():** used to filtered columns based on filter_data.
* 1.6 **dataframe['column_name_2'].str.cat(dataframe['column_name_1'], sep = ", "):** used to concat the column_name_1 with column_name_2 based on delimiter.
* 1.7 **dataframe['column_name'].str.replace('str_1', 'str_2') :** used to replace str_1 with str_2.

    * **str.rstrip():** used to remove spaces from right side of the string.
    * **str.lstrip():** used to remove spaces from left side of the string.
    * **str.strip():** used remove spaces from both side of the string.
    
    * **Series.str.replace(pat, repl, n=-1, case=None, regex=True)**
       * **Parameters:**
       * **pat:** string or compiled regex to be replaced.
       * **repl:** string or callable to replace instead of pat.
       * **n:** Number of replacement to make in a single string, default is -1 which means All.
       * **case:** Takes boolean value to decide case sensitivity. Make false for case insensitivity.
       * **regex:** Boolean value, if True assume that the passed pattern is a regex.
       * **Return:** Series with replaced text values.

* 1.8 **dataframe['column_name'].replace('str_1', 'str_2') :** used to replace str_1 with str_2.

    * **Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')**
       * **Parameters:**
       * **to_replace:** How to find the values that will be replaced.
       * **value:** Value to replace any values matching to_replace with.
       * **inplace:** If True, in place.
       * **limit:** Maximum size gap to forward or backward fill.
       * **regex:** Whether to interpret to_replace and/or value as regular expressions.
       * **method:** The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.
       * **Return:** Object after replacement.
       
* 1.9 **dataframe['column_name'].str.extract('regular_expression_rule'):** used to extract string based on regular exprssion. 
* 1.10 **pd.tseries.offsets.DateOffset:** used to create standard kind of date increment used for a date range. 