## What is Pandas ?
  
Pandas is a software library written for the python programming language
for data **manipulation and analysis.**

* Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas.
* Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.
* The primary two components of pandas are the **Series and DataFrame**

## Loan dataset

https://www.kaggle.com/animeshparikshya/loan-dataset

## Importing  required libraries

In [125]:
import numpy as np
print('numpy version : ', np.__version__)

import pandas as pd
print('pandas version : ', pd.__version__)

import warnings
warnings.filterwarnings('ignore')

numpy version :  1.24.4
pandas version :  1.5.3


## 1.Concatenating dataframe using .concat()

* 1.1 **pd.concat([dataframe_1,  dataframe_2]):** used to concat one dataframe at the end of another. 
* 1.2 **pd.concat([dataframe_1, dataframe_2], axis=1, join='inner'):** used for intersection of dataframe based on dataframe index.
* 1.3 **pd.concat([dataframe_1, dataframe_2], axis=1, join='outer'):** used for union of dataframe based on dataframe index.
* 1.4 **pd.concat([dataframe_1, dataframe_2], ignore_index=True):** used to concat one dataframe at the end of another and ignore the index.
* 1.5 **pd.concat([dataframe_1, dataframe_2], keys = ['X', 'Y']):** used to concat one dataframe at the end of another and also add groups keys X and Y.
* 1.6 **pd.concat([dataframe, series],  axis = 1):** used to concat one dataframe & series and finally return dataframe.

**Note: join = inner/outer** can be used to perform only for **axis = 1** join.

In [126]:
data_1 = {'Name':['Name_1', 'Name_2', 'Name_3', 'Name_4'], 
                'Age':[27, 24, 22, 32], 
                'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
                'Qualification':['MSC', 'M.A', 'MCA', 'PHD']} 
   
df_1 = pd.DataFrame(data_1, index=[0, 1, 2, 3])
df_1

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD


In [127]:
data_2 = {'Name':['Name_5', 'Name_6', 'Name_7', 'Name_8'], 
                'Age':[17, 14, 12, 52], 
                'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
                'Qualification':['B.Tech', 'B.A', 'B.Com', 'B.Hons']} 
    
df_2 = pd.DataFrame(data_2, index=[4, 5, 6, 7])
df_2

Unnamed: 0,Name,Age,Address,Qualification
4,Name_5,17,Nagpur,B.Tech
5,Name_6,14,Delhi,B.A
6,Name_7,12,Bangalore,B.Com
7,Name_8,52,Meerut,B.Hons


### Concatenating dataframe using .concat()

In [128]:
concat_df = pd.concat([df_1, df_2])
concat_df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD
4,Name_5,17,Nagpur,B.Tech
5,Name_6,14,Delhi,B.A
6,Name_7,12,Bangalore,B.Com
7,Name_8,52,Meerut,B.Hons


In [129]:
concat_df = pd.concat([df_1, df_2], axis = 1)
concat_df

Unnamed: 0,Name,Age,Address,Qualification,Name.1,Age.1,Address.1,Qualification.1
0,Name_1,27.0,Nagpur,MSC,,,,
1,Name_2,24.0,Delhi,M.A,,,,
2,Name_3,22.0,Bangalore,MCA,,,,
3,Name_4,32.0,Meerut,PHD,,,,
4,,,,,Name_5,17.0,Nagpur,B.Tech
5,,,,,Name_6,14.0,Delhi,B.A
6,,,,,Name_7,12.0,Bangalore,B.Com
7,,,,,Name_8,52.0,Meerut,B.Hons


### Concatenating dataframe using join='inner'

In [130]:
data_1 = {'Name':['Name_1', 'Name_2', 'Name_3', 'Name_4'], 
                'Age':[27, 24, 22, 32], 
                'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
                'Qualification':['MSC', 'M.A', 'MCA', 'PHD']} 

df_1 = pd.DataFrame(data_1, index=[1, 2, 3, 4])
df_1

Unnamed: 0,Name,Age,Address,Qualification
1,Name_1,27,Nagpur,MSC
2,Name_2,24,Delhi,M.A
3,Name_3,22,Bangalore,MCA
4,Name_4,32,Meerut,PHD


In [131]:
data_2 = {'Name':['Name_1', 'Name_5', 'Name_3', 'Name_8'], 
                'Age':[27, 14, 22, 52], 
                'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
                'Qualification':['MSC', 'B.A', 'MCA', 'B.Hons']} 
    
df_2 = pd.DataFrame(data_2, index=[1, 5, 3, 7])
df_2

Unnamed: 0,Name,Age,Address,Qualification
1,Name_1,27,Nagpur,MSC
5,Name_5,14,Delhi,B.A
3,Name_3,22,Bangalore,MCA
7,Name_8,52,Meerut,B.Hons


In [132]:
concat_df = pd.concat([df_1, df_2], axis=1, join='inner')
concat_df

Unnamed: 0,Name,Age,Address,Qualification,Name.1,Age.1,Address.1,Qualification.1
1,Name_1,27,Nagpur,MSC,Name_1,27,Nagpur,MSC
3,Name_3,22,Bangalore,MCA,Name_3,22,Bangalore,MCA


In [133]:
concat_df = pd.concat([df_1, df_2], axis=0, join='inner')
concat_df

Unnamed: 0,Name,Age,Address,Qualification
1,Name_1,27,Nagpur,MSC
2,Name_2,24,Delhi,M.A
3,Name_3,22,Bangalore,MCA
4,Name_4,32,Meerut,PHD
1,Name_1,27,Nagpur,MSC
5,Name_5,14,Delhi,B.A
3,Name_3,22,Bangalore,MCA
7,Name_8,52,Meerut,B.Hons


### Concatenating dataframe using join='outer'

In [134]:
concat_df = pd.concat([df_1, df_2], axis=1, join='outer')
concat_df

Unnamed: 0,Name,Age,Address,Qualification,Name.1,Age.1,Address.1,Qualification.1
1,Name_1,27.0,Nagpur,MSC,Name_1,27.0,Nagpur,MSC
2,Name_2,24.0,Delhi,M.A,,,,
3,Name_3,22.0,Bangalore,MCA,Name_3,22.0,Bangalore,MCA
4,Name_4,32.0,Meerut,PHD,,,,
5,,,,,Name_5,14.0,Delhi,B.A
7,,,,,Name_8,52.0,Meerut,B.Hons


In [135]:
concat_df = pd.concat([df_1, df_2], axis=0, join='outer')
concat_df

Unnamed: 0,Name,Age,Address,Qualification
1,Name_1,27,Nagpur,MSC
2,Name_2,24,Delhi,M.A
3,Name_3,22,Bangalore,MCA
4,Name_4,32,Meerut,PHD
1,Name_1,27,Nagpur,MSC
5,Name_5,14,Delhi,B.A
3,Name_3,22,Bangalore,MCA
7,Name_8,52,Meerut,B.Hons


### Concatenating dataframe using ignore_index=True

In [136]:
concat_df = pd.concat([df_1, df_2], axis =1, ignore_index=True)
concat_df

Unnamed: 0,0,1,2,3,4,5,6,7
1,Name_1,27.0,Nagpur,MSC,Name_1,27.0,Nagpur,MSC
2,Name_2,24.0,Delhi,M.A,,,,
3,Name_3,22.0,Bangalore,MCA,Name_3,22.0,Bangalore,MCA
4,Name_4,32.0,Meerut,PHD,,,,
5,,,,,Name_5,14.0,Delhi,B.A
7,,,,,Name_8,52.0,Meerut,B.Hons


In [137]:
concat_df = pd.concat([df_1, df_2], axis =0, ignore_index=True)
concat_df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD
4,Name_1,27,Nagpur,MSC
5,Name_5,14,Delhi,B.A
6,Name_3,22,Bangalore,MCA
7,Name_8,52,Meerut,B.Hons


### Concatenating dataframe with group keys

In [138]:
concat_df = pd.concat([df_1, df_2], keys = ['X', 'Y'])
concat_df

Unnamed: 0,Unnamed: 1,Name,Age,Address,Qualification
X,1,Name_1,27,Nagpur,MSC
X,2,Name_2,24,Delhi,M.A
X,3,Name_3,22,Bangalore,MCA
X,4,Name_4,32,Meerut,PHD
Y,1,Name_1,27,Nagpur,MSC
Y,5,Name_5,14,Delhi,B.A
Y,3,Name_3,22,Bangalore,MCA
Y,7,Name_8,52,Meerut,B.Hons


### Concatenating dataframe and series

In [139]:
data = {'Name':['Name_1', 'Name_2', 'Name_3', 'Name_4'], 
                'Age':[27, 24, 22, 32], 
                'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
                'Qualification':['MSC', 'M.A', 'MCA', 'PHD']} 
   
df = pd.DataFrame(data, index=[0, 1, 2, 3])
df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD


In [140]:
ser = pd.Series([1000, 2000, 3000, 4000], name='Salary')
ser

0    1000
1    2000
2    3000
3    4000
Name: Salary, dtype: int64

In [141]:
result = pd.concat([df, ser], axis=1)
result

Unnamed: 0,Name,Age,Address,Qualification,Salary
0,Name_1,27,Nagpur,MSC,1000
1,Name_2,24,Delhi,M.A,2000
2,Name_3,22,Bangalore,MCA,3000
3,Name_4,32,Meerut,PHD,4000


## 2.Concatenating dataframe using .append()

* 2.1 **dataframe_1.append(dataframe_2)** used to append one dataframe at the end of another.

In [142]:
data_1 = {'Name':['Name_1', 'Name_2', 'Name_3', 'Name_4'], 
                'Age':[27, 24, 22, 32], 
                'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
                'Qualification':['MSC', 'M.A', 'MCA', 'PHD']} 
   
df_1 = pd.DataFrame(data_1, index=[0, 1, 2, 3])
df_1

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD


In [143]:
data_2 = {'Name':['Name_5', 'Name_6', 'Name_7', 'Name_8'], 
                'Age':[17, 14, 12, 52], 
                'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
                'Qualification':['B.Tech', 'B.A', 'B.Com', 'B.Hons']} 
    
df_2 = pd.DataFrame(data_2, index=[4, 5, 6, 7])
df_2

Unnamed: 0,Name,Age,Address,Qualification
4,Name_5,17,Nagpur,B.Tech
5,Name_6,14,Delhi,B.A
6,Name_7,12,Bangalore,B.Com
7,Name_8,52,Meerut,B.Hons


### Concatenating dataframe using .append()

In [144]:
append_df = df_1.append(df_2)
append_df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD
4,Name_5,17,Nagpur,B.Tech
5,Name_6,14,Delhi,B.A
6,Name_7,12,Bangalore,B.Com
7,Name_8,52,Meerut,B.Hons


### Concatenating dataframe of different shape using .append()

In [145]:
data_2 = {'Name':['Name_5', 'Name_6', 'Name_7', 'Name_8'], 
                'Age':[17, 14, 12, 52], 
                'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
                'Qualification':['B.Tech', 'B.A', 'B.Com', 'B.Hons'],
                'Salary':[3000, 4000, 5000, 6000]} 
    
df_2 = pd.DataFrame(data_2, index=[4, 5, 6, 7])
df_2

Unnamed: 0,Name,Age,Address,Qualification,Salary
4,Name_5,17,Nagpur,B.Tech,3000
5,Name_6,14,Delhi,B.A,4000
6,Name_7,12,Bangalore,B.Com,5000
7,Name_8,52,Meerut,B.Hons,6000


In [146]:
append_df = df_1.append(df_2)
append_df

Unnamed: 0,Name,Age,Address,Qualification,Salary
0,Name_1,27,Nagpur,MSC,
1,Name_2,24,Delhi,M.A,
2,Name_3,22,Bangalore,MCA,
3,Name_4,32,Meerut,PHD,
4,Name_5,17,Nagpur,B.Tech,3000.0
5,Name_6,14,Delhi,B.A,4000.0
6,Name_7,12,Bangalore,B.Com,5000.0
7,Name_8,52,Meerut,B.Hons,6000.0


## 3.Merging dataframe

* 3.1 **pd.merge(dataframe_1, dataframe_2, on='column_1', how='inner'):** used to perform inner join based on given column name.
* 3.2 **pd.merge(dataframe_1, dataframe_2, left_on='column_1', right_on='column_2', how='inner'):** used to perform inner join based on given column name, when column name is different in both the dataframes.
* 3.3 **pd.merge(dataframe_1, dataframe_2, left_on=['column_1', 'column_2'], right_on=['column_3', 'column_4'], how='inner'):** used to perform inner join based on given column name, when column name is different in both the dataframes and need to use multiple column for join.

**Note** : **how = inner/outer/left/right** can be used to perform respective join.

In [147]:
product_dict = {
    'Product_ID':[101,102,103,104,105,106,107],
    'Product_name':['Watch','Bag','Shoes','Smartphone','Books','Oil','Laptop'],
    'Category':['Fashion','Fashion','Fashion','Electronics','Study','Grocery','Electronics'],
    'Price':[299.0,1350.50,2999.0,14999.0,145.0,110.0,79999.0],
    'Seller_City':['Delhi','Mumbai','Chennai','Kolkata','Delhi','Chennai','Bengalore']
}

product=pd.DataFrame(product_dict)

In [148]:
product

Unnamed: 0,Product_ID,Product_name,Category,Price,Seller_City
0,101,Watch,Fashion,299.0,Delhi
1,102,Bag,Fashion,1350.5,Mumbai
2,103,Shoes,Fashion,2999.0,Chennai
3,104,Smartphone,Electronics,14999.0,Kolkata
4,105,Books,Study,145.0,Delhi
5,106,Oil,Grocery,110.0,Chennai
6,107,Laptop,Electronics,79999.0,Bengalore


In [149]:
customer_df = {
    'id':[1,2,3,4,5,6,7,8,9],
    'name':['Olivia','Aditya','Cory','Isabell','Dominic','Tyler','Samuel','Daniel','Jeremy'],
    'age':[20,25,15,10,30,65,35,18,23],
    'Product_ID':[101,0,106,0,103,104,0,0,107],
    'Purchased_Product':['Watch','NA','Oil','NA','Shoes','Smartphone','NA','NA','Laptop'],
    'City':['Mumbai','Delhi','Bangalore','Chennai','Chennai','Delhi','Kolkata','Delhi','Mumbai']
}

customer=pd.DataFrame(customer_df)

In [150]:
customer

Unnamed: 0,id,name,age,Product_ID,Purchased_Product,City
0,1,Olivia,20,101,Watch,Mumbai
1,2,Aditya,25,0,,Delhi
2,3,Cory,15,106,Oil,Bangalore
3,4,Isabell,10,0,,Chennai
4,5,Dominic,30,103,Shoes,Chennai
5,6,Tyler,65,104,Smartphone,Delhi
6,7,Samuel,35,0,,Kolkata
7,8,Daniel,18,0,,Delhi
8,9,Jeremy,23,107,Laptop,Mumbai


### Merging dataframe using inner method

In [151]:
pd.merge(product, customer, on='Product_ID')

Unnamed: 0,Product_ID,Product_name,Category,Price,Seller_City,id,name,age,Purchased_Product,City
0,101,Watch,Fashion,299.0,Delhi,1,Olivia,20,Watch,Mumbai
1,103,Shoes,Fashion,2999.0,Chennai,5,Dominic,30,Shoes,Chennai
2,104,Smartphone,Electronics,14999.0,Kolkata,6,Tyler,65,Smartphone,Delhi
3,106,Oil,Grocery,110.0,Chennai,3,Cory,15,Oil,Bangalore
4,107,Laptop,Electronics,79999.0,Bengalore,9,Jeremy,23,Laptop,Mumbai


* Here, I have performed **inner join** on the product and customer dataframes on the **Product_ID** column.

* But, what if the column names are different in the two dataframes? Then, we have to explicitly mention both the column names.

* **left_on** and **right_on** are two arguments through which we can achieve this. **left_on** is the name of the key in the **left dataframe** and **right_on** in the **right dataframe.**

In [152]:
pd.merge(product, customer, left_on='Product_name', right_on='Purchased_Product')

Unnamed: 0,Product_ID_x,Product_name,Category,Price,Seller_City,id,name,age,Product_ID_y,Purchased_Product,City
0,101,Watch,Fashion,299.0,Delhi,1,Olivia,20,101,Watch,Mumbai
1,103,Shoes,Fashion,2999.0,Chennai,5,Dominic,30,103,Shoes,Chennai
2,104,Smartphone,Electronics,14999.0,Kolkata,6,Tyler,65,104,Smartphone,Delhi
3,106,Oil,Grocery,110.0,Chennai,3,Cory,15,106,Oil,Bangalore
4,107,Laptop,Electronics,79999.0,Bengalore,9,Jeremy,23,107,Laptop,Mumbai


* Now we wants more details about the products sold. They want to know about all the products sold by the **seller to the same city i.e., seller and customer both belong to the same city.**

* In this case, we have to perform an inner join on both **Product_ID and Seller_City** of product and **Product_ID and City columns of the customer dataframe.**

In [153]:
pd.merge(product, customer, how='inner', left_on=['Product_ID','Seller_City'], right_on=['Product_ID','City'])

Unnamed: 0,Product_ID,Product_name,Category,Price,Seller_City,id,name,age,Purchased_Product,City
0,103,Shoes,Fashion,2999.0,Chennai,5,Dominic,30,Shoes,Chennai


### Merging dataframe using outer method

In [154]:
pd.merge(product, customer, on='Product_ID', how='outer')

Unnamed: 0,Product_ID,Product_name,Category,Price,Seller_City,id,name,age,Purchased_Product,City
0,101,Watch,Fashion,299.0,Delhi,1.0,Olivia,20.0,Watch,Mumbai
1,102,Bag,Fashion,1350.5,Mumbai,,,,,
2,103,Shoes,Fashion,2999.0,Chennai,5.0,Dominic,30.0,Shoes,Chennai
3,104,Smartphone,Electronics,14999.0,Kolkata,6.0,Tyler,65.0,Smartphone,Delhi
4,105,Books,Study,145.0,Delhi,,,,,
5,106,Oil,Grocery,110.0,Chennai,3.0,Cory,15.0,Oil,Bangalore
6,107,Laptop,Electronics,79999.0,Bengalore,9.0,Jeremy,23.0,Laptop,Mumbai
7,0,,,,,2.0,Aditya,25.0,,Delhi
8,0,,,,,4.0,Isabell,10.0,,Chennai
9,0,,,,,7.0,Samuel,35.0,,Kolkata


* All the non-matching rows of both the dataframes have **NaN** values for the columns of other dataframes. But wait – **we still don’t know which row belongs to which dataframe.**

* For this, Pandas provides us with a fantastic solution. We just have to mention the **indicator argument as True** in the function, and a **new column of name _merge** will be created in the resulting dataframe:

In [155]:
pd.merge(product,customer,on='Product_ID',how='outer', indicator=True)

Unnamed: 0,Product_ID,Product_name,Category,Price,Seller_City,id,name,age,Purchased_Product,City,_merge
0,101,Watch,Fashion,299.0,Delhi,1.0,Olivia,20.0,Watch,Mumbai,both
1,102,Bag,Fashion,1350.5,Mumbai,,,,,,left_only
2,103,Shoes,Fashion,2999.0,Chennai,5.0,Dominic,30.0,Shoes,Chennai,both
3,104,Smartphone,Electronics,14999.0,Kolkata,6.0,Tyler,65.0,Smartphone,Delhi,both
4,105,Books,Study,145.0,Delhi,,,,,,left_only
5,106,Oil,Grocery,110.0,Chennai,3.0,Cory,15.0,Oil,Bangalore,both
6,107,Laptop,Electronics,79999.0,Bengalore,9.0,Jeremy,23.0,Laptop,Mumbai,both
7,0,,,,,2.0,Aditya,25.0,,Delhi,right_only
8,0,,,,,4.0,Isabell,10.0,,Chennai,right_only
9,0,,,,,7.0,Samuel,35.0,,Kolkata,right_only


### Merging dataframe using left method

In [156]:
pd.merge(product,customer,on='Product_ID',how='left')

Unnamed: 0,Product_ID,Product_name,Category,Price,Seller_City,id,name,age,Purchased_Product,City
0,101,Watch,Fashion,299.0,Delhi,1.0,Olivia,20.0,Watch,Mumbai
1,102,Bag,Fashion,1350.5,Mumbai,,,,,
2,103,Shoes,Fashion,2999.0,Chennai,5.0,Dominic,30.0,Shoes,Chennai
3,104,Smartphone,Electronics,14999.0,Kolkata,6.0,Tyler,65.0,Smartphone,Delhi
4,105,Books,Study,145.0,Delhi,,,,,
5,106,Oil,Grocery,110.0,Chennai,3.0,Cory,15.0,Oil,Bangalore
6,107,Laptop,Electronics,79999.0,Bengalore,9.0,Jeremy,23.0,Laptop,Mumbai


### Merging dataframe using right method

In [157]:
pd.merge(product,customer,on='Product_ID',how='right')

Unnamed: 0,Product_ID,Product_name,Category,Price,Seller_City,id,name,age,Purchased_Product,City
0,101,Watch,Fashion,299.0,Delhi,1,Olivia,20,Watch,Mumbai
1,0,,,,,2,Aditya,25,,Delhi
2,106,Oil,Grocery,110.0,Chennai,3,Cory,15,Oil,Bangalore
3,0,,,,,4,Isabell,10,,Chennai
4,103,Shoes,Fashion,2999.0,Chennai,5,Dominic,30,Shoes,Chennai
5,104,Smartphone,Electronics,14999.0,Kolkata,6,Tyler,65,Smartphone,Delhi
6,0,,,,,7,Samuel,35,,Kolkata
7,0,,,,,8,Daniel,18,,Delhi
8,107,Laptop,Electronics,79999.0,Bengalore,9,Jeremy,23,Laptop,Mumbai


## 4.Joining dataframe

* 4.1 **dataframe_1.join(dataframe_2, how='inner'):** used to perform inner join based on the indexes (set by set_index or index=[]).
* 4.2 **dataframe_1.join(dataframe_2, how='inner', on='key'):** used to perform inner join based on the indexes (set by set_index or index=[] for dataframe_2) and key must be the column in dataframe_1.

**Note** : **how = inner/outer/left/right** can be used to perform respective join.

In [158]:
data_1 = {'Name':['Name_1', 'Name_2', 'Name_3', 'Name_4'], 
                'Age':[27, 24, 22, 32]} 
     
df_1 = pd.DataFrame(data_1, index=['K0', 'K1', 'K2', 'K3'])

In [159]:
df_1

Unnamed: 0,Name,Age
K0,Name_1,27
K1,Name_2,24
K2,Name_3,22
K3,Name_4,32


In [160]:
data_2 = {'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
              'Qualification':['MSC', 'M.A', 'MCA', 'PHD']}

df_2 = pd.DataFrame(data_2, index=['K0', 'K2', 'K3', 'K4'])

In [161]:
df_2

Unnamed: 0,Address,Qualification
K0,Nagpur,MSC
K2,Delhi,M.A
K3,Bangalore,MCA
K4,Meerut,PHD


### Joining dataframe using inner method

In [162]:
result = df_1.join(df_2, how='inner')
result

Unnamed: 0,Name,Age,Address,Qualification
K0,Name_1,27,Nagpur,MSC
K2,Name_3,22,Delhi,M.A
K3,Name_4,32,Bangalore,MCA


### Joining dataframe using outer method

In [163]:
result = df_1.join(df_2, how='outer')
result

Unnamed: 0,Name,Age,Address,Qualification
K0,Name_1,27.0,Nagpur,MSC
K1,Name_2,24.0,,
K2,Name_3,22.0,Delhi,M.A
K3,Name_4,32.0,Bangalore,MCA
K4,,,Meerut,PHD


### Joining dataframe using inner + on method

In [164]:
data_1 = {'Name':['Name_1', 'Name_2', 'Name_3', 'Name_4'], 
                'Age':[27, 24, 22, 32],
                'Key':['K0', 'K1', 'K2', 'K3']} 
     
df_1 = pd.DataFrame(data_1)

In [165]:
data_2 = {'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
                'Qualification':['MSC', 'M.A', 'MCA', 'PHD']}

df_2 = pd.DataFrame(data_2, index=['K0', 'K2', 'K3', 'K4'])

In [166]:
df_1

Unnamed: 0,Name,Age,Key
0,Name_1,27,K0
1,Name_2,24,K1
2,Name_3,22,K2
3,Name_4,32,K3


In [167]:
df_2

Unnamed: 0,Address,Qualification
K0,Nagpur,MSC
K2,Delhi,M.A
K3,Bangalore,MCA
K4,Meerut,PHD


In [168]:
df_1.join(df_2, how='inner', on='Key')

Unnamed: 0,Name,Age,Key,Address,Qualification
0,Name_1,27,K0,Nagpur,MSC
2,Name_3,22,K2,Delhi,M.A
3,Name_4,32,K3,Bangalore,MCA


## 5.Series.str.cat() to concatenate string

* 5.1 **data["column_1"].str.cat(column_2_copy, sep =", "):** used to concat two column based on given separator.
* 5.2 **data["column_1"].str.cat(column_2_copy, sep =", " na_rep = 'str-value'):** used to concat two column based on given separator and fill the NaN value given str-value.

In [169]:
data = pd.read_csv("dataset/loan.csv", index_col='Loan_ID')
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


### Concatenating column with separator

In [170]:
gender_copy = data["Gender"].copy()
data["Gender"] = data["Married"].str.cat(gender_copy, sep ="_")
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,No_Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,Yes_Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,Yes_Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,Yes_Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,No_Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,No_Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Yes_Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Yes_Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Yes_Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


## 6.Pandas series.append()

* 6.1 **series_1.append(series_2):** used to append one series at the end of another.
* 6.2 **series_1.append(series_2, ignore_index = True):** used to append one series at the end of another and ignore the index.

In [171]:
ser_1 = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio'])
index_1 = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5'] 
ser_1.index = index_1

In [172]:
ser_2 = pd.Series(['Chicage', 'Shanghai', 'Beijing', 'Jakarta', 'Seoul'])
index_2 = ['City 6', 'City 7', 'City 8', 'City 9', 'City 10']
ser_2.index = index_2

In [173]:
ser_1

City 1    New York
City 2     Chicago
City 3     Toronto
City 4      Lisbon
City 5         Rio
dtype: object

In [174]:
ser_2

City 6      Chicage
City 7     Shanghai
City 8      Beijing
City 9      Jakarta
City 10       Seoul
dtype: object

### Concatenating two series using series.append()

In [175]:
result = ser_1.append(ser_2)
result

City 1     New York
City 2      Chicago
City 3      Toronto
City 4       Lisbon
City 5          Rio
City 6      Chicage
City 7     Shanghai
City 8      Beijing
City 9      Jakarta
City 10       Seoul
dtype: object

### Concatenating two series using series.append() and ignore origal index using ignore_index = True

In [176]:
result = ser_1.append(ser_2, ignore_index = True)
result

0    New York
1     Chicago
2     Toronto
3      Lisbon
4         Rio
5     Chicage
6    Shanghai
7     Beijing
8     Jakarta
9       Seoul
dtype: object

## 7.Pandas index.append()

* 7.1 **index_1.append(index_2):** used to append one Index at the end of another.

In [177]:
df_1 = pd.Index([17, 69, 33, 5, 0, 74, 0])
df_1

Int64Index([17, 69, 33, 5, 0, 74, 0], dtype='int64')

In [178]:
df_2 = pd.Index([11, 16, 54, 58])
df_2

Int64Index([11, 16, 54, 58], dtype='int64')

### Concatenating two index using  index.append()

In [179]:
df_1.append(df_2)

Int64Index([17, 69, 33, 5, 0, 74, 0, 11, 16, 54, 58], dtype='int64')

### Concatenating multiple index using  index.append()

In [180]:
df_3 = pd.Index([101, 102, 103, 104])
df_3

Int64Index([101, 102, 103, 104], dtype='int64')

In [181]:
df_1.append([df_2, df_3])

Int64Index([17, 69, 33, 5, 0, 74, 0, 11, 16, 54, 58, 101, 102, 103, 104], dtype='int64')

## 8.Pandas str.join() to join string/list elements with passed delimiter

* 8.1 **data[column_1].str.join("-"):** used to join character of given column by delimiter.

In [182]:
data = pd.read_csv("dataset/loan.csv", index_col='Loan_ID')
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


In [183]:
data["Gender"]= data["Gender"].str.join("-")
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,M-a-l-e,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,M-a-l-e,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,M-a-l-e,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,M-a-l-e,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,M-a-l-e,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,F-e-m-a-l-e,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,M-a-l-e,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,M-a-l-e,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,M-a-l-e,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


## 9.Join two text columns into a single column in pandas

* 9.1 **dataframe['column_1'].str.cat(dataframe['column_2'], sep ="_"):** used to concat two columns based on given delimiter.

**Note** Same thing can be done using **lambda** function as well.

In [184]:
data = pd.read_csv("dataset/loan.csv", index_col='Loan_ID')
data.head()

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


In [185]:
data['New_Column'] = data['Married'].str.cat(data['Education'], sep ="_")
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status,New_Column
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y,No_Graduate
LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N,Yes_Graduate
LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y,Yes_Graduate
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y,Yes_Not Graduate
LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y,No_Graduate
...,...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y,No_Graduate
LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y,Yes_Graduate
LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y,Yes_Graduate
LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y,Yes_Graduate


## Quick Recap

### 1.Concatenating dataframe using.concat()

* 1.1 **pd.concat([dataframe_1,  dataframe_2]):** used to concat one dataframe at the end of another. 
* 1.2 **pd.concat([dataframe_1, dataframe_2], axis=1, join='inner'):** used for intersection of dataframe based on dataframe index.
* 1.3 **pd.concat([dataframe_1, dataframe_2], axis=1, join='outer'):** used for union of dataframe based on dataframe index.
* 1.4 **pd.concat([dataframe_1, dataframe_2], ignore_index=True):** used to concat one dataframe at the end of another and ignore the index.
* 1.5 **pd.concat([dataframe_1, dataframe_2], keys = ['X', 'Y']):** used to concat one dataframe at the end of another and also add groups keys X and Y.
* 1.6 **pd.concat([dataframe, series],  axis = 1):** used to concat one dataframe & series and finally return dataframe.


### 2.Concatenating dataframe using.append()

* 2.1 **dataframe_1.append(dataframe_2)** used to append one dataframe at the end of another.


### 3.Merging dataframe

* 3.1 **pd.merge(dataframe_1, dataframe_2, on='column_1', how='inner'):** used to perform inner join based on given column name.
* 3.2 **pd.merge(dataframe_1, dataframe_2, left_on='column_1', right_on='column_2', how='inner'):** used to perform inner join based on given column name, when column name is different in both the dataframes.
* 3.3 **pd.merge(dataframe_1, dataframe_2, left_on=['column_1', 'column_2'], right_on=['column_3', 'column_4'], how='inner'):** used to perform inner join based on given column name, when column name is different in both the dataframes and need to use multiple column for join.

**Note** : **how = inner/outer/left/right** can be used to perform respective join.

### 4.Joining dataframe

* 4.1 **dataframe_1.join(dataframe_2, how='inner'):** used to perform inner join based on the indexes (set by set_index or index=[]).
* 4.2 **dataframe_1.join(dataframe_2, how='inner', on='key'):** used to perform inner join based on the indexes (set by set_index or index=[] for dataframe_2) and key must be the column in dataframe_1.

**Note** : **how = inner/outer/left/right** can be used to perform respective join.

### 5.Series.str.cat() to concatenate string

* 5.1 **data["column_1"].str.cat(column_2_copy, sep =", "):** used to concat two column based on given separator.
* 5.2 **data["column_1"].str.cat(column_2_copy, sep =", " na_rep = 'str-value'):** used to concat two column based on given separator and fill the NaN value given str-value.


### 6.Pandas series.append()

* 6.1 **series_1.append(series_2):** used to append one series at the end of another.
* 6.2 **series_1.append(series_2, ignore_index = True):** used to append one series at the end of another and ignore the index.

### 7.Pandas index.append()

* 7.1 **index_1.append(index_2):** used to append one Index at the end of another.

### 8.Pandas str.join() to join string/list elements with passed delimiter

* 8.1 **data[column_1].str.join("-"):** used to join character of given column by delimiter.

### 9.Join two text columns into a single column in pandas

* 9.1 **dataframe['column_1'].str.cat(dataframe['column_2'], sep ="_"):** used to concat two columns based on given delimiter.

**Note** Same thing can be done using **lambda** function as well.


## Pandas concat vs append vs join vs merge

* **concat** gives the flexibility to join based on the axis(all rows or all columns) and the indexes (set by set_index) of dataframe.

* **append** is the specific case(axis=0, join='outer') of concat. It used to append one dataframe at the end of another.

* The **concat** method can combine data frames along either rows or columns, while the **append method** only combines data frames along rows.

* **join** is based on the indexes (set by set_index) on **how variable =['left','right','inner','outer']**. It doesn't provide flexibility to join based on the axis.

* **join()** for combining data on a key column or an index

* **merge** is based on any particular column, each of the two dataframes on **how variable =['left','right','inner','outer']**, these columns are variables on like **'left_on', 'right_on', 'on'**. It doesn't provide flexibility to join based on the axis.
* **merge()** for combining data on common columns or indices