## What is Pandas ?
  
Pandas is a software library written for the python programming language
for data **manipulation and analysis.**

* Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas.
* Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.
* The primary two components of pandas are the **Series and DataFrame**

## Loan dataset

https://www.kaggle.com/animeshparikshya/loan-dataset

## Importing  required libraries

In [120]:
import numpy as np
print('numpy version : ', np.__version__)

import pandas as pd
print('pandas version : ', pd.__version__)

import warnings
warnings.filterwarnings('ignore')

numpy version :  1.19.5
pandas version :  1.1.5


## 1.Adding new column to existing DataFrame in Pandas

* 1.1 **dataframe['new_column_name'] = [values of new column]:** used to add new column in existing dataframe directly. 
* 1.2 Using **dataframe.insert()** function we can add a column at particular position in dataframe.
* 1.3 Using **dataframe.assign()** function we can add a column at the end of dataframe.

In [121]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, 6.2, 5.1, 5.2, 5.7],
             'Qualification': ['BCA', 'MCA', 'BSC', 'MSC', 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,6.2,MCA
2,Name_3,5.1,BSC
3,Name_4,5.2,MSC
4,Name_5,5.7,B.Tech


### Directly adding a column into Dataframe

In [122]:
address = ['Delhi', 'Bangalore', 'Chennai', 'Pune', 'Noida']
df['Address'] = address
df

Unnamed: 0,Name,Height,Qualification,Address
0,Name_1,5.1,BCA,Delhi
1,Name_2,6.2,MCA,Bangalore
2,Name_3,5.1,BSC,Chennai
3,Name_4,5.2,MSC,Pune
4,Name_5,5.7,B.Tech,Noida


### Using DataFrame.insert() function

In [123]:
df.insert(2, "Age", [21, 23, 24, 21, 20], True)

In [124]:
df

Unnamed: 0,Name,Height,Age,Qualification,Address
0,Name_1,5.1,21,BCA,Delhi
1,Name_2,6.2,23,MCA,Bangalore
2,Name_3,5.1,24,BSC,Chennai
3,Name_4,5.2,21,MSC,Pune
4,Name_5,5.7,20,B.Tech,Noida


### Using Dataframe.assign() function

In [125]:
df1 = df.assign(Address_2 = ['Delhi', 'Bangalore', 'Chennai', 'Pune', 'Noida'])
df1

Unnamed: 0,Name,Height,Age,Qualification,Address,Address_2
0,Name_1,5.1,21,BCA,Delhi,Delhi
1,Name_2,6.2,23,MCA,Bangalore,Bangalore
2,Name_3,5.1,24,BSC,Chennai,Chennai
3,Name_4,5.2,21,MSC,Pune,Pune
4,Name_5,5.7,20,B.Tech,Noida,Noida


In [126]:
df

Unnamed: 0,Name,Height,Age,Qualification,Address
0,Name_1,5.1,21,BCA,Delhi
1,Name_2,6.2,23,MCA,Bangalore
2,Name_3,5.1,24,BSC,Chennai
3,Name_4,5.2,21,MSC,Pune
4,Name_5,5.7,20,B.Tech,Noida


## 2.Delete rows/columns from DataFrame using Pandas.drop()

* 2.1 Using **dataframe.drop(axis = 0)** function we can delete the rows of dataframe.
* 2.2 Using **dataframe.drop(axis = 1)** function we can delete the colums of dataframe.

### Dropping Rows by index label

In [127]:
data = pd.read_csv("dataset/loan.csv", index_col='Loan_ID')
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


In [128]:
data.drop(["LP001002", "LP001003", "LP001005","LP001008"], inplace = True)
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001011,Male,Yes,2,Graduate,Yes,5417,4196.0,267.0,360.0,1.0,Urban,Y
LP001013,Male,Yes,0,Not Graduate,No,2333,1516.0,95.0,360.0,1.0,Urban,Y
LP001014,Male,Yes,3+,Graduate,No,3036,2504.0,158.0,360.0,0.0,Semiurban,N
LP001018,Male,Yes,2,Graduate,No,4006,1526.0,168.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


### Dropping columns with column name

In [129]:
data.drop(["Dependents", "Education"], axis = 1, inplace = True)
data

Unnamed: 0_level_0,Gender,Married,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
LP001006,Male,Yes,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001011,Male,Yes,Yes,5417,4196.0,267.0,360.0,1.0,Urban,Y
LP001013,Male,Yes,No,2333,1516.0,95.0,360.0,1.0,Urban,Y
LP001014,Male,Yes,No,3036,2504.0,158.0,360.0,0.0,Semiurban,N
LP001018,Male,Yes,No,4006,1526.0,168.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Male,Yes,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Male,Yes,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Male,Yes,No,7583,0.0,187.0,360.0,1.0,Urban,Y


## 3.Pandas DataFrame.truncate()

* 3.1 Using **dataframe.truncate(before = ? , after = ?)** we can truncate the data from rows based on given index of before and after.

In [130]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, 6.2, 5.1, 5.2, 5.7],
             'Qualification': ['BCA', 'MCA', 'BSC', 'MSC', 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,6.2,MCA
2,Name_3,5.1,BSC
3,Name_4,5.2,MSC
4,Name_5,5.7,B.Tech


In [131]:
index_ = ['Row_1', 'Row_2', 'Row_3', 'Row_4', 'Row_5']
df.index = index_
df

Unnamed: 0,Name,Height,Qualification
Row_1,Name_1,5.1,BCA
Row_2,Name_2,6.2,MCA
Row_3,Name_3,5.1,BSC
Row_4,Name_4,5.2,MSC
Row_5,Name_5,5.7,B.Tech


In [132]:
df.truncate(before = 'Row_3', after = 'Row_4')

Unnamed: 0,Name,Height,Qualification
Row_3,Name_3,5.1,BSC
Row_4,Name_4,5.2,MSC


## 4.Pandas Series.truncate()

* 4.1 Using **series.truncate(before = ? , after = ?)** we can truncate the data based on given index of before and after.

In [133]:
sr = pd.Series([19.5, 16.8, 22.78, 20.124, 18.1002])
sr

0    19.5000
1    16.8000
2    22.7800
3    20.1240
4    18.1002
dtype: float64

In [134]:
sr.truncate(before = 1, after = 3)

1    16.800
2    22.780
3    20.124
dtype: float64

## 5.Iterating over rows and columns in Pandas DataFrame

* 5.1 **iterrows():** Used to iterate over each element of the set, row-wise.
* 5.2 **iteritems():** Used to iterate over each element of the set, column-wise.
* 5.3 **itertuple():** Used to iterate over each row and form a tuple out of them.
* 5.4 **list(dataframe):** Used to iterate over each element of the dataframe, column-wise using for loop.

In [135]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, 6.2, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', 'MCA', 'BSC', 'MSC', 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,6.2,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,MSC
4,Name_5,5.4,B.Tech


### Iteration over rows using iterrows()

In [136]:
for index, value in df.iterrows():
    print('Index : ', index)
    print('Value : ', value)
    print('\n')

Index :  0
Value :  Name             Name_1
Height              5.1
Qualification       BCA
Name: 0, dtype: object


Index :  1
Value :  Name             Name_2
Height              6.2
Qualification       MCA
Name: 1, dtype: object


Index :  2
Value :  Name             Name_3
Height              5.2
Qualification       BSC
Name: 2, dtype: object


Index :  3
Value :  Name             Name_4
Height              5.3
Qualification       MSC
Name: 3, dtype: object


Index :  4
Value :  Name             Name_5
Height              5.4
Qualification    B.Tech
Name: 4, dtype: object




In [137]:
for index, value in df.iterrows():
    print('Index : ', index)
    print('Name : ', value['Name'])
    print('\n')

Index :  0
Name :  Name_1


Index :  1
Name :  Name_2


Index :  2
Name :  Name_3


Index :  3
Name :  Name_4


Index :  4
Name :  Name_5




### Iteration over rows using iteritems()

In [138]:
for index, value in df.iteritems():
    print(index)
    print(value)
    print('\n')

Name
0    Name_1
1    Name_2
2    Name_3
3    Name_4
4    Name_5
Name: Name, dtype: object


Height
0    5.1
1    6.2
2    5.2
3    5.3
4    5.4
Name: Height, dtype: float64


Qualification
0       BCA
1       MCA
2       BSC
3       MSC
4    B.Tech
Name: Qualification, dtype: object




### Iteration over rows using itertuples()

In [139]:
for value in df.itertuples():
    print(value)

Pandas(Index=0, Name='Name_1', Height=5.1, Qualification='BCA')
Pandas(Index=1, Name='Name_2', Height=6.2, Qualification='MCA')
Pandas(Index=2, Name='Name_3', Height=5.2, Qualification='BSC')
Pandas(Index=3, Name='Name_4', Height=5.3, Qualification='MSC')
Pandas(Index=4, Name='Name_5', Height=5.4, Qualification='B.Tech')


### Iterating over cloums in Pandas DataFrame

In [140]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, 6.2, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', 'MCA', 'BSC', 'MSC', 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,6.2,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,MSC
4,Name_5,5.4,B.Tech


In [141]:
columns = list(df)
columns

['Name', 'Height', 'Qualification']

In [142]:
for col_name in columns:
    print(df[col_name][2])

Name_3
5.2
BSC


## 6.Working with Missing Data in Pandas

* 6.1 Checking for missing values using **isnull():** It return True at place of null value in dataframe and False at place of value.
* 6.2 Checking for missing values using **notnull():** It return False at place of null value in dataframe and True at place of value.

In [143]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', 'MCA', 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


### Checking for missing values using isnull()

In [144]:
df.isnull()

Unnamed: 0,Name,Height,Qualification
0,False,False,False
1,False,True,False
2,False,False,False
3,False,False,True
4,False,False,False


In [145]:
df.isnull().sum()

Name             0
Height           1
Qualification    1
dtype: int64

### Checking for missing values using notnull()

In [146]:
df.notnull()

Unnamed: 0,Name,Height,Qualification
0,True,True,True
1,True,False,True
2,True,True,True
3,True,True,False
4,True,True,True


In [147]:
df.notnull().sum()

Name             5
Height           4
Qualification    4
dtype: int64

### Filling missing values using fillna(), replace() and interpolate()

* **fillna():** Filling null values with a single value.
* **fillna(method ='pad'):** Filling null values with the previous ones.
* **fillna(method ='bfill'):** Filling null value with the next ones.
* **replace(to_replace = np.nan, value = 20) :** Replace all the NaN values with a particular value in entire dataframe..
* **interpolate(method ='linear', limit_direction ='forward'):** Interpolate the missing values using Linear method. Linear method ignore the index and treat the values as equally spaced.


### Filling missing values using fillna()

In [148]:
df.fillna(0)

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,0.0,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,0
4,Name_5,5.4,B.Tech


### Filling missing values using fillna(method ='pad')

In [149]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', 'MCA', 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


In [150]:
df.fillna(method ='pad')

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,5.1,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,BSC
4,Name_5,5.4,B.Tech


### Filling missing values using fillna(method ='bfill')

In [151]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', 'MCA', 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


In [152]:
df.fillna(method ='bfill')

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,5.2,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,B.Tech
4,Name_5,5.4,B.Tech


### Filling missing values using replace()

In [153]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', 'MCA', 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


In [154]:
df.replace(to_replace = np.nan, value = 20)

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,20.0,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,20
4,Name_5,5.4,B.Tech


### Filling missing values using interpolate()

In [155]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', 'MCA', 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


In [156]:
df.interpolate(method ='linear', limit_direction ='forward')

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,5.15,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


### Dropping missing values

* **dropna():** drop rows with at least one NaN value.
* **dropna(how='all'):** drop a row whose all data is missing or contain null values(NaN).
* **dropna(axis = 1):** drop a columns which have at least 1 missing values.
* **dropna(axis = 1, how='all'):** drop a column whose all data is missing or contain null values(NaN)
* **dropna(axis = 0, how ='any'):** drop rows with at least 1 NaN value.

In [157]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', 'MCA', 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,MCA
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


### Dropping missing values using dropna()

In [158]:
df.dropna()

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
2,Name_3,5.2,BSC
4,Name_5,5.4,B.Tech


### Dropping missing values using dropna(how='all')

In [159]:
data = {'Name': ['Name_1', np.nan, 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', np.nan, 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,,,
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


In [160]:
df.dropna(how='all')

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


### Dropping missing values using dropna(axis = 1)

In [161]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', np.nan, 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


In [162]:
df.dropna(axis = 1)

Unnamed: 0,Name
0,Name_1
1,Name_2
2,Name_3
3,Name_4
4,Name_5


### Dropping missing values using dropna(axis = 1, how='all')

In [163]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', np.nan, 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


In [164]:
df.dropna(axis = 1, how='all')

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


### Dropping missing values using dropna(axis = 0, how='any')

In [165]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, np.nan, 5.2, 5.3, 5.4],
             'Qualification': ['BCA', np.nan, 'BSC', np.nan, 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,,
2,Name_3,5.2,BSC
3,Name_4,5.3,
4,Name_5,5.4,B.Tech


In [166]:
df.dropna(axis =0, how='any')

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
2,Name_3,5.2,BSC
4,Name_5,5.4,B.Tech


## 7.Pandas Dataframe.sort_values()

* 7.1 **dataframe.sort_values():** is used to sort the dataframe based on given column name.
* 7.2 **dataframe.sort_values("column-name", axis = 0, ascending = True, inplace = True, na_position ='last'):** is used to sort the dataframe based on given column name and it also allow us to keep the NaN value position (last/first).
* 7.2 **dataframe.sort_values(["column-name-1","column-name-2"], axis = 0, ascending = [True, False], inplace = True, na_position ='last'):** is used to sort the dataframe based on multiple columns name and it also allow us to keep the NaN value position (last/first).


### Sort dataframe based on single column

In [167]:
data = pd.read_csv("dataset/loan.csv", index_col='Loan_ID')
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


In [168]:
data.sort_values("Education", axis = 0, ascending = True, inplace = True, na_position ='last')
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP002231,Female,No,0,Graduate,No,6000,0.0,156.0,360.0,1.0,Urban,Y
LP002229,Male,No,0,Graduate,No,5941,4232.0,296.0,360.0,1.0,Semiurban,Y
LP002226,Male,Yes,0,Graduate,,3333,2500.0,128.0,360.0,1.0,Semiurban,Y
LP002225,Male,Yes,2,Graduate,No,5391,0.0,130.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002637,Male,No,0,Not Graduate,No,3598,1287.0,100.0,360.0,1.0,Rural,N
LP001630,Male,No,0,Not Graduate,No,2333,1451.0,102.0,480.0,0.0,Urban,N
LP001250,Male,Yes,3+,Not Graduate,No,4755,0.0,95.0,,0.0,Semiurban,N
LP002367,Female,No,1,Not Graduate,No,4606,0.0,81.0,360.0,1.0,Rural,N


In [169]:
data.sort_values("Education", axis = 0, ascending = True, inplace = True, na_position ='first')
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001448,,Yes,3+,Graduate,No,23803,0.0,370.0,360.0,1.0,Rural,Y
LP001449,Male,No,0,Graduate,No,3865,1640.0,,360.0,1.0,Rural,Y
LP001882,Male,Yes,3+,Graduate,No,4333,1811.0,160.0,360.0,0.0,Urban,Y
LP001465,Male,Yes,0,Graduate,No,6080,2569.0,182.0,360.0,,Rural,N
...,...,...,...,...,...,...,...,...,...,...,...,...
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP002288,Male,Yes,2,Not Graduate,No,2889,0.0,45.0,180.0,0.0,Urban,N
LP002964,Male,Yes,2,Not Graduate,No,3987,1411.0,157.0,360.0,1.0,Rural,Y
LP001086,Male,No,0,Not Graduate,No,1442,0.0,35.0,360.0,1.0,Urban,N


### Sort dataframe based on multiple columns

In [170]:
data = pd.read_csv("dataset/loan.csv", index_col='Loan_ID')
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


In [171]:
data.sort_values(["Education","Self_Employed"], axis = 0, ascending = [True,True], inplace = True, na_position ='first')
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001027,Male,Yes,2,Graduate,,2500,1840.0,109.0,360.0,1.0,Urban,Y
LP001041,Male,Yes,0,Graduate,,2600,3500.0,115.0,,1.0,Urban,Y
LP001052,Male,Yes,1,Graduate,,3717,2925.0,151.0,360.0,,Semiurban,N
LP001087,Female,No,2,Graduate,,3750,2083.0,120.0,360.0,1.0,Semiurban,Y
LP001091,Male,Yes,1,Graduate,,4166,3369.0,201.0,360.0,,Urban,N
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002444,Male,No,1,Not Graduate,Yes,2769,1542.0,190.0,360.0,,Semiurban,N
LP002582,Female,No,0,Not Graduate,Yes,17263,0.0,225.0,360.0,1.0,Semiurban,Y
LP002731,Female,No,0,Not Graduate,Yes,18165,0.0,125.0,360.0,1.0,Urban,Y
LP002821,Male,No,0,Not Graduate,Yes,5800,0.0,132.0,360.0,1.0,Semiurban,Y


## Quick Recap

### 1.Adding new column to existing DataFrame in Pandas 

* 1.1 **dataframe['new_column_name'] = [values of new column]:** used to add new column in existing dataframe directly. 
* 1.2 Using **dataframe.insert()** function we can add a column at particular position in dataframe.
* 1.3 Using **dataframe.assign()** function we can add a column at the end of dataframe.

### 2.Delete rows/columns from dataframe using pandas.drop()
* 2.1 Using **dataframe.drop(axis = 0)** function we can delete the rows of dataframe.
* 2.2 Using **dataframe.drop(axis = 1)** function we can delete the colums of dataframe.

#### 3.Pandas dataframe.truncate()
* 3.1 Using **dataframe.truncate(before = ? , after = ?)** we can truncate the data from rows based on given index of before and after.

#### 4.Pandas series.truncate()
* 4.1 Using **series.truncate(before = ? , after = ?)** we can truncate the data based on given index of before and after.

#### 5.Iterating over rows and columns in Pandas dataframe
* 5.1 **iterrows():** Used to iterate over each element of the set, row-wise.
* 5.2 **iteritems():** Used to iterate over each element of the set, column-wise.
* 5.3 **itertuple():** Used to iterate over each row and form a tuple out of them.
* 5.4 **list(dataframe):** Used to iterate over each element of the dataframe, column-wise using for loop.

#### 6.Working with Missing Data in Pandas
* 6.1 Checking for missing values using **isnull():** It return True at place of null value in dataframe and False at place of value.
* 6.2 Checking for missing values using **notnull():** It return False at place of null value in dataframe and True at place of value.
* 6.3 Filling missing values using fillna(), replace() and interpolate()
    * 6.3.1 **fillna():** Filling null values with a single value.
    * 6.3.2 **fillna(method ='pad'):** Filling null values with the previous ones.
    * 6.3.3 **fillna(method ='bfill'):** Filling null value with the next ones.
    * 6.3.4 **replace(to_replace = np.nan, value = ?):** Replace all the NaN values with a particular value in entire dataframe.
    * 6.3.5 **interpolate(method ='linear', limit_direction ='forward'):** Interpolate the missing values using Linear method. Linear method ignore the index and treat the values as equally spaced.
    
    
* 6.4.Dropping missing values

    * 6.4.1 **dropna():** drop rows with at least one NaN value.
    * 6.4.2 **dropna(how='all'):** drop a row whose all data is missing or contain null values(NaN).
    * 6.4.3 **dropna(axis = 1):** drop a columns which have at least 1 missing values.
    * 6.4.4 **dropna(axis = 1, how='all'):** drop a column whose all data is missing or contain null values(NaN)
    * 6.4.5 **dropna(axis = 0, how ='any'):** drop rows with at least 1 NaN value.


#### 7.Pandas Dataframe.sort_values()

* 7.1 **dataframe.sort_values():** is used to sort the dataframe based on given column name.
* 7.2 **dataframe.sort_values("column-name", axis = 0, ascending = True, inplace = True, na_position ='last'):** is used to sort the dataframe based on given column name and it also allow us to keep the NaN value position (last/first).
* 7.2 **dataframe.sort_values(["column-name-1","column-name-2"], axis = 0, ascending = [True, False], inplace = True, na_position ='last'):** is used to sort the dataframe based on multiple columns name and it also allow us to keep the NaN value position (last/first).


 ### Note : axis = 0 is default value in pandas dataframe for functions.