## What is Pandas ?
  
Pandas is a software library written for the python programming language
for data **manipulation and analysis.**

* Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas.
* Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.
* The primary two components of pandas are the **Series and DataFrame.**

## Loan dataset

https://www.kaggle.com/animeshparikshya/loan-dataset

## Importing  required libraries

In [75]:
import numpy as np
print('numpy version : ', np.__version__)

import pandas as pd
print('pandas version : ', pd.__version__)

import warnings
warnings.filterwarnings('ignore')

numpy version :  1.19.5
pandas version :  1.1.5


## 1.Series in Pandas
* 1.1 **pd.Series():** used to create a empty series.
* 1.2 **pd.Series(np.array):** used to create a series from given array.
* 1.3 **pd.Series(list):** used to create a series from given list.
* 1.3 **series[:]:** used to access elements of series.
* 1.4 **series[index_val]:** used to access elements from series based on index value.
* 1.5 **series.loc[]:** used to access elements of series.
* 1.6 **series.iloc[]:** used to access elements of series.
* 1.7 **series_1.add(series_2, ):** used to add elements of two series.
* 1.8 **series_1.sub(series_2, ):** used to subtract elements of two series.

### Creating empty series

In [76]:
ser = pd.Series()
ser

Series([], dtype: float64)

### Creating a series using an array

In [77]:
data = np.array(['g', 'e', 'e', 'k', 's'])

ser = pd.Series(data)
ser

0    g
1    e
2    e
3    k
4    s
dtype: object

### Creating a series from list

In [78]:
lst = ['Create', 'series', 'from', 'list', 'elements'] 
ser = pd.Series(lst)
ser

0      Create
1      series
2        from
3        list
4    elements
dtype: object

### Accessing element of Series

In [79]:
data = np.array(['Create', 'series', 'from', 'list', 'elements'])
ser = pd.Series(data)
ser[:3]

0    Create
1    series
2      from
dtype: object

### Accessing Element Using Label (index)

In [80]:
data = np.array(['Create', 'series', 'from', 'list', 'elements'])
ser = pd.Series(data,index=[10,11,12,13,14])

ser[13]

'list'

### Indexing and Selecting Data in Series

In [81]:
df = pd.read_csv("dataset/loan.csv")
df

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
610,LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
611,LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
612,LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


### Indexing a Series using indexing operator []

In [82]:
ser = pd.Series(df['Loan_ID']) 
ser.head(10)

0    LP001002
1    LP001003
2    LP001005
3    LP001006
4    LP001008
5    LP001011
6    LP001013
7    LP001014
8    LP001018
9    LP001020
Name: Loan_ID, dtype: object

### Indexing a Series using .loc[]

In [83]:
data = ser.head(10)
data

0    LP001002
1    LP001003
2    LP001005
3    LP001006
4    LP001008
5    LP001011
6    LP001013
7    LP001014
8    LP001018
9    LP001020
Name: Loan_ID, dtype: object

In [84]:
data.loc[3:6]

3    LP001006
4    LP001008
5    LP001011
6    LP001013
Name: Loan_ID, dtype: object

### Indexing a Series using .iloc[ ]

In [85]:
data.loc[2:6]

2    LP001005
3    LP001006
4    LP001008
5    LP001011
6    LP001013
Name: Loan_ID, dtype: object

### Binary Operation on Series

We can perform binary operation on series like addition, subtraction and many other operation. In order to perform binary operation on series we have to use some function like **.add() and .sub() etc..**

In [86]:
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
 
print(data, "\n\n", data1)

a    5
b    2
c    3
d    7
dtype: int64 

 a    1
b    6
d    4
e    9
dtype: int64


### add() Operation on Series

In [87]:
data.add(data1, fill_value=0)

a     6.0
b     8.0
c     3.0
d    11.0
e     9.0
dtype: float64

### sub() Operation on Series

In [88]:
data.sub(data1, fill_value=0)

a    4.0
b   -4.0
c    3.0
d    3.0
e   -9.0
dtype: float64

## 2.DataFrame in Pandas
* 2.1 **pd.DataFrame():** used to create a empty dataframe.
* 2.2 **pd.DataFrame(list):** used to create a dataframe from given list.
* 2.3 **pd.DataFrame(dictionary):** used to create a dataframe from given dictonary.
* 2.4 **dataframe['column_name']:** used to select single column from dataframe.
* 2.5 **dataframe['column_name_1', 'column_name_2']:** used to select multiple columns from dataframe.
* 2.6 **pd.read_csv():** used to read data from CSV file.
* 2.7 **dataframe.loc['row_index_label']:** used to select single row from dataframe.
* 2.8 **dataframe.iloc['row_index']:** used to select single row from dataframe.
* 2.9 **dataframe.isnull():** used to check NULL and NaN value in dataframe.
* 2.10 **dataframe.fillna():** used to fill NULL and NaN value in dataframe.
* 2.11 **dataframe.dropna():** used to drop NULL and NaN value from dataframe.
* 2.12 In order to iterate over rows, we can use three function **iteritems(), iterrows(), itertuples()**
    * **iteritems():** Helps to iterate over each element of the set, column-wise.
    * **iterrows():** Each element of the set, row-wise.
    * **itertuple():** Each row and form a tuple out of them.
* 2.13 In order to iterate over columns use **dataframe.columns.values.tolist()** and then walk through for loop.

### Creating empty dataframe

In [89]:
df = pd.DataFrame() 
df

### Creating dataframe using list

In [90]:
lst = ['Create', 'dataframe', 'from', 'list', 'elements'] 
df = pd.DataFrame(lst) 
df

Unnamed: 0,0
0,Create
1,dataframe
2,from
3,list
4,elements


### Creating dataframe using dictonary

In [91]:
data = {'Name':['Name_1', 'Name_2', 'Name_3', 'Name_4'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Name_1,20
1,Name_2,21
2,Name_3,19
3,Name_4,18


### Dealing with Rows and Columns in dataframe

In [92]:
data = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Age':[27, 24, 22, 32,26],
             'Address':['Delhi', 'Bangalore', 'Chennai', 'Pune', 'Noida'],
             'Qualification': ['BCA', 'MCA', 'BSC', 'MSC', 'B.Tech']}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Delhi,BCA
1,Name_2,24,Bangalore,MCA
2,Name_3,22,Chennai,BSC
3,Name_4,32,Pune,MSC
4,Name_5,26,Noida,B.Tech


### Selecting a column of dataframe

In [93]:
first = data["Age"]
first

[27, 24, 22, 32, 26]

### Selecting multiple columns of dataframe

In [94]:
df[['Name', 'Qualification']]

Unnamed: 0,Name,Qualification
0,Name_1,BCA
1,Name_2,MCA
2,Name_3,BSC
3,Name_4,MSC
4,Name_5,B.Tech


### Read data from CSV

In [95]:
data = pd.read_csv("dataset/loan.csv", index_col ="Loan_ID")
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


### Selecting a row of dataframe using .loc[]

In [96]:
first = data.loc["LP001002"]
first

Gender                   Male
Married                    No
Dependents                  0
Education            Graduate
Self_Employed              No
ApplicantIncome          5849
CoapplicantIncome           0
LoanAmount                NaN
Loan_Amount_Term          360
Credit_History              1
Property_Area           Urban
Loan_Status                 Y
Name: LP001002, dtype: object

In [97]:
second = data.loc["LP001003"]
second

Gender                   Male
Married                   Yes
Dependents                  1
Education            Graduate
Self_Employed              No
ApplicantIncome          4583
CoapplicantIncome        1508
LoanAmount                128
Loan_Amount_Term          360
Credit_History              1
Property_Area           Rural
Loan_Status                 N
Name: LP001003, dtype: object

### Selecting a row of dataframe using .iloc[]

In [98]:
row4 = data.iloc[3] 
row4

Gender                       Male
Married                       Yes
Dependents                      0
Education            Not Graduate
Self_Employed                  No
ApplicantIncome              2583
CoapplicantIncome            2358
LoanAmount                    120
Loan_Amount_Term              360
Credit_History                  1
Property_Area               Urban
Loan_Status                     Y
Name: LP001006, dtype: object

### Working with missing data of dataframe

In [99]:
dict = {'First_Score':[100, 90, np.nan, 95],
           'Second_Score': [30, 45, 56, np.nan],
           'Third_Score':[np.nan, 40, 80, 98]}
 
df = pd.DataFrame(dict)
df

Unnamed: 0,First_Score,Second_Score,Third_Score
0,100.0,30.0,
1,90.0,45.0,40.0
2,,56.0,80.0
3,95.0,,98.0


In [100]:
df.isnull()

Unnamed: 0,First_Score,Second_Score,Third_Score
0,False,False,True
1,False,False,False
2,True,False,False
3,False,True,False


### Filling missing values using fillna(), replace() and interpolate() 

In [101]:
df.fillna(0)

Unnamed: 0,First_Score,Second_Score,Third_Score
0,100.0,30.0,0.0
1,90.0,45.0,40.0
2,0.0,56.0,80.0
3,95.0,0.0,98.0


### Dropping missing values using dropna()

In [102]:
dict = {'First_Score':[100, 90, np.nan, 95],
           'Second_Score': [30, 45, 56, np.nan],
           'Third_Score':[np.nan, 40, 80, 98]}
 
df = pd.DataFrame(dict)
df

Unnamed: 0,First_Score,Second_Score,Third_Score
0,100.0,30.0,
1,90.0,45.0,40.0
2,,56.0,80.0
3,95.0,,98.0


In [103]:
df.dropna()

Unnamed: 0,First_Score,Second_Score,Third_Score
1,90.0,45.0,40.0


### Iterating over rows

* In order to iterate over rows, we can use three function **iteritems(), iterrows(), itertuples()**

    * **iteritems():** Helps to iterate over each element of the set, column-wise.
    * **iterrows():** Each element of the set, row-wise.
    * **itertuple():** Each row and form a tuple out of them.

In [104]:
dict = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, 6.2, 5.1, 5.2, 5.7],
             'Qualification': ['BCA', 'MCA', 'BSC', 'MSC', 'B.Tech']}
 
df = pd.DataFrame(dict)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,6.2,MCA
2,Name_3,5.1,BSC
3,Name_4,5.2,MSC
4,Name_5,5.7,B.Tech


### iteritems() example

In [105]:
for key, value in df.iteritems():
    print(key, value)
    print()

Name 0    Name_1
1    Name_2
2    Name_3
3    Name_4
4    Name_5
Name: Name, dtype: object

Height 0    5.1
1    6.2
2    5.1
3    5.2
4    5.7
Name: Height, dtype: float64

Qualification 0       BCA
1       MCA
2       BSC
3       MSC
4    B.Tech
Name: Qualification, dtype: object



### iterrows() example

In [106]:
for i, j in df.iterrows():
    print(i, j)
    print()

0 Name             Name_1
Height              5.1
Qualification       BCA
Name: 0, dtype: object

1 Name             Name_2
Height              6.2
Qualification       MCA
Name: 1, dtype: object

2 Name             Name_3
Height              5.1
Qualification       BSC
Name: 2, dtype: object

3 Name             Name_4
Height              5.2
Qualification       MSC
Name: 3, dtype: object

4 Name             Name_5
Height              5.7
Qualification    B.Tech
Name: 4, dtype: object



### itertuples() example

In [107]:
for i in df.itertuples():
    print(i)
    print()

Pandas(Index=0, Name='Name_1', Height=5.1, Qualification='BCA')

Pandas(Index=1, Name='Name_2', Height=6.2, Qualification='MCA')

Pandas(Index=2, Name='Name_3', Height=5.1, Qualification='BSC')

Pandas(Index=3, Name='Name_4', Height=5.2, Qualification='MSC')

Pandas(Index=4, Name='Name_5', Height=5.7, Qualification='B.Tech')



### Iterating over dataframe columns

In [108]:
dict = {'Name': ['Name_1', 'Name_2', 'Name_3', 'Name_4', 'Name_5'],
             'Height': [5.1, 6.2, 5.1, 5.2, 5.7],
             'Qualification': ['BCA', 'MCA', 'BSC', 'MSC', 'B.Tech']}

df = pd.DataFrame(dict)
df

Unnamed: 0,Name,Height,Qualification
0,Name_1,5.1,BCA
1,Name_2,6.2,MCA
2,Name_3,5.1,BSC
3,Name_4,5.2,MSC
4,Name_5,5.7,B.Tech


In [109]:
df.columns.values.tolist()

['Name', 'Height', 'Qualification']

In [110]:
columns = df.columns.values.tolist()
 
for i in columns:
    print (df[i])

0    Name_1
1    Name_2
2    Name_3
3    Name_4
4    Name_5
Name: Name, dtype: object
0    5.1
1    6.2
2    5.1
3    5.2
4    5.7
Name: Height, dtype: float64
0       BCA
1       MCA
2       BSC
3       MSC
4    B.Tech
Name: Qualification, dtype: object


In [111]:
for i in columns:
    print (df[i][2])

Name_3
5.1
BSC


## Quick Recap

### 1.Series in Pandas
* 1.1 **pd.Series():** used to create a empty series.
* 1.2 **pd.Series(np.array):** used to create a series from given array.
* 1.3 **pd.Series(list):** used to create a series from given list.
* 1.3 **series[:]:** used to access elements of series.
* 1.4 **series[index_val]:** used to access elements from series based on index value.
* 1.5 **series.loc[]:** used to access elements of series.
* 1.6 **series.iloc[]:** used to access elements of series.
* 1.7 **series_1.add(series_2, ):** used to add elements of two series.
* 1.8 **series_1.sub(series_2, ):** used to subtract elements of two series.


### 2.DataFrame in Pandas
* 2.1 **pd.DataFrame():** used to create a empty dataframe.
* 2.2 **pd.DataFrame(list):** used to create a dataframe from given list.
* 2.3 **pd.DataFrame(dictionary):** used to create a dataframe from given dictonary.
* 2.4 **dataframe['column_name']:** used to select single column from dataframe.
* 2.5 **dataframe['column_name_1', 'column_name_2']:** used to select multiple columns from dataframe.
* 2.6 **pd.read_csv():** used to read data from CSV file.
* 2.7 **dataframe.loc['row_index_label']:** used to select single row from dataframe.
* 2.8 **dataframe.iloc['row_index']:** used to select single row from dataframe.
* 2.9 **dataframe.isnull():** used to check NULL and NaN value in dataframe.
* 2.10 **dataframe.fillna():** used to fill NULL and NaN value in dataframe.
* 2.11 **dataframe.dropna():** used to drop NULL and NaN value from dataframe.
* 2.12 In order to iterate over rows, we can use three function **iteritems(), iterrows(), itertuples()**
    * **iteritems():** Helps to iterate over each element of the set, column-wise.
    * **iterrows():** Each element of the set, row-wise.
    * **itertuple():** Each row and form a tuple out of them.
* 2.13 In order to iterate over columns use **dataframe.columns.values.tolist()** and then walk through for loop.