## What is Pandas ?
  
Pandas is a software library written for the python programming language
for data **manipulation and analysis.**

* Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas.
* Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.
* The primary two components of pandas are the **Series and DataFrame**

## Loan dataset

https://www.kaggle.com/animeshparikshya/loan-dataset

## Importing  required libraries

In [1]:
import numpy as np
print('numpy version : ', np.__version__)

import pandas as pd
print('pandas version : ', pd.__version__)

import warnings
warnings.filterwarnings('ignore')

numpy version :  1.19.5
pandas version :  1.1.5


## 1.Read CSV using pandas.read_csv()

* 1.1 **pd.read_csv():** used to read data from CSV file. 
* 1.2 **header = [1, 2]:** param used to treat given rows number as header.
* 1.3 **index_col = 'column_name':** param used to treat given column as header.
* 1.4 **usecols = ['column_name']:** param used to get only specified column name from CSV.
* 1.5 **skiprows = [1, 2]:** param used to skip given rows from CSV while reading data from CSV.

### Read data from CSV

In [2]:
data = pd.read_csv("dataset/loan.csv")
data

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
610,LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
611,LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
612,LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


### Make the first two rows as header.

In [3]:
data = pd.read_csv("dataset/loan.csv", header =[1, 2])
data

Unnamed: 0_level_0,LP001002,Male,No,0,Graduate,No,5849,0,Unnamed: 8_level_0,360,1,Urban,Y
Unnamed: 0_level_1,LP001003,Male,Yes,1,Graduate,No,4583,1508,128,360,1,Rural,N
0,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
1,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
2,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
3,LP001011,Male,Yes,2,Graduate,Yes,5417,4196.0,267.0,360.0,1.0,Urban,Y
4,LP001013,Male,Yes,0,Not Graduate,No,2333,1516.0,95.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...
607,LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
608,LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
609,LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
610,LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


### Make column as Index column.

In [4]:
data = pd.read_csv("dataset/loan.csv", index_col = 'Loan_ID')
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


### Use only selected columns from CSV file.

In [5]:
data = pd.read_csv("dataset/loan.csv", usecols =["Education"])
data

Unnamed: 0,Education
0,Graduate
1,Graduate
2,Graduate
3,Not Graduate
4,Graduate
...,...
609,Graduate
610,Graduate
611,Graduate
612,Graduate


In [6]:
data = pd.read_csv("dataset/loan.csv", usecols =["Gender", "Education"])
data

Unnamed: 0,Gender,Education
0,Male,Graduate
1,Male,Graduate
2,Male,Graduate
3,Male,Not Graduate
4,Male,Graduate
...,...,...
609,Female,Graduate
610,Male,Graduate
611,Male,Graduate
612,Male,Graduate


### Skip rows whlile reading data from CSV file.

In [7]:
data = pd.read_csv("dataset/loan.csv", skiprows = [1, 2, 3, 4])
data

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
1,LP001011,Male,Yes,2,Graduate,Yes,5417,4196.0,267.0,360.0,1.0,Urban,Y
2,LP001013,Male,Yes,0,Not Graduate,No,2333,1516.0,95.0,360.0,1.0,Urban,Y
3,LP001014,Male,Yes,3+,Graduate,No,3036,2504.0,158.0,360.0,0.0,Semiurban,N
4,LP001018,Male,Yes,2,Graduate,No,4006,1526.0,168.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...
605,LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
606,LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
607,LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
608,LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


## 2.Saving a Pandas dataframe as a CSV

* 2.1 **pd.to_csv():** used to write dataframe data into CSV file. 
* 2.2 **header = False:** param used to remove header from CSV file.
* 2.3 **index = False:** param used to remove index from CSV file.


### Create a dataframe.

In [8]:
data = {'Name':['Name_1', 'Name_2', 'Name_3', 'Name_4'], 
             'Age':[27, 24, 22, 32], 
             'Address':['Nagpur', 'Delhi', 'Bangalore', 'Meerut'], 
             'Qualification':['MSC', 'M.A', 'MCA', 'PHD']} 
   
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Address,Qualification
0,Name_1,27,Nagpur,MSC
1,Name_2,24,Delhi,M.A
2,Name_3,22,Bangalore,MCA
3,Name_4,32,Meerut,PHD


### Save dataframe data into CSV file

In [9]:
df.to_csv('dataset/csv_file.csv')

In [10]:
data = pd.read_csv("dataset/csv_file.csv")
data

Unnamed: 0.1,Unnamed: 0,Name,Age,Address,Qualification
0,0,Name_1,27,Nagpur,MSC
1,1,Name_2,24,Delhi,M.A
2,2,Name_3,22,Bangalore,MCA
3,3,Name_4,32,Meerut,PHD


### Save dataframe data into CSV file without header and index

In [11]:
df.to_csv('dataset/csv_file.csv', header=False, index=False)

In [12]:
data = pd.read_csv("dataset/csv_file.csv")
data

Unnamed: 0,Name_1,27,Nagpur,MSC
0,Name_2,24,Delhi,M.A
1,Name_3,22,Bangalore,MCA
2,Name_4,32,Meerut,PHD


## Quick Recap

### 1.Read CSV using pandas.read_csv()

* 1.1 **pd.read_csv():** used to read data from CSV file. 
* 1.2 **header = [1, 2]:** param used to treat given rows number as header.
* 1.3 **index_col = 'column_name':** param used to treat given column as header.
* 1.4 **usecols = ['column_name']:** param used to get only specified column name from CSV.
* 1.5 **skiprows = [1, 2]:** param used to skip given rows from CSV while reading data from CSV.


### 2.Saving a Pandas dataframe as a CSV

* 2.1 **pd.to_csv():** used to write dataframe data into CSV file. 
* 2.2 **header = False:** param used to remove header from CSV file.
* 2.3 **index = False:** param used to remove index from CSV file.
