## Pandas
---

### Present notebook discuss the majorly used functions of the _pandas_ (version 1.1.1)

* Install the pandas with pip using following command

```!pip install pandas```

* Load the library as ```pd``` after installation.

In [1]:
# Loading library
import pandas as pd

# This will print entire output of the cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

### 1. ```pd.DataFrame()```
* Make data frame from list

In [2]:
# Declaring lists
names = ["Harry", "John", "Sean", "Paul", "Stacey", "Hannah"]
age = [23,12,45,87,4,52]
height = [180, 120, 167, 170, 94, 160]
weight = [100, 80, 75, 60, 10, 55]

# Making DataFrame
df = pd.DataFrame([names, age, height, weight])
df

Unnamed: 0,0,1,2,3,4,5
0,Harry,John,Sean,Paul,Stacey,Hannah
1,23,12,45,87,4,52
2,180,120,167,170,94,160
3,100,80,75,60,10,55


### 2. Transposing DataFrames
* Data Frame can be transposed by using ```T```, method of the pandas class
* It follows the following syntax,
```df.T```

In [3]:
# Transposing the previous dataFrame
df = df.T
df

Unnamed: 0,0,1,2,3
0,Harry,23,180,100
1,John,12,120,80
2,Sean,45,167,75
3,Paul,87,170,60
4,Stacey,4,94,10
5,Hannah,52,160,55


### 3. Renaming the columns
* There exist many methods to rename the columns of the dataFrame
* ```df.rename()```, is one of them it accepts a dictionary of column names
* The old names are 'key' and new names are the 'value' of the dictionary passed
* ```axis = 1``` denotes changes in x- axis 

In [4]:
# Renaming the dataframe
df = df.rename({0 : 'Name', 1 : 'Age', 2 : 'Height', 3 : 'Weight'}, axis=1)
df

Unnamed: 0,Name,Age,Height,Weight
0,Harry,23,180,100
1,John,12,120,80
2,Sean,45,167,75
3,Paul,87,170,60
4,Stacey,4,94,10
5,Hannah,52,160,55


### 4. Writing and Reading dataFrames
* ```pd.read_csv()```, is used to read csv files. Optional argument of ```sep = ","``` can be used to declare the delimiter
* ```to_csv()```, is a method in which is used to write the dataFrame in a file.
* For writing without index optional argument of ```indec = False``` can be used

In [5]:
# Writting without the row-indexes
df.to_csv("df_without_index.csv", index = False)

# Writting with the row-indexes
df.to_csv("df_with_index.csv",)

# Reading file as it is,
df1 = pd.read_csv("df_without_index.csv")
df1

# Reading file by, declaring the index
df2 = pd.read_csv("df_without_index.csv", index_col=0)
df2

Unnamed: 0,Name,Age,Height,Weight
0,Harry,23,180,100
1,John,12,120,80
2,Sean,45,167,75
3,Paul,87,170,60
4,Stacey,4,94,10
5,Hannah,52,160,55


Unnamed: 0_level_0,Age,Height,Weight
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Harry,23,180,100
John,12,120,80
Sean,45,167,75
Paul,87,170,60
Stacey,4,94,10
Hannah,52,160,55


### 5. Subsetting by column names
* For subsetting a single column use slice operator ```[]```, and pass column names as string
* Single column subset generates a ```pandas-series```, to convert it back to dataFrame use ```pd.DataFrame()``` 
* For subsetting multiple columns pass the list having column names as string

In [6]:
# Subsetting single column
dfSingle = df['Name']
dfSingle = pd.DataFrame(dfSingle)
dfSingle

# Subsetting multiple columns
dfMultiple = df[['Name','Age']]
dfMultiple

Unnamed: 0,Name
0,Harry
1,John
2,Sean
3,Paul
4,Stacey
5,Hannah


Unnamed: 0,Name,Age
0,Harry,23
1,John,12
2,Sean,45
3,Paul,87
4,Stacey,4
5,Hannah,52


### 6. Boolean Subsetting
* It it is simmilar to ```Numpy```'s boolean subsetting
* Conditional operartion returns dataframe having booleans, which can be used to used to subset the dataFrame

In [7]:
# Using df2 as it have all integers 
df2_bool = df2 > 50
df2_bool

# Subsetting df2 with booledn dataFrame
df2_subset = df2[df2_bool]
df2_subset

Unnamed: 0_level_0,Age,Height,Weight
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Harry,False,True,True
John,False,True,True
Sean,False,True,True
Paul,True,True,True
Stacey,False,True,False
Hannah,True,True,True


Unnamed: 0_level_0,Age,Height,Weight
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Harry,,180,100.0
John,,120,80.0
Sean,,167,75.0
Paul,87.0,170,60.0
Stacey,,94,
Hannah,52.0,160,55.0


### 7. NaN Removal
* NaNs are the empty cells of the dataFrame, i.e. they hold no value at all
* They can be removed systematically using ```dropna()``` method
* For removing the entire row having NaNs use ```axis = 0``` as argument
* For removing the entire column having NaNs use ```axis = 1``` as argument

In [8]:
# Removing NaNs containing rows
df_row = df2_subset.dropna(axis = 0)
df_row

# Removing NaNs containing columns
df_col = df2_subset.dropna(axis = 1)
df_col

Unnamed: 0_level_0,Age,Height,Weight
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Paul,87.0,170,60.0
Hannah,52.0,160,55.0


Unnamed: 0_level_0,Height
Name,Unnamed: 1_level_1
Harry,180
John,120
Sean,167
Paul,170
Stacey,94
Hannah,160


### 7. DataFrame Operations (Visual)
* For viewng dersired number of rows use ```head(n)``` method, where ```n``` is the number of rows
* For viewing the column names use ```columns``` methods

In [9]:
# Viewing first 2 rows 
df.head(2)

# Viewing column names
df.columns

Unnamed: 0,Name,Age,Height,Weight
0,Harry,23,180,100
1,John,12,120,80


Index(['Name', 'Age', 'Height', 'Weight'], dtype='object')

### 8. DataFrame Operations (Numerical)
* For generating descriptive statistics use ```describe()``` method
* For finding mamximum values per columns use ```max()``` method
* For finding minimum values per columns use ```min()``` method
* For finding column-wise mean use ```mean()``` method
* For finding column-wise standard-deviation use ```std()``` method

In [10]:
# Using df2 as it holds numerical values

# Descriptive statistics
df2.describe()

# Column-wise maximum values
df2.max()

# Column-wise minimum values
df2.min()

# Column-wise mean
df2.mean()

# Column-wise mean
df2.std()

Unnamed: 0,Age,Height,Weight
count,6.0,6.0,6.0
mean,37.166667,148.5,63.333333
std,30.655614,33.797929,30.60501
min,4.0,94.0,10.0
25%,14.75,130.0,56.25
50%,34.0,163.5,67.5
75%,50.25,169.25,78.75
max,87.0,180.0,100.0


Age        87
Height    180
Weight    100
dtype: int64

Age        4
Height    94
Weight    10
dtype: int64

Age        37.166667
Height    148.500000
Weight     63.333333
dtype: float64

Age       30.655614
Height    33.797929
Weight    30.605010
dtype: float64

### 9. Typecasting to Numpy array
* Use ```to_numpy()```, to convert the numerical dataframe to ```numpy``` array

In [11]:
# Typecasting to array
dfArray = df.to_numpy()
dfArray

array([['Harry', 23, 180, 100],
       ['John', 12, 120, 80],
       ['Sean', 45, 167, 75],
       ['Paul', 87, 170, 60],
       ['Stacey', 4, 94, 10],
       ['Hannah', 52, 160, 55]], dtype=object)

___