##  Data Manupulation using Pandas


#### Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures.

##### Note: 
1. First Clean the Evironment (Go to "Kernel" Menu --> "Restart & Clean Output"
2. To execute the code --> Click on a cell and press cntrl + enter key


## Key Features of Pandas
- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.


## Working with Pandas

## 1. Import pandas library

In [1]:
#This command imports all the methods related to pandas.

import pandas as pd


## 2 Let's start with Series

#### Series is a one-dimensional labeled array

### 2.1 A Series is created with data from 1 to 9

In [2]:
#import pandas as pd

a = [1, 3, 5, 7, 9, 2, 4, 6, 8]
a1 = pd.Series(a)

print(a1)


0    1
1    3
2    5
3    7
4    9
5    2
6    4
7    6
8    8
dtype: int64


### 2.2 A Series has been created with Data along with it's Index

In [3]:
import pandas as pd

a1 = [1,3,5,7,9,2,4,6,8]
a2 = ['a','arun','b','c','d','e','f','g','h']
a3 = pd.Series(a1,a2)

# print(a3)
#a3[0]

print(a3)
print(a3['g'])
print(a3[-2])




a       1
arun    3
b       5
c       7
d       9
e       2
f       4
g       6
h       8
dtype: int64
6
6


### 2.3 Creating a series with the help of a dictionary

In [4]:
import pandas as pd

dict1 = {'Oranges':3, 'Apples':4, 'Mangoes':2, 'Banana':12}
dict2 = pd.Series(dict1)

print (dict2)
print (type(dict1))
print(dict2['Apples'])

Oranges     3
Apples      4
Mangoes     2
Banana     12
dtype: int64
<class 'dict'>
4


### 2.4 Creating a series with the help of Nested List

In [5]:
import pandas as pd

Array1 = [[1,3,5],[2,4,6]]
Array2 = pd.Series(Array1)

print (Array2)
type(Array2)


0    [1, 3, 5]
1    [2, 4, 6]
dtype: object


pandas.core.series.Series

## 2 DataFrames
#### DataFrames are 2 dimensional data structure which are defined in PANDAS which has rows and columns.

### 2.1 Creating a data frame with dictionary

In [6]:
import pandas as pd

Data = {'Age':[23,33,12],'Name':['Rahul','John','Robert']}

Data1 = pd.DataFrame(Data)

print(Data1)


   Age    Name
0   23   Rahul
1   33    John
2   12  Robert


### 2.2 Creating a data frame with lists

In [7]:
# import pandas as pd

Data2 = [[4,1900],[3,1600],[2,1100],[1,850]]
Data3 = pd.DataFrame(Data2)#, columns = ['No_of_Bedrooms','Square_Feet'])
print(Data2)
print("")
print (Data3)


[[4, 1900], [3, 1600], [2, 1100], [1, 850]]

   0     1
0  4  1900
1  3  1600
2  2  1100
3  1   850


### 2.3 Assigning indexes within a data frame

In [8]:
import pandas as pd

Data4 = {'Name':['Ankit','Rishitha','Karthik','Vishnu'],'Marks':[78,67,98,56]}
Data5 = pd.DataFrame(Data4,index = ['Rank 2','Rank 3','Rank 1','Rank 4'])

print (Data5)


            Name  Marks
Rank 2     Ankit     78
Rank 3  Rishitha     67
Rank 1   Karthik     98
Rank 4    Vishnu     56


### 2.4 Creating dataframes from list of dictionaries

In [9]:
import pandas as pd

Data6 = [{'A':65,'B':66,'C':67},{'A':97,'B':98,'D':99}]
Data7 = pd.DataFrame(Data6)
a = Data7.iloc[:,1:3]
print(a)
print (Data7)
print(type(Data6))
print(type(Data7))
print(Data6[1])

    B     C
0  66  67.0
1  98   NaN
    A   B     C     D
0  65  66  67.0   NaN
1  97  98   NaN  99.0
<class 'list'>
<class 'pandas.core.frame.DataFrame'>
{'A': 97, 'B': 98, 'D': 99}


### 2.5 Creating a dataframe with the help of timestamp and categorical.

In [10]:
import numpy as np
import pandas as pd

Data8 = pd.DataFrame({'A':[1,2,3,'',5], 'B':pd.Timestamp('20190305'),'C':np.array([3]*5)
                     , 'D' : pd.Categorical(["Test","Train","Car","Bike", "Bus"])
                     , 'E':'Hello, Welcome!'})
print(Data8)


## 3 Working with data file (csv)

   A          B  C      D                E
0  1 2019-03-05  3   Test  Hello, Welcome!
1  2 2019-03-05  3  Train  Hello, Welcome!
2  3 2019-03-05  3    Car  Hello, Welcome!
3    2019-03-05  3   Bike  Hello, Welcome!
4  5 2019-03-05  3    Bus  Hello, Welcome!


### 3.1 Read csv file

In [11]:
import pandas as pd

LOL = pd.read_csv('League_of_Legends.csv')
LOL


FileNotFoundError: [Errno 2] No such file or directory: 'League_of_Legends.csv'

### 3.2 Get the dimention of the dataset

In [None]:

LOL.shape


### 3.3 Top 5 rows of the Data Set

In [None]:

LOL.head(10)


### 3.4 Bottom 5 rows of the Data Set

In [None]:

LOL.tail()


### 3.5 Get all column names of the Data Set

In [None]:

LOL.columns


### 3.6 Get the statistical summary of the data

In [None]:

LOL.describe()


### 3.7 Get the information related to the Data Frame

In [None]:

LOL.info()


### 3.8 Transposing the Dataframe

In [None]:

LOL.T.head()


### 3.9 Get columns using column names

In [None]:

LOL.loc[:,['gameId','redKills','blueKills']].tail(10)


### 3.10 Get columns using position

In [None]:

LOL.iloc[:,-1]


### 3.11 Get the mean of the all the columns present in the dataset

In [None]:

LOL.mean()


### 3.12 Get the correlation of the all the columns present in the dataset

In [None]:

LOL.corr()
 

### 3.13 Get the maximum value of the data set present in each column

In [None]:

LOL.max().head(10)


### 3.14 Get the minimum value of the dataset of each column

In [None]:

LOL.min().tail(10)
#print(type(LOL))


### 3.15 Get the median of the Dataset

In [None]:

LOL.median().head(13)


### 3.16 Get the standard deviation of the dataset

In [None]:

LOL.std().head(10)


### 3.17 Append the dataset with the same dataset

In [None]:

print(LOL.shape)

LOL_temp = LOL.append(LOL)

print(LOL_temp.shape)


### 3.18 Drop the duplicates present in the dataset.

In [None]:

print(LOL_temp.shape)

LOL_temp = LOL_temp.drop_duplicates()

print(LOL_temp.shape)


### 3.19 IsNull: This returns true or false depending on the status of the cell

In [None]:
#import pandas as pd

#LOL = pd.read_csv('League_of_Legends.csv')
LOL.isnull()


### 3.20 Aggregate of all the values which are null

In [None]:

LOL.isnull().sum()


### 3.21 Drop NA values (delete rows)

In [None]:
import pandas as pd
import numpy as np

Data9 = pd.DataFrame({"Name":["Iron-Man","Wonder-Woman","Avengers", "Abc"],
                     "House":["Marvel","DC Comics","Marvel", np.NaN],
                     "Start":[pd.NaT,pd.Timestamp("2017-05-15"),pd.NaT,pd.NaT]})

Data9


In [None]:

Data9.dropna()


### 3.22 Drop the columns where there are null values

In [None]:

Data9.dropna(axis = 'columns')


### 3.23 Drop the entire row and column if ALL THE VALUES are null

In [None]:

Data9.dropna(how = 'all')


### 3.24 Drop the null values where they are present

In [None]:

Data9.dropna(how = 'any')


### 3.25 fill the null values with '0'

In [None]:
import pandas as pd
import numpy as np

Data10 = pd.DataFrame([[3,np.nan,4,2],[5,2,np.nan,9],
                       [np.nan,np.nan,7,np.nan],[4,np.nan,5,np.nan]]
                      ,columns=list('PQRS'))
Data10

In [None]:

Data10.fillna(0)


### 3.26 Replace Values

In [None]:

Replace_Values = {'P':10,'Q':11,'R':12,'S':13}

Data10.fillna(Replace_Values)


### 3.27 Fill null values only once which are specified by the user

In [None]:

Data11=Data10.fillna(Replace_Values, limit = 1)
Data11

### 3.28 Calculated the mean of column (ignore NA)

In [None]:

Mean1 = Data10['R'].mean()
Mean1


### 3.29 Filled the missing values with the calculated mean

In [None]:

Data10['R'].fillna(Mean1,inplace= False)


In [None]:
Data10


### 3.30 Describe the dataset

In [None]:

Data10.describe()


### 3.31 Describe the column of  dataset

In [None]:

Data10['P'].describe()


### 3.32 Fill the missing valuse

In [None]:
Mean1 = Data10['P'].mean()
Mean2 = Data10['Q'].mean()
Mean3 = Data10['R'].mean()
Mean4 = Data10['S'].mean()


In [None]:
Data10['P'].fillna(Mean1,inplace= True)

Data10['Q'].fillna(Mean2,inplace= True)
Data10['R'].fillna(Mean3,inplace= True)
Data10['S'].fillna(Mean4,inplace= True)


In [None]:

Data10


In [None]:
Data10['P'].mean()