# Objectives

At the end of the experiment you will be able to 

* Understand importance of Pandas
* Perform data cleaning, manipulation using Pandas

## Features of Pandas

* Fast and efficient DataFrame object with default and customized indexing.
* Tools for loading data into in-memory data objects from different file formats.
* Data alignment and integrated handling of missing data.
* Reshaping and pivoting of date sets.
* Label-based slicing, indexing and subsetting of large data sets.
* Columns from a data structure can be deleted or inserted.
* Group by data for aggregation and transformations.
* High performance merging and joining of data.
* Time Series functionality.

Now it is time to work on practicals. Following Are the given Exercise:




### Excercise 1: How to import pandas and check the version?

In [None]:
import pandas as pd
pd.__version__

### Excercise 2: Create a Series from Dictionary.

In [None]:
dic = {'Items': ['Biscuits', 'Chocolate']}
pd.Series(dic)


### Exercise 3: Create a DataFrame from Lists ; coloumns heading should be 'Name', 'Age'.

In [None]:
Name = ['Varsha','Thanvi','Yashu']
Age = [18,20,22]
pd.DataFrame({'Name': Name, 'Age': Age})

### Exercise 4: Create a DataFrame from List of Dictionaries.

In [None]:
list_of_dict = [{'Name': 'Varshi', 'Age': '20', 'Marks': 98},
                {'Name': 'Thanu', 'Age': '24', 'Marks': 90},
                {'Name': 'Vyshu', 'Age': '23', 'Marks': 92},
                {'Name': 'Sai', 'Age': '21', 'Marks': 89}]
df =pd.DataFrame(list_of_dict)
print(df)

### Exercise 5: frame a dataset using following data

In [None]:
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
df

### Exercise 6: In Ipl_data group the data by year

In [None]:
print(df.groupby('Year').groups)

### Exercise 7: In Ipl_data group the data by Team and year

In [None]:
print(df.groupby(['Year','Team']).groups)

### Exercise 8: Iterating through Groups using year

In [None]:
grouped = df.groupby('Year')
for year, data in grouped:
  print(year)
  print(data)

### Exercise 9: Group the data and get only 2014 year

In [None]:
grouped = df.groupby('Year')
print(grouped.get_group(2014))

### Exercise 10: In points do the mean using .agg i.e aggregate

In [None]:
import numpy as np
grouped = df.groupby('Year')
print(grouped['Points'].agg(np.mean))

### Exercise 11: Find size of dataset using **.agg** based on team

In [None]:
import numpy as np
grouped = df.groupby('Team')
print(grouped.agg(np.size))

### Exercise 12: Create two Dataframes name it as 'left' and other 'right' using following data

In [None]:
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]})

right = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]})

print(left)
print()
print(right) 

### Exercise 13: Merge the left and right based on 'id'

In [None]:
print(pd.merge(left,right,on="id"))

### Exercise 14: Merge the left and right based on 'id' and 'subject_id'

In [None]:
print(pd.merge(left,right,on=["id","subject_id"]))

### Exercise 15: Merge the left and right based on 'subject_id' ,left join

In [None]:
print(pd.merge(left,right,on="subject_id",how ="left"))

### Exercise 16: Merge the left and right based on 'subject_id' ,right join

In [None]:
print(pd.merge(left,right,on="subject_id",how="right"))

### Exercise 17: Merge the left and right based on 'subject_id' ,outter join

In [None]:
print(pd.merge(left,right,on="subject_id",how ="outer"))

### Exercise 18: Merge the left and right based on 'subject_id' ,inner join

In [None]:
print(pd.merge(left,right,on="subject_id",how = "inner"))

### Exercise 19: Concatinate two dataframes using following data and the index should not repeat

In [None]:
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])

two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])

pd.concat([one,two]).reset_index(drop=True)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Exercise 20: Change the header names of given data using .read_csv

In [None]:
cols = ['Name','sex','Age','Height','Weight']
df = pd.read_csv('/biostats.csv',names=cols)
df