<h1 style="color:red">What is pandas</h1>
<p>Pandas is an open source library in python used mainly for the purpose of data analysis, data manipulation and data exploration</p>

<p><i>[pandas] is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. — Wikipedia</i></p>

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcShYvd6EGuSC3rbnEC0S-uyyVJdeuBDJnB8oUoKzeVXgj_Rx34A"/>

<p>The readme in the official pandas github repository describes pandas as “a Python package providing <b>fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive</b>. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.</p>

<h3>When can I use pandas?</h3>
<p>
    a. Calculate statistics and answer questions about the data, like<br>
        ------- What's the average, median, max, or min of each column?<br>
        ------- Does column A correlate with column B?<br>
        ------- What does the distribution of data in column C look like?<br>
    b. Clean the data by doing things like removing missing values and filtering rows or columns by some criteria<br>
    c. Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles, and more.<br>
    d. Store the cleaned, transformed data back into a CSV, other file or database<br>
</p>

<h3>What is so great about Pandas?</h3>
<p>
    1. It has got tons of functionality to help you in every possible scenario.<br>
    2. Kickass documentation<br>
    3. Open Source - Active community and active development.<br>
    4. Plays well with other libraries like numpy and scikit.learn<br>
    
    
</p>


<h3>Pandas Popularity</h3>
<img src="https://storage.googleapis.com/lds-media/images/the-rise-in-popularity-of-pandas.width-1200.png"/>

<h1 style="color:red">Importing Stuff</h1>

#### 1. Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#### 2. Import Datasets

In [3]:
match=pd.read_csv('matches.csv')
delivery=pd.read_csv('deliveries.csv')
company=pd.read_csv('Fortune501.csv')
titanic=pd.read_csv('titanic.csv')
food=pd.read_csv('food.csv')

In [9]:
type(food)

pandas.core.frame.DataFrame

In [8]:
food

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends
0,Nitish,M,Kolkata,Weekly,Burger,11
1,Anu,F,Gurgaon,Daily,Sandwich,14
2,Mukku,M,Kolkata,Once,Vada,25
3,Suri,M,Kolkata,Monthly,Pizza,56
4,Rajiv,M,Patna,Never,Paneer,34
5,Vandanda,F,Patna,Once,Chicken,23
6,Piyush,M,Ranchi,Never,Chicken,67
7,Radhika,F,Mumbai,Monthly,Pizza,43
8,Sunil,M,Mumbai,Monthly,Vada,34
9,Madhuri,F,Pune,Daily,Paneer,66


In [10]:
titanic.shape

(891, 12)

In [11]:
delivery.shape

(150460, 21)

<h1 style="color:red">Series and Dataframes</h1>

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRK5vl7PcWTN02CXdNczGUYxwtuJRwuAueqfhzzca4Jq6RjH2CZ"/>

#### 1. The Shape attribute

In [13]:
match.shape

(636, 18)

In [14]:
titanic.shape

(891, 12)

In [15]:
delivery.shape

(150460, 21)

#### 2. The columns attribute

In [16]:
titanic.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

#### 3. The head() and tail() method

In [17]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [18]:
titanic.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [19]:
titanic.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


In [20]:
titanic.tail(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


#### 4. The info() method

In [21]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB


#### 5. The describe() method

#### 6. The nunique/unique() method

#### 7. The astype() method

#### 8. Extracting one column

#### 9. Extracting multiple columns

#### 10. Creating a new column

#### 11. Extracting one row

#### 12. Extracting multiple rows

#### 13. Extracting both rows and columns

#### 14. The value_counts() method

#### 15. Filtering data based on a condition

#### 16. Filtering data based on multiple conditions

<h3 style="color:#00a65a">Exercise 1 : Find the total number of matches that have been played in the IPL</h3>

<h3 style="color:#00a65a">Exercise 2 : Find the top 5 teams in terms of number of matches won</h3>

<h3 style="color:#00a65a">Exercise 3 : At which venue most number of matches have been played?</h3>

#### 17. The plot() method

<h3 style="color:#00a65a">Exercise 4 : Find the top 5 teams who have played the most number of matches?</h3>

<h3 style="color:#00a65a">Exercise 5 : Find the player who has won the most number of player of the match award in Chennai?</h3>

<h3 style="color:#00a65a">Exercise 6 : What percentage of teams opt to bat first after winning the toss?</h3>

#### 18. The sort_values() method

#### 19. The set_index() method

#### 20. The inplace parameter

#### 21. The sort_index() method

#### 22. The reset_index() method

#### 23. Maths functions

#### 24. The drop_duplicates() method

<h3 style="color:#00a65a">Exercise 7 : List down all the IPL winning teams year-wise?</h3>

#### 25. The groupby() method

<h3 style="color:#00a65a">Exercise 8 : Find the top 5 most successful batsman in the hostory of IPL</h3>

<h3 style="color:#00a65a">Exercise 9 : Find the top 5 batsman who have hit the most number of 6's</h3>

<h3 style="color:#00a65a">Exercise 10 : Find the top 5 bowlers</h3>

<h3 style="color:#00a65a">Exercise 11 : Against which team has Virat Kohli scored most number of his runs?</h3>

<h3 style="color:#00a65a">Exercise 12 : Against which bowler has R?</h3>