![](https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcThdcd1qChQgrNWFj2CoNrlpQGM_6bqVP07d2ZvSfWnKaKiUdwr)

- <a href='#1'>1. Introduction</a>  
- <a href='#2'>2. Loading libraries and retrieving data</a>
- <a href='#3'>3.  Data Visualization</a>

# <a id='1'>1. Introduction</a>


**Overview**

The Indian Premier League (IPL)  is a professional Twenty20 cricket league in India contested during April and May of every year by teams representing Indian cities and some states. The league was founded by the Board of Control for Cricket in India (BCCI) in 2008. The IPL is the most-attended cricket league in the world and in 2014 ranked sixth by average attendance among all sports leagues. There have been ten seasons of the IPL tournament

> 

**Data**

We have 5 different files in the dataset:

**1) DIM_PLAYER.csv:** Details of all the players who have played in IPL alongwith their country, date of birth, batting/bowling style.

**2) DIM_PLAYER_MATCH.csv:**  Various stats of players like team name, captaincy, keeper etc. 

**3) DIM_TEAM.csv: ** IPL team names and ID

**4) FACT_BALL_BY_BALL.csv:** Ball by ball details

**5) DIM_MATCH.csv:** Match details  


# <a id='2'>2. Loading libraries and retrieving data</a>


In [None]:
#Importing libraries 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.offline as offline
offline.init_notebook_mode()
from plotly import tools
import plotly.tools as tls


#Importing the datasets 

players = pd.read_csv("../input/Player.csv", encoding='ISO-8859-1' )
player_match = pd.read_csv("../input/Player_match.csv", encoding='ISO-8859-1' )
team = pd.read_csv("../input/Team.csv", encoding='ISO-8859-1' )
ball_fact = pd.read_csv ("../input/Ball_By_Ball.csv", encoding='ISO-8859-1' )
match = pd.read_csv ("../input/Match.csv", encoding='ISO-8859-1' )




**PLAYER_MATCH**

Let's start by looking at player_match dataset

In [None]:
player_match.head(2)

In [None]:
player_match.describe(include='all')

In [None]:
captain = player_match[player_match['Role_Desc'] == 'Captain']
captain.Player_Name.unique()

# <a id='3'>3. Data Visualization</a>

In [None]:
plt.figure(figsize=(14,6))
sns.countplot(x='Age_As_on_match',data=player_match)

Age is normally distributed. There are some young players,  probably talented enough to start playing early. We also observe some older players, well into there 40's,  still playing in the IPL

**Match**

Now let's analyze the second dataset 'Match'

In [None]:
match.head(2)

In [None]:
match.describe(include='all')

In [None]:
match.isnull().sum(axis=0)

In [None]:
#Number of teams
print("Number of unique teams: ",match.Team1.unique())

In [None]:
#Most man of the matches awards
ManofMatch = match.groupby(['ManOfMach']).count()['match_winner']
ManOfMatch_count = ManofMatch.sort_values(axis=0, ascending=False)
ManOfMatch_count.head()

In [None]:
#number of matches per season
plt.figure(figsize=(8,6))
sns.countplot(x='Season_Year', data=match) 


Number of games increased during 2011-2013

In [None]:
#Number of matches per venue 
plt.figure(figsize=(14,6))
sns.countplot(x='Venue_Name', data=match, order=pd.value_counts(match['Venue_Name']).index) 
plt.xticks(rotation='vertical')
plt.show()

Big cities with a home team have hosted more matches with M Chinnaswamy Stadium leading till 2017 followed by Eden Gardens and Feroz Shah Kotla

In [None]:
#Wins per team
plt.figure(figsize=(8,6))
ax=sns.countplot(x='match_winner', data=match, order=pd.value_counts(match['match_winner']).index) 
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.tight_layout()
plt.show()

Mumbai has most wins followed by Chennai and than Kolkata. Now, let's see who's winning the toss more often

In [None]:
#Toss wins per team
plt.figure(figsize=(8,6))
ax=sns.countplot(x='Toss_Winner', data=match, order=pd.value_counts(match['Toss_Winner']).index) 
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.tight_layout()
plt.show()

Again it's Mumbai who's winning the toss more often. Let's see what are the teams doing after winning the toss over the years

In [None]:
match.replace(to_replace='Field', value = 'field', inplace=True) #Replace 'Field' with 'field'

In [None]:
match.replace(to_replace='Bat', value = 'bat', inplace=True) #Replace 'Bat' with 'bat'

In [None]:
plt.figure(figsize=(12,6))
sns.countplot(x='Season_Year', hue='Toss_Name', data=match)

Teams used to bat first after winning the toss during initial years of IPL.  But we see there's a clear change in this pattern, specially last couple of years. 

In [None]:
match.head()

**PLAYER**

Now let's look at the third dataset 'Player'

In [None]:
players.head(2)

Let's look at the batting and bowling styles of IPL Players

In [None]:
players.replace(to_replace=' Right-hand bat', value = 'Right-hand bat', inplace=True) #Clean the data 

In [None]:
players.replace(to_replace=' Left-hand bat', value = 'Left-hand bat', inplace=True) #Clean the data 

In [None]:

players.replace(to_replace='Right-handed', value = 'Right-hand bat', inplace=True) #Clean the data 

In [None]:
temp = players["Batting_hand"].value_counts()
fig = {
  "data": [
    {
      "values": temp.values,
      "labels": temp.index,
      "domain": {"x": [0, 1]},
      "hole": .6,
      "type": "pie"
    },
    
    ],
  "layout": {
        "title":"Batting Style",
        "annotations": [
            {
                "font": {
                    "size": 17
                },
                "showarrow": False,
                "text": "Batting Style",
                "x": 0.5,
                "y": 0.5
            }
            
        ]
    }
}
iplot(fig, filename='donut')

In [None]:
plt.figure(figsize=(12,6))
ax=sns.countplot(x='Bowling_skill', data=players, order=pd.value_counts(players['Bowling_skill']).iloc[:10].index)
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.show()

Right arm bat and Right arm medium are clearly more popular

Now, let's look at the countries.

In [None]:
agg = players['Country_Name'].value_counts()[:10]
labels = list(reversed(list(agg.index )))
values = list(reversed(list(agg.values)))

trace1 = go.Pie(labels=labels, values=values, marker=dict(colors=['red']))
layout = dict(title='Top Countries', legend=dict(orientation="h"));


fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')


Most players are from India followed by Australia and South Africa

**BALL FACT**

I will now quickly dive into the next dataset Ball Fact. This has lot of information about each ball bowled in the IPL

In [None]:
ball_fact.describe(include='all')

An extra is a run scored by a means other than a batsman hitting the ball. Other than runs scored off the bat from a no-ball, a batsman is not given credit for extras and the extras are tallied separately on the scorecard and count only towards the team's score. Let's see how many extras, aka free runs, the teams are giving:

In [None]:
ball_fact.Extra_Type.unique()

Extra types in our data are not clean. For e.g. 'wides' is also in the dataset as 'Wides', 'noballs --> 'Noballs'. Let's clean this up. 

In [None]:
ball_fact.replace(to_replace='Wides', value = 'wides', inplace=True) #Replace Wides with wides

In [None]:
ball_fact.replace(to_replace='Legbyes', value = 'legbyes', inplace=True) #Legbyes with legbyes

In [None]:
ball_fact.replace(to_replace='Noballs', value = 'noballs', inplace=True) #Noballs with noballs

In [None]:
ball_fact.replace(to_replace='Byes', value = 'byes', inplace=True) #Byes with byes

In [None]:
ball_df = ball_fact[ball_fact['Extra_Type']  != 'No Extras']
agg = ball_df['Extra_Type'].value_counts()[:10]
labels = list(reversed(list(agg.index )))
values = list(reversed(list(agg.values)))

trace1 = go.Bar(x=values, y=labels, opacity=0.75, orientation='h', name="month", marker=dict(color='rgba(0, 20, 50, 0.6)'))
trace1 = go.Pie(labels=labels, values=values)
layout = dict(title='Extras given', legend=dict(orientation="h"));


fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')

More than half of the extra run given are through Wide. 

Dismissal:  Dismissal occurs when the batsman is out (also known as the fielding side taking a wicket and/or the batting side losing a wicket). At this point, a batsman must discontinue batting and leave the field permanently for the innings.  Let's see how the players are getting dismissed:

In [None]:
ball_df = ball_fact[ball_fact['Out_type']  != 'Not Applicable']
agg = ball_df['Out_type'].value_counts()[:10]
labels = list(reversed(list(agg.index )))
values = list(reversed(list(agg.values)))

trace1 = go.Bar(x=values, y=labels, opacity=0.75, orientation='h', name="month", marker=dict(color='rgba(0, 20, 50, 0.6)'))
trace1 = go.Pie(labels=labels, values=values)
layout = dict(title='Out Type', legend=dict(orientation="h"));


fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')

Most of the players are getting caught, followed by getting bowled


*Thanks for reading the Kernel. I will continue updating this. **Please leave a comment** for any suggestions*