# IPL Matches Data Analysis

As finally, this year IPL Season 13 has started on Sept. 19, 2020 , the cricket mood is on. While watching the first match itself, the idea of analyzing IPL dataset struck my mind and luckily I found one dataset on Kaggle which contains the data of matches held between 2008-2019. So, I shall be analyzing that dataset only. Hope you like my work.

As a first step, let's upload our Jupyter notebook to [Jovian.ml](https://jovian.ml).

In [1]:
project_name = "ipl-data-analysis"

In [2]:
!pip install jovian --upgrade -q

In [3]:
import jovian

<IPython.core.display.Javascript object>

In [4]:
jovian.commit(project=project_name)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

[31m[jovian] Error: Failed to read the Jupyter notebook. Please re-run this cell to try again. If the issue persists, provide the "filename" argument to "jovian.commit" e.g. "jovian.commit(filename='my-notebook.ipynb')"[0m


## Importing Libraries
Let us first import all the libraries which we'll be using in the entire project.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

## Data Preparation and Cleaning

Lets's first load our dataset and take a look on it to have an overview of what our dataset looks like. We will also discard few columns which won't help us in our data visualization.

In [2]:
ipl_df = pd.read_csv('dataset/matches.csv')
ipl_df.head(5)

Unnamed: 0,id,Season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,IPL-2017,Hyderabad,05-04-2017,Sunrisers Hyderabad,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
1,2,IPL-2017,Pune,06-04-2017,Mumbai Indians,Rising Pune Supergiants,Rising Pune Supergiants,field,normal,0,Rising Pune Supergiants,0,7,SPD Smith,Maharashtra Cricket Association Stadium,A Nand Kishore,S Ravi,
2,3,IPL-2017,Rajkot,07-04-2017,Gujarat Lions,Kolkata Knight Riders,Kolkata Knight Riders,field,normal,0,Kolkata Knight Riders,0,10,CA Lynn,Saurashtra Cricket Association Stadium,Nitin Menon,CK Nandan,
3,4,IPL-2017,Indore,08-04-2017,Rising Pune Supergiants,Kings XI Punjab,Kings XI Punjab,field,normal,0,Kings XI Punjab,0,6,GJ Maxwell,Holkar Cricket Stadium,AK Chaudhary,C Shamshuddin,
4,5,IPL-2017,Bangalore,08-04-2017,Royal Challengers Bangalore,Delhi Daredevils,Royal Challengers Bangalore,bat,normal,0,Royal Challengers Bangalore,15,0,KM Jadhav,M Chinnaswamy Stadium,,,


Let us explain the dataset. So, basically we have a lot of rows and columns here in the dataset. It includes the **Season**, **City** , **Venue**  in which the match was held, the **Date** on which the match was held, the **teams** between which the match was played , information related to **toss** , **winner** and **umpires**.

In [7]:
ipl_df.shape

(756, 18)

So, we have 756 rows and 18 columns in total.

In [8]:
ipl_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 756 entries, 0 to 755
Data columns (total 18 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   id               756 non-null    int64 
 1   Season           756 non-null    object
 2   city             749 non-null    object
 3   date             756 non-null    object
 4   team1            756 non-null    object
 5   team2            756 non-null    object
 6   toss_winner      756 non-null    object
 7   toss_decision    756 non-null    object
 8   result           756 non-null    object
 9   dl_applied       756 non-null    int64 
 10  winner           752 non-null    object
 11  win_by_runs      756 non-null    int64 
 12  win_by_wickets   756 non-null    int64 
 13  player_of_match  752 non-null    object
 14  venue            756 non-null    object
 15  umpire1          754 non-null    object
 16  umpire2          754 non-null    object
 17  umpire3          119 non-null    ob

We see that in the **umpire3** column, we have only 119 non-null objects. So we can discard them without any issue. Also, we will discard the **umpire1** and **umpire2** columns since they won't be useful in our data analysis.

We have another dataset that contains data of each ball bowled in the last 12 seasons of IPL. Let's see how it looks like.

In [3]:
deliveries_df = pd.read_csv('dataset/deliveries.csv')
deliveries_df.head()

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batsman,non_striker,bowler,is_super_over,...,bye_runs,legbye_runs,noball_runs,penalty_runs,batsman_runs,extra_runs,total_runs,player_dismissed,dismissal_kind,fielder
0,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,1,DA Warner,S Dhawan,TS Mills,0,...,0,0,0,0,0,0,0,,,
1,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,2,DA Warner,S Dhawan,TS Mills,0,...,0,0,0,0,0,0,0,,,
2,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,3,DA Warner,S Dhawan,TS Mills,0,...,0,0,0,0,4,0,4,,,
3,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,4,DA Warner,S Dhawan,TS Mills,0,...,0,0,0,0,0,0,0,,,
4,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,5,DA Warner,S Dhawan,TS Mills,0,...,0,0,0,0,0,2,2,,,


In [10]:
deliveries_df.shape

(179078, 21)

So we have 179078 rows and 21 columns in this dataset. Well, the number of rows is that high as it contains data of each ball.

In [11]:
deliveries_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 179078 entries, 0 to 179077
Data columns (total 21 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   match_id          179078 non-null  int64 
 1   inning            179078 non-null  int64 
 2   batting_team      179078 non-null  object
 3   bowling_team      179078 non-null  object
 4   over              179078 non-null  int64 
 5   ball              179078 non-null  int64 
 6   batsman           179078 non-null  object
 7   non_striker       179078 non-null  object
 8   bowler            179078 non-null  object
 9   is_super_over     179078 non-null  int64 
 10  wide_runs         179078 non-null  int64 
 11  bye_runs          179078 non-null  int64 
 12  legbye_runs       179078 non-null  int64 
 13  noball_runs       179078 non-null  int64 
 14  penalty_runs      179078 non-null  int64 
 15  batsman_runs      179078 non-null  int64 
 16  extra_runs        179078 non-null  int

In [4]:
# for ipl_df dataframe
discard_columns = ['umpire1','umpire2','umpire3']

In [5]:
ipl_df = ipl_df.drop(discard_columns, axis=1)

In [6]:
ipl_df.head()

Unnamed: 0,id,Season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue
0,1,IPL-2017,Hyderabad,05-04-2017,Sunrisers Hyderabad,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal"
1,2,IPL-2017,Pune,06-04-2017,Mumbai Indians,Rising Pune Supergiants,Rising Pune Supergiants,field,normal,0,Rising Pune Supergiants,0,7,SPD Smith,Maharashtra Cricket Association Stadium
2,3,IPL-2017,Rajkot,07-04-2017,Gujarat Lions,Kolkata Knight Riders,Kolkata Knight Riders,field,normal,0,Kolkata Knight Riders,0,10,CA Lynn,Saurashtra Cricket Association Stadium
3,4,IPL-2017,Indore,08-04-2017,Rising Pune Supergiants,Kings XI Punjab,Kings XI Punjab,field,normal,0,Kings XI Punjab,0,6,GJ Maxwell,Holkar Cricket Stadium
4,5,IPL-2017,Bangalore,08-04-2017,Royal Challengers Bangalore,Delhi Daredevils,Royal Challengers Bangalore,bat,normal,0,Royal Challengers Bangalore,15,0,KM Jadhav,M Chinnaswamy Stadium


Earlier we see that, we had three columns called **umpire1** , **umpire2** and **umpire3**. But we do not need them in our analysis as many of their rows contained NaN values. So, we have discarded them and our dataset now contains 15 columns.

In [7]:
ipl_df.team1.value_counts()

Mumbai Indians                 101
Kings XI Punjab                 91
Chennai Super Kings             89
Royal Challengers Bangalore     85
Kolkata Knight Riders           83
Delhi Daredevils                72
Rajasthan Royals                67
Sunrisers Hyderabad             63
Deccan Chargers                 43
Pune Warriors                   20
Rising Pune Supergiants         15
Gujarat Lions                   14
Kochi Tuskers Kerala             7
Delhi Capitals                   6
Name: team1, dtype: int64

We can see that, these are the all teams that have played in the last 12 seasons of IPL. Few of them like Delhi Capitals, Gujarat Lions, Kochi Tuskers Kerala didn't play in more than 1-2 seasons. That's why their numbers are so low.

In [16]:
ipl_df.result.value_counts()

normal       743
tie            9
no result      4
Name: result, dtype: int64

The result column in the dataset specifies whether the matched ended normally or there was a tie between the teams or the match was cancelled due to rain or some unavoidable reasons. 

In [17]:
import jovian

In [18]:
jovian.commit()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

[31m[jovian] Error: Failed to read the Jupyter notebook. Please re-run this cell to try again. If the issue persists, provide the "filename" argument to "jovian.commit" e.g. "jovian.commit(filename='my-notebook.ipynb')"[0m


## Exploratory Analysis and Visualization

Now that our dataset is good to go, we can analyze it using plots, pie charts and graphs.

Let's load our cleaned dataset again first.

In [19]:
ipl_df.sample(10)

Unnamed: 0,id,Season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue
271,272,IPL-2011,Chennai,01-05-2011,Chennai Super Kings,Deccan Chargers,Chennai Super Kings,bat,normal,0,Chennai Super Kings,19,0,JA Morkel,"MA Chidambaram Stadium, Chepauk"
611,612,IPL-2016,Chandigarh,07-05-2016,Kings XI Punjab,Delhi Daredevils,Delhi Daredevils,field,normal,0,Kings XI Punjab,9,0,MP Stoinis,"Punjab Cricket Association IS Bindra Stadium, ..."
365,366,IPL-2012,Chandigarh,13-05-2012,Deccan Chargers,Kings XI Punjab,Deccan Chargers,bat,normal,0,Kings XI Punjab,0,4,DJ Hussey,"Punjab Cricket Association Stadium, Mohali"
159,160,IPL-2009,Durban,14-05-2009,Rajasthan Royals,Mumbai Indians,Rajasthan Royals,bat,normal,0,Rajasthan Royals,2,0,SK Warne,Kingsmead
14,15,IPL-2017,Delhi,15-04-2017,Delhi Daredevils,Kings XI Punjab,Delhi Daredevils,bat,normal,0,Delhi Daredevils,51,0,CJ Anderson,Feroz Shah Kotla
413,414,IPL-2013,Chennai,25-04-2013,Sunrisers Hyderabad,Chennai Super Kings,Sunrisers Hyderabad,bat,normal,0,Chennai Super Kings,0,5,MS Dhoni,"MA Chidambaram Stadium, Chepauk"
299,300,IPL-2011,Dharamsala,21-05-2011,Deccan Chargers,Kings XI Punjab,Kings XI Punjab,field,normal,0,Deccan Chargers,82,0,S Dhawan,Himachal Pradesh Cricket Association Stadium
25,26,IPL-2017,Rajkot,23-04-2017,Kings XI Punjab,Gujarat Lions,Gujarat Lions,field,normal,0,Kings XI Punjab,26,0,HM Amla,Saurashtra Cricket Association Stadium
232,233,IPL-2010,Mumbai,24-04-2010,Deccan Chargers,Royal Challengers Bangalore,Deccan Chargers,bat,normal,0,Royal Challengers Bangalore,0,9,A Kumble,Dr DY Patil Sports Academy
572,573,IPL-2015,Mumbai,19-05-2015,Mumbai Indians,Chennai Super Kings,Mumbai Indians,bat,normal,0,Mumbai Indians,25,0,KA Pollard,Wankhede Stadium


### Number of matches hosted in different cities

In [8]:
city_host = ipl_df.city.value_counts()
city_host

Mumbai            101
Kolkata            77
Delhi              74
Bangalore          66
Hyderabad          64
Chennai            57
Jaipur             47
Chandigarh         46
Pune               38
Durban             15
Bengaluru          14
Visakhapatnam      13
Centurion          12
Ahmedabad          12
Rajkot             10
Mohali             10
Indore              9
Dharamsala          9
Johannesburg        8
Cuttack             7
Ranchi              7
Port Elizabeth      7
Cape Town           7
Abu Dhabi           7
Sharjah             6
Raipur              6
Kochi               5
Kanpur              4
Nagpur              3
Kimberley           3
East London         3
Bloemfontein        2
Name: city, dtype: int64

In [None]:
plt.figure(figsize=(15,6))
plt.xticks(rotation=90)
plt.title('Number of matches hosted')
count=0
cities=pd.DataFrame(city_host)
cities['name']=city_host.index
for i in cities['city']:
    plt.text(count-0.4,i+1,str(i),color='black')
    count+=1
sns.barplot(city_host.index, city_host);

NameError: name 'plt' is not defined

So, we can see in the barplot that Mumbai has hosted the maximum number of IPL matches followed by Kolkata and Delhi.

### Maximum Runs Scored By Different Players In An Inning

In [None]:
batting_ings = deliveries_df.groupby(['match_id','batsman']).apply(lambda x : np.sum(x['batsman_runs'])).reset_index(name='Innings Runs')
sorted_batting_ings = batting_ings.sort_values(by='Innings Runs',ascending=False)
top_batsmen_scores = sorted_batting_ings[:10] 
ball_faced = deliveries_df.groupby(['match_id','batsman']).apply(lambda x : x['batsman_runs'].count()).reset_index(name='Balls Faced')
batsmen_performance = pd.merge(top_batsmen_scores, ball_faced, how='inner', left_on=['match_id','batsman'], right_on=['match_id','batsman'])
batsmen_performance['Strike Rate for Match'] = batsmen_performance['Innings Runs']*100 / batsmen_performance['Balls Faced']
batsmen_innings = pd.merge(batsmen_performance, deliveries_df, how='inner',left_on=['match_id','batsman'],right_on=['match_id','batsman'])
batsmen_innings_table = batsmen_innings.iloc[:,1:8]
batsmen_innings_table2 = batsmen_innings_table.drop_duplicates()
batsmen_innings_table2

In [None]:
x=batsmen_innings_table2['batsman']
y1=batsmen_innings_table2['Innings Runs']
plt.figure(figsize=(12,6))
plt.scatter(x,y1)
plt.xlabel('Batsmen',size=15)
plt.ylabel('Innings Score',size=15)
plt.title('IPL Best batting performances in an Inning')
plt.xticks(rotation=60)
plt.legend(['Runs']);

We can see that Chris Gayle is the one who has scored the maximum runs(175*) in a match followed by Brendon McCullum and AB de Villiers.

Well, if you are an IPL fan, you must have been aware of this Gayle Storm that hit the Chinnaswamy Stadium on April 23,2013. Gayle scored an unbeaten 175 against Pune Warriors with 17 sixes and 13 fours in just 66 balls and a strike rate of 265.15 . Isn't that something the Universe Boss can only do??

In the above graph, we can see two players, Chris Gayle and AB de Villers, have appeared twice and they both play for Royal Challengers Bangalore. But there's a fun fact, despite having world class players like Gayle, Kohli and ABD , RCB has never won any season of IPL.

### Maximum Wickets taken by bowlers in last 12 seasons

In [None]:
bowling_wickets = deliveries_df[deliveries_df['dismissal_kind']!='run out']
bowling_total = bowling_wickets.groupby('bowler').apply(lambda x : x['dismissal_kind'].dropna()).reset_index(name='Wickets')
bowling_wicket_count = bowling_total.groupby('bowler').count().reset_index()
bowling_top = bowling_wicket_count.sort_values(by='Wickets',ascending=False)
top_bowlers = bowling_top.loc[:,['bowler','Wickets']][0:10]
top_bowlers

Since, run outs are not counted in the bowlers account, we can discard all those such dismissals.

Let's plot it.

In [None]:
plt.figure(figsize=(12,6))
plt.scatter(top_bowlers['bowler'],top_bowlers['Wickets'],color='r');
plt.plot(top_bowlers['bowler'],top_bowlers['Wickets'],color='g');
plt.xticks(rotation=60)
plt.xlabel('Top 10 Bowlers')
plt.ylabel('Wickets Taken')
plt.title('Top 10 Bowlers in last 12 seasons');

Lasith Malinga, the Sri Lankan fast bowler, has taken the maximum number(170) wickets in the last 12 seasons of the IPL, followed by Amit Mishra and Harbhajan Singh.

### Matches Played vs Matches Won

In [None]:
matches_played=pd.concat([ipl_df['team1'],ipl_df['team2']])
matches_played=matches_played.value_counts().reset_index()
matches_played.columns=['Team','Total Matches']
matches_played['wins']=ipl_df['winner'].value_counts().reset_index()['winner']

matches_played.set_index('Team',inplace=True)

In [None]:
matches_played.reset_index()

In [None]:
win_percentage = round(matches_played['wins']/matches_played['Total Matches'],3)*100
win_percentage

We can see the Win percentage of each team. When we look at it closely, we find that Delhi Capitals has the best stats. But when we see the number of matches played by Delhi Capitals, we find that it is quite low as compared with the other teams. This is so because earlier Delhi used to play with the name of Delhi Daredevils and then renamed to Delhi Capitals.

We can see the same data in a bar plot more easily, so let's plot it.

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=90)
plt.title('Matches Played vs. Matches Won')
sns.barplot(matches_played['Total Matches'].index, win_percentage);

In this graph, we can clearly see that Delhi Capitals has the best stats. But now that we know the reason, we can say that Mumbai Indians have the best winning percentage as compared to other teams.

### Maximum Runs Scored By A Team

In [None]:
high_scores = deliveries_df.groupby(['match_id', 'inning','batting_team','bowling_team'])['total_runs'].sum().reset_index() 
high_scores = high_scores[high_scores['total_runs']>=200]
high_scores.nlargest(10,'total_runs')

Royal Challengers Bangalore has scored the maximum runs in one innings against Pune Warriors India, followed by Kolkata Knight Riders against Kings XI Punjab and then again Royal Challengers Bangalore against Gujarat Lions.

Well, the match in which RCB scored 263 runs is the same match win which Chris Gayle scored unbeated 175.

### Biggest win by runs each season

First of all, in the dataset we have a column called Season which contains data as : 'IPL-2017'. But we want the data as : 2017 . So, we will add a new column called Year in the dataset which contains the year in which match was held.

In [None]:
season = ipl_df.Season.values
year = []
for i in range(len(season)):
    year.append(int(season[i][4:8]))
ipl_df['Year'] = year
ipl_df

Now that we have a column called Year, we can compare in which year what was the maximum run by which a team won?

In [None]:
y = 2008
season=[]
win_by_runs_max=[]
while y<2020:
    season.append(y)
    win_by_runs_max.append(ipl_df[ipl_df['Year']==y]['win_by_runs'].max())
    y += 1
win_by_runs_max

We have got a list of runs . We can plot it right away.

In [None]:
plt.plot(season,win_by_runs_max)
plt.scatter(season,win_by_runs_max)
count=0
while count < 12:
    plt.text(season[count]+0.1,win_by_runs_max[count],str(win_by_runs_max[count]),size=14)
    count+=1
plt.xticks(range(2008,2020),fontsize=14)
plt.xlabel('Season')
plt.ylabel('Runs')
plt.title('Biggest win by runs per season',fontsize=14)
plt.show()

In [None]:
import jovian

In [None]:
jovian.commit()

## Asking and Answering Questions

Now, let us answer some basic questions that one might want to know about the IPL seasons.

### Q. Which team won the maximum number of matches in all seasons?

In [None]:
winner = ipl_df.winner.value_counts()
winner

We can see that **Mumbai Indians** have won the maximum number of matches in the last 12 seasons of IPL, followed by **Chennai Super Kings** and **Kolkata Knight Riders**. So, we may say that Mumbai Indians has been the most successful team in the IPL.
We can also see that few teams like Kochi Tuskers Kerala, Delhi Capitals, Pune Warriors, etc. have won very less number of matches. The reason for this is they had played the IPL for just 1 or 2 seasons.

Let's plot the above data using a barplot.

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('Which team won the maximum number of matches in all seasons?')
sns.barplot(winner.index, winner);

### Q. Which player has become the Man of the Match most number of times?

In [None]:
m_o_m = ipl_df.player_of_match.value_counts()
m_o_m

Whoh! The GayleStorm **Chris Gayle** has become Man of the Match most number of times, followed by Mr. 360 **AB de Villiers**, our very own Hitman **Rohit Sharma**, Thalaiva **MS Dhoni** and Reverend **David Warner**.

But here's a quick fun fact : _Despite having outstanding performance in IPL, Chris Gayle had gone unsold twice in the IPL 2018 auction before being taken by Kings XI Punjab at his base price of ₹2 Crores_.

Now, let's again plot a barplot to see the same data more clearly.

In [None]:
m_o_m_10 = m_o_m[:10]
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('Top 10 players with most number of Man of the Match title')
sns.barplot(m_o_m_10.index, m_o_m_10);

### Q. Does winning the toss increase the chances of winning the match?

In [None]:
win = ipl_df.winner == ipl_df.toss_winner
win_counts = win.value_counts()
win_counts

Out of 756 matches in the last 12 seasons, we can see that the toss winning team has won the match 393 times but has lost the match 363 times. Well, the difference is not that much.

Let's see the data in percentage first and then plot a pie chart for it.

In [None]:
round((win_counts/win.count())*100,2)

In [None]:
plt.figure(figsize=(12,6))
plt.title('Does winning the toss increase the chances of winning the match?')
plt.pie(win_counts,labels=win_counts.index,autopct='%1.1f%%', startangle=180);

### Q. Who are the Top 5 scoring batsmen?

In [None]:
batsmen = deliveries_df.groupby('batsman').apply(lambda x : np.sum(x['batsman_runs'])).reset_index(name='Runs')
batsmen_sorted=batsmen.sort_values(by='Runs',ascending=False)
top_batsmen=batsmen_sorted[:5]
top_batsmen

In [None]:
plt.figure(figsize=(12,6))
plt.title('Top 5 Batsmen')
sns.barplot(top_batsmen['batsman'],top_batsmen['Runs']);

Thus, we understood that why Virat Kohli is called the Run Machine. He has scored 5434 runs in total, followed by Suresh Raina with slightly less 5415 runs and Rohit Sharma with 4914 runs.

### Q. What is the maximum run by which a team won?

We know that a team can win by runs only if it bats first. So, to answer this question, we will first separate the teams which have batted first.

In [None]:
batting_first_team = ipl_df[ipl_df.win_by_runs != 0]
batting_first_team.head()

Now that we have a separate dataframe where we have teams who have batted first, we can plot a histogram and visualize the graph.

In [None]:
plt.figure(figsize=(12,6))
plt.hist(batting_first_team['win_by_runs'])
plt.xlabel('Runs')
plt.ylabel('Number of matches')
plt.show()

We can see that in almost around 140 matches, teams have won with a margin of 0-20 runs whereas only one or two team has won match with a margin of 120-140 runs. One of such matches is RCB vs PWI as we have mentioned it already.

### Q. What is the maximum wicket by a which a team won?

Similarly to answer this question, we have to find which team has bowled first and we will separate them out.

In [None]:
bowling_first_team = ipl_df[ipl_df.win_by_wickets != 0]
bowling_first_team.head()

In [None]:
plt.figure(figsize=(12,6))
plt.hist(bowling_first_team['win_by_wickets'])
plt.xlabel('Wickets')
plt.ylabel('Number of matches')
plt.show()

We can see that around 85 matches have been won by 6 wickets, 70 matches with 5 wickets, 80 matches with 7 wickets. We can also find that 10 matches have been won by 10 wickets meaning the opening batsmen were enough for the opponents.

In [None]:
import jovian

In [None]:
jovian.commit()

## Inferences and Conclusion

These are the few conclusions that I can draw from the above analysis.

* There are several players who perform very good in these private tournaments. But sometimes, they aren't even selected in the playing 11 team or sometimes remain unsold. The reason can be their inconsistent performance that they become burden for their team. We can consider Chris Gayle. He performs outstanding in some matches while in other he becomes burden for the team.

* During the analysis, we found that which team can be considered as the most successful team over the last 12 seasons, which batsman is the highest scorer and who has taken most number of wickets.

This analysis can be further more improved by adding more visualizations as the dataset is quite interesting and clean in itself.

In [None]:
import jovian

In [None]:
jovian.commit()

## References and Future Work

- Kaggle (https://www.kaggle.com/lazycoder00/ipl-dataset-20082019) 

In [None]:
import jovian

In [None]:
jovian.commit()