## **Exploratory Data Analysis - Sports**


* "Exploratory Data Analysis" was performed on the "Indian Premier League" dataset.
### Problems
* Perform ‘Exploratory Data Analysis’ on dataset ‘Indian Premier League’
* As a sports analysts, find out the most successful teams, players and factors contributing win or loss of a team.
* Suggest teams or players a company should endorse for its products. 


# Author: Muhammet Varlı

In [None]:
from warnings import filterwarnings
filterwarnings('ignore')

In [None]:
# Some Libraries Imported
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# **1. The Story of the Dataset**
### **Information about some variables used in the Data Set.**
#### **1st "matches" Dataset**
* id: The IPL match id.
* season: The IPL season
* city: The city where the IPL match was held.
* date: The date on which the match was held.
* team1: One of the teams of the IPL match
* team2: The other team of the IPL match
* toss_winner: The team that won the toss
* toss_decision: The decision taken by the team that won the toss to ‘bat’ or ‘field’
* result: The result(‘normal’, ‘tie’, ‘no result’) of the match.
* dl_applied: (1 or 0)indicates whether the Duckworth-Lewis rule was applied or not.
* winner: The winner of the match.
* win_by_runs: Provides the runs by which the team batting first won
* win_by_runs: Provides the number of wickets by which the team batting second won.
* player_of_match: The outstanding player of the match.
* venue: The venue where the match was hosted.
* umpire1: One of the two on-field umpires who officiate the match.
* umpire2: One of the two on-field umpires who officiate the match.
* umpire3: The off-field umpire who officiates the match
#### **2nd "deliveries" Dataset**
* match_id: Unique Identifier for a match
* inning: Match innings - 1st innings/2nd innings
* batting_team: Name of the batting team
* bowling_team: Name of the bowling team
* over: Current over
* ball: Current ball of the over
* batsman: Name of the batsman on strike
* non_striker: Name of the batsman on non-striker's end
* bowler: Name of the bowler
* is_super_over: Is this a super-over (0 or 1)
* wide_runs: Runs given as wide
* bye_runs: Runs given as bye
* legbye_runs: Runs given as leg-bye
* noball_runs: Runs given as no-ball
* penalty_runs: Runs given as penalty
* batsman_runs: Runs scored by the batsman
* extra_runs: Total extra runs (Wide, Bye, No-ball, Penalty)
* total_runs: Total runs from the ball (extra_runs, batsman_runs)
* player_dismissed: Name of the player dismissed (If out)
* dismissal_kind: How the player was dismissed (If out)
* fielder: Fielder involved in the dismissal (If any)

# **2. Data Read**

* Read First Dataset

In [None]:
matches = pd.read_csv("../input/tsf-datasets/matches.csv")
matches.head()

In [None]:
matches.info()

* Read Second Dataset

In [None]:
deliveries = pd.read_csv("../input/tsf-datasets/deliveries.csv")
deliveries.head()

In [None]:
deliveries.info()

# **3. Data Cleaning**

In [None]:
matches.isnull().sum()

In [None]:
# !pip install missingno

In [None]:
 # Observation of Missing Value of Data
import missingno as msno

msno.matrix(matches);

In [None]:
#Percentage of NAN Values 
per_Nan = [(c, matches[c].isna().mean()*100) for c in matches]
per_Nan = pd.DataFrame(per_Nan, columns=["column_name", "Percentage"])

In [None]:
per_Nan

* The variable 'umpire3' was dropped because it contains too much missing data.

In [None]:
matches.drop('umpire3',axis = 1, inplace=True)

In [None]:
deliveries.isnull().sum()

**player_dismissed:** Name of the player dismissed (If out)
**dismissal_kind:** How the player was dismissed (If out)
**fielder:** Fielder involved in the dismissal (If any)
* It is normal for these features to contain missing data. Because not every player may have taken these actions.

In [None]:
matches['team1'].unique()

In [None]:
matches['team2'].unique()

In [None]:
deliveries['batting_team'].unique()

In [None]:
deliveries['bowling_team'].unique()

* 'Rising Pune Supergiant' and 'Rising Pune Supergiants' These two are the same statement.

In [None]:
matches.replace('Rising Pune Supergiant','Rising Pune Supergiants', inplace = True)

In [None]:
deliveries.replace('Rising Pune Supergiant','Rising Pune Supergiants', inplace = True)

In [None]:
matches.loc[matches['city'].isnull()]

In [None]:
matches['city'].fillna('Dubai', inplace = True)

In [None]:
matches['city'].unique()

* 'Bangalore' and 'Bengaluru' These two are the same statement.

In [None]:
matches.replace('Bengaluru','Bangalore', inplace = True)

In [None]:
deliveries.replace('Bengaluru','Bangalore', inplace = True)

# **3. Data Visualization**

* **1. Number of matches in the season.**

In [None]:
# lets see how many matches are being played every season
plt.subplots(figsize=(15,5))
sns.countplot(x = 'season', data = matches, palette = 'inferno')
plt.xlabel('Season',fontsize=15)
plt.ylabel('Number of Matches Played',fontsize=15)
plt.title('Number of Matches in Each Season',fontsize=20)
plt.show()

* The most matches were played in 2013 and the least in 2009.

* **2. Number of matches played by each team.**

In [None]:
num_matches = pd.concat([matches['team1'], matches['team2']])

num_matches = num_matches.value_counts()

plt.figure(figsize=(15,7))
plt.bar(x=num_matches.index, height=num_matches.values,color='yellow')
plt.title('Number of Matches played by each team',fontsize=20)
plt.xlabel('Team',fontsize=15)
plt.ylabel('Number of Matches',fontsize=15)
plt.xticks(rotation=90,fontsize=15)

for i,v in enumerate(num_matches.values):
    plt.text(x=i, y=v+2, s=v)
    
plt.show()    

* Maximum number of matches played by Mumbai Indians, Royal Challengers Bangalore, Kolkata Knight Riders.

* **3. Total number of wins by each team**

In [None]:
sns.set(style='darkgrid')
fig=plt.gcf()
fig.set_size_inches(18.5,10.5)
wins=pd.DataFrame(matches['winner'].value_counts())
wins['name']=wins.index
plt.xticks(rotation=90,fontsize=15)
plt.yticks(fontsize=16)
plt.bar(wins['name'],
        wins['winner'],
        color=['#15244C','#FFFF48','#292734','#EF2920','#CD202D','#ECC5F2',
               '#294A73','#D4480B','#242307','#FD511F','#158EA6','#E82865',
               '#005DB7','#C23E25','#E82865']
        ,alpha=0.8)
count=0
for i in wins['winner']:
    plt.text(count-0.15,i-4,str(i),size=15,color='black',rotation=90)
    count+=1
plt.title('Total Wins by Each Team',fontsize=20)
plt.xlabel('Teams',fontsize=15)
plt.ylabel('Total Number of Matches Won (2008-2019)',fontsize=14)
plt.show()

* Maximum number of matches played by Mumbai Indians, Chennai Super Kings, Kolkata Knight Riders.

* **4. Visualization of the total number of matches won by the teams for all seasons.**

In [None]:
winner_by_season = matches.groupby('season')['winner'].value_counts()

In [None]:
groups = winner_by_season.groupby('season')
fig = plt.figure()
count = 1

for year, group in groups:
    ax = fig.add_subplot(4,3,count)
    ax.set_title(year)
    ax = group[year].plot.bar(figsize = (15,20), width = 0.8,color='yellow')
    
    count+=1;
    
    plt.xlabel('')
    plt.yticks([])
    plt.ylabel('No. of matches won')
    
    total_of_matches = []
    for i in ax.patches:
        total_of_matches.append(i.get_height())
    total = sum(total_of_matches)
    
    for i in ax.patches:
        ax.text(i.get_x()+0.2, i.get_height()-1.5,s= i.get_height(),color="black",fontweight='bold')
plt.tight_layout()
plt.show()

* **5. Champion in each season**

In [None]:
season_winner = matches.drop_duplicates('season', keep='last')
season_winner = season_winner[['season', 'winner']]
season_winner.sort_values('season',inplace=True)
season_winner.reset_index(inplace=True, drop=True)
season_winner

In [None]:
plt.subplots(figsize=(13,8))
sns.countplot('winner', data = season_winner, palette = 'inferno')
plt.xlabel('Teams',fontsize=15)
plt.xticks(rotation=90,fontsize=15)
plt.ylabel('Number of seasons won by any team',fontsize=15)
plt.title('Total Championship Numbers',fontsize=20)
plt.show()

* The team with the most championships between the seasons of 2008-2019 is Mumbai Indians.

* **6. Visualization of how many finals which team played and how many of these matches they won.**

In [None]:
finals = matches.drop_duplicates('season', keep='last')
finals = finals[['season', 'team1', 'team2', 'winner', ]]

# Teams who reaches maximum number of finals
most_finals = pd.concat([finals['team1'], finals['team2']])
most_finals = most_finals.value_counts().reset_index()
most_finals = pd.DataFrame(most_finals)
most_finals.columns = ['Team', 'Number of Final']

# Teams who won the final.
win_finals = finals['winner'].value_counts().reset_index()
win_finals = pd.DataFrame(win_finals,)

most_finals = most_finals.merge(win_finals, left_on='Team',right_on='index', how='outer')
most_finals.drop('index', axis=1,inplace=True)
most_finals.set_index('Team', drop=True, inplace=True)
most_finals.columns = ['Number of Times Finals played', 'Number of Times Finals won']
most_finals.plot(kind='bar', figsize=(13,7),fontsize=15, title='How many Finals Teams Played and How Many Finals Won')
plt.show()

* Although Chennai Super Kings played a lot of finals, their ability to win the final match is low.
* Mumbai Indians won the most in the finals.

* **7. Visualizing the number of matches held in each city.**

In [None]:
# Visualizing how many matches were played in which city.
plt.subplots(figsize=(18.5,10.5))
sns.countplot(x = 'city', data = matches, palette = 'tab20', order=matches['city'].value_counts().index)
plt.ylabel('Number of Matches Held',fontsize=15)
plt.title('Number of matches held in each city',fontsize=20)
plt.xlabel('Cities',fontsize=15)
plt.xticks(rotation=90,fontsize=15)
plt.show()

* Mumbai was observed as the city with the most played.
* Other favorite cities are Bangalore, Kolkate, Delhi.

In [None]:
# Picking the top 10 players based on the no. of Man of Match (MOM) awards won
mom=matches['player_of_match'].value_counts()[:10]
mom

* **8. Plot to visualise the top 10 players based on the number of MOM awards won**

In [None]:
# Plot to visualise the top 10 players based on the number of MOM awards won
plt.subplots(figsize=(18.5,10.5))
ax = sns.barplot(x = mom.index, y = mom,orient='v', palette = 'tab10')
plt.ylabel('Number of awards won',fontsize=15)
plt.title('Top 10 players based on number of awards won',fontsize=20)
plt.xlabel('Players',fontsize=15)
plt.xticks(rotation=90,fontsize=15)
# Write the value count
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
for p in ax.patches:
    ax.annotate('{0:.1f}'.format(p.get_height()), (p.get_x()+0.2, p.get_height()+0.2))       
plt.show()

* CH Gayle is the player with the most man of the match award.
* It is followed by AB de Villiers.

* **9. Visualizing the number of matches held in each venues.**

In [None]:
# Visualization of how many matches were played in which venue.
plt.subplots(figsize=(18.5,10.5))
sns.countplot(x = 'venue', data = matches, palette = 'tab20', order=matches['venue'].value_counts().index)
plt.ylabel('Number of matches played',fontsize=15)
plt.title('Number of Matches in each Venue',fontsize=20)
plt.xlabel('Stadium',fontsize=15)
plt.xticks(rotation=90,fontsize=15)
plt.show()

* The maximum matches played on Eden Gardens Stadium
* Other favorite stadiums are Wankhede, M. Chinnaswamy Stadium.

* **10. Observing the effect of the field situation on the winning for the 10 stadiums with the most matches**

In [None]:
plt.figure(figsize = (20,15))
ax = sns.countplot(x='venue', data = matches, hue = 'winner',order=matches['venue'].value_counts().iloc[:10].index,palette='tab10')
plt.xticks(rotation=30, ha = 'right',fontsize=15)
plt.ylabel('Number of Matches',fontsize=15)
plt.xlabel('Venues',fontsize=15)
plt.title('Stadium Effect On Win',fontsize=20)
plt.legend(loc='upper right')
plt.show()

* Particularly striking here is that Mumbai Indian's number of victories is as low as 10 in Eden Gardens, the stadium where the most games are played. Because Mumbai Indian is the team with the most wins.
* This shows that Mumbai Indians is not very good with the Eden Garden stadium, also due to the deplacement disadvantage.

* **11. Observing the effect of the field situation on the winning for the 10 cities with the most matches**

In [None]:
plt.figure(figsize = (20,15))
ax = sns.countplot(x='city', data = matches, hue = 'winner',order=matches['city'].value_counts().iloc[:10].index,palette='tab10')
plt.xticks(rotation=30, ha = 'right',fontsize=15)
plt.ylabel('Number of Matches',fontsize=15)
plt.xlabel('Cities',fontsize=15)
plt.title('City Effect On Win',fontsize=20)
plt.legend(loc='upper right')
plt.show()

* Mumbai, the city of the Mumbai Indians team, stands out as the city with the most wins.

* **12. Number of wins according to Toss Decision**

In [None]:
plt.subplots(figsize=(18.5,10.5))
toss=matches[matches['toss_winner']==matches['winner']]
ax = sns.countplot("winner", data = toss, hue = 'toss_decision',order = toss['toss_winner'].value_counts().index,palette='Set1')
plt.title("Number of winning teams according to the toss decision",fontsize=20)
plt.xticks(rotation=90, ha = 'right',fontsize=15)
plt.ylabel('Number of Matches',fontsize=15)
plt.xlabel('Winner',fontsize=15)

# Write the value count
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
for p in ax.patches:
    ax.annotate('{0:.1f}'.format(p.get_height()), (p.get_x()+0.05, p.get_height()+0.5))        
        
    
plt.show()

* In general, it has been observed that the number of wins is higher when toss_decision is 'field'.

* **13. Toss Winner and Match Winner**

In [None]:
team = matches['team1'].unique()

toss_match_winner = []
for var in team:
    count = matches[(matches['toss_winner'] == var) & (matches['winner'] == var)]['id'].count()
    toss_match_winner.append(count)

    
    
plt.figure(figsize=(18.5,10.5))
plt.bar(x=team, height=toss_match_winner,color='yellow')
plt.xticks(rotation=90,fontsize=15)
plt.title('Toss Winner and Match Winner',fontsize=20)
plt.xlabel('Teams',fontsize=15)
plt.ylabel('Number of times match win with toss win',fontsize=15)

for i,v in enumerate(toss_match_winner):   
    plt.text(x=i, y=v+1, s=v)
plt.show()

* The team with the most points, both Toss Winner and Match Winner, is Chennai Super Kings.

* Merge 2 Dataset

In [None]:
df = pd.merge(matches, deliveries, left_on='id', right_on='match_id')

In [None]:
df.head()

* **14. Total runs across each season**

In [None]:
# Getting total runs from each season
total_run = df.groupby(['season','match_id'])['total_runs'].sum().reset_index()
total_run = total_run.groupby(['season'])['total_runs'].sum().reset_index()
total_run

In [None]:
plt.subplots(figsize=(15,5))
ax = sns.barplot(x ='season', y='total_runs',data = total_run, palette = 'tab10')
plt.xlabel('Seasons',fontsize=15)
plt.ylabel('Runs',fontsize=15)
plt.title('Runs in Each Season',fontsize=20)

# Write the value count
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
for p in ax.patches:
    ax.annotate('{0:.1f}'.format(p.get_height()), (p.get_x()+0.15, p.get_height()+0.5))   


plt.show()



* In 2011,2012,2013 more then 20000 runs were made.
* In each Season there were more then 15000 runs.
* Many runs in each season of IPL show that the matches are very competitive.

* **15. Total Runs score by each Team**

In [None]:
# Getting total runs from each teams
team_runs = df.groupby('batting_team')['total_runs'].sum().reset_index()
team_runs = team_runs.groupby(['batting_team'])['total_runs'].sum().reset_index()

In [None]:
team_runs

In [None]:
plt.subplots(figsize=(15,5))
ax = sns.barplot(x ='batting_team', y='total_runs',data = team_runs, palette = 'tab10')
plt.xlabel('Teams',fontsize=15)
plt.ylabel('Runs',fontsize=15)
plt.title('Runs in Each Teams',fontsize=20)
plt.xticks(rotation=90,fontsize=15)

# Write the value count
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
for p in ax.patches:
    ax.annotate('{0:.1f}'.format(p.get_height()), (p.get_x()+0.03, p.get_height()+0.5))   

plt.show()



* Mumbai Indians is the team with the most total runs.

* **16. Visualization of teams' runs according to 1st and 2nd Inning**

In [None]:
# Runs Distribution By each Team in 1st Inning and 2nd Inning
inning_run = df.groupby(['batting_team','match_id','inning', 'over'])['total_runs'].sum().reset_index()
inning_run = inning_run.groupby(['batting_team','inning'])['total_runs'].sum().reset_index()
inning_run = inning_run.drop(inning_run[inning_run.inning > 2 ].index)
inning_run[0:5]

In [None]:
plt.subplots(figsize=(15,5))
ax = sns.barplot(x ='batting_team', y='total_runs',data = inning_run,hue='inning')
plt.xlabel('Teams',fontsize=15)
plt.ylabel('Runs',fontsize=15)
plt.title('Runs Distribution By each Team in Innings',fontsize=20)
plt.xticks(rotation=90,fontsize=15)
plt.show()


* Among the teams with high wins, all except Kolkata Knight Riders are observed to have higher running in the 1st inning than the 2nd inning.
* For example, for teams with high wins such as Mumbai Indians, Chenai Super Kings, Royal Challengers Bangalore, the run in 1st Inning is higher.

* **17. Top 10 Batsman with most number of Fours**

In [None]:
batsman = df['batsman'].unique()
batsman[:5]

In [None]:
def check_fours(x): # Counting number of fours
    global count
    if x==4:
        count+=1

In [None]:
count=0
batsman_fours = []       # This list will contain the fours amount numbers of each batsman hitting.
for i in batsman:
    temp_df = df[df['batsman']==i]
    temp_df['batsman_runs'].apply(check_fours)
    batsman_fours.append(count)
    count=0

In [None]:
df_fours = pd.DataFrame(data={'Batsman':batsman, 'Fours':batsman_fours})
df_fours.sort_values('Fours', inplace=True,ascending=False,)
df_fours.reset_index(drop=True, inplace=True)
df_fours = df_fours[:10]

df_fours

In [None]:
# Ranking of players by fours amount.
df_fours.T

In [None]:
plt.subplots(figsize=(15,5))
ax = sns.barplot(x ='Batsman', y='Fours',data = df_fours, palette = 'tab10')
plt.xlabel('BATSMAN',fontsize=15)
plt.ylabel('FOURS',fontsize=15)
plt.title('Top 10 Batsman with most number of FOURS',fontsize=20)
plt.xticks(rotation=90,fontsize=15)

# Write the value count
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
for p in ax.patches:
    ax.annotate('{0:.1f}'.format(p.get_height()), (p.get_x()+0.2, p.get_height()+0.5))   

plt.show()

*  G Gambhir came in first with 484 fours, followed by SK Raina.
* V Kohli is in 4th place with 384 fours.
* CH Gayle, the top scorer, is 9th place on the top 10 players that have the most fours.

* **18. Top 10 Batsman with most number of Sixes**

In [None]:
def check_sixes(x): # Counting number of fours
    global count
    if x==6:
        count+=1

In [None]:
count=0
batsman_sixes = []       # This list will contain the fours amount numbers of each batsman hitting.
for i in batsman:
    temp_df = df[df['batsman']==i]
    temp_df['batsman_runs'].apply(check_sixes)
    batsman_sixes.append(count)
    count=0

In [None]:
df_sixes = pd.DataFrame(data={'Batsman':batsman, 'Sixes':batsman_sixes})
df_sixes.sort_values('Sixes', inplace=True,ascending=False,)
df_sixes.reset_index(drop=True, inplace=True)
df_sixes = df_sixes[:10]

df_sixes

In [None]:
# Ranking of players by fours amount.
df_sixes.T

In [None]:
plt.subplots(figsize=(15,5))
ax = sns.barplot(x ='Batsman', y='Sixes',data = df_sixes, palette = 'tab10')
plt.xlabel('BATSMAN',fontsize=15)
plt.ylabel('SIXES',fontsize=15)
plt.title('Top 10 Batsman with most number of SIXES',fontsize=20)
plt.xticks(rotation=90,fontsize=15)

# Write the value count
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
for p in ax.patches:
    ax.annotate('{0:.1f}'.format(p.get_height()), (p.get_x()+0.2, p.get_height()+0.5))   

plt.show()

* CH Gayle tops the list as expected.
* Although AB de Villiers is not in the fours, it is in the 2nd place in the sixes.
* Although MS Dhoni is not in the fours, it is in the 3rd place in the sixes.

* **19. Top 10 Batsman with most number of dot balls.**

In [None]:
def check_dots(x): # Counting number of fours
    global count
    if x==0:
        count+=1

In [None]:
count=0
batsman_dots = []       # This list will contain the fours amount numbers of each batsman hitting.
for i in batsman:
    temp_df = df[df['batsman']==i]
    temp_df['batsman_runs'].apply(check_dots)
    batsman_dots.append(count)
    count=0

In [None]:
df_dots = pd.DataFrame(data={'Batsman':batsman, 'Dots':batsman_dots})
df_dots.sort_values('Dots', inplace=True,ascending=False,)
df_dots.reset_index(drop=True, inplace=True)
df_dots = df_dots[:10]

df_dots

In [None]:
# Ranking of players by fours amount.
df_dots.T

In [None]:
plt.subplots(figsize=(15,5))
ax = sns.barplot(x ='Batsman', y='Dots',data = df_dots, palette = 'tab10')
plt.xlabel('BATSMAN',fontsize=15)
plt.ylabel('DOTS',fontsize=15)
plt.title('Top 10 Batsman with most number of DOTS',fontsize=20)
plt.xticks(rotation=90,fontsize=15)

# Write the value count
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
for p in ax.patches:
    ax.annotate('{0:.1f}'.format(p.get_height()), (p.get_x()+0.2, p.get_height()+0.5))   

plt.show()

* V Kohli tops the list, followed by S Dhawan and CH Gayle.

* **20. Top 10 Individuals Score**

In [None]:
individual  = df.groupby(['batsman','match_id'])['batsman_runs'].sum().reset_index()
individual.sort_values('batsman_runs',axis=0, inplace=True,ascending=False)
individual.drop('match_id',inplace=True,axis=1)
individual = individual[:10]

In [None]:
individual

In [None]:
individual.plot(x='batsman', kind='bar', figsize=(12,6),color='green')
plt.xlabel('Batsman',fontsize=15)
plt.ylabel('Runs',fontsize=15)
plt.title('Top 10 Individual Scores',fontsize=20)
plt.xticks(rotation=90,fontsize=15)
plt.show()

* The day CH Gayle hit 175 runs

* **21. Top 10 Bowler who bowled Maximum bowls in IPL**

In [None]:
bowler = df['bowler'].value_counts()[:10]

plt.figure(figsize=(15,7))
plt.bar(x=bowler.index, height=bowler.values,color='orange')

plt.title('Bowlers who bowled maximum balls', fontsize=20)
plt.xlabel('BOWLER',fontsize=15)
plt.ylabel('BALLS',fontsize=15)

for i,v in enumerate(bowler.values):
    plt.text(x=i, y=v+1, s=v)
    
plt.show() 

* Harbhajan Singh bowled maximum balls.

* **22. Top 10 Bowlers with maximum number of Dot Balls**

In [None]:
dot_ball = df[df['total_runs']==0]
dot_ball = dot_ball['bowler'].value_counts()[:10]

plt.figure(figsize=(15,7))
plt.bar(x=dot_ball.index, height=dot_ball.values,color='lightgreen')

plt.title('Bowlers who have maximum number of Dot balls', fontsize=20)
plt.xlabel('BOWLER',fontsize=15)
plt.ylabel('BALLS',fontsize=15)

for i,v in enumerate(dot_ball.values):
    plt.text(x=i, y=v+1, s=v)
    
plt.show() 


* Harbhajan Singh also bowled maximum Dot balls.

* **23. Top 10 Bowlers with maximum number of extras**

In [None]:
extra_runs = df[df['extra_runs']!=0]
extra_runs = extra_runs['bowler'].value_counts()[:10]

plt.figure(figsize=(15,7))
plt.bar(x=extra_runs.index, height=extra_runs.values,color='lightblue')

plt.title('Bowlers who have bowled maximum number of Extra balls', fontsize=20)
plt.xlabel('BOWLER',fontsize=15)
plt.ylabel('BALLS',fontsize=15)

for i,v in enumerate(extra_runs.values):
    plt.text(x=i, y=v+1, s=v)
    
plt.show() 


* SL Malinga bowled maximum Extra balls.

 **Results**
* In the final matches, some teams had a very low win rate, so it is important for the championship to come to the final and not give up.
* In general, it has been observed that the number of wins is higher when toss_decision is 'field'.
* The teams with higher toss_winner are the teams with more wins.
* The place where the match is played is very effective for victory. Especially some teams may have displacement phobia.
* It is very important to have effective players, that is, players with high statistics.
* The teams that can be recommended for companies for the teams with high number of wins and champions in recent years are: Mumbai Indians, Chennai Super Kings and Kolkata Knight Riders.
* The recommended bowler for companies could be: SL Malinga, Herbajan Singh, B.Kumar, A.Mishra.
* Players as recommended batman for companies could be: CH Gayle, BB MccOllum, AB de Villiers, V Kohli, S Dhawan, SK Raina.