# **Exploratory Data Analysis on ICC Cricket World Cup 2023**

Exploratory Data Analysis (EDA) on the ICC Cricket World Cup 2023 explores batting dynamics through a detailed ball-by-ball dataset. This analysis delves into performance metrics, trends in player strategies, scoring patterns, and team batting strengths across different match conditions, offering insights into key players and moments that defined the tournament  

# Main Task


Batting Analysis from the ICC Cricket World Cup 2023 ball-by-ball dataset dives deep into player performance, team strategies, and season-long trends. This analysis uncovers top run-scorers, players hitting the most boundaries, and clutch performers in each match. We'll also explore team batting dynamics, standout partnerships, and scoring patterns, painting a comprehensive picture of batting prowess throughout the tournament. It’s a data-driven journey through every boundary, milestone, and match-defining moment!

In [None]:
# For data manipulation
import numpy as np 
import pandas as pd 

# For data visualization
import matplotlib.pyplot as plt 
import seaborn as sns

# Ingore the warnings
import warnings
warnings.filterwarnings('ignore')

# 2. Loading the Dataset

In [None]:
# Loading the dataset
df = pd.read_csv('/kaggle/input/icc-mens-world-cup-2023/deliveries.csv')

# 3. Basic EDA

In [None]:
# Printing the first 5 rows of the dataset
df.head()

In [None]:
# Having a look at that how data looks like
df.info()

In [None]:
# Let's see the shape of the data
df.shape

In [None]:
# Let's see which columns are there in the data
df.columns

# 4. Let's start Analysing the data

**4.1.1. Total Number of Sixes hit in the season**

In [None]:
tournament_6 = df[df['runs_off_bat'] == 6]['runs_off_bat'].count()
tournament_6

**4.1.2. Total Number of Fours hit in ICC CWC 2023**

In [None]:
tournament_4 = df[df['runs_off_bat'] == 4]['runs_off_bat'].count()
tournament_4

**4.1.3. Sixes VS Fours Distribution**

In [None]:
# Let's graph a pie chart to visualize the number of sixes and the number of fours hit in ICC CWC 2023
plt.pie([tournament_6, tournament_4], labels = ['Sixes', 'Fours'], autopct = '%1.1f%%')
plt.title('Number of Sixes and Fours hit in ICC CWC 2023')
plt.show()

**4.1.4. Teams played the season**

In [None]:
teams = list(set(df['batting_team'].unique().tolist() + df['batting_team'].unique().tolist()))

print("Teams played the Season")
for i in range(len(teams)):
    print(f"{i+1} : {teams[i]}")

**4.2. Overall Batting Analysis**

**4.2.1. Top 10 highest runs scorer of the season**

In [None]:
df.groupby('striker')['runs_off_bat'].sum().sort_values(ascending=False).head(10)

In [None]:
# Lets Plot the bar plot of the top 10 highest runs scorer of ICC CWC 2023 with vertical names

plt.bar(df.groupby('striker')['runs_off_bat'].sum().sort_values(ascending=False).head(10).index,df.groupby('striker')['runs_off_bat'].sum().sort_values(ascending=False).head(10).values)
plt.title('Top 10 highest runs scorer of ICC CWC 2023')
plt.xticks(rotation='vertical')
plt.show()

**.2.2. Top 10 players who hit most number of sixes in the season**

In [None]:
sixes_df = df[df['runs_off_bat'] == 6]

sixes_df.groupby('striker')['runs_off_bat'].count().sort_values(ascending=False).head(10)

In [None]:
# Lets plot the line plot of the top 10 players who hit most number of sixes in ICC CWC 2023

plt.bar(sixes_df.groupby('striker')['runs_off_bat'].count().sort_values(ascending=False).head(10).index,sixes_df.groupby('striker')['runs_off_bat'].count().sort_values(ascending=False).head(10).values)
plt.title('Top 10 players who hit most number of sixes in ICC CWC 2023')
plt.xticks(rotation='vertical')
plt.show()

**4.2.3. Top 10 batsmen who hit most number of sixes in sigle match**

In [None]:
result1 = sixes_df.groupby(['match_id','striker'])['runs_off_bat'].count().sort_values(ascending=False).head(10)
result1

In [None]:
# Reseting the index of result1
result1 = result1.reset_index()
result1

# converting result1 to dataframe
result1 = pd.DataFrame(result1)
result1


In [None]:
# Lets plot a bar plot of the result1 usign striker and runs_off_bat using matplotlib

plt.bar(result1['striker'], result1['runs_off_bat'])
plt.title('Most Sixes hit in a Venue')
plt.xticks(rotation='vertical')
plt.show()

**4.2.4. Most Sixes hit in a Venue**

In [None]:
sixes_df['venue'].value_counts().head(10)

In [None]:
# Lets plot the bar plot of the number of sixes hit in a venue

plt.bar(sixes_df['venue'].value_counts().head(10).index, sixes_df['venue'].value_counts().head(10).values)
plt.title('Number of sixes hit in a venue')
plt.xticks(rotation='vertical')
plt.show()

**4.2.5. Top 10 players who hit most number of fours in the season**

In [None]:
fours_df = df[df['runs_off_bat'] == 4]

fours_df.groupby('striker')['runs_off_bat'].count().sort_values(ascending=False).head(10)

In [None]:
# Lets plot the top 10 players who hit most number of foures in ICC CWC 2023

plt.bar(fours_df.groupby('striker')['runs_off_bat'].count().sort_values(ascending=False).head(10).index, fours_df.groupby('striker')['runs_off_bat'].count().sort_values(ascending=False).head(10).values)
plt.title('Top 10 players who hit most number of foures in ICC CWC 2023')
plt.xticks(rotation='vertical')
plt.show()

**4.2.6. Top 10 batsmen who hit most number of fours in sigle match**

In [None]:
result2 =  fours_df.groupby(['match_id','striker'])['runs_off_bat'].count().sort_values(ascending=False).head(10)
result2

In [None]:
# reset_index of result2
result2 = result2.reset_index()
result2

# converting result2 to dataframe
result2 = pd.DataFrame(result2)
result2

In [None]:
# Lets plot a bar plot of the result2 usign striker and runs_off_bat using matplotlib

plt.bar(result2['striker'], result2['runs_off_bat'])
plt.title('Most Fours in single match')
plt.xticks(rotation='vertical')
plt.show()

**4.2.7. Most fours hit in a Venue**

In [None]:
fours_df['venue'].value_counts().head(10)

In [None]:
# Lets plot the number of fours hit in a Venue

plt.bar(fours_df['venue'].value_counts().head(10).index, fours_df['venue'].value_counts().head(10).values)
plt.title('Number of fours hit in a venue')
plt.xticks(rotation='vertical')
plt.show()

**4.2.8 Top 10 Teams which scored most number of runs in a match**

In [None]:
result3 = (df[df['innings'] == 1].groupby(['match_id','batting_team'])['runs_off_bat'].sum() + df[df['innings'] == 1].groupby(['match_id','batting_team'])['extras'].sum()).sort_values(ascending=False).head(10)
result3

**4.2.9 Top 10 Batsmen who scored most number of runs in a match**

In [None]:
result3 = df.groupby(['match_id','striker'])['runs_off_bat'].sum().sort_values(ascending=False).head(10)
result3

In [None]:
# reset the index of result3
result3 = result3.reset_index()
result3

# converting result3 to dataframe
result3 = pd.DataFrame(result3)
result3

In [None]:
# Lets plot a bar plot of the result3

plt.bar(result3['striker'], result3['runs_off_bat'])
plt.title('Players who scored most runs in a single match')
plt.xticks(rotation='vertical')
plt.show()

**4.2.10 Overall Team Score in the season**

In [None]:
(df.groupby('batting_team')['runs_off_bat'].sum() + df.groupby('batting_team')['extras'].sum()).sort_values(ascending = False)

In [None]:
# lets plot the bar chart of the overall team score in the tournament

plt.bar((df.groupby('batting_team')['runs_off_bat'].sum() + df.groupby('batting_team')['extras'].sum()).sort_values(ascending = False).index, (df.groupby('batting_team')['runs_off_bat'].sum() + df.groupby('batting_team')['extras'].sum()).sort_values(ascending = False).values)
plt.title('Overall Team Score in ICC WC_23')
plt.xticks(rotation='vertical')
plt.show()

**4.3.1 Overall Batsman Performance**

1st Function which is (OverallBatsmanStats) will calculate the overall tournament stats:
     

1. Player: Name of the playerMatches
2. Total number of matches playedScore
3. Total number of runs scoredAvg
4. AverageSR: Strike rateBalls Faced
5. Total number of balls playedHS
6. Highest score50s
7. Total number of 50's scored100s
8. Total number of 100's scoredDismissed
9. In how many matches the player got out

In [None]:
def OverallBatsmanStats(p_name):    
    player = p_name

    playerList = df['striker'].unique().tolist()
    
    #Checking if the player name is in list of players who played ICC CWC_2023
    if player in playerList:
        batsman = df[df['striker'] == player]
        
        # Find number of matches played by a player
        matches = batsman['match_id'].nunique()
        
        # Find number of runs scored by a player
        score = batsman['runs_off_bat'].sum()
        
        # Find number of 50's scored by a player
        runs = batsman.groupby('match_id')['runs_off_bat'].sum().values.tolist()

        fiftees = 0
        hundreds = 0

        for i in runs:
            if i >= 50 and i <100:
                fiftees += 1
                
            elif i >= 100:
                hundreds += 1
                        
        # Find the highest runs scored by a player
        highestScore = batsman.groupby('match_id')['runs_off_bat'].sum().max()
        
        # Find the number of times a batsman has been dismissed
        dismissed = batsman.player_dismissed.count()
        
        # Find number of Balls Faced by a player
        ballsFaced = batsman['runs_off_bat'].count()
        
        # Find SR of Batsman
        sr = (score/ballsFaced)*100
        
        # Find Average of Player
        avg = score/dismissed
        
        data = {
            'PlayeName': [player],
            'Matches' : [matches],
            'Score' : [score],
            'Avg' : [round(avg,3)],
            'SR' : [round(sr,3)],
            'BallsFaced': [ballsFaced],
            'HS' : [highestScore],
            '50s' : [fiftees],
            '100s' : [hundreds],
            'Dismissed': [dismissed]
        }
        
        Stats_df = pd.DataFrame(data)
                
        return Stats_df
    
    else:
        return 'Invalid Player Name, Please Re-Enter the name of Player or Check the spelling of the name'

In [None]:
OverallBatsmanStats('V Kohli') # You can enter the name of the player of your choice

2nd Function which is (AgainstEachTeamBatsmanAnalysis) will calculate only for a single match:
 

1. Scores: Total number of runs scored in the matchBalls
2. Total number of balls faced in the matchOpponent
3. Which was the opponent team that playedVenue
4. Where the match was played



In [None]:
def AgainstEachTeamBatsmanAnalysis(p_name):
    player = p_name
    
    playerList = df['striker'].unique().tolist()
    
    #Checking if the player name is in list of players who played ICC CWC_2023
    if player in playerList:
        
        batsman = df[df['striker'] == player]
        
        matches_played = sorted(batsman['match_id'].unique().tolist())
        
        matches_played
                
        scores = []
        balls = []
        opponent = []
        venue = []

        for i in matches_played:
            scores.append(batsman[batsman['match_id'] == i]['runs_off_bat'].sum())
            balls.append(batsman[batsman['match_id'] == i]['runs_off_bat'].count())
            opponent.append(batsman[batsman['match_id'] == i]['bowling_team'].unique()[0])
            venue.append(batsman[batsman['match_id'] == i]['venue'].unique()[0].split(',')[-1])
        
        
        teamWise = {
            'Scores' : scores,
            'Balls' : balls,
            'Opponent' : opponent,
            'Venue' : venue
        }
        
        
        TeamWiseStats = pd.DataFrame(teamWise)
        
        return TeamWiseStats
    
    else:
        return 'Invalid Player Name, Please Re-Enter the name of Player or Check the spelling of the name'

In [None]:
AgainstEachTeamBatsmanAnalysis('V Kohli') # You can enter the name of the player of your choice

---

# Thanks