# Exploratory Data Analysis (EDA) on twb.csv and twbo.csv
This section explores the batting (twb.csv) and bowling (twbo.csv) datasets to uncover key patterns and insights.

In [None]:
# Load libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load batting and bowling data
twb = pd.read_csv("dataset/twb.csv")
twbo = pd.read_csv("dataset/twbo.csv")

## Batting Data Overview (twb.csv)

In [None]:
# Batting data head, info, describe
twb.head(), twb.info(), twb.describe()

In [None]:
# Check for missing values in batting data
twb.isnull().sum()

In [None]:
# Batting: Distribution of Batting Average and Strike Rate
plt.figure(figsize=(12,5))
sns.histplot(twb['Ave'].dropna(), kde=True, bins=30)
plt.title('Distribution of Batting Average')
plt.xlabel('Average')
plt.show()

plt.figure(figsize=(12,5))
sns.histplot(twb['SR'].dropna(), kde=True, bins=30)
plt.title('Distribution of Batting Strike Rate')
plt.xlabel('Strike Rate')
plt.show()

In [None]:
# Batting: Role counts
role_counts = twb['Role'].value_counts()
role_counts.plot(kind='bar', figsize=(8,4), title='Player Role Counts')
plt.ylabel('Count')
plt.show()

In [None]:
# Batting: Correlation matrix
corr = twb[['Inns','Ave','SR']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Batting Correlation Matrix')
plt.show()

## Bowling Data Overview (twbo.csv)

In [None]:
# Bowling data head, info, describe
twbo.head(), twbo.info(), twbo.describe()

In [None]:
# Check for missing values in bowling data
twbo.isnull().sum()

In [None]:
# Bowling: Distribution of Economy and Strike Rate
plt.figure(figsize=(12,5))
sns.histplot(twbo['Econ'].dropna(), kde=True, bins=30)
plt.title('Distribution of Bowling Economy')
plt.xlabel('Economy')
plt.show()

plt.figure(figsize=(12,5))
sns.histplot(twbo['SR'].dropna(), kde=True, bins=30)
plt.title('Distribution of Bowling Strike Rate')
plt.xlabel('Strike Rate')
plt.show()

In [None]:
# Bowling: Bowling_Type counts
bowling_type_counts = twbo['Bowling_Type'].value_counts()
bowling_type_counts.plot(kind='bar', figsize=(8,4), title='Bowling Type Counts')
plt.ylabel('Count')
plt.show()

In [None]:
# Bowling: Correlation matrix
corr_bowl = twbo[['Inns','Econ','SR']].corr()
sns.heatmap(corr_bowl, annot=True, cmap='coolwarm')
plt.title('Bowling Correlation Matrix')
plt.show()

# Key Insights
- Batting averages and strike rates show a right-skewed distribution, with most players clustered at moderate values and a few outliers with very high performance.
- Player roles are unevenly distributed, with some roles (e.g., openers) more common.
- Batting average and strike rate are positively correlated, but not perfectly, indicating some high-average players may not have the highest strike rates and vice versa.
- Bowling economy and strike rate are also right-skewed; most bowlers have moderate values, with a few very economical or very attacking bowlers.
- Fast bowlers are the most common type, followed by off spinners and leg spinners.
- Bowling economy and strike rate are moderately correlated, suggesting bowlers who concede fewer runs per over also tend to take wickets more frequently.
- There are some missing values in both datasets, especially for players who are specialists in only one discipline.