# ***Major League Cricket Data Analysis***
### by [***Anthahkarana***](https://anthahkarana.tech)

### **TRANSFORMING AND OBTAINING DATA**

The sources for this project are
1. CricSheet MLC Database
2. CricSheet People Database (Comprehensive Database with all the stakeholders - including players, referees and umpires - of a cricket match)
3. ESPNCricinfo

In [1]:
folder_path='/content/mlc_json'

FIRST STEP IS TO GET COMPREHENSIVE INFORMATION ABOUT ALL THE PARTICIPATING PLAYERS IN THE MLC. IN THE BELOW CODE, I TRY TO GET BASIC INFORMATION ABOUT THE PLAYERS FROM CRICINFO.

In [2]:
import requests
from bs4 import BeautifulSoup

def get_player_details(player_id):
    """
    Fetches the details of a player from ESPN Cricinfo using the player ID.

    Parameters:
    player_id (str): The ESPN Cricinfo player ID.

    Returns:
    dict: A dictionary containing player details like Full Name, Born, Age, Batting Style, Bowling Style, and Playing Role.
    """

    # Construct the URL dynamically using the player ID
    url = f"https://www.espncricinfo.com/cricketers/player-name-{player_id}"

    # Send a GET request to fetch the page content
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code != 200:
        print(f"Failed to retrieve data for player ID {player_id}. Status code: {response.status_code}")
        return None

    # Parse the page content using BeautifulSoup
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extracting the details
    player_details = {}

    # Full Name
    full_name = soup.find('p', class_='ds-text-tight-m ds-font-regular ds-uppercase ds-text-typo-mid3', string='Full Name')
    if full_name:
        full_name_value = full_name.find_next('span').find('p').text.strip()
        player_details['Full Name'] = full_name_value

    # Born
    born = soup.find('p', class_='ds-text-tight-m ds-font-regular ds-uppercase ds-text-typo-mid3', string='Born')
    if born:
        born_value = born.find_next('span').find('p').text.strip()
        player_details['Born'] = born_value

    # Age
    age = soup.find('p', class_='ds-text-tight-m ds-font-regular ds-uppercase ds-text-typo-mid3', string='Age')
    if age:
        age_value = age.find_next('span').find('p').text.strip()
        player_details['Age'] = age_value

    # Batting Style
    batting_style = soup.find('p', class_='ds-text-tight-m ds-font-regular ds-uppercase ds-text-typo-mid3', string='Batting Style')
    if batting_style:
        batting_style_value = batting_style.find_next('span').find('p').text.strip()
        player_details['Batting Style'] = batting_style_value

    # Bowling Style
    bowling_style = soup.find('p', class_='ds-text-tight-m ds-font-regular ds-uppercase ds-text-typo-mid3', string='Bowling Style')
    if bowling_style:
        bowling_style_value = bowling_style.find_next('span').find('p').text.strip()
        player_details['Bowling Style'] = bowling_style_value

    # Playing Role
    playing_role = soup.find('p', class_='ds-text-tight-m ds-font-regular ds-uppercase ds-text-typo-mid3', string='Playing Role')
    if playing_role:
        playing_role_value = playing_role.find_next('span').find('p').text.strip()
        player_details['Playing Role'] = playing_role_value

    # Return the player details as a dictionary
    return player_details


# Example usage: Pass a player ID and get the details
player_id = input("Enter the player ID ")
player_info = get_player_details(player_id)

# Print the details
if player_info:
    print("\nPlayer Details:")
    for key, value in player_info.items():
        print(f"{key}: {value}")
else:
    print("No player details found. Please check the player ID.")


Enter the player ID 277916

Player Details:
Full Name: Ajinkya Madhukar Rahane
Born: June 06, 1988, Ashwi-KD, Maharashtra
Age: 36y 184d
Batting Style: Right hand Bat
Bowling Style: Right arm Medium
Playing Role: Top order Batter


GETTING THIS DATA INVOLVED LOOKING INTO THE SOURCE CODE OF ESPN CRICINFO WEB PAGE AND SCRAPING CLASSES FROM THERE. HENCE YOU CAN SEE SOME TECHNICAL WORDS IN THE CLASS NAMES. THERE IS A LIBRARY NAMED 'python-espncricinfo' WHICH DOES THIS, BUT THAT DIDN'T WORK FOR ME, HENCE HAD TO USE THIS MANUAL APPROACH.

Now that we have defined the function **get_player_details**, the next step is to match it with the cricsheet 'MLC match JSON files' and their large dataset of players (**people.csv**). This is done using pandas and I have created a new csv file named **comprehensive.csv** in order to store this data.

This will take quite some time to run (2 minutes for me) since it has to run through ample amount of data to generate this new file.

In [5]:
import pandas as pd
import glob
import os
import json

# Load the people.csv file
people_df = pd.read_csv('/content/people.csv')

all_files = glob.glob(os.path.join(folder_path, "*.json"))

identifiers_from_json = set()

for file in all_files:
    with open(file, 'r') as f:
        match_data = json.load(f)

    # Accessing players and registry from 'info'
    players_data = match_data.get('info', {}).get('players', {})
    registry_data = match_data.get('info', {}).get('registry', {}).get('people', {})

    # Get identifiers from players_data
    for team_players in players_data.values():
        for player_name in team_players:
            # Find identifier in registry_data using player name as key
            identifier = registry_data.get(player_name)
            if identifier:
                identifiers_from_json.add(identifier)

# Match identifiers with people.csv and get playing styles
comprehensive_data = []
for identifier in identifiers_from_json:
    matching_row = people_df[people_df['identifier'] == identifier]
    if not matching_row.empty:
        cricinfo_id = matching_row['key_cricinfo'].iloc[0]
        player_details = get_player_details(cricinfo_id)
        if player_details:
            comprehensive_data.append({
                'identifier': identifier,
                'key_cricinfo': cricinfo_id,
                **player_details
            })

# Create and save the comprehensive.csv file
comprehensive_df = pd.DataFrame(comprehensive_data)
comprehensive_df.to_csv('comprehensive.csv', index=False)

Now, I aim to divide the bowlers to two broad categories, namely 'pace' and 'spin' in order to perform analytics better. The following code does that. (There is a none category as well which acts as a fallback)

In [6]:
import pandas as pd

# Read the CSV file
comprehensive_df = pd.read_csv('comprehensive.csv')

# Specify the field name
field_name = 'Bowling Style'

# Get unique entries
unique_entries = comprehensive_df[field_name].unique()

# Print the unique entries
print(f"Unique entries in '{field_name}':")
for entry in unique_entries:
    print(entry)

Unique entries in 'Bowling Style':
Legbreak
Right arm Fast
Right arm Offbreak
Slow Left arm Orthodox
Left arm Medium fast
nan
Right arm Fast medium
Left arm Fast medium
Right arm Medium fast
Legbreak Googly
Right arm Medium
Left arm Wrist spin
Left arm Fast


Now that I've gotten all the unique entries of the bowling style, I can divide them easily.

In [10]:
import pandas as pd

def categorize_bowling_type(bowling_style):
    if pd.isnull(bowling_style):
        return 'none'
    elif bowling_style in ['Slow Left arm Orthodox', 'Right arm Offbreak', 'Legbreak Googly', 'Legbreak', 'Left arm Wrist spin']:
        return 'spin'
    elif bowling_style in ['Left arm Medium fast', 'Left arm Fast medium', 'Right arm Medium fast', 'Right arm Medium', 'Right arm Fast medium', 'Right arm Fast', 'Left arm Fast']:
        return 'fast'
    else:
        return 'none'  # Handle cases not specified in my conditions

comprehensive_df['Bowling Type'] = comprehensive_df['Bowling Style'].apply(categorize_bowling_type)

I have successfully added it to the pandas dataframe, now in order to make the changes visible in the csv file, running the following command.

In [9]:
comprehensive_df.to_csv('comprehensive.csv', index=False)

Data Transformation is done for now. Moving on to performing analysis in the next notebook file..