# Code for Data Preprocessing and Model Setup

This code snippet imports necessary libraries for data preprocessing and sets up models for classification tasks. It includes modules for scaling, encoding, transforming features, standardizing features, hyperparameter tuning using GridSearchCV, and implementing RandomForestClassifier and Support Vector Machine (SVC) for classification.


In [1]:
import numpy as np
import pandas as pd
from sklearn import preprocessing # for scaling, encoding and transforming
from sklearn.preprocessing import StandardScaler # Standardizing features
from sklearn.model_selection import GridSearchCV # hyperparameter tuning
from sklearn.ensemble import RandomForestClassifier 
from sklearn.svm import SVC

# Loading Player Details Data

The provided code reads player details data from Excel files into Pandas DataFrames. The data includes overall details of batsmen (`overall_batsman_details`), match-specific details of batsmen (`match_batsman_details`), overall details of bowlers (`overall_bowler_details`), and match-specific details of bowlers (`match_bowler_details`). The data is loaded from the specified Excel files located in the 'playerDetails' directory.


In [2]:
# Reading the datasets
overall_batsman_details = pd.read_excel('./playerDetails/overall_batsman_details.xlsx', header=0)
match_batsman_details = pd.read_excel('./playerDetails/match_batsman_details.xlsx',header=0)
overall_bowler_details = pd.read_excel('./playerDetails/overall_bowler_details.xlsx', header=0)
match_bowler_details = pd.read_excel('./playerDetails/match_bowler_details.xlsx',header=0)

# Ignoring XGBoost and SciKit-Learn Warnings

The provided code defines a function `warn` to suppress warnings and then sets the warnings to be ignored for XGBoost and SciKit-Learn. This is done to prevent the display of warning messages related to XGBoost and data conversion in SciKit-Learn during the execution of the code.


In [3]:
#Ignoring XGBoost warnings
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

#Ignoring SciKit-Learn warnings
import warnings
from sklearn.exceptions import DataConversionWarning
warnings.filterwarnings("ignore", category=DeprecationWarning)

# Extracting Best Hyperparameters from GridSearchCV Results

The provided function, `gridsearchcv_results`, takes the results of a GridSearchCV and extracts the best test score and corresponding hyperparameters. It compares the maximum test scores obtained from split 0 and split 1 of the cross-validation and selects the split with the higher score. The function returns a tuple containing the maximum test score (`flag`) and the corresponding hyperparameters (`params`).


In [4]:
def gridsearchcv_results(results):
    # Initialize variables to store the best argument (arg), best parameters (params), and the maximum test score (flag)
    arg, params, flag = None, None, None
    
    # Get the maximum test scores from split 0 and split 1
    split0 = results['split0_test_score'].max()
    split1 = results['split1_test_score'].max()
    
    # Compare the test scores and determine the best
    if split0 >= split1:
        arg = results['split0_test_score'].argmax()
        flag = split0
    else:
        arg = results['split1_test_score'].argmax()
        flag = split1
        
    # Retrieve the best parameters corresponding to the best test score
    params = results['params'][arg]
    
    # Return the maximum test score and corresponding parameters
    return flag, params

# Batting Performance Prediction Function

The provided function `bat_perform` is designed for predicting batting performance based on player name, opposition, and venue. It includes the following steps:

1. **Data Preprocessing:**
   - Fills missing dates, filters relevant player and match details, and standardizes numerical features.

2. **Label Encoding and Scaling:**
   - Converts categorical features to numerical labels and scales numerical features using LabelEncoder and StandardScaler.

3. **Binning Runs into Categories:**
   - Divides the target runs into categories using predefined bins.

4. **Model Setup and Hyperparameter Tuning:**
   - Initializes RandomForestClassifier and Support Vector Machine (SVC) models, defines hyperparameter grids, and performs GridSearchCV for hyperparameter tuning.

5. **Model Training and Prediction:**
   - Trains the selected model with the best hyperparameters and predicts the batting performance.

6. **Result Visualization:**
   - Maps the predicted value to predefined run categories and returns the result dictionary.

7. **Prints Prediction Accuracy and Classifier Type:**
   - Displays the prediction accuracy and the type of classifier used.

8. **Returns Prediction Result or Original Targets:**
   - Returns the prediction result if batting is predicted; otherwise, returns the original targets.

In [5]:
def bat_perform(player_name,opposition,venue):
    
    res = {}
    
    # data preprocessing
    match_batsman_details.loc[:, 'date'].ffill(inplace=True)
    bat_match_details = match_batsman_details[match_batsman_details['name']==player_name]
    bat_match_details = bat_match_details[bat_match_details['opposition']==opposition]
    bat_overall_details = overall_batsman_details[overall_batsman_details['player_name'] == player_name][['player_name', 'team', 'innings', 'runs', 'average', 'strike_rate', 'centuries', 'fifties', 'zeros']]
    bat_features = bat_match_details.loc[:,['opposition', 'venue', 'innings_played','previous_average', 'previous_strike_rate', 'previous_centuries','previous_fifties', 'previous_zeros']]
    bat_targets = bat_match_details.loc[:,['runs']]
    temp = bat_targets
    
    le = preprocessing.LabelEncoder() # for converting categorical form to numerical
    sc = StandardScaler() # to standardize numerical values
    
    bins = [0,10,30,50,80,120,250]
    labels = ["0","1","2","3","4","5"]
    bat_targets = pd.cut(bat_targets['runs'],bins,labels=labels,include_lowest=True)
    classes_bat = len(bat_targets.unique())

    if classes_bat > 2:
    
        le.fit(bat_features.loc[:,'opposition']) # maps the unique categories and numerical labels
        opp_bat = le.transform([opposition]) # transforms 
        bat_features.loc[:,'opposition'] = le.transform(bat_features.loc[:,'opposition'])

        full_data = pd.DataFrame(match_batsman_details)
        le.fit(full_data['venue'])
        ven_bat = le.transform([venue])
        bat_features.loc[:,'venue'] = le.transform(bat_features.loc[:,'venue'])

        predict_bat = bat_overall_details[['innings','average','strike_rate','centuries','fifties','zeros']].values[0]

        #Scaling Non-Categorical Features
        bat_means = bat_features.loc[:,['innings_played','previous_average','previous_strike_rate','previous_centuries','previous_fifties','previous_zeros']].mean()
        bat_std = bat_features.loc[:,['innings_played','previous_average','previous_strike_rate','previous_centuries','previous_fifties','previous_zeros']].std()
        predict_bat = ((predict_bat-bat_means)/bat_std).tolist()
        bat_features.loc[:,['innings_played','previous_average','previous_strike_rate','previous_centuries','previous_fifties','previous_zeros']] = sc.fit_transform(bat_features.loc[:,['innings_played','previous_average','previous_strike_rate','previous_centuries','previous_fifties','previous_zeros']])

        predict_bat.insert(0,ven_bat[0])
        predict_bat.insert(0,opp_bat[0])

        #Array
        bat_features = bat_features.values
        bat_targets = bat_targets.values
        predict_bat_features = np.array(predict_bat).reshape(-1,1)
        predict_bat_features = predict_bat_features.T
        predict_bat_features = np.nan_to_num(predict_bat_features) 

        #RandomForestClassifier
        
        bat_rfc = RandomForestClassifier(random_state=42)
        bat_parameters_rfc = {'n_estimators':[75,100,125],'criterion':['gini','entropy'],'min_samples_leaf':[1,2,3]}


        #SupportVectorMachine
        bat_svc = SVC()
        bat_parameters_svc = {'C':[1,5,10],'kernel':['rbf','linear','sigmoid'],'gamma':['auto','scale']}


        #ParameterTuningformodels
        bat_best_score, bat_best_params = None, None
    
        
        bat_gridsearch_rfc = GridSearchCV(estimator=bat_rfc,param_grid=bat_parameters_rfc,scoring='accuracy',cv=2)
        bat_gridresult_rfc = bat_gridsearch_rfc.fit(bat_features,bat_targets)
        bat_score, bat_params = gridsearchcv_results(bat_gridresult_rfc.cv_results_)      
        bat_best_score, bat_best_params = [bat_score,'rfc'],bat_params

            #SupportVectorMachine
    
        bat_gridsearch_svc = GridSearchCV(estimator=bat_svc,param_grid=bat_parameters_svc,scoring='accuracy',cv=2)
        bat_gridresult_svc = bat_gridsearch_svc.fit(bat_features,bat_targets)
        bat_score, bat_params = gridsearchcv_results(bat_gridresult_svc.cv_results_)

        if bat_score > bat_best_score[0]:
            bat_best_score, bat_best_params = [bat_score, 'svc'], bat_params

        print(f'Batting Prediction accuracy={bat_best_score[0]*100} with classifier={bat_best_score[1].upper()}')

        print('Batting Prediction begins...')

        if bat_best_score[1] == 'rfc':
#             print("rfc")
            bat_classifier = RandomForestClassifier(n_estimators=bat_best_params['n_estimators'],criterion=bat_best_params['criterion'],random_state=42,min_samples_leaf=bat_best_params['min_samples_leaf'])
            bat_classifier = bat_classifier.fit(bat_features,bat_targets)
            res['bat_prediction'] = bat_classifier.predict(predict_bat_features)
            #SupportVectorMachine
        elif bat_best_score[1] == 'svc': 
#             print("svc")
            bat_classifier = SVC(C=bat_best_params['C'],kernel=bat_best_params['kernel'],gamma=bat_best_params['gamma'])
            bat_classifier = bat_classifier.fit(bat_features,bat_targets)
            res['bat_prediction'] = bat_classifier.predict(predict_bat_features)

        bat_runs = {'0':'0-10','1':'11-30','2':'31-50','3':'51-80','4':'81-120','5':'121-250'}

        # Get the predicted value from the model
        predicted_value = str(res['bat_prediction'][0])

        # Use the get method with a default value to avoid KeyError
        bat_prediction_label = bat_runs.get(predicted_value, 'Unknown')

        # Assign the label to the result dictionary
        res['bat_prediction'] = bat_prediction_label

        return res
    else: 
#         print("No Batting")
        return temp
    

In [6]:
def bowl_perform(player_name,opposition,venue):
    res = {}
    
    overall_bowler_details = pd.read_excel('./playerDetails/overall_bowler_details.xlsx',header=0)
    match_bowler_details = pd.read_excel('./playerDetails/match_bowler_details.xlsx',header=0)
    match_bowler_details.loc[:, 'date'].ffill(inplace=True)
    bowl_match_details = match_bowler_details[match_bowler_details['name']==player_name]
    bowl_match_details = bowl_match_details[bowl_match_details['opposition'] == opposition]
    bowl_overall_details = overall_bowler_details[overall_bowler_details['player_name']==player_name][['player_name','team','innings','wickets','average','strike_rate','economy','wicket_hauls']]
    bowl_features = bowl_match_details.loc[:,['opposition', 'venue', 'innings_played','previous_average', 'previous_strike_rate', 'previous_economy','previous_wicket_hauls']]
    bowl_targets = bowl_match_details.loc[:,['wickets']]
    temp=bowl_targets
    
    #Pre_Processing
    le = preprocessing.LabelEncoder() # for converting categorical form to numerical
    sc = StandardScaler() # to standardize the range of independent variables
    #Categorizing Runs
    bins = [0,1,3,5,7,10,11]
    labels = ['0','1','2','3','4','5']
    bowl_targets = pd.cut(bowl_targets['wickets'],bins,right=False,labels=labels,include_lowest=True)
    bowl_targets = bowl_targets.astype(int)
    
    #Classification classes
    classes_bowl = len(bowl_targets.unique())
    if classes_bowl > 2:
        
        #Categorizing Opposition and Venue
        le.fit(bowl_features['opposition'])
        opp_bowl = le.transform([opposition])
        bowl_features['opposition'] = le.transform(bowl_features['opposition'])
        
        full_data = pd.DataFrame(match_bowler_details)
        le.fit(full_data['venue'])
        ven_bowl = le.transform([venue])
        bowl_features.loc[:,'venue'] = le.transform(bowl_features.loc[:,'venue'])
        
        predict_bowl = bowl_overall_details[['innings','average','strike_rate','economy','wicket_hauls']].values[0]
        
        #Scaling Non-Categorical Features
            
        bowl_means = bowl_features.loc[:,['innings_played','previous_average', 'previous_strike_rate', 'previous_economy','previous_wicket_hauls']].mean()
        bowl_std = bowl_features.loc[:,['innings_played','previous_average', 'previous_strike_rate', 'previous_economy','previous_wicket_hauls']].std()
        predict_bowl = ((predict_bowl-bowl_means)/bowl_std).tolist()
        bowl_features.loc[:,['innings_played','previous_average', 'previous_strike_rate', 'previous_economy','previous_wicket_hauls']] = sc.fit_transform(bowl_features.loc[:,['innings_played','previous_average', 'previous_strike_rate', 'previous_economy','previous_wicket_hauls']])

        predict_bowl.insert(0, ven_bowl[0])
        predict_bowl.insert(0, opp_bowl[0])

            #Array 
        
        bowl_features = bowl_features.values
        bowl_targets = bowl_targets.values
        predict_bowl_features = np.array(predict_bowl).reshape(-1,1)
        predict_bowl_features = predict_bowl_features.T
        predict_bowl_features = np.nan_to_num(predict_bowl_features)

        print('\nBowling Parameter Tuning begins...')
        
        bowl_rfc = RandomForestClassifier(random_state=42)
        bowl_parameters_rfc = {'n_estimators':[75,100,125],'criterion':['gini','entropy'],'min_samples_leaf':[1,2,3]}
        
        bowl_svc = SVC()
        bowl_parameters_svc = {'C':[1,5,10],'kernel':['rbf','linear','sigmoid'],'gamma':['auto','scale']}

            #ParameterTuningformodels
        bowl_best_score, bowl_best_params = None, None
        
            #RandomForestClassifier
            
        bowl_gridsearch_rfc = GridSearchCV(estimator=bowl_rfc,param_grid=bowl_parameters_rfc,scoring='accuracy',cv=2)
        bowl_gridresult_rfc = bowl_gridsearch_rfc.fit(bowl_features,bowl_targets)
        bowl_score, bowl_params = gridsearchcv_results(bowl_gridresult_rfc.cv_results_)
        bowl_best_score, bowl_best_params = [bowl_score, 'rfc'], bowl_params
        
        bowl_gridsearch_svc = GridSearchCV(estimator=bowl_svc,param_grid=bowl_parameters_svc,scoring='accuracy',cv=2)
        bowl_gridresult_svc = bowl_gridsearch_svc.fit(bowl_features,bowl_targets)
        bowl_score, bowl_params = gridsearchcv_results(bowl_gridresult_rfc.cv_results_)
            
        if bowl_score > bowl_best_score[0]:
            bowl_best_score, bowl_best_params = [bowl_score, 'svc'], bowl_params
            
        print(f'The bowling prediction accuracy={bowl_best_score[0]*100} with classifier={bowl_best_score[1].upper()}')

        print('Bowling Prediction begins...')
        
        if bowl_best_score[1] == 'rfc':
            min_leaf_samples = bowl_best_params.get('min_leaf_samples', 1)
            classifier = RandomForestClassifier(n_estimators=bowl_best_params['n_estimators'],criterion=bowl_best_params['criterion'],random_state=42,min_samples_leaf=bowl_best_params['min_leaf_samples'])
            classifier = classifier.fit(bowl_features,bowl_targets)
            res['bowl_prediction'] = classifier.predict(predict_bowl_features)
                #SupportVectorMachine
        elif bowl_best_score[1] == 'svc':
            classifier = SVC(C=bowl_best_params['C'],kernel=bowl_best_params['kernel'],gamma=bowl_best_params['gamma'])
            classifier = classifier.fit(bowl_features,bowl_targets)
            res['bowl_prediction'] = classifier.predict(predict_bowl_features)

        bowl_wickets = {'0':'0','1':'1-2','2':'3-4','3':'5-6','4':'7-9','5':'10'}
        res['bowl_prediction'] = bowl_wickets[str(res['bowl_prediction'][0])]
        
        return res
    else: 
#         print("No Bowling")
        return temp


# User Input Function for Batting Performance Prediction

The provided function `get_inputs_batting` is designed to interactively collect user inputs for predicting batting performance. It includes the following steps:

1. **Selecting Team:**
   - Displays available teams from the dataset and prompts the user to enter the desired team.

2. **Selecting Player:**
   - Filters players from the selected team and prompts the user to enter the desired player name.

3. **Selecting Opposition Team:**
   - Displays available opposition teams for the selected player and prompts the user to enter the desired opposition team.

4. **Selecting Venue:**
   - Displays available venues where the player has played against the selected opposition team and prompts the user to enter the desired venue.

5. **Returning Input Dictionary:**
   - Returns a dictionary (`d`) containing the selected player's name, opposition team, and venue.

This function provides a user-friendly interface for obtaining inputs required for predicting batting performance.


In [7]:
def get_inputs_batting():
    d={}
    print("Available Teams :\n")
    print("--------------------------------------------")
    for i in list(set(overall_batsman_details['team']).intersection(set(match_batsman_details['team']))):
        print(i)
    print("---------------------------------------------")
    while True:
        team_name = input("\nEnter the desired team :\n")
        if team_name in list(set(overall_batsman_details['team']).intersection(set(match_batsman_details['team']))): break
    # Assuming 'player_name' is the column in overall_batsman_details and 'team' in match_batsman_details
    players_of_specific_team = list(set(overall_batsman_details.loc[overall_batsman_details['team'] == team_name, 'player_name']).intersection(set(match_batsman_details.loc[match_batsman_details['team'] == team_name, 'name'])))
    print("\nPlayer names :\n")
    print("----------------------------------------------")
    for i in players_of_specific_team:
        print(i)
    print("-----------------------------------------------")
    while True:
        d['player_name'] = input("\nEnter the desired player name :\n")
        if d['player_name'] in players_of_specific_team: break
    opposition_team = list(set(match_batsman_details.loc[match_batsman_details['name']==d['player_name'],'opposition']))
    print("\nOpposition teams available :\n")
    print("-----------------------------------------------")
    for i in opposition_team:
        print(i)
    print("-----------------------------------------------")
    while True:
        d['opposition'] = input("\nEnter the desired opposition team :\n")
        if d['opposition'] in opposition_team: break
    venues_played_against_team = set(match_batsman_details.loc[(match_batsman_details['name'] == d['player_name']), 'venue'].tolist())
    print("\nAvailable Venues :\n")
    print("-----------------------------------------------")
    for i in venues_played_against_team:
        print(i)
    print("-----------------------------------------------")
    while True:
        d['venue'] = input("\nEnter the desired venue :\n")
        if d['venue'] in venues_played_against_team: break
    return d

In [8]:
def get_inputs_bowling():
    d={}
    print("Available Teams :\n")
    print("--------------------------------------------")
    for i in list(set(overall_bowler_details['team']).intersection(set(match_bowler_details['team']))):
        print(i)
    print("---------------------------------------------")
    while True:
        team_name = input("\nEnter the desired team :\n")
        if team_name in list(set(overall_bowler_details['team']).intersection(set(match_bowler_details['team']))): break
    players_of_specific_team = list(set(overall_bowler_details.loc[overall_bowler_details['team'] == team_name, 'player_name']).intersection(set(match_bowler_details.loc[match_bowler_details['team'] == team_name, 'name'])))
    print("\nPlayer names :\n")
    print("----------------------------------------------")
    for i in players_of_specific_team:
        print(i)
    print("-----------------------------------------------")
    while True:
        d['player_name'] = input("\nEnter the desired player name :\n")
        if d['player_name'] in players_of_specific_team: break
    opposition_team = list(set(match_bowler_details.loc[match_bowler_details['name']==d['player_name'],'opposition']))
    print("\nOpposition teams available :\n")
    print("-----------------------------------------------")
    for i in opposition_team:
        print(i)
    print("-----------------------------------------------")
    while True:
        d['opposition'] = input("\nEnter the desired opposition team :\n")
        if d['opposition'] in opposition_team: break
    venues_played_against_team = set(match_bowler_details.loc[(match_bowler_details['name'] == d['player_name']), 'venue'].tolist())
    print("\nAvailable Venues :\n")
    print("-----------------------------------------------")
    for i in venues_played_against_team:
        print(i)
    print("-----------------------------------------------")
    while True:
        d['venue'] = input("\nEnter the desired venue :\n")
        if d['venue'] in venues_played_against_team: break
    return d

# User Prediction Selection

The provided code prompts the user to choose between bowling prediction and batting prediction and then utilizes the corresponding prediction functions (`bowl_perform` or `bat_perform`). The steps include:

1. **User Choice:**
   - Displays a menu with options for bowling prediction and batting prediction.
   - Prompts the user to enter the desired prediction number.

2. **Prediction and Output:**
   - If the user chooses bowling prediction (`c=="1"`):
      - Collects user inputs for bowling prediction using `get_inputs_bowling`.
      - Calls the `bowl_perform` function with the collected inputs.
      - Prints the predicted number of wickets.

   - If the user chooses batting prediction (`c!="1"`):
      - Collects user inputs for batting prediction using `get_inputs_batting`.
      - Calls the `bat_perform` function with the collected inputs.
      - Prints the predicted run range.

This code structure provides a clear and interactive way for users to select the type of cricket performance prediction they want and displays the corresponding prediction output.


In [12]:
print("1. BOWLING PREDICTION \t 2. BATTING PREDICTION \n")
c = input("Enter the desired prediction no\n")
if c=="1":
    d=get_inputs_bowling()
    r = bowl_perform(d['player_name'],d['opposition'],d['venue'])
    print("Number of wickets taken by ",d['player_name'])
    if 'bowl_prediction' in r: 
        print(r['bowl_prediction'])
    elif r.empty: print("1")
    else: print(round(r['wickets'].mean()))
else:
    d=get_inputs_batting()
    r = bat_perform(d['player_name'],d['opposition'],d['venue'])
    print("Run Prediction for ",d['player_name'])
    if 'bat_prediction' in r: 
        print(r['bat_prediction'])
    elif r.empty: print("0-10")
    else: print(round(r['runs'].mean()))

1. BOWLING PREDICTION 	 2. BATTING PREDICTION 

Enter the desired prediction no
2
Available Teams :

--------------------------------------------
Africa XI
Asia XI
Australia
Oman
New Zealand
United Arab Emirates
Thailand
Kenya
Netherlands
Ireland
Nepal
South Africa
Zimbabwe
West Indies
United States of America
Bermuda
Afghanistan
Namibia
Scotland
Jersey
Sri Lanka
Hong Kong
Pakistan
England
Bangladesh
Canada
India
Papua New Guinea
---------------------------------------------

Enter the desired team :
Jersey

Player names :

----------------------------------------------
JW Jenner
J Sumerauer
CW Perchard
H Carlyon
EJB Miles
DG Blampied
AM Tribe
NA Greenwood
JAD Lawrenson
D Birrell
JE Dunford
B Ward
BDH Stevens
-----------------------------------------------

Enter the desired player name :
Jersey

Enter the desired player name :
D Birrell

Opposition teams available :

-----------------------------------------------
United Arab Emirates
-----------------------------------------------

E