## Mixed-Integer Linear Program (MILP) using the CBC solver (Coin-or branch and cut)

### Interpreting the Output : 
For each player, a performance score is computed — derived from pairwise interactions. In the output, the final objective value represents the sum of selected players’ performance contributions (multiplied by 100) for numerical stability during optimization.

Although the absolute magnitude of the score may appear lower than expected. The formulation is theoretically sound, as evidenced by the solver reaching an optimal solution without invoking any branching, cutting planes, or node expansion. This means:

    * The problem was well-constrained and solvable using heuristics.

    * The feasibility pump found a valid integer solution immediately.

    * No time was spent exploring alternate solutions via branch-and-bound.

The solver output consists of two key components:

    * Objective Function Value:
    The cumulative performance score for the selected team.
    This value is what the model seeks to maximize, subject to all constraints.

    * Selected Playing XI:
    The optimal team composition as determined by the solver — adhering to role requirements (batsmen, bowlers, all-rounders, etc.) and maximizing the team’s overall performance potential under the given model.
NOTE : Ignore the warning.

In [None]:
!pip install pulp

In [5]:
import pandas as pd
from pulp import LpMaximize, LpProblem, LpVariable, lpSum

# Function to compute bowler performance score with smoothing
def bowler_score(bowler, opposition_batsmen, stats, min_balls=20, wicket_value=20, dot_value=5, boundary_penalty=10, alpha=0.5):
    scores = []
    overall_stats = stats[stats['bowler'] == bowler]
    n_o = overall_stats['balls'].sum() if not overall_stats.empty else 0
    w_o = overall_stats['is_wicket'].sum() if not overall_stats.empty else 0
    r_o = overall_stats['Runs'].sum() if not overall_stats.empty else 0
    d_o = overall_stats['derived_dot_balls'].sum() if not overall_stats.empty else 0
    b_o = overall_stats['derived_boundaries'].sum() if not overall_stats.empty else 0

    for t in opposition_batsmen:
        pair_stats = stats[(stats['bowler'] == bowler) & (stats['batsman'] == t)]
        if not pair_stats.empty:
            n = pair_stats['balls'].iloc[0]
            w = pair_stats['is_wicket'].iloc[0]
            r = pair_stats['Runs'].iloc[0]
            d = pair_stats['derived_dot_balls'].iloc[0]
            b = pair_stats['derived_boundaries'].iloc[0]
            if n >= min_balls:
                e = ((w / n) * wicket_value - (r / n) + (d / n) * dot_value - (b / n) * boundary_penalty) * 100
            else:
                n_smooth = n + alpha * n_o
                w_smooth = w + alpha * w_o
                r_smooth = r + alpha * r_o
                d_smooth = d + alpha * d_o
                b_smooth = b + alpha * b_o
                e = ((w_smooth / n_smooth) * wicket_value - (r_smooth / n_smooth) + (d_smooth / n_smooth) * dot_value - (b_smooth / n_smooth) * boundary_penalty) * 100
            print(f"Bowler {bowler} vs Batsman {t}: {n} balls, {w} wickets, {r} runs, {d} dot balls, {b} boundaries, score = {e}")
        else:
            if n_o > 0:
                e = ((w_o / n_o) * wicket_value - (r_o / n_o) + (d_o / n_o) * dot_value - (b_o / n_o) * boundary_penalty) * 100
                print(f"Bowler {bowler} vs Batsman {t}: Using overall stats, score = {e}")
            else:
                e = 10  # Default score
                print(f"Bowler {bowler} vs Batsman {t}: No data, default score = {e}")
        scores.append(e)
    return sum(scores) / len(opposition_batsmen) if scores else 0

# Function to compute batsman performance score with smoothing
def batsman_score(batsman, opposition_bowlers, stats, min_balls=20, boundary_value=0.5, beta=0.5):
    scores = []
    overall_stats = stats[stats['batsman'] == batsman]
    n_o = overall_stats['balls'].sum() if not overall_stats.empty else 0
    r_o = overall_stats['Runs'].sum() if not overall_stats.empty else 0
    b_o = overall_stats['derived_boundaries'].sum() if not overall_stats.empty else 0

    for o in opposition_bowlers:
        pair_stats = stats[(stats['batsman'] == batsman) & (stats['bowler'] == o)]
        if not pair_stats.empty:
            n = pair_stats['balls'].iloc[0]
            r = pair_stats['Runs'].iloc[0]
            b = pair_stats['derived_boundaries'].iloc[0]
            if n >= min_balls:
                e = ((r / n) + (b / n) * boundary_value) * 100
            else:
                n_smooth = n + beta * n_o
                r_smooth = r + beta * r_o
                b_smooth = b + beta * b_o
                e = ((r_smooth / n_smooth) + (b_smooth / n_smooth) * boundary_value) * 100
            print(f"Batsman {batsman} vs Bowler {o}: {n} balls, {r} runs, {b} boundaries, score = {e}")
        else:
            if n_o > 0:
                e = ((r_o / n_o) + (b_o / n_o) * boundary_value) * 100
                print(f"Batsman {batsman} vs Bowler {o}: Using overall stats, score = {e}")
            else:
                e = 10  # Default score
                print(f"Batsman {batsman} vs Bowler {o}: No data, default score = {e}")
        scores.append(e)
    return sum(scores) / len(opposition_bowlers) if scores else 0

# Main function to select the playing eleven
def select_playing_eleven(opposition_team, squad, data_file='final.csv'):
    try:
        df = pd.read_csv(data_file)
        df['Stadium_Type'] = df['Stadium_Type'].fillna('Unknown')
        stadium_order = {'Small': 1, 'Medium': 2, 'Large': 3, 'Unknown': 4}
        df['Stadium_Rank'] = df['Stadium_Type'].map(stadium_order)
        df = df.sort_values(by='Stadium_Rank').drop(columns=['Stadium_Rank'])

        if df.empty:
            raise ValueError("Dataset is empty.")
        print(f"Total dataset size: {len(df)} rows")

        # Compute historical statistics with derived metrics
        stats = df.groupby(['bowler', 'batsman']).apply(
            lambda g: pd.Series({
                'Runs': g['Runs'].sum(),
                'is_wicket': g['is_wicket'].sum(),
                'balls': g['ball'].count(),
                'derived_dot_balls': ((g['Runs'] == 0) & (g['is_wicket'] == False)).sum(),
                'derived_boundaries': (g['Runs'] >= 4).sum()
            })
        ).reset_index()

        if stats.empty:
            raise ValueError("No bowler-batsman pairs found after aggregation.")
        print(f"Number of bowler-batsman pairs: {len(stats)}")
        print(f"Pairs with <20 balls: {len(stats[stats['balls'] < 20])}")
        print(f"Sparsity percentage (<20 balls): {len(stats[stats['balls'] < 20]) / len(stats) * 100:.2f}%")

        # Check player data availability
        squad_names = [p['name'] for p in squad]
        opposition_names = [p['name'] for p in opposition_team]
        stats_bowlers = set(stats['bowler'])
        stats_batsmen = set(stats['batsman'])
        missing_squad = [name for name in squad_names if name not in stats_bowlers and name not in stats_batsmen]
        missing_opposition = [name for name in opposition_names if name not in stats_bowlers and name not in stats_batsmen]
        print(f"Missing squad players in stats: {missing_squad}")
        print(f"Missing opposition players in stats: {missing_opposition}")

        # Separate opposition batsmen and bowlers
        opposition_batsmen = [p['name'] for p in opposition_team if p['role'] in ['batsman', 'wicket-keeper']]
        opposition_bowlers = [p['name'] for p in opposition_team if p['role'] == 'bowler']
        print(f"Opposition Batsmen: {opposition_batsmen}")
        print(f"Opposition Bowlers: {opposition_bowlers}")

        # Compute performance scores
        player_scores = {}
        for player in squad:
            name = player['name']
            role = player['role']
            if role == 'bowler':
                score = bowler_score(name, opposition_batsmen, stats)
            elif role in ['batsman', 'wicket-keeper']:
                score = batsman_score(name, opposition_bowlers, stats)
            elif role == 'all-rounder':
                bat_score = batsman_score(name, opposition_bowlers, stats)
                bowl_score = bowler_score(name, opposition_batsmen, stats)
                score = (bat_score + bowl_score) / 2  # Average for all-rounders
            player_scores[name] = score
            print(f"Player: {name} ({role}) score is {score}")

        # Set up the optimization problem
        prob = LpProblem("Playing_Eleven_Selection", LpMaximize)
        x = {p['name']: LpVariable(f"x_{p['name']}", 0, 1, 'Binary') for p in squad}

        # Objective
        prob += lpSum([x[p['name']] * player_scores[p['name']] for p in squad])

        # Constraints
        prob += lpSum([x[p['name']] for p in squad]) == 11
        batsmen = [p['name'] for p in squad if p['role'] == 'batsman']
        bowlers = [p['name'] for p in squad if p['role'] == 'bowler']
        all_rounders = [p['name'] for p in squad if p['role'] == 'all-rounder']
        wicket_keepers = [p['name'] for p in squad if p['role'] == 'wicket-keeper']
        prob += lpSum([x[p] for p in batsmen]) >= 3
        prob += lpSum([x[p] for p in batsmen]) <= 6
        prob += lpSum([x[p] for p in bowlers]) >= 3
        prob += lpSum([x[p] for p in bowlers]) <= 6
        prob += lpSum([x[p] for p in all_rounders]) >= 1
        prob += lpSum([x[p] for p in all_rounders]) <= 4
        prob += lpSum([x[p] for p in wicket_keepers]) == 1

        prob.solve()

        # Check solver status
        status = prob.status
        print(f"Solver Status: {status} (1=Optimal, 0=Not Solved, -1=Infeasible)")
        if status != 1:
            raise ValueError("Optimization failed to find an optimal solution.")

        # Extract selected players
        selected_players = [p for p in squad if x[p['name']].value() == 1]
        return selected_players

    except Exception as e:
        print(f"Error: {e}")
        return None

if __name__ == "__main__":
    # Opposition team (11 players)
    opposition_team = [
        {'name': 'Virat Kohli', 'role': 'batsman'},
        {'name': 'Shubman Gill', 'role': 'batsman'},
        {'name': 'Suryakumar Yadav', 'role': 'batsman'},
        {'name': 'Tilak Varma', 'role': 'batsman'},
        {'name': 'Heinrich Klaasen', 'role': 'batsman'},
        {'name': 'Ravindra Jadeja', 'role': 'all-rounder'},
        {'name': 'Jasprit Bumrah', 'role': 'bowler'},
        {'name': 'Yuzvendra Chahal', 'role': 'bowler'},
        {'name': 'Arshdeep Singh', 'role': 'bowler'},
        {'name': 'Trent Boult', 'role': 'bowler'},
        {'name': 'KL Rahul', 'role': 'wicket-keeper'}
    ]

    # Squad (15 players)
    squad = [
        {'name': 'Rohit Sharma', 'role': 'batsman'},
        {'name': 'Faf du Plessis', 'role': 'batsman'},
        {'name': 'Ruturaj Gaikwad', 'role': 'batsman'},
        {'name': 'Sanju Samson', 'role': 'batsman'},
        {'name': 'Jos Buttler', 'role': 'batsman'},
        {'name': 'Pat Cummins', 'role': 'bowler'},
        {'name': 'Kagiso Rabada', 'role': 'bowler'},
        {'name': 'Rashid Khan', 'role': 'bowler'},
        {'name': 'Sunil Narine', 'role': 'bowler'},
        {'name': 'Mohammed Shami', 'role': 'bowler'},
        {'name': 'Andre Russell', 'role': 'all-rounder'},
        {'name': 'Glenn Maxwell', 'role': 'all-rounder'},
        {'name': 'Hardik Pandya', 'role': 'all-rounder'},
        {'name': 'MS Dhoni', 'role': 'wicket-keeper'},
        {'name': 'Rishabh Pant', 'role': 'wicket-keeper'}
    ]

    selected_team = select_playing_eleven(opposition_team, squad)
    if selected_team:
        print("\nSelected Playing Eleven:")
        for player in selected_team:
            print(f"{player['name']} ({player['role']})")

Total dataset size: 143823 rows
Number of bowler-batsman pairs: 15991
Pairs with <20 balls: 14300
Sparsity percentage (<20 balls): 89.43%
Missing squad players in stats: []
Missing opposition players in stats: []
Opposition Batsmen: ['Virat Kohli', 'Shubman Gill', 'Suryakumar Yadav', 'Tilak Varma', 'Heinrich Klaasen', 'KL Rahul']
Opposition Bowlers: ['Jasprit Bumrah', 'Yuzvendra Chahal', 'Arshdeep Singh', 'Trent Boult']
Batsman Rohit Sharma vs Bowler Jasprit Bumrah: 6 balls, 7 runs, 0 boundaries, score = 143.56382978723403
Batsman Rohit Sharma vs Bowler Yuzvendra Chahal: 29 balls, 34 runs, 2 boundaries, score = 120.6896551724138
Batsman Rohit Sharma vs Bowler Arshdeep Singh: 14 balls, 17 runs, 2 boundaries, score = 143.5296191819464
Batsman Rohit Sharma vs Bowler Trent Boult: 20 balls, 19 runs, 1 boundaries, score = 97.5
Player: Rohit Sharma (batsman) score is 126.32077603539855
Batsman Faf du Plessis vs Bowler Jasprit Bumrah: 41 balls, 61 runs, 8 boundaries, score = 158.53658536585365

  stats = df.groupby(['bowler', 'batsman']).apply(
