<a href="https://colab.research.google.com/github/samitf/My-Projects/blob/main/ML/PL_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Import Libraries

In [17]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

2. Load the Dataset

I have used a dataset from kaggle

LINK : https://www.kaggle.com/datasets/meraxes10/fantasy-premier-league-dataset-2024-2025/data

In [67]:
# Load the dataset
data = pd.read_csv('players.csv')

# Data Cleaning: Handle missing values and drop irrelevant columns
data.fillna(0, inplace=True)
data = data.drop(columns=['id', 'news', 'news_added'])
print(data.head())


                           name  now_cost position     team  \
0         Fábio Ferreira Vieira        55      MID  Arsenal   
1     Gabriel Fernando de Jesus        70      FWD  Arsenal   
2  Gabriel dos Santos Magalhães        60      DEF  Arsenal   
3                   Kai Havertz        80      FWD  Arsenal   
4                     Karl Hein        40      GKP  Arsenal   

   direct_freekicks_order  saves  now_cost_rank  selected_rank_type  \
0                     0.0      0            143                 153   
1                     0.0      0             29                  26   
2                     0.0      0             81                  12   
3                     0.0      0             12                   5   
4                     0.0      0            564                  53   

   ict_index_rank  creativity_rank  ...  expected_goals_per_90  bps  \
0             322              268  ...                   0.39   67   
1              71               84  ...             

3. Data Pre-processing

In [68]:
# Feature Selection: Select relevant features for goals and assists prediction
features_goals = ['starts', 'expected_goal_involvements', 'expected_goals', 'minutes', 'now_cost']
features_assists = ['starts', 'expected_assists', 'minutes', 'now_cost']

For Goals

In [69]:
# Data Preparation: Split the dataset into training and testing sets for goals prediction
X_goals = data[features_goals]
y_goals = data['goals_scored']
X_train_goals, X_test_goals, y_train_goals, y_test_goals = train_test_split(X_goals, y_goals, test_size=0.2, random_state=42)

For Assists

In [70]:
# Data Preparation: Split the dataset into training and testing sets for assists prediction
X_assists = data[features_assists]
y_assists = data['assists']
X_train_assists, X_test_assists, y_train_assists, y_test_assists = train_test_split(X_assists, y_assists, test_size=0.2, random_state=42)

4. Train model

For Goals

In [71]:
# Model Building: Train a model for goals prediction
model_goals = RandomForestRegressor()
model_goals.fit(X_train_goals, y_train_goals)

For Assists

In [72]:
# Model Building: Train a model for assists prediction
model_assists = RandomForestRegressor()
model_assists.fit(X_train_assists, y_train_assists)

5. Evaluation

For Goals

In [73]:
# Evaluation: Evaluate the model for goals prediction
y_pred_goals = model_goals.predict(X_test_goals)
print("Goals Prediction Mean Squared Error:", mean_squared_error(y_test_goals, y_pred_goals))

Goals Prediction Mean Squared Error: 1.8985820512820517


For Assists

In [74]:
# Evaluation: Evaluate the model for assists prediction
y_pred_assists = model_assists.predict(X_test_assists)
print("Assists Prediction Mean Squared Error:", mean_squared_error(y_test_assists, y_pred_assists))

Assists Prediction Mean Squared Error: 1.2977008547008548


6. Prediction

In [75]:
# Prediction: Predict the top 10 players with most goals and assists
data['predicted_goals'] = model_goals.predict(data[features_goals])
data['predicted_assists'] = model_assists.predict(data[features_assists])

top_10_goals = data.sort_values(by='predicted_goals', ascending=False).head(10)
top_10_assists = data.sort_values(by='predicted_assists', ascending=False).head(10)

For Goals

In [76]:
print("Top 10 Players with Most Goals:")
print(top_10_goals[['name', 'predicted_goals', 'now_cost']])

Top 10 Players with Most Goals:
                         name  predicted_goals  now_cost
185               Cole Palmer            20.24       105
362            Erling Haaland            18.90       150
339             Mohamed Salah            18.75       125
414            Alexander Isak            18.29        85
58              Ollie Watkins            17.79        90
520             Son Heung-min            17.04       100
16                Bukayo Saka            16.79       100
359                Phil Foden            16.40        95
84   Dominic Solanke-Mitchell            14.82        75
531              Jarrod Bowen            14.77        75


For Assists

In [77]:
print("Top 10 Players with Most Assists:")
print(top_10_assists[['name', 'predicted_assists', 'now_cost']])

Top 10 Players with Most Assists:
                   name  predicted_assists  now_cost
58        Ollie Watkins              13.84        90
411      Anthony Gordon              13.20        75
185         Cole Palmer              11.74       105
339       Mohamed Salah              11.74       125
16          Bukayo Saka              11.19       100
447  Morgan Gibbs-White              11.00        65
508     Brennan Johnson              10.74        65
511      James Maddison              10.63        75
12      Martin Ødegaard              10.62        85
128         Pascal Groß              10.55        65


In [78]:
# Calculate value metrics (predicted goals/cost and predicted assists/cost)
data['value_goals'] = data['predicted_goals'] / data['now_cost']
data['value_assists'] = data['predicted_assists'] / data['now_cost']

top_10_value_goals = data.sort_values(by='value_goals', ascending=False).head(10)
top_10_value_assists = data.sort_values(by='value_assists', ascending=False).head(10)

print("Top 10 Players with Best Goals Value:")
print(top_10_value_goals[['name', 'predicted_goals', 'now_cost', 'value_goals']])
print("/n")
print("Top 10 Players with Best Assists Value:")
print(top_10_value_assists[['name', 'predicted_assists', 'now_cost', 'value_assists']])

Top 10 Players with Best Goals Value:
                              name  predicted_goals  now_cost  value_goals
461                     Chris Wood            12.94        60     0.215667
414                 Alexander Isak            18.29        85     0.215176
132  João Pedro Junqueira de Jesus            11.74        55     0.213455
58                   Ollie Watkins            17.79        90     0.197667
84        Dominic Solanke-Mitchell            14.82        75     0.197600
531                   Jarrod Bowen            14.77        75     0.196933
183                Nicolas Jackson            14.62        75     0.194933
185                    Cole Palmer            20.24       105     0.192762
338                      Luis Díaz            14.01        75     0.186800
212           Jean-Philippe Mateta            13.82        75     0.184267
/n
Top 10 Players with Best Assists Value:
                                   name  predicted_assists  now_cost  \
411                   

For Final Team Predictions

In [80]:
# Create team-level features
team_features = data.groupby('team').agg({
    'goals_scored': 'sum',
    'assists': 'sum',
    'clean_sheets': 'sum',
    'goals_conceded': 'mean'
}).reset_index()

team_features['performance_score'] = team_features['goals_scored'] - team_features['goals_conceded']

# Feature Engineering: Create features for team performance prediction
X = team_features[['goals_scored', 'assists', 'clean_sheets', 'goals_conceded']]
y = team_features['performance_score']  # Proxy for team strength or performance

# Data Preparation: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model Building: Train a model for team performance prediction
model_team_performance = RandomForestRegressor()
model_team_performance.fit(X_train, y_train)

# Evaluation: Evaluate the model
y_pred = model_team_performance.predict(X_test)
print("Team Performance Prediction Mean Squared Error:", mean_squared_error(y_test, y_pred))

# Prediction: Predict team performance based on the latest data
latest_team_data = team_features  # Assume this is your latest data
predicted_performance = model_team_performance.predict(latest_team_data[['goals_scored', 'assists', 'clean_sheets', 'goals_conceded']])

# Add predictions to the latest team data
latest_team_data['predicted_performance'] = predicted_performance

# Sort by predicted performance
sorted_team_data = latest_team_data.sort_values(by='predicted_performance', ascending=False)

print("Predicted Team Performance:")
print(sorted_team_data[['team', 'predicted_performance']])


Team Performance Prediction Mean Squared Error: 37.24668378817361
Predicted Team Performance:
              team  predicted_performance
12        Man City              73.002709
0          Arsenal              68.032680
11       Liverpool              65.199924
5          Chelsea              63.091813
1      Aston Villa              62.777362
14       Newcastle              62.584085
17           Spurs              52.412669
13         Man Utd              30.296475
4         Brighton              29.311284
19          Wolves              27.624995
15   Nott'm Forest              26.553773
2      Bournemouth              25.695155
6   Crystal Palace              25.603664
18        West Ham              22.589912
3        Brentford              22.522481
7          Everton              16.901005
8           Fulham              16.320279
16     Southampton               4.192197
10       Leicester               4.177736
9          Ipswich               1.151234


This is just an example usage on how we can approach for a prediction for individual goals and assists scorer of the season, best defensive side, etc, based upon which we can predict the winners , although, the Final results will depend on the players current form, injury status, signings, etc