# What are the chances that Swedens wins it's qualification group? 

## Relevant information: 
* Group teams: Sweden, Switzerland, Kosovo, Slovenien
* Each team plays 6 games in total, 2 against each team. 

## Strategy:

### Step 1 Match result probability(MRP) method:
* Find a method for determining the probability of each team scoring k amount of goals(ie. Poission dist) 
* Find an evaluation method, then evaluate on historical data

#### Elo and poisson method:
* Calcualte elo in accordance with, https://www.kaggle.com/code/thomasstokes/custom-football-elo-rating
* Fit two linear regression model, where the elo rating difference is fitted to the scored goals of the team, one model for home team and one for away team(this way we can incorporate home advanatge in our model) 
* Then set up a poisson distrubution for expected goals

#### Evaluation method:


### Step 2(MC simulate Qualification round):
* Run a monte carlo simulation of the group stage and see how many times sweden ends on top.


## Load data

In [1]:
import kagglehub
import pandas as pd

# Download latest version
path = kagglehub.dataset_download("martj42/international-football-results-from-1872-to-2017")
df = pd.read_csv(path+"/results.csv")
df['tournament'].str.lower().unique()

print("Last updated: " + df['date'].max())

Last updated: 2025-06-10


## Make elo columns

New columns info:
* The new columns "home_elo" and "away_elo" represents the elo scorce each team recived after the match
* The new columns "elo_dff" represents the elo difference befrore the match(home_elo-away_elo)
* The new columns "score_dff" represents the score difference(home_elo-away_elo)
* The new columns "importance" represents a quantification of the importance of match

In [2]:
from util import game_importance_score, update_elo

### --- Append importance column --- ###
df['importance'] = df.apply(game_importance_score, axis = 1)

### --- Append elo columns --- ###
all_countries = pd.unique(pd.concat([df['home_team'], df['away_team']]))
elo_home = []
elo_away = []
elo_diff = []

current_elo = {}
for country in all_countries:
    current_elo[country] = 1500

for index,row in df.iterrows():
    home_team = row.home_team
    away_team = row.away_team
    home_score = row.home_score
    away_score = row.away_score
    i = row.importance
    
    home_current_elo = current_elo[home_team]
    away_current_elo = current_elo[away_team]
    current_elo_diff = home_current_elo-away_current_elo

    (new_home_elo, new_away_elo) = update_elo(home_current_elo, away_current_elo, (home_score-away_score), i)

    # Update elo
    current_elo[home_team] = new_home_elo
    current_elo[away_team] = new_away_elo
    elo_home.append(new_home_elo)
    elo_away.append(new_away_elo)
    elo_diff.append(current_elo_diff)


df['home_elo'] = elo_home
df['away_elo'] = elo_away
df['elo_diff'] = elo_diff
df['score_diff'] = df['home_score']-df['away_score']


#### List current elo in descending order

In [3]:
countries = list(current_elo.keys())
elo = list(current_elo.values())
df_current_elo = pd.DataFrame(data={'Country': countries, 'Elo': elo})
print(df_current_elo.sort_values('Elo', ascending=False).head(5))

      Country          Elo
8   Argentina  2130.339355
35      Spain  2121.628003
10     France  2110.234101
31     Brazil  2045.682411
1     England  2022.801725


## Step 1:

### Fit model :

In [4]:
from util import MRP_Poisson_Dist

mrp_model = MRP_Poisson_Dist()
mrp_model.fit(elo_diff=df[['elo_diff']], home_score=df['home_score'], away_score=df['away_score'])

## Step 2:

### Group def

In [5]:
group_b_matches = { 'home_team': 
['Slovenia', 'Switzerland', 'Kosovo', 'Switzerland', 'Kosovo', 'Sweden', 'Slovenia', 'Sweden', 'Slovenia', 'Switzerland',
 'Kosovo', 'Sweden' ],
'away_team': ['Sweden', 'Kosovo', 'Sweden', 'Slovenia', 'Slovenia', 'Switzerland', 'Switzerland', 'Kosovo', 'Kosovo', 'Sweden', 
'Switzerland', 'Slovenia']    
 }

df_group_b = pd.DataFrame(data=group_b_matches)

### MC simulation

In [6]:
from util import simulate_group_play, get_placement
N = 100
number_of_qualifications = 0

for n in range(0,N):
    match_res = simulate_group_play(df_group_b, current_elo, mrp_model)
    placement = get_placement(match_res)

    if placement[0] == 'Sweden':
        number_of_qualifications += 1


print(f'Chance of qualifying: {number_of_qualifications/N}')

Chance of qualifying: 0.4
