# What are the chances that Swedens wins it's qualification group? 

## Relevant information: 
* Group teams: Sweden, Switzerland, Kosovo, Slovenien
* Each team plays 6 games in total, 2 against each team. 

## Strategy:

### Step 1 Match result probability(MRP) method:
* Find a method for determining the probability of each team scoring k amount of goals(ie. Poission dist) 
* Find an evaluation method, then evaluate on historical data

#### Elo and poisson method:
* Calcualte elo in accordance with, https://www.kaggle.com/code/thomasstokes/custom-football-elo-rating
* Fit two linear regression model, where the elo rating difference is fitted to the scored goals of the team, one model for home team and one for away team(this way we can incorporate home advanatge in our model) 
* Then set up a poisson distrubution for expected goals

#### Evaluation method:


### Step 2(MC simulate Qualification round):
* Run a monte carlo simulation of the group stage and see how many times sweden ends on top.


## Load data

In [1]:
import kagglehub
import pandas as pd

# Download latest version
path = kagglehub.dataset_download("martj42/international-football-results-from-1872-to-2017")
df = pd.read_csv(path+"/results.csv")
df['tournament'].str.lower().unique()

print("Last updated: " + df['date'].max())

Last updated: 2025-06-10


## Make elo columns

New columns info:
* The new columns "home_elo" and "away_elo" represents the elo scorce each team recived after the match
* The new columns "elo_dff" represents the elo difference befrore the match(home_elo-away_elo)
* The new columns "score_dff" represents the score difference(home_elo-away_elo)
* The new columns "importance" represents a quantification of the importance of match

In [16]:
from util import game_importance_score, update_elo_draw, update_elo_win

### --- Append importance column --- ###
df['importance'] = df.apply(game_importance_score, axis = 1)

### --- Append elo columns --- ###
all_countries = pd.unique(pd.concat([df['home_team'], df['away_team']]))
elo_home = []
elo_away = []
elo_diff = []

current_elo = {}
for country in all_countries:
    current_elo[country] = 1500

for index,row in df.iterrows():
    home_team = row.home_team
    away_team = row.away_team
    home_score = row.home_score
    away_score = row.away_score
    i = row.importance
    score_diff = home_score-away_score
    
    home_current_elo = current_elo[home_team]
    away_current_elo = current_elo[away_team]
    current_elo_diff = home_current_elo-away_current_elo

    home_win = (score_diff)>0
    if (score_diff)>0:
        (new_home_elo, new_away_elo) = update_elo_win(home_current_elo,away_current_elo,i) 
    elif (score_diff)==0:
        (new_home_elo, new_away_elo) = update_elo_draw(home_current_elo,away_current_elo,i) 
    else:
        (new_away_elo, new_home_elo) = update_elo_win(away_current_elo,home_current_elo,i)

    # Update elo
    current_elo[home_team] = new_home_elo
    current_elo[away_team] = new_away_elo
    elo_home.append(new_home_elo)
    elo_away.append(new_away_elo)
    elo_diff.append(current_elo_diff)


df['home_elo'] = elo_home
df['away_elo'] = elo_away
df['elo_diff'] = elo_diff
df['score_diff'] = df['home_score']-df['away_score']


#### List current elo in descending order

In [17]:
countries = list(current_elo.keys())
elo = list(current_elo.values())
df_current_elo = pd.DataFrame(data={'Country': countries, 'Elo': elo})
print(df_current_elo.sort_values('Elo', ascending=False).head(5))

      Country          Elo
8   Argentina  2130.339355
35      Spain  2121.628003
10     France  2110.234101
31     Brazil  2045.682411
1     England  2022.801725


## Step 1:

### Fit model :

In [18]:
from util import MRP_Poisson_Dist

mrp_model = MRP_Poisson_Dist()
mrp_model.fit(elo_diff=df[['elo_diff']], home_score=df['home_score'], away_score=df['away_score'])

#### Match result function

In [34]:
test_match = df[df['home_team']=='Sweden'].iloc[528]
print(f'Home: {test_match.home_team}, Away: {test_match.away_team}, Date: {test_match.date}')
print(f'Home elo: {test_match.home_elo}, Away elo: {test_match.away_elo}')
print('')

(home_goals, away_goals) = mrp_model.random_res([[test_match.elo_diff]])

print(f"Simulated result: Home {home_goals} - {away_goals} Away")

Home: Sweden, Away: Algeria, Date: 2025-06-10
Home elo: 1790.3248350290453, Away elo: 1783.7272618705604

Simulated result: Home [2] - [3] Away




## Step 2:

### Simulate group play

In [None]:
group_b_matches = { 'Home Team': 
['Slovenia', 'Switzerland', 'Kosovo', 'Switzerland', 'Kosovo', 'Sweden', 'Slovenia', 'Sweden', 'Slovenia', 'Switzerland',
 'Kosovo', 'Sweden' ],
'Away Team': ['Sweden', 'Kosovo', 'Sweden', 'Slovenia', 'Slovenia', 'Switzerland', 'Switzerland', 'Kosovo', 'Kosovo', 'Sweden', 
'Switzerland', 'Slovenia']    
 }

df_group_b = pd.DataFrame(data=group_b_matches)

print(df_group_b)

      Home Team    Away Team
0      Slovenia       Sweden
1   Switzerland       Kosovo
2        Kosovo       Sweden
3   Switzerland     Slovenia
4        Kosovo     Slovenia
5        Sweden  Switzerland
6      Slovenia  Switzerland
7        Sweden       Kosovo
8      Slovenia       Kosovo
9   Switzerland       Sweden
10       Kosovo  Switzerland
11       Sweden     Slovenia


In [None]:
# From datafram with all group matches in order
def get_placement(df: pd.DataFrame, starting_elo: dict):
    current_elo = starting_elo.copy()

    for row in df.iterrows():
        
