# **World Cup Qatar 2022 Final and Champion Prediction**

### *First step:*  
Import the necessary libraries: Pandas (to handle the data), Pickle () and the Poisson method that is to do statistical calculations.

In [1]:
import pandas as pd
import pickle
from scipy.stats import poisson

### *Second step*  
Import the tables that we are going to use, the files are in the GitHub repository.

In [2]:
dict_table = pickle.load(open('dict_table', 'rb'))
df_historical_data = pd.read_csv('clean_fifa_worldcup_matches.csv')
df_fixture = pd.read_csv('clean_fifa_worldcup_fixture.csv')

### *Third step:*  
Calculate the strength of the team, in this case we will calculate it from the average goals conceded every 90' (1 game) and the goals scored against the rivals.

In [3]:
df_home = df_historical_data[['HomeTeam', 'HomeGoals', 'AwayGoals']]
df_away = df_historical_data[['AwayTeam', 'HomeGoals', 'AwayGoals']]

df_home = df_home.rename(columns={'HomeTeam': 'Team', 'HomeGoals': 'GoalsScored', 'AwayGoals': 'GoalsConceded'})
df_away = df_away.rename(columns={'AwayTeam': 'Team', 'HomeGoals': 'GoalsConceded', 'AwayGoals': 'GoalsScored'})

df_team_strength = pd.concat([df_home, df_away], ignore_index=True).groupby('Team').mean()
df_team_strength

Unnamed: 0_level_0,GoalsScored,GoalsConceded
Team,Unnamed: 1_level_1,Unnamed: 2_level_1
Algeria,1.000000,1.461538
Angola,0.333333,0.666667
Argentina,1.691358,1.148148
Australia,0.812500,1.937500
Austria,1.482759,1.620690
...,...,...
Uruguay,1.553571,1.321429
Wales,0.800000,0.800000
West Germany,2.112903,1.241935
Yugoslavia,1.666667,1.272727


### *Fourth Step:*  
We created the functions that will help us to know which team wins in a match and the function that helps us update the tables, to find out the match for third place, we change the functions a bit so that it also extracts the loser of the matches.

First function to know the probability that one team beats another.

In [4]:
def predict_points(home, away):
    if home in df_team_strength.index and away in df_team_strength.index:
        #lambda = goals_scored * goals_conceded
        lamb_home = df_team_strength.at[home, 'GoalsScored'] * df_team_strength.at[away, 'GoalsConceded']
        lamb_away = df_team_strength.at[away, 'GoalsScored'] * df_team_strength.at[home, 'GoalsConceded']

        prob_home, prob_away, prob_draw = 0, 0, 0

        for x in range(0, 11): #goals made by home team
            for y in range(0, 11): #goals made by away team
                p = poisson.pmf(x, lamb_home) * poisson.pmf(y, lamb_away)

                if x == y:
                    prob_draw += p
                elif x > y:
                    prob_home += p
                else:
                    prob_away += p

        points_home = 3 * prob_home + prob_draw
        points_away = 3 * prob_away + prob_draw

        return (points_home, points_away)
    
    #If the team has not participated in a World Cup, 
    #we will not be able to know its historical data, so we cannot assume data from it.
    else:
        return (0, 0)

Second function to know the updated tables.

In [5]:
def update_table(df_fixture_round_1, df_fixture_round_2):
    # We iterate through each row of the previous df and locate the winner 
    # (the column we've created) and the match number it is.
    for index, row in df_fixture_round_1.iterrows():
        winner = df_fixture_round_1.loc[index, 'winner']
        loser = df_fixture_round_1.loc[index, 'loser']
        match = df_fixture_round_1.loc[index, 'score']

        # We substitute the winner we found before by finding the match number 
        # in the new table.
        df_fixture_round_2.replace({f'Winners {match}': winner}, inplace=True)
        df_fixture_round_2.replace({f'Losers {match}': loser}, inplace = True)
    
    # we create the column of the winner with the '?' to find it easier later.
    df_fixture_round_2['winner'] = '?'
    df_fixture_round_2['loser'] = '?'

    return df_fixture_round_2

Third function to know the winners of each crossing.

In [6]:
def get_winner(df_fixture_updated):
    #In this case we iterate row by row since each row is a match.
    for index, row in df_fixture_updated.iterrows():
        # We get the name of the home and away team. 
        home, away = row['home'], row['away']
        
        # We use the function that we use in the group stage 
        # to see the probability that one or the other wins.
        points_home, points_away = predict_points(home, away)
        
        # As we are not interested in the points we see which one is 
        # higher and we give it the label 'winner'
        if points_home > points_away:
            winner = home
            loser = away
        else:
            winner = away
            loser = home
        
        # We replace the winner column created earlier and assign to it
        # the team that will win the match.
        df_fixture_updated.loc[index, 'winner'] = winner
        df_fixture_updated.loc[index, 'loser'] = loser

    return df_fixture_updated

### *Fifth step:*  
We prepare the tables for the semis (which we already know) and the final.

In [7]:
df_fixture_semi = df_fixture[60:62].copy()
df_fixture_final = df_fixture[62:].copy()

df_fixture_semi

Unnamed: 0,home,score,away,year
60,Winners Match 57,Match 61,Winners Match 58,2022
61,Winners Match 59,Match 62,Winners Match 60,2022


We edit the table so that it shows the matches that we have to predict.

In [8]:
df_semi_updated = df_fixture_semi.replace({'Winners Match 57': 'Argentina', 'Winners Match 58': 'Croatia', 'Winners Match 59': 'France', 'Winners Match 60': 'Morocco'})
df_semi_updated['winner'] = '?'
df_semi_updated['loser'] = '?'

df_semi_updated

Unnamed: 0,home,score,away,year,winner,loser
60,Argentina,Match 61,Croatia,2022,?,?
61,France,Match 62,Morocco,2022,?,?


We use the function to know the winner and loser of each match.

In [9]:
get_winner(df_semi_updated)

Unnamed: 0,home,score,away,year,winner,loser
60,Argentina,Match 61,Croatia,2022,Argentina,Croatia
61,France,Match 62,Morocco,2022,France,Morocco


Updated the table of the matches of the third place and the final (respectively).

In [10]:
update_table(df_semi_updated, df_fixture_final)

Unnamed: 0,home,score,away,year,winner,loser
62,Croatia,Match 63,Morocco,2022,?,?
63,Argentina,Match 64,France,2022,?,?


We predict the matches with the function created above.

In [11]:
get_winner(df_fixture_final)

Unnamed: 0,home,score,away,year,winner,loser
62,Croatia,Match 63,Morocco,2022,Croatia,Morocco
63,Argentina,Match 64,France,2022,France,Argentina


We edit the table to see more clearly which match is the final and which match is for third place.

In [12]:
df_final_and_third = df_fixture_final.replace({'Match 63': '3rd place match', 'Match 64': 'Final Match'})
df_final_and_third


Unnamed: 0,home,score,away,year,winner,loser
62,Croatia,3rd place match,Morocco,2022,Croatia,Morocco
63,Argentina,Final Match,France,2022,France,Argentina


As we can see in the final table, we see that the World Cup classification would look like this:  

**1st Place --> France**  
**2nd Place --> Argentina**  
**3rd Place --> Croatia**  
**4th PLace --> Morocco**  

Unfortunately, according to our model, Messi will not be able to lift the World Cup in what is supposed to be his last appearance in the most important football tournament.