# Predicción de la copa del mundo de Catar 2022

En esta práctica vamos a generar un modelo que nos ayude a predecir los resultados de la copa del mundo des de la fase de grupos hasta la final del campeonato. para ello seguiremos el proceso que se detalla en esta libreta.

### *First step:*  
Importar las librerias necesarias: Pandas (para manejar los datos), Pickle () y el método Poisson que es para hacer cálculos estadísticos.

In [29]:
import pandas as pd
import pickle
from scipy.stats import poisson

### *Second step*  
Importar las tablas que vamos a utilizar, los archivos se encuentran en el repositorio de GitHub.

In [30]:
dict_table = pickle.load(open('dict_table', 'rb'))
df_historical_data = pd.read_csv('clean_fifa_worldcup_matches.csv')
df_fixture = pd.read_csv('clean_fifa_worldcup_fixture.csv')

### *Third step:*  
Calcular la fuerza del equipo.

In [31]:
df_home = df_historical_data[['HomeTeam', 'HomeGoals', 'AwayGoals']]
df_away = df_historical_data[['AwayTeam', 'HomeGoals', 'AwayGoals']]
df_away

Unnamed: 0,AwayTeam,HomeGoals,AwayGoals
0,Mexico,4,1
1,Argentina,4,2
2,Yugoslavia,6,1
3,United States,6,1
4,Belgium,1,0
...,...,...,...
895,Costa Rica,2,0
896,Switzerland,1,2
897,Brazil,0,2
898,Peru,1,0


In [32]:
df_home = df_home.rename(columns={'HomeTeam': 'Team', 'HomeGoals': 'GoalsScored', 'AwayGoals': 'GoalsConceded'})
df_away = df_away.rename(columns={'AwayTeam': 'Team', 'HomeGoals': 'GoalsConceded', 'AwayGoals': 'GoalsScored'})

In [33]:
df_team_strength = pd.concat([df_home, df_away], ignore_index=True).groupby('Team').mean()
df_team_strength

Unnamed: 0_level_0,GoalsScored,GoalsConceded
Team,Unnamed: 1_level_1,Unnamed: 2_level_1
Algeria,1.000000,1.461538
Angola,0.333333,0.666667
Argentina,1.691358,1.148148
Australia,0.812500,1.937500
Austria,1.482759,1.620690
...,...,...
Uruguay,1.553571,1.321429
Wales,0.800000,0.800000
West Germany,2.112903,1.241935
Yugoslavia,1.666667,1.272727


### *Fourth step:*  


In [40]:
def predict_points(home, away):
    if home in df_team_strength.index and away in df_team_strength.index:
        #lambda = goals_scored * goals_conceded
        lamb_home = df_team_strength.at[home, 'GoalsScored'] * df_team_strength.at[away, 'GoalsConceded']
        lamb_away = df_team_strength.at[away, 'GoalsScored'] * df_team_strength.at[home, 'GoalsConceded']

        prob_home, prob_away, prob_draw = 0, 0, 0

        for x in range(0, 11): #goals made by home team
            for y in range(0, 11): #goals made by away team
                p = poisson.pmf(x, lamb_home) * poisson.pmf(y, lamb_away)

                if x == y:
                    prob_draw += p
                elif x > y:
                    prob_home += p
                else:
                    prob_away += p

        points_home = 3 * prob_home + prob_draw
        points_away = 3 * prob_away + prob_draw

        return (points_home, points_away)
    
    #If the team has not participated in a World Cup, 
    #we will not be able to know its historical data, so we cannot assume data from it.
    else:
        return (0, 0)

Provamos la función

In [43]:
print(predict_points('Argentina', 'Mexico'))
print(predict_points('England', 'United States'))
print(predict_points('Qatar (H)', 'Ecuador'))

(2.3129151525530505, 0.5378377125059863)
(2.2356147635326007, 0.5922397535606193)
(0, 0)


### *5th step:*  


In [44]:
# we divided the world cup into its phases
df_fixture_group_48 = df_fixture[:48].copy()
df_fixture_knockout = df_fixture[48:56].copy()
df_fixture_quarter = df_fixture[56:60].copy()
df_fixture_semi = df_fixture[60:62].copy()
df_fixture_final = df_fixture[62:].copy()

Predecimos la fase de grupos

In [45]:
for group in dict_table:
    teams_in_group = dict_table[group]['Team'].values
    df_fixture_group_6 = df_fixture_group_48[df_fixture_group_48['home'].isin(teams_in_group)]
    
    for index, row in df_fixture_group_6.iterrows():
        home, away = row['home'], row['away']
        points_home, points_away = predict_points(home, away)
        dict_table[group].loc[dict_table[group]['Team'] == home, 'Pts'] += points_home
        dict_table[group].loc[dict_table[group]['Team'] == away, 'Pts'] += points_away
    
    dict_table[group] = dict_table[group].sort_values('Pts', ascending=False).reset_index()
    dict_table[group] = dict_table[group][['Team', 'Pts']]
    dict_table[group] = dict_table[group].round(0)


In [46]:
dict_table['Group A']

Unnamed: 0,Team,Pts
0,Netherlands,4.0
1,Senegal,2.0
2,Ecuador,2.0
3,Qatar (H),0.0
