# Ultra Naive Mundial Predictor
#### Author: Mikołaj Zabiegliński
#### https://github.com/m-zabieglinski/
The goal of the assignment was to only predict win, loss, or draw for every match. No probabilities for every outcome were required. Hence, no advanced model was necessary. The only goal was to get a 1 / 0 type of prediction.

<br/><br/> 

## Predicting model

The Ultra Naive Mundial Predictor uses the latest official FIFA ranking (October 6th 2022). It uses 2 functions to predict a result for every match:
#### pick_winner(team1, team2, ranking)
always picks the higher ranked team in the ranking to win
#### pick_winner_or_draw(team1, team2, delta ranking)
always picks the higher ranked team in the ranking to win OR predicts a draw, if the ranking difference between the teams (score) is below certain treshold (delta). 
For the purpose of this project, delta was set so that 8 draws in the entire groupstage were guaranteed to happen (based on the ultra naive observation that Football Worldcup 2014 had 7 draws and Football Worldcup 2018 8 draws).

<br/><br/> 

## Results
The model does not adjust itself after any match. It uses only the latest FIFA ranking for every game, up to the finals. Hence, it of course predicts Brazil (the #1 team) as the overall winner.
The results for each stage (8 groups and brackets) are in properly described cells in this notebook, which also runs the entire code necessary to get the predictions.

## Other comments
Each cell generating code contains comments describing the thought process behind each step and also the execution. The functions are likewise commented.

In [3]:
"""
FIFA worldcup ranking for 06.10.2022 was manually copied from https://www.11v11.com .
This was much easier and faster than trying to scrape either that site or the FIFA website.
Example link that could be scraped off: https://www.11v11.com/teams/indonesia/option/ranking/
"""
import pandas as pd

fifa_ranking = pd.read_csv("worldcup_ranking.csv", sep = "\t", header = None)
fifa_ranking

Unnamed: 0,0,1,2,3,4
0,1,Brazil,1841,0,4
1,2,Belgium,1817,0,-5
2,3,Argentina,1774,0,3
3,4,France,1760,0,-5
4,5,England,1728,0,-9
...,...,...,...,...,...
206,207,Sri Lanka,825,0,0
207,208,US Virgin Islands,824,0,0
208,209,British Virgin Islands,809,0,0
209,210,Anguilla,791,0,0


In [4]:
"""
Cleaning the dataframe.
"""
fifa_ranking.drop(columns = [0, 3, 4], inplace = True)
fifa_ranking.rename(columns = {1: "Team", 2: "Rating"}, inplace = True)
fifa_ranking.Team = fifa_ranking.Team.apply(lambda x: x.strip())
fifa_ranking

Unnamed: 0,Team,Rating
0,Brazil,1841
1,Belgium,1817
2,Argentina,1774
3,France,1760
4,England,1728
...,...,...
206,Sri Lanka,825
207,US Virgin Islands,824
208,British Virgin Islands,809
209,Anguilla,791


In [5]:
"""
Got the rankings. Now the easy part: predictions.
I will manually install every group/playoffs match in every cell.

I will use the most trivial system:
-never update the rankings based on the worldcup results
-in playoffs:
    -always pick the higher ranked team to win
-in groups:
    -only pick a draw if the ranking difference between the teams is smaller than the specified treshold
    -otherwise, always pick the higher ranked team to win
"""

def pick_winner(Team_1, Team_2, ranking):
    """
    The function picks the winner between 2 input teams, does not allow draws. 
    It always picks the team rated higher in the fifa_ranking dataframe.
    It is assumed that no teams have the exact same rating.
    Example:
        >> pick_winner("Brazil", "San Marino")
        >> "Brazil"
    """
    rating_1 = ranking[ranking.Team == Team_1].Rating.iloc[0]
    rating_2 = ranking[ranking.Team == Team_2].Rating.iloc[0]
    
    if rating_1 > rating_2:
        return Team_1
    else:
        return Team_2
    
def pick_winner_or_draw(Team_1, Team_2, delta, ranking):
    """
    The function picks the winner between 2 input teams or determines a draw.
    If the variable delta is higher than the ranking difference between the 2 teams, a draw is predicted.
    Otherwise, it works like pick_winner().
    Example:
        >> pick_winner_or_draw("Brazil", "San Marino", 10000)
        >> "Draw"
        >> pick_winner_or_draw("Brazil", "San Marino", 1)
        >> "Brazil"     
    """
    rating_1 = ranking[ranking.Team == Team_1].Rating.iloc[0]
    rating_2 = ranking[ranking.Team == Team_2].Rating.iloc[0]
    rating_difference = rating_1 - rating_2
    
    if abs(rating_difference) < delta:
        return "Draw"
    elif rating_difference > 0:
        return Team_1
    else:
        return Team_2

In [6]:
"""
Setting the groups up:
The groups were set manually as Python sets with country names.
get_pairs() and get_teams_from_pair() create subsets of group member sets
and splice those subsets into individual teams in a match, respectively.
"""
import itertools
from itertools import combinations

def get_pairs(teams):
    """
    Creates a list of all subsets of set teams.
    Example:
    >> get_pairs(set(["Qatar", "Ecuador", "Senegal", "Netherlands"]))
    >> [{'Ecuador', 'Senegal'},
        {'Ecuador', 'Netherlands'},
        {'Ecuador', 'Qatar'},
        {'Netherlands', 'Senegal'},
        {'Qatar', 'Senegal'},
        {'Netherlands', 'Qatar'}]
    """
    return list(map(set, itertools.combinations(teams, 2)))

def get_teams_from_pair(pair):
    """
    Returns 2 items from a subset. Meant to be used only on subsets of 2 items.
    Example:
    >> get_teams_from_pair({'Ecuador', 'Senegal'})
    >> ('Ecuador', 'Senegal')
    """
    team_1 = pair.pop()
    team_2 = pair.pop()
    return team_1, team_2

In [7]:
"""
Setting up the teams in groups, manually.
"""
group_A_teams = set(["Qatar", "Ecuador", "Senegal", "Netherlands"])
group_B_teams = set(["England", "Iran", "USA", "Wales"])
group_C_teams = set(["Argentina", "Saudi Arabia", "Mexico", "Poland"])
group_D_teams = set(["France", "Denmark", "Australia", "Tunisia"])
group_E_teams = set(["Spain", "Costa Rica", "Germany", "Japan"])
group_F_teams = set(["Belgium", "Canada", "Morocco", "Croatia"])
group_G_teams = set(["Brazil", "Serbia", "Switzerland", "Cameroon"])
group_H_teams = set(["Portugal", "Ghana", "Uruguay", "Korea Republic"])

In [8]:
"""
Determining the delta.

Proper prediction of a draw can only be made if the delta variable for pick_winner_or_draw() function is specified.
Ideally, this variable cannot be too small, or pick_winner_or_draw() would behave like pick_winner().
It cannot be too high, as we want to keep draws to be very small minority of all matches.
Last worldcup, 9 out of 48 matches were drawn.
The approach chosen is:
    -determine the difference in ratings for all 48 matches played
    -pick 9 with the smallest difference in ratings, expecting to get 9 draws this year as well
    -set the delta equal to the largest ranking difference between teams that were decided to draw + 1
This deterministically ensures that the algorithm will pick 9 draws, between the closest matched teams.
This is recognized as not a good or proper but fits with the predictor's philosophy of employing simplest solution,
based strictly on FIFA rankings before the tournament.
"""
rating_differences = []
groups = [
    group_A_teams,
    group_B_teams,
    group_C_teams,
    group_D_teams,
    group_E_teams,
    group_F_teams,
    group_G_teams,
    group_H_teams,
]
ranking = fifa_ranking
for group in groups:
    for pair in get_pairs(group):
        Team_1, Team_2 = get_teams_from_pair(pair)
        rating_1 = ranking[ranking.Team == Team_1].Rating.iloc[0]
        rating_2 = ranking[ranking.Team == Team_2].Rating.iloc[0]
        rating_difference = abs(rating_1 - rating_2)
        rating_differences.append(rating_difference)
rating_differences.sort()
delta = rating_differences[:9][-1] + 1
print(f"Delta is {delta}")

Delta is 73


In [9]:
"""
Building the predicting model for the groups

The model used is extremely simplified:
since the predicting function considers only wins, losses, and draws (no score) then predicting which teams advance
from groups (which in the world cup based on total goals scored, and not just wins/losses) must be simplified.
In the model used, every victory awards the team 1 point. Every loss and draw award 0 points.
The group standings are ordered by points total.
This also means that there might occur certain situations where 2 or more teams are tied in the group,
but only 1 of them can advance. This is ignored by the predictor, it will forcefuly select 1 of them. 
"""
class WorldCupGroup:
    """
    WorldCupGroup is a class that's initiated by a list of teams in the group.
    Methods:
        -predict - predicts the group results, using an input ranking
    Attributes:
        -team_scores - returns all teams in the group and their predicted score as a dictionary
        -match_scores - returns the predicted results of all matches in the group (winning team or draw)
         as a list of dictionaries
    """
    def __init__(self, teams):
        self.teams = teams
        self.pairs = get_pairs(self.teams)
        self.team_scores = dict(zip(list(self.teams), [0, 0, 0, 0]))
        self.match_scores = []
        for pair in self.pairs:
            Team_1, Team_2 = get_teams_from_pair(pair)
            match_score = {
                "Team 1": Team_1,
                "Team 2": Team_2,
                "Result": "Nothing-yet",
            }
            self.match_scores.append(match_score)
            
    def predict(self, ranking):
        for match_score in self.match_scores:
            match_result = pick_winner_or_draw(match_score["Team 1"], match_score["Team 2"], delta, ranking)
            match_score["Result"] = match_result
            for k, v in self.team_scores.items():
                if match_result == k:
                    self.team_scores[k] += 1
        
        x = sorted(((v,k) for k,v in self.team_scores.items()))
        self.advanced = [x[-1][1], x[-2][1]]
        self.winner = self.advanced[0]
        self.runnerup = self.advanced[1]
        

# Group A predictions
## Each line is a separate match, presented as a dictionary of 2 teams and the result.

In [10]:
groupA = WorldCupGroup(group_A_teams)
groupA.predict(fifa_ranking)
groupA.match_scores

[{'Team 1': 'Netherlands', 'Team 2': 'Qatar', 'Result': 'Netherlands'},
 {'Team 1': 'Netherlands', 'Team 2': 'Ecuador', 'Result': 'Netherlands'},
 {'Team 1': 'Netherlands', 'Team 2': 'Senegal', 'Result': 'Netherlands'},
 {'Team 1': 'Qatar', 'Team 2': 'Ecuador', 'Result': 'Draw'},
 {'Team 1': 'Qatar', 'Team 2': 'Senegal', 'Result': 'Senegal'},
 {'Team 1': 'Ecuador', 'Team 2': 'Senegal', 'Result': 'Senegal'}]

# Group B predictions
## Each line is a separate match, presented as a dictionary of 2 teams and the result.

In [11]:
groupB = WorldCupGroup(group_B_teams)
groupB.predict(fifa_ranking)
groupB.match_scores

[{'Team 1': 'USA', 'Team 2': 'Iran', 'Result': 'Draw'},
 {'Team 1': 'USA', 'Team 2': 'England', 'Result': 'England'},
 {'Team 1': 'USA', 'Team 2': 'Wales', 'Result': 'Draw'},
 {'Team 1': 'Iran', 'Team 2': 'England', 'Result': 'England'},
 {'Team 1': 'Iran', 'Team 2': 'Wales', 'Result': 'Draw'},
 {'Team 1': 'England', 'Team 2': 'Wales', 'Result': 'England'}]

# Group C predictions
## Each line is a separate match, presented as a dictionary of 2 teams and the result.

In [12]:
groupC = WorldCupGroup(group_C_teams)
groupC.predict(fifa_ranking)
groupC.match_scores

[{'Team 1': 'Saudi Arabia', 'Team 2': 'Mexico', 'Result': 'Mexico'},
 {'Team 1': 'Saudi Arabia', 'Team 2': 'Poland', 'Result': 'Poland'},
 {'Team 1': 'Saudi Arabia', 'Team 2': 'Argentina', 'Result': 'Argentina'},
 {'Team 1': 'Mexico', 'Team 2': 'Poland', 'Result': 'Mexico'},
 {'Team 1': 'Mexico', 'Team 2': 'Argentina', 'Result': 'Argentina'},
 {'Team 1': 'Poland', 'Team 2': 'Argentina', 'Result': 'Argentina'}]

# Group D predictions
## Each line is a separate match, presented as a dictionary of 2 teams and the result.

In [13]:
groupD = WorldCupGroup(group_D_teams)
groupD.predict(fifa_ranking)
groupD.match_scores

[{'Team 1': 'Denmark', 'Team 2': 'Australia', 'Result': 'Denmark'},
 {'Team 1': 'Denmark', 'Team 2': 'France', 'Result': 'France'},
 {'Team 1': 'Denmark', 'Team 2': 'Tunisia', 'Result': 'Denmark'},
 {'Team 1': 'Australia', 'Team 2': 'France', 'Result': 'France'},
 {'Team 1': 'Australia', 'Team 2': 'Tunisia', 'Result': 'Draw'},
 {'Team 1': 'France', 'Team 2': 'Tunisia', 'Result': 'France'}]

# Group E predictions
## Each line is a separate match, presented as a dictionary of 2 teams and the result.

In [14]:
groupE = WorldCupGroup(group_E_teams)
groupE.predict(fifa_ranking)
groupE.match_scores

[{'Team 1': 'Spain', 'Team 2': 'Germany', 'Result': 'Draw'},
 {'Team 1': 'Costa Rica', 'Team 2': 'Germany', 'Result': 'Germany'},
 {'Team 1': 'Germany', 'Team 2': 'Japan', 'Result': 'Germany'},
 {'Team 1': 'Spain', 'Team 2': 'Costa Rica', 'Result': 'Spain'},
 {'Team 1': 'Spain', 'Team 2': 'Japan', 'Result': 'Spain'},
 {'Team 1': 'Costa Rica', 'Team 2': 'Japan', 'Result': 'Draw'}]

# Group F predictions
## Each line is a separate match, presented as a dictionary of 2 teams and the result.

In [15]:
groupF = WorldCupGroup(group_F_teams)
groupF.predict(fifa_ranking)
groupF.match_scores

[{'Team 1': 'Canada', 'Team 2': 'Croatia', 'Result': 'Croatia'},
 {'Team 1': 'Croatia', 'Team 2': 'Belgium', 'Result': 'Belgium'},
 {'Team 1': 'Croatia', 'Team 2': 'Morocco', 'Result': 'Croatia'},
 {'Team 1': 'Canada', 'Team 2': 'Belgium', 'Result': 'Belgium'},
 {'Team 1': 'Canada', 'Team 2': 'Morocco', 'Result': 'Morocco'},
 {'Team 1': 'Belgium', 'Team 2': 'Morocco', 'Result': 'Belgium'}]

# Group G predictions
## Each line is a separate match, presented as a dictionary of 2 teams and the result.

In [16]:
groupG = WorldCupGroup(group_G_teams)
groupG.predict(fifa_ranking)
groupG.match_scores

[{'Team 1': 'Serbia', 'Team 2': 'Switzerland', 'Result': 'Draw'},
 {'Team 1': 'Serbia', 'Team 2': 'Brazil', 'Result': 'Brazil'},
 {'Team 1': 'Serbia', 'Team 2': 'Cameroon', 'Result': 'Serbia'},
 {'Team 1': 'Switzerland', 'Team 2': 'Brazil', 'Result': 'Brazil'},
 {'Team 1': 'Cameroon', 'Team 2': 'Switzerland', 'Result': 'Switzerland'},
 {'Team 1': 'Cameroon', 'Team 2': 'Brazil', 'Result': 'Brazil'}]

# Group H predictions
## Each line is a separate match, presented as a dictionary of 2 teams and the result.

In [17]:
groupH = WorldCupGroup(group_H_teams)
groupH.predict(fifa_ranking)
groupH.match_scores

[{'Team 1': 'Korea Republic', 'Team 2': 'Uruguay', 'Result': 'Uruguay'},
 {'Team 1': 'Korea Republic', 'Team 2': 'Portugal', 'Result': 'Portugal'},
 {'Team 1': 'Korea Republic', 'Team 2': 'Ghana', 'Result': 'Korea Republic'},
 {'Team 1': 'Portugal', 'Team 2': 'Uruguay', 'Result': 'Draw'},
 {'Team 1': 'Ghana', 'Team 2': 'Uruguay', 'Result': 'Uruguay'},
 {'Team 1': 'Portugal', 'Team 2': 'Ghana', 'Result': 'Portugal'}]

# Group advances predictions
## Each line gives teams advancing from each group, in the 1st and 2nd place respectively.

In [18]:
for group in list(zip([groupA, groupB, groupC, groupD, groupE, groupF, groupG, groupH], ["A", "B", "C", "D", "F", "G", "H"])):
    print(f"Predicted to advance from Group {group[1]}: 1.{group[0].winner}, 2.{group[0].runnerup}")

Predicted to advance from Group A: 1.Netherlands, 2.Senegal
Predicted to advance from Group B: 1.England, 2.Wales
Predicted to advance from Group C: 1.Argentina, 2.Mexico
Predicted to advance from Group D: 1.France, 2.Denmark
Predicted to advance from Group F: 1.Spain, 2.Germany
Predicted to advance from Group G: 1.Belgium, 2.Croatia
Predicted to advance from Group H: 1.Brazil, 2.Switzerland


# Bracket results predictions
## Each block of text gives the prediction for the teams advancing in each round of the bracket

In [19]:
"""
Proceeding to the bracket:
Every cell will have matches of teams that advanced from the previous stage, using the world cup brackets as reference
"""

#ROUND OF 16
ro16_match_1_winner = pick_winner(groupA.winner, groupB.runnerup, fifa_ranking)
ro16_match_2_winner = pick_winner(groupC.winner, groupD.runnerup, fifa_ranking)
ro16_match_3_winner = pick_winner(groupE.winner, groupF.runnerup, fifa_ranking)
ro16_match_4_winner = pick_winner(groupG.winner, groupH.runnerup, fifa_ranking)
ro16_match_5_winner = pick_winner(groupB.winner, groupA.runnerup, fifa_ranking)
ro16_match_6_winner = pick_winner(groupD.winner, groupC.runnerup, fifa_ranking)
ro16_match_7_winner = pick_winner(groupF.winner, groupE.runnerup, fifa_ranking)
ro16_match_8_winner = pick_winner(groupH.winner, groupG.runnerup, fifa_ranking)

print("Ro16 winners:")
for i in range(8):
    print(globals()[f"ro16_match_{i+1}_winner"])
    
print("\n")    
#ROUND OF 8
ro8_match_1_winner = pick_winner(ro16_match_1_winner, ro16_match_2_winner, fifa_ranking)
ro8_match_2_winner = pick_winner(ro16_match_3_winner, ro16_match_4_winner, fifa_ranking)
ro8_match_3_winner = pick_winner(ro16_match_5_winner, ro16_match_6_winner, fifa_ranking)
ro8_match_4_winner = pick_winner(ro16_match_7_winner, ro16_match_8_winner, fifa_ranking)

print("Ro8 winners:")
for i in range(4):
    print(globals()[f"ro8_match_{i+1}_winner"])

print("\n") 
#ROUND OF 4
ro4_match_1_winner = pick_winner(ro8_match_1_winner, ro8_match_2_winner, fifa_ranking)
ro4_match_2_winner = pick_winner(ro8_match_3_winner, ro8_match_4_winner, fifa_ranking)

print("Ro4 winners:")
for i in range(2):
    print(globals()[f"ro4_match_{i+1}_winner"])

print("\n") 
#ROUND OF 2
overall_winner = pick_winner(ro8_match_1_winner, ro8_match_2_winner, fifa_ranking)
print("Worldcup winner:")
print(overall_winner)


Ro16 winners:
Netherlands
Argentina
Spain
Brazil
England
France
Belgium
Uruguay


Ro8 winners:
Argentina
Brazil
France
Belgium


Ro4 winners:
Brazil
Belgium


Worldcup winner:
Brazil


# Manual bracket results predictions

## The model of course did not correctly predict all group advancements, hence the input from now on will be manual for each playoffs round

# Round of 16

In [20]:
#ROUND OF 16
ro16_match_1_winner = pick_winner("Netherlands", "USA", fifa_ranking)
ro16_match_2_winner = pick_winner("Argentina", "Australia", fifa_ranking)
ro16_match_3_winner = pick_winner("Japan", "Croatia", fifa_ranking)
ro16_match_4_winner = pick_winner("Brazil", "Korea Republic", fifa_ranking)
ro16_match_5_winner = pick_winner("England", "Senegal", fifa_ranking)
ro16_match_6_winner = pick_winner("France", "Poland", fifa_ranking)
ro16_match_7_winner = pick_winner("Morocco", "Spain", fifa_ranking)
ro16_match_8_winner = pick_winner("Portugal", "Switzerland", fifa_ranking)

print("Ro16 winners:")
for i in range(8):
    print(globals()[f"ro16_match_{i+1}_winner"])

Ro16 winners:
Netherlands
Argentina
Croatia
Brazil
England
France
Spain
Portugal


# Round of 8

In [23]:
#ROUND OF 8
ro8_match_1_winner = pick_winner("Netherlands", "Argentina", fifa_ranking)
ro8_match_2_winner = pick_winner("Croatia", "Brazil", fifa_ranking)
ro8_match_3_winner = pick_winner("England", "France", fifa_ranking)
ro8_match_4_winner = pick_winner("Morocco", "Portugal", fifa_ranking)

print("Ro8 winners:")
for i in range(4):
    print(globals()[f"ro8_match_{i+1}_winner"])

Ro8 winners:
Argentina
Brazil
France
Portugal
