# 2018 World Cup draw procedure and probability estimations

## Draw procedure

The draw for the 2018 FIFA World Cup will take place on 1 December 2017 at the State Kremlin Palace in Moscow, Russia. It will determine the group in which each of the 32 qualified national teams will play in at the start of the tournament. **The teams will be divided into four pots of eight, with one team selected from each pot to form a group**.

Unlike previous editions of the World Cup, all pots will be determined by each national team's [October 2017 FIFA World Ranking](http://www.fifa.com/fifa-world-ranking/ranking-table/men/index.html), with Pot 1 containing the highest-ranked teams, Pot 2 containing the next highest-ranked teams, and so on; in previous editions only one pot containing the highest-ranked teams was determined by rank, with the other three pots determined by continental confederation. **The hosts will continue to be placed in Pot 1 and treated as a seeded team**, therefore, Pot 1 will consist of hosts Russia and the seven highest-ranked teams that qualify for the tournament.

As with previous editions, **no group may have more than one team from any continental confederation with the exception of UEFA, which may have no more than two in a group. Eight groups of four teams will be labelled A to H: the four pots will be emptied completely by allocating one of their eight teams to each of the eight groups**.

The table below shows the composition of each pot and the continental confederation to which each national team belongs:

|      **POT 1**       |      **POT 2**      |        **POT 3**      |     **POT 4**      | 
| :------------------: | :-----------------: | :-------------------: | :----------------: |
| Russia (Hosts, UEFA) | Spain (UEFA)        | Denmark (UEFA)        | Serbia (UEFA)      |
| Germany (UEFA)       | Peru (CONMEBOL)     | Iceland (UEFA)        | Nigeria (CAF)      |
| Brazil (CONMEBOL)    | Switzerland (UEFA)  | Costa Rica (CONCACAF) | Australia (AFC)    |
| Portugal (UEFA)      | England (UEFA)      | Sweden (UEFA)         | Japan (AFC)        |
| Argentina (CONMEBOL) | Colombia (CONMEBOL) | Tunisia (CAF)         | Morocco (CAF)      |
| Belgium (UEFA)       | Mexico (CONCACAF)   | Egypt (CAF)           | Panama (CONCACAF)  |
| Poland (UEFA)        | Uruguay (CONMEBOL)  | Senegal (CAF)         | South Korea (AFC)  |
| France (UEFA)        | Croatia (UEFA)      | Iran (AFC)            | Saudi Arabia (AFC) |

The four pots will be emptied by drawing the eight teams they each contain one by one and placing them in the eight groups of four teams (Groups A to H). Hosts Russia will occupy the top position in Group A, while the seven other seeds will occupy the top spots in Groups B to H. The positions of all the other teams (from pots 2, 3 and 4) will be decided when they are drawn.   

As is customary at Final Draws, a ball will be drawn from the team pots and then another from the group pots to determine the position (not relevant for probability calculation) in which the team in question will play.

### Sources

* [2018 FIFA World Cup seeding](https://en.wikipedia.org/wiki/2018_FIFA_World_Cup_seeding)
* [FIFA continental confederations](http://www.fifa.com/associations/index.html)
* [Group formation](http://www.fifa.com/about-fifa/news/y=2017/m=9/news=oc-for-fifa-competitions-approves-procedures-for-the-final-draw-of-the-2907924.html)
* [Draw procedure](http://www.fifa.com/worldcup/news/y=2017/m=11/news=the-final-draw-how-it-works-2921565.html)
* [Draw procedure video](https://www.youtube.com/watch?v=jDkn83FwioA)

# Simulator

The draw is not completely random. Some constraints must be fulfilled: _no group may have more than one team from any continental confederation with the exception of UEFA, which may have no more than two in a group_. Therefore, an exact calculation of the probabilities for each pair of teams belonging to the same group cannot be done. Nevertheless, these probabilities can be estimated using a simulator. The idea is very simple: to run simulated draws satisfying the required constraints. The probability of two teams belonging to the same group is then the count of the simulated draws pairing both teams in a group divided by the number of simulated draws. The simulator code implemented in Python can be found below.

## Imports, constants and functions

In [1]:
import numpy as np
from IPython.display import display_html

In [2]:
TEAMS = ["Russia", "Germany", "Brazil", "Portugal", "Argentina", "Belgium", "Poland", "France", 
         "Spain", "Peru", "Switzerland", "England", "Colombia", "Mexico", "Uruguay", "Croatia", 
         "Denmark", "Iceland", "Costa Rica", "Sweden", "Tunisia", "Egypt", "Senegal", "Iran", 
         "Serbia", "Nigeria", "Australia", "Japan", "Morocco", "Panama", "South Korea", "Saudi Arabia"]

CONFEDERATIONS = ["UEFA", "CONMEBOL", "CONCACAF", "CAF", "AFC"]

TEAM_POTS = [1, 1, 1, 1, 1, 1, 1, 1, 
             2, 2, 2, 2, 2, 2, 2, 2, 
             3, 3, 3, 3, 3, 3, 3, 3, 
             4, 4, 4, 4, 4, 4, 4, 4]

TEAM_CONFEDERATIONS = [0, 0, 1, 0, 1, 0, 0, 0, 
                       0, 1, 0, 0, 1, 2, 1, 0, 
                       0, 0, 2, 0, 3, 3, 3, 4, 
                       0, 3, 4, 4, 3, 2, 4, 4]

CONSTRAINT_CONFEDERATIONS = [2, 1, 1, 1, 1]

GROUPS = 8

In [3]:
def print_html(string):
    display_html(string, raw=True)

def team(country_name):
    return TEAMS.index(country_name)

def confederation(country_name):
    return TEAM_CONFEDERATIONS[team(country_name)]

def confederation_name(country_name):
    return CONFEDERATIONS[country(country_name)]

def check_confederation_multiplicity(team, confederation_multiplicity):
    confederation = TEAM_CONFEDERATIONS[team]
    return (confederation_multiplicity.count(confederation) < CONSTRAINT_CONFEDERATIONS[confederation])

def check_group(team, group_composition):
    filtered_groups = filter(lambda x: x > -1, group_composition)
    confederation_multiplicity = map(lambda x: TEAM_CONFEDERATIONS[x], filtered_groups)
    return check_confederation_multiplicity(team, confederation_multiplicity)

def filter_teams(teams, confederation_multiplicity):
    return filter(lambda team: check_confederation_multiplicity(team, confederation_multiplicity), teams)

## Draw simulator

In [4]:
simulations = 1000000  # number of simulated draws
draws = np.full((simulations, GROUPS, 4), -1)  # array storing the results of each simulation

simulation = 0
while simulation < simulations:
    # Teams are coded as integers from 0 to 31
    TEAM_POTS = [range(1,8), range(8, 16), range(16, 24), range(24, 32)]  # Pots composition

    #print("Simulation: %d" % simulation)
    failed = False  # if a simulated doesn't fulfill the constraints, restart the draw procedure

    draws[simulation, 0, 0] = 0  # Russia to the first position in Group A

    # Draw teams from pot 1, then teams from pot 2 and so on
    for pot, teams in enumerate(TEAM_POTS):
        #print("\tPot %d" % (pot+1))
        GROUP_POT = range(0 if pot  > 0 else 1, GROUPS)  # Groups available

        while len(teams) > 0:
            # Draw a Team
            chosen = np.random.choice(teams)
            #print("\t\tTeam %s" % TEAMS[chosen])

            # Remove from the group list those that don't satisfy the constraints
            groups = filter(lambda x: check_group(chosen, draws[simulation, x, :]), GROUP_POT)
            if len(groups) < 1:  # If no group is available, restart the draw procedure
                failed = True
                break
            # Take the first group available
            group = groups.pop(0)

            GROUP_POT.remove(group)  # remove the group from the list of available groups
            teams.remove(chosen)  # remove the team from the list of teams to draw
            draws[simulation, group, pot] = chosen  # store the draw result
        if failed:
            draws[simulation, :, :] = -1  # cancel the current simulated draw and restart it
            break
    if not failed:
        simulation += 1  # correct simulated draw, process the next simulation

## Probability estimation

In [5]:
total_events = float(simulations)  # total number of events
estimations = np.full((len(TEAMS), len(TEAMS)), 0,  dtype=np.float32)  # store the probability estimations

# For each pair of teams, calculate the probability of belonging to the same group
for team in range(32):
    pot = int(team / 8)  # Pot to which the current team belongs
    rivals = np.array([draws[i,np.where(draws[i,:,:] == team)[0][0],:] for i in range(simulations)])
    estimations[team, team] = 1  # each team has a prob=1 of belonging to its own group
    for other_pot in filter(lambda x: x <> pot, range(4)):  
        probabilities = [float(len(filter(lambda y: y == x, rivals[:, other_pot])))/total_events \
                         for x in range(8 * other_pot, 8 * other_pot + 8)]
        for x in range(8):
            estimations[team, x + 8 * other_pot] = probabilities[x]

## Results after 1,000,000 simulations

After simulating 1,000,000 draws, we can accurately estimate the likelihood of two teams belonging to the same group. 

**How to interpret the numbers in the table below?** The number in each cell represents the estimated probability that its row team and its column team belong to the same group. Probabilities are values from 0 (impossibility) to 1 (certainty). In order to be interpreted as a percentage, the probability value should be multiplied by 100.

Obviously, the diagonal just contains 1.0, because each team has a probability equals to 1 (100%) of belonging to its own group. Likewise, the probability for two teams in the same pot playing in the same group is 0 (0%). 

Spain, for instance, has six European prospect rivals and two South American ones in the first pot. In a fully random draw, all the eight prospect rivals would have the same probability (0.125 or 12.5%) of facing Spain in the group phase of the tournament. However, the World Cup draw constraints assign to a feature Spain vs South American a probability of 0.41 (41%) when in a fully random draw this likelihood would be 0.25 (25%).

Some surprising (or not, Iran is the only AFC team in the Pot 3 and Serbia the only UEFA team in the Pot 4) probabilities:

- Iran and Nigeria belonging to the same group: 29%.
- Iran and Morocco belonging to the same group: 29%.
- Mexico and Serbia belonging to the same group: 27%.
- Serbia joining Denmark, Iceland or Sweden: 9.6%.

In [6]:
# HTML output
html = "<table>"
html += "<tr><td>&nbsp;</td><td><b>%s</b></td></tr>" % ("</b></td><td><b>".join(TEAMS))
for team in range(len(TEAMS)):
    html += "<tr><td><b>%s</b></td><td>%s</td></tr>" % (TEAMS[team], 
                                                        "</td><td>".join([str(round(x, 3)) \
                                                                          for x in estimations[team,:]]))
html += "</table>"
print_html(html)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
,Russia,Germany,Brazil,Portugal,Argentina,Belgium,Poland,France,Spain,Peru,Switzerland,England,Colombia,Mexico,Uruguay,Croatia,Denmark,Iceland,Costa Rica,Sweden,Tunisia,Egypt,Senegal,Iran,Serbia,Nigeria,Australia,Japan,Morocco,Panama,South Korea,Saudi Arabia
Russia,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.129,0.121,0.13,0.13,0.121,0.118,0.121,0.13,0.068,0.067,0.143,0.068,0.163,0.162,0.163,0.166,0.061,0.108,0.145,0.144,0.107,0.145,0.145,0.145
Germany,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.093,0.176,0.092,0.092,0.175,0.103,0.176,0.093,0.11,0.11,0.138,0.11,0.133,0.133,0.133,0.132,0.103,0.127,0.128,0.129,0.127,0.13,0.128,0.129
Brazil,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.204,0.0,0.204,0.204,0.0,0.184,0.0,0.204,0.191,0.191,0.083,0.191,0.086,0.086,0.086,0.086,0.213,0.128,0.106,0.107,0.129,0.103,0.106,0.107
Portugal,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.092,0.176,0.092,0.092,0.176,0.102,0.176,0.092,0.11,0.11,0.138,0.11,0.133,0.133,0.133,0.132,0.103,0.127,0.129,0.128,0.127,0.13,0.128,0.128
Argentina,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.204,0.0,0.204,0.204,0.0,0.184,0.0,0.204,0.191,0.19,0.083,0.191,0.086,0.086,0.086,0.087,0.212,0.129,0.107,0.107,0.129,0.103,0.107,0.106
Belgium,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.092,0.175,0.092,0.093,0.176,0.103,0.176,0.093,0.11,0.111,0.138,0.11,0.133,0.133,0.132,0.132,0.102,0.127,0.128,0.128,0.127,0.13,0.128,0.128
Poland,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.092,0.176,0.092,0.093,0.175,0.103,0.176,0.093,0.11,0.11,0.138,0.11,0.133,0.134,0.133,0.132,0.102,0.127,0.128,0.128,0.127,0.13,0.129,0.128
France,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.093,0.175,0.092,0.092,0.176,0.103,0.176,0.092,0.11,0.11,0.138,0.11,0.133,0.133,0.133,0.132,0.103,0.127,0.128,0.128,0.127,0.13,0.128,0.128
Spain,0.129,0.093,0.204,0.092,0.204,0.092,0.092,0.093,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074,0.074,0.169,0.074,0.153,0.152,0.152,0.152,0.066,0.123,0.134,0.134,0.123,0.155,0.133,0.134


## Most likely rivals for each team and from each pot

The table below shows the most likely rival per pot for each team. In the Brazil case, the most likely rival from pot 2 is Spain (with the same probability for England, Switzerland and Croatia), from pot 3 is Sweden (but with the same odds for Denmark and Iceland) and from pot 4 is Serbia. This doesn't represent a valid group composition, because the World Cup constraints are not fulfilled. Just two UEFA teams maximum are allowed in the same group.

In [7]:
# This can lead to inconsistent group compositions
TEAM_POTS = [range(8), range(8, 16), range(16, 24), range(24, 32)]
html = "<table><tr><td><b>TEAM</b></td><td colspan='3'><b>RIVALS</b></td></tr>"
for team in range(32):
    rivals = []
    for pot_idx, group_range in enumerate(TEAM_POTS):
        relative_idx = np.argmax(estimations[team, group_range])
        rivals.append(TEAMS[relative_idx + pot_idx * 8])
    rivals.remove(TEAMS[team])
    html += "<tr><td><b>%s</b></td><td>%s</td><td>%s</td><td>%s</td></tr>" % (TEAMS[team], 
                                                                              rivals[0], 
                                                                              rivals[1], 
                                                                              rivals[2])
html += "</table>"
print_html(html)

0,1,2,3
TEAM,RIVALS,RIVALS,RIVALS
Russia,Croatia,Iran,Saudi Arabia
Germany,Peru,Costa Rica,Panama
Brazil,Spain,Sweden,Serbia
Portugal,Peru,Costa Rica,Panama
Argentina,Switzerland,Denmark,Serbia
Belgium,Colombia,Costa Rica,Panama
Poland,Peru,Costa Rica,Panama
France,Colombia,Costa Rica,Panama
Spain,Brazil,Costa Rica,Panama
