# Europa League group stage draw for 2018-2019 season

The goal of this notebook is to estimate the probabilities of each pair of clubs to be in the same group for the 2018-2019 Europa League group stage. The actual set of fixtures will be the result of the draw on Friday 31 August 2018 in Monaco.

The probabilities are hard to calculate exactly because the draw has to accomplish some UEFA constraints. Therefore, we will estimate an approximation for each probability by means of Monte Carlo simulation method. We will assume that the same constraints for the 2017-2018 season are valid in this draw.

# Regulations

According to the [2018/2019 UEFA Europa League regulations](https://www.uefa.com/MultimediaFiles/Download/Regulations/uefaorg/Regulations/02/55/82/82/2558282_DOWNLOAD.pdf):

* [13.08] For the group stage draw, the 48 clubs are seeded into four groups of 12 in accordance with the club coefficient rankings established at the beginning of the season and with the principles set by the Club Competitions Committee.

* [15.01] Once the play-offs have been completed, the 48 remaining clubs are drawn into __12 groups of four__ in accordance with Paragraph 13.08. __Clubs from the same association cannot be drawn into the same group__.


The following instructions about the technical procedure have been extracted from the video of the [UEFA Europa League 2017/18 group stage draw](https://www.youtube.com/watch?v=mt9I0sPORWM):
* The 48 teams have been allocated to 4 pots in accordance with the seeding principles based on the UEFA Club coefficients.
* Pot 1 will comprise the 12 top clubs in the club coefficient rankings, Pot 2 the following 12 clubs in the rankings, and so on for pots 3 and 4.
* __For TV coverage reasons clubs from the same Association are paired in order to split their kick-off times that is one early and one late__.
* For this reason the 12 groups will be distinguished by colors: groups A to F will be red and groups G to L will be blue.
* __When a paired club is drawn, for example into a red group A to F [Arsenal in Pot 1], the other paired club once it is been drawn will automatically be assigned to one of the 6 blue groups G to L [Everton in Pot 2]__.
* Please note that based on a decision taken by the UEFA Executive Committee __teams from Russia and Ukraine shall not be drawn in the same group__.
* A ball is drawn at random from Pot 1; the team drawn is placed in the first available group in alphabetical order from A to L as indicated by the computer. For example, if the team drawn has all 12 options from A to L available, it will be automatically allocated to Group A; if the team drawn has four options available from G to J, for example, it will be automatically alloated to Group G and so on.
* It must be noted that the number of options available to a team depends not only on the team's own attributes, for example, winter venue and those are the teams already drawn but also on the attributes of the other teams still to be drawn. This is due to the computer calculations needed to anticipate all possible scenarios and prevent any deadlock situation.
* This procedure will allocate all teams to the various groups. Then it will be repeated for the teams in Pots 2, 3, and 4, in that order.
* At the end of the draw a computer will assign the final positions of the teams within their group. These positions will determine the order of the home and away matches.

<table>
    <tr>
        <th colspan=4><a target="_blank" rel="noopener noreferrer" href="https://www.uefa.com/uefaeuropaleague/news/newsid=2568277.html?iv=true">2018-2019 Season</a></th>
    </tr>
    <tr>
        <th>Pot 1</th>
        <th>Pot 2</th>
        <th>Pot 3</th>
        <th>Pot 4</th>
    </tr>
    <tr>
        <td>Sevilla (ESP)</td>
        <td>Sporting CP (POR)</td>
        <td>Real Betis (ESP)</td>
        <td>Apollon (CYP)</td>
    </tr>
    <tr>
        <td>Arsenal (ENG)</td>
        <td>Ludogorets (BUL)</td>
        <td>Qarabağ (AZE)</td>
        <td>Rosenborg (NOR)</td>
    </tr>
    <tr>
        <td>Chelsea (ENG)</td>
        <td>København (DEN)</td>
        <td>BATE Borisov (BLR)</td>
        <td>Vorskla Poltava (UKR)</td>
    </tr>
    <tr>
        <td>Zenit (RUS)</td>
        <td>Marseille (FRA)</td>
        <td>Dinamo Zagreb (CRO)</td>
        <td>Slavia Praha (CZE)</td>
    </tr>
    <tr>
        <td>Bayer Leverkusen (GER)</td>
        <td>Celtic (SCO)</td>
        <td>RB Leipzig (GER)</td>
        <td>Akhisar Belediyespor (TUR)</td>
    </tr>
    <tr>
        <td>Dynamo Kyiv (UKR)</td>
        <td>PAOK (GRE)</td>
        <td>Eintracht Frankfurt (GER)</td>
        <td>Jablonec (CZE)</td>
    </tr>
    <tr>
        <td>Beşiktaş (TUR)</td>
        <td>AC Milan (ITA)</td>
        <td>Malmö (SWE)</td>
        <td>AEK Larnaca (CYP)</td>
    </tr>
    <tr>
        <td>Salzburg (AUT)</td>
        <td>Genk (BEL)</td>
        <td>Spartak Moskva (RUS)</td>
        <td>Vidi (HUN)</td>
    </tr>
    <tr>
        <td>Olympiacos (GRE)</td>
        <td>Fenerbahçe (TUR)</td>
        <td>Standard Liège (BEL)</td>
        <td>Rangers (SCO)</td>
    </tr>
    <tr>
        <td>Villarreal (ESP)</td>
        <td>Krasnodar (RUS)</td>
        <td>Zürich (SUI)</td>
        <td>Dudelange (LUX)</td>
    </tr>
    <tr>
        <td>Anderlecht (BEL)</td>
        <td>Astana (KAZ)</td>
        <td>Bordeaux (FRA)</td>
        <td>Spartak Trnava (SVK)</td>
    </tr>
    <tr>
        <td>Lazio (ITA)</td>
        <td>Rapid Wien (AUT)</td>
        <td>Rennes (FRA)</td>
        <td>Sarpsborg (NOR)</td>
    </tr>
</table>

<!--
* Season 2018-2019
<table>
    <tr>
        <th colspan=4>2018-2019 Season</th>
    </tr>
    <tr>
        <th>Pot 1</th>
        <th>Pot 2</th>
        <th>Pot 3</th>
        <th>Pot 4</th>
    </tr>
    <tr>
        <td>Sevilla (ESP)</td>
        <td>Sporting CP (POR)</td>
        <td>Fenerbahçe (TUR)</td>
        <td>Standard Liège (BEL)</td>
    </tr>
    <tr>
        <td>Arsenal (ENG)</td>
        <td>Ludogorets (BUL)</td>
        <td>Krasnodar (RUS)</td>
        <td>Zurich (SUI)</td>
    </tr>
    <tr>
        <td>Chelsea (ENG)</td>
        <td>Kobenhavn (DEN)</td>
        <td>Maribor (SLO)</td>
        <td>Midtjylland (DEN)</td>
    </tr>
    <tr>
        <td>Zenith (RUS)</td>
        <td>Olympique Marseille (FRA)</td>
        <td>Maccabi Tel-Aviv (ISR)</td>
        <td>Stade Rennais (FRA)</td>
    </tr>
    <tr>
        <td>Basel (SUI)</td>
        <td>Sporting Braga (POR)</td>
        <td>Feyenoord (NED)</td>
        <td>Red Star Belgrade (SRB)</td>
    </tr>
    <tr>
        <td>Bayer Leverkusen (GER)</td>
        <td>PAOK (GRE)</td>
        <td>Real Betis (ESP)</td>
        <td>AEK (GRE)</td>
    </tr>
    <tr>
        <td>Besiktas (TUR)</td>
        <td>Milan (ITA)</td>
        <td>Garabag (AZB)</td>
        <td>Rosenborg (NOR)</td>
    </tr>
    <tr>
        <td>Olympiakos (GRE)</td>
        <td>Steaua (ROM)</td>
        <td>Young Boys (SUI)</td>
        <td>Vorskla Poltava (UKR)</td>
    </tr>
    <tr>
        <td>Ajax (NED)</td>
        <td>APOEL (CYP)</td>
        <td>BATE (BLS)</td>
        <td>HJK Helsinki (FIN)</td>
    </tr>
    <tr>
        <td>Villarreal (ESP)</td>
        <td>Gent (BEL)</td>
        <td>Eintracht Frankfurt (GER)</td>
        <td>Slavia Praha (CZE)</td>
    </tr>
    <tr>
        <td>Anderlecht (BEL)</td>
        <td>Racing Genk (BEL)</td>
        <td>Malmoe (SWE)</td>
        <td>Akhisarspor (TUR)</td>
    </tr>
    <tr>
        <td>Lazio (ITA)</td>
        <td>Legia Warsaw (POL)</td>
        <td>Spartak Moscow (RUS)</td>
        <td>Jablonec (CZE)</td>
    </tr>
</table>
-->

# Simulators

A draw simulator has been implemented in order to estimate the probabilities for each pair of clubs to be in the same group. The estimation is based on the [Monte Carlo method](https://en.wikipedia.org/wiki/Monte_Carlo_method).

It is assumed that fixture kick-offs will be at 19h for groups A to F and at 21h for groups G to L.

## Imports and constants

In [1]:
import numpy as np
import networkx as nx
import collections
import time

In [2]:
# Clubs ordered by association and UEFA ranking
CLUBS = ["Sevilla", "Arsenal", "Chelsea", "Zenit", "Bayer Leverkusen", "Dynamo Kyiv", 
         "Beşiktaş", "Salzburg", "Olympiacos", "Villarreal", "Anderlecht", "Lazio", 
         "Sporting CP", "Ludogorets", "København", "Marseille", "Celtic", "PAOK", 
         "AC Milan", "Genk", "Fenerbahçe", "Krasnodar", "Astana", "Rapid Wien", 
         "Real Betis", "Qarabağ", "BATE Borisov", "Dínamo Zagreb", "RB Leipzig", "Eintracht Frankfurt", 
         "Malmö", "Spartak Moskva", "Standard Liège", "Zürich", "Bordeaux", "Rennes", 
         "Apollon", "Rosenborg", "Vorskla Poltava", "Slavia Praha", "Akhisar Belediyespor", "Jablonec", 
         "AEK Larnaca", "Vidi", "Rangers", "Dudelange", "Spartak Trnava", "Sarpsborg"]
# Association of each club
ASSOCIATIONS = ["ESP", "ENG", "ENG", "RUS", "GER", "UKR", 
                "TUR", "AUT", "GRE", "ESP", "BEL", "ITA", 
                "POR", "BUL", "DEN", "FRA", "SCO", "GRE", 
                "ITA", "BEL", "TUR", "RUS", "KAZ", "AUT", 
                "ESP", "AZE", "BLR", "CRO", "GER", "GER", 
                "SWE", "RUS", "BEL", "SUI", "FRA", "FRA", 
                "CYP", "NOR", "UKR", "CZE", "TUR", "CZE", 
                "CYP", "HUN", "SCO", "LUX", "SVK", "NOR"]
# Pot number of each club
CLUB_POTS = [1, 1, 1, 1, 1, 1, 
             1, 1, 1, 1, 1, 1, 
             2, 2, 2, 2, 2, 2, 
             2, 2, 2, 2, 2, 2, 
             3, 3, 3, 3, 3, 3, 
             3, 3, 3, 3, 3, 3, 
             4, 4, 4, 4, 4, 4, 
             4, 4, 4, 4, 4, 4]
# Number of pots in the draw
NUMBER_OF_POTS = len(set(CLUB_POTS))
# Number of clubs in each pot
CLUBS_PER_POT = len(CLUBS) / NUMBER_OF_POTS
# Groups in first timetable
GROUPS_IN_FIRST_TIMETABLE = list(range(CLUBS_PER_POT/2))
# Groups in second timetable
GROUPS_IN_SECOND_TIMETABLE = list(range(CLUBS_PER_POT/2, CLUBS_PER_POT))

In [3]:
# Build the list PAIRED_CLUBS (different TV timetable clubs) automatically
# The first two clubs in each association have to be in different TV groups: A-F (19h) or G-L (21h)
# It is required that the list CLUBS is firstly ordered by association and then by UEFA ranking
PAIRED_CLUBS = {}
for association in set(ASSOCIATIONS):
    club_indexes = [idx for idx, c in enumerate(ASSOCIATIONS) if c == association]
    if len(club_indexes) > 1:
        pairing = club_indexes[0:2]
        PAIRED_CLUBS[pairing[0]] = pairing[1]
        PAIRED_CLUBS[pairing[1]] = pairing[0]
        print("Pairing for association %s: %s, %s" % (association, CLUBS[pairing[0]], CLUBS[pairing[1]]))

# Force Russian and Ukrainian clubs to be in different groups
for ukrainian in [idx for idx, c in enumerate(ASSOCIATIONS) if c == "UKR"]:
    ASSOCIATIONS[ukrainian] = "RUS"

Pairing for association BEL: Anderlecht, Genk
Pairing for association GER: Bayer Leverkusen, RB Leipzig
Pairing for association SCO: Celtic, Rangers
Pairing for association CYP: Apollon, AEK Larnaca
Pairing for association ESP: Sevilla, Villarreal
Pairing for association FRA: Marseille, Bordeaux
Pairing for association ENG: Arsenal, Chelsea
Pairing for association UKR: Dynamo Kyiv, Vorskla Poltava
Pairing for association GRE: Olympiacos, PAOK
Pairing for association AUT: Salzburg, Rapid Wien
Pairing for association TUR: Beşiktaş, Fenerbahçe
Pairing for association ITA: Lazio, AC Milan
Pairing for association RUS: Zenit, Krasnodar
Pairing for association NOR: Rosenborg, Sarpsborg
Pairing for association CZE: Slavia Praha, Jablonec


## Functions

In [4]:
def copy_list_and_remove_element(element, list_of_elements):
    """
    Return a copy of the list_of_elements having removed element from it.

    Args:
        element: element to be removed from list_of_elements
        list_of_elements: list containing element

    Returns:
        a fresh copy of the list without the element removed
    """
    copied_list = list_of_elements[:]
    copied_list.remove(element)
    return copied_list

def filter_groups(club, groups, updated_draw):
    """
    Valid groups for the club are those having clubs from different associations and 
    satisfying the TV constraints given the current state of the draw.

    Args:
        club: club index for which the list of remaining groups will be filtered
        groups: list of groups that have not been assigned yet
        updated_draw: 2-D numpy array contaiing the current state of the draw

    Returns:
        a filter list of groups
    """
    association = ASSOCIATIONS[club]
    return filter(lambda g: has_no_same_association_club_in_group(updated_draw[g,:], association) \
                  and is_tv_constraint_satisfied(updated_draw, club, g), groups)

def exist_maximum_matching(remaining_clubs, remaining_groups, updated_draw):
    """
    A bipartite graph is built using remaining_clubs (first class) and 
    remaining_groups (second class). For each club, eligible groups are calculated, 
    and for each of these pairs of nodes (1st class, 2nd class) an edge is built. 
    If the maximum matching for this bipartite graph is exactly the sum of 
    the sizes of remaining_clubs and remaining_groups, then there is no dead ends yet.

    Args:
        remaining_clubs: list of indexes of the remaining clubs
        remaining_groups: list of groups that have not been assigned yet
        updated_draw: 2-D numpy array contaiing the current state of the draw

    Returns:
        a boolean
    """
    G = nx.Graph()
    size = len(remaining_clubs)
    G.add_nodes_from(range(size), bipartite=0)
    G.add_nodes_from(range(size, 2*size), bipartite=1)
    for idx, c in enumerate(remaining_clubs):
        for fg in filter_groups(c, remaining_groups, updated_draw):
            g_idx = remaining_groups.index(fg)
            G.add_edge(idx, g_idx + size)
    max_size = len(nx.algorithms.bipartite.maximum_matching(G))
    return max_size == 2 * size

def is_group_in_timetable(group, first_timetable=True):
    """
    Check if a group belongs to one of the two timetables.

    Args:
        group: group index
        first_timetable: True for the first timetable and False otherwise

    Returns:
        a boolean
    """
    groups = GROUPS_IN_FIRST_TIMETABLE if first_timetable else GROUPS_IN_SECOND_TIMETABLE
    return group in groups

def are_groups_in_same_timetable(group1, group2):
    """
    Check if both groups belong to the same timetable.

    Args:
        group1: first group index
        group2: second group index

    Returns:
        a boolean
    """
    return (is_group_in_timetable(group1) and is_group_in_timetable(group2)) or \
           (is_group_in_timetable(group1, False) and is_group_in_timetable(group2, False))

def has_no_same_association_club_in_group(clubs_in_group, association):
    """
    Check whether there is no club in clubs_in_group belonging to the association.

    Args:
        clubs_in_group: list of club indexes
        association: association code to be checked

    Returns:
        a boolean
    """
    already_drawn_clubs_in_group = [club for club in clubs_in_group if club > -1]
    return all([ASSOCIATIONS[club] != association for club in already_drawn_clubs_in_group])

def is_tv_constraint_satisfied(draw, drawn_club_index, group_candidate):
    """
    Check if after assigning the drawn_club into the group_candidate 
    the TV constraints about paired clubs are satisfied.

    Args:
        draw: 2-D numpy array contaiing the current state of the draw
        drawn_club_index: index of the drawn club
        group_candidate: group candidate to be assigned to the current drawn club

    Returns:
        a boolean
    """
    if drawn_club_index in PAIRED_CLUBS:
        i,j = np.where(draw == PAIRED_CLUBS[drawn_club_index])
        if (len(i) > 0) and are_groups_in_same_timetable(i[0], group_candidate):
            return False
    return True

def get_numbers_of_pending_clubs_tv_constrained(draw, remaining_clubs):
    """
    Calculate the number of clubs in remaining_clubs having a paired club already drawn
    in both TV timetable group categories: 19h and 21h.
    Return a pair of integers.

    Args:
        draw: 2-D numpy array contaiing the current state of the draw
        remaining_clubs: list of indexes of the remaining clubs

    Returns:
        a tuple of integers
    """
    clubs_in_19h = 0
    clubs_in_21h = 0
    for c in remaining_clubs:
        if c in PAIRED_CLUBS:
            g, p = np.where(draw == PAIRED_CLUBS[c])
            if len(g) > 0:
                if is_group_in_timetable(g[0]):
                    clubs_in_21h += 1
                else:
                    clubs_in_19h += 1
    return clubs_in_19h, clubs_in_21h

def has_no_dead_ends(draw, drawn_club_index, group_candidate, remaining_clubs, groups_available, pot):
    """
    Check whether after assigning drawn_club to group, 
    there will be a dead end in the draw of remaining_clubs and groups_available.

    Args:
        draw: 2-D numpy array contaiing the current state of the draw
        drawn_club_index: index of the drawn club
        group_candidate: group candidate to be assigned to the current drawn club
        remaining_clubs: list of indexes of the remaining clubs
        groups_available: groups that have not been assigned yet
        pot: pot from which the current club has been drawn

    Returns:
        a boolean
    """
    updated_draw = np.copy(draw)
    updated_draw[group_candidate, pot] = drawn_club_index
    remaining_groups = copy_list_and_remove_element(group_candidate, groups_available)
    paired_clubs_in_pot = sum([1 if c in PAIRED_CLUBS and PAIRED_CLUBS[c] in remaining_clubs else 0 \
                               for c in remaining_clubs]) 
    pairs = paired_clubs_in_pot / 2
    clubs_forced_in_19h,clubs_forced_in_21h = get_numbers_of_pending_clubs_tv_constrained(updated_draw, 
                                                                                          remaining_clubs)
    groups_in_19h = len([g for g in remaining_groups if is_group_in_timetable(g)])
    groups_in_21h = len([g for g in remaining_groups if is_group_in_timetable(g, False)])
    if (groups_in_19h < pairs + clubs_forced_in_19h) or (groups_in_21h < pairs + clubs_forced_in_21h):
        return False
    return exist_maximum_matching(remaining_clubs, remaining_groups, updated_draw)

def get_feasible_groups(draw, drawn_club_index, remaining_clubs, groups_available, pot):
    """
    Return the first feasible group for drawn_club satisfying the draw constraints 
    about TV timetables, same association clubs, and dead ends.
    The function is prepared to return a list of feasible groups 
    instead of the first one in lexicographical order.

    Args:
        draw: 2-D numpy array contaiing the current state of the draw
        drawn_club_index: index of the drawn club
        remaining_clubs: list of indexes of the remaining clubs
        groups_available: groups that have not been assigned yet
        pot: pot from which the current club has been drawn

    Returns:
        the index of the first feasible group available for the drawn club
    """
    feasible_groups = []
    association = ASSOCIATIONS[drawn_club_index]
    for group in groups_available:
        if is_tv_constraint_satisfied(draw, drawn_club_index, group):
            if has_no_same_association_club_in_group(draw[group,:], association):
                if has_no_dead_ends(draw, drawn_club_index, group, 
                                    remaining_clubs, groups_available, pot):
                    feasible_groups.append(group)
                    return feasible_groups
    return feasible_groups

def check_draw_validity(draw):
    """
    Check whether or not the draw satisfies all the constraints about 
    TV timetables and same association clubs.

    Args:
        draw: a [CLUBS_PER_POT]x[NUMBER_OF_POTS] numpy 2D-array containing club indexes

    Returns:
        a boolean
    """
    # Checking TV constraints
    for c1, c2 in PAIRED_CLUBS.items():
        g1, _ = np.where(draw == c1)
        g2, _ = np.where(draw == c2)
        if are_groups_in_same_timetable(g1[0], g2[0]):
            timetable = '19h' if is_group_in_timetable(g1[0]) else '21h'
            print("Teams %s, %s must be in different timetables, but they are in %s side" % (CLUBS[c1], 
                                                                                             CLUBS[c2], 
                                                                                             timetable))
            return False
    # Checking association constraint
    for group in range(CLUBS_PER_POT):
        valid = len(set([ASSOCIATIONS[club] for club in draw[group,:]])) == NUMBER_OF_POTS
        if not valid:
            print("Group %s (%s): %s" % (chr(65+group), 
                                         valid, 
                                         ", ".join([CLUBS[club] for club in draw[group,:]])))
            return False
    return True

def show_draw_result(draw):
    """
    Print the result of a draw in the form of group composition.

    Args:
        draw: a [CLUBS_PER_POT]x[NUMBER_OF_POTS] numpy 2D-array containing club indexes
    """
    for group in range(CLUBS_PER_POT):
        print("Group %s: %s" % (chr(65+group), ", ".join([CLUBS[club] for club in draw[group,:]])))

def simulate_draw(simulations, verbose=False, show_errors=True):
    """
    Simulate the number of draw required.

    Args:
        simulations: The number of draws to be simulated
        verbose: Trace the draw development printing pot compositions and clubs drawn
        show_errors: Print an error message where a club doesn't have any feasible group

    Returns:
        a [simulations]x[CLUBS_PER_POT]x[NUMBER_OF_POTS] numpy 3D-array containing club indexes
    """
    draws = np.full((simulations, CLUBS_PER_POT, NUMBER_OF_POTS), -1)
    simulation = 0
    while simulation < simulations:
        feasible = True
        draws[simulation] = np.full((CLUBS_PER_POT, NUMBER_OF_POTS), -1)
        draw = draws[simulation]
        for pot_idx in range(NUMBER_OF_POTS):
            clubs_in_pot = [idx for idx, pot in enumerate(CLUB_POTS) if pot == pot_idx+1]
            if verbose:
                print("\nPot #%d:%s" % (pot_idx+1, ', '.join([CLUBS[idx] for idx in clubs_in_pot])))
            groups_available = range(CLUBS_PER_POT)
            while len(clubs_in_pot) > 0:
                drawn_club = np.random.choice(clubs_in_pot)
                clubs_in_pot.remove(drawn_club)
                feasible_groups = get_feasible_groups(draw, drawn_club, 
                                                      clubs_in_pot, groups_available, pot_idx)
                if len(feasible_groups) == 0:
                    if show_errors:
                        print("Not group available for club: %s, %s" % (CLUBS[drawn_club], 
                                                                        groups_available))
                    feasible = False
                    break
                assigned_group = feasible_groups[0]
                groups_available.remove(assigned_group)
                if verbose:
                    print("\t -> %s to group %d\t%s" % (CLUBS[drawn_club], 
                                                        assigned_group, 
                                                        feasible_groups))
                draw[assigned_group, pot_idx] = drawn_club
            if not feasible:
                break
        if feasible and check_draw_validity(draw):
            simulation += 1
    return draws

def estimate_probabilities(draws):
    """
    Using all the simulated draws, the probabilities of each pair of club 
    to be in the same group are estimated.
    Probabilities are real numbers in the interval [0, 1].

    Args:
        draws: [simulations]x[CLUBS_PER_POT]x[NUMBER_OF_POTS] numpy 3D-array 
               containing the simulated draws

    Returns:
        a 48x48 numpy 2D-array containing the probability for each pair of clubs 
        belonging to the same group
    """
    simulations = draws.shape[0]
    total_events = float(simulations)  # total number of events
    estimations = np.full((len(CLUBS), len(CLUBS)), 0,  dtype=np.float32)  # probability estimations

    # For each pair of teams, calculate the probability of belonging to the same group
    for club in range(len(CLUBS)):
        pot = CLUB_POTS[club]-1  # Pot to which the current team belongs
        rivals = np.array([draws[i,np.where(draws[i,:,:] == club)[0][0],:] for i in range(simulations)])
        estimations[club, club] = 1  # each team has a prob=1 of belonging to its own group
        rivals = rivals.flatten()
        counts = collections.Counter(rivals)
        for rival, counter in counts.items():
            estimations[club, rival] = float(counter)/total_events

    return estimations

## Simulations

In [5]:
start = time.time()
simulations = 100000
draws = simulate_draw(simulations, verbose=False)
end = time.time()
print("Elapsed time for %d simulated draws: %.3f" % (simulations, end-start))

Elapsed time for 100000 simulated draws: 3846.076


In [6]:
print("Showing one of the simulated draws:\n")
show_draw_result(draws[0])

Showing one of the simulated draws:

Group A: Arsenal, Genk, BATE Borisov, Jablonec
Group B: Sevilla, København, Zürich, Akhisar Belediyespor
Group C: Lazio, Marseille, Real Betis, AEK Larnaca
Group D: Zenit, PAOK, Malmö, Rangers
Group E: Beşiktaş, Rapid Wien, Spartak Moskva, Rosenborg
Group F: Bayer Leverkusen, Astana, Rennes, Vorskla Poltava
Group G: Anderlecht, AC Milan, RB Leipzig, Dudelange
Group H: Villarreal, Krasnodar, Standard Liège, Vidi
Group I: Olympiacos, Fenerbahçe, Bordeaux, Sarpsborg
Group J: Chelsea, Celtic, Qarabağ, Apollon
Group K: Dynamo Kyiv, Sporting CP, Dínamo Zagreb, Slavia Praha
Group L: Salzburg, Ludogorets, Eintracht Frankfurt, Spartak Trnava


## Probability estimations

In [7]:
# Point probability estimations using the whole sample
estimations = estimate_probabilities(draws)

# Split the sample into a <number_of_subsamples> of subsamples
# in order to estimate the error
number_of_subsamples = 100
subsamples_estimations = np.full((number_of_subsamples, estimations.shape[0], estimations.shape[1]), 
                                 0,  
                                 dtype=np.float32)
for idx, subsample in enumerate(np.array_split(draws, number_of_subsamples)):
    subsamples_estimations[idx, :] = estimate_probabilities(subsample)

# The error will be the squared root of the variance divided by the number of subsamples
error_estimations = np.sqrt(np.var(subsamples_estimations, axis=0)/number_of_subsamples)

## Results

A point estimation and a 95% confidence interval is shown for the probability of each pair of clubs to be in the same group if that likelihood is greater than zero.

In [8]:
for pot in range(1, NUMBER_OF_POTS+1):
    print("Clubs in pot %d" % pot)
    for club in np.where(np.array(CLUB_POTS) == pot)[0]:
        print("\t%s" % CLUBS[club])
        for other_pot in filter(lambda x: x <> pot-1, range(NUMBER_OF_POTS)):
            rivals = np.where(np.array(CLUB_POTS) == other_pot+1)[0]
            print("\t\tRivals in pot %d" % (other_pot+1))
            probabilities = filter(lambda x: x[1] > 0, zip(rivals, estimations[club,rivals]))
            probabilities = sorted(probabilities, key=lambda x: -x[1])
            for rival, p in probabilities:
                print("\t\t\t%s: %.2f%% ± %.2f%%" % (CLUBS[rival], 
                                                             p*100, 
                                                             2*error_estimations[club,rival]*100))
            if not np.isclose([sum(estimations[club,rivals])], [1]):
                print("\t\t\tThe sum of all the probabilities must be equal to 100%")

Clubs in pot 1
	Sevilla
		Rivals in pot 2
			Krasnodar: 9.79% ± 0.18%
			PAOK: 9.01% ± 0.19%
			Rapid Wien: 8.90% ± 0.17%
			Genk: 8.88% ± 0.16%
			AC Milan: 8.85% ± 0.15%
			Fenerbahçe: 8.77% ± 0.16%
			Astana: 7.72% ± 0.15%
			Marseille: 7.67% ± 0.16%
			Celtic: 7.64% ± 0.16%
			København: 7.64% ± 0.15%
			Ludogorets: 7.58% ± 0.16%
			Sporting CP: 7.56% ± 0.18%
		Rivals in pot 3
			Bordeaux: 10.49% ± 0.19%
			RB Leipzig: 10.46% ± 0.20%
			Spartak Moskva: 10.20% ± 0.20%
			Eintracht Frankfurt: 9.44% ± 0.19%
			Standard Liège: 9.08% ± 0.18%
			Qarabağ: 8.54% ± 0.20%
			BATE Borisov: 8.42% ± 0.16%
			Malmö: 8.40% ± 0.17%
			Rennes: 8.37% ± 0.18%
			Dínamo Zagreb: 8.36% ± 0.19%
			Zürich: 8.26% ± 0.13%
		Rivals in pot 4
			Vorskla Poltava: 9.73% ± 0.19%
			Rangers: 9.18% ± 0.18%
			Akhisar Belediyespor: 8.71% ± 0.16%
			Jablonec: 8.26% ± 0.18%
			Rosenborg: 8.24% ± 0.17%
			AEK Larnaca: 8.08% ± 0.16%
			Sarpsborg: 8.06% ± 0.17%
			Slavia Praha: 8.05% ± 0.18%
			Dudelange: 7.96% ± 0.16%
	