### Math stochastiek: simuleren van het WK
#### Student : Hussin Almoustafa
#### Studentnummer : 1776495

In [14]:
#Import the needed liberies  
import random
import pandas as pd
import math
from concurrent.futures import ThreadPoolExecutor

In [2]:
# Load the data from the Excel file
world_cup_data = pd.read_excel('fifa-ranking.xlsx')
world_cup_data

Unnamed: 0,rank,country,points
0,1,Belgium,1780
1,2,France,1755
2,3,Brazil,1743
3,4,England,1670
4,5,Portugal,1662
...,...,...,...
205,206,Sri Lanka,853
206,207,US Virgin Islands,844
207,208,British Virgin Islands,842
208,209,Anguilla,821


####  The poisson random number function
is a custom-built random number generator for the Poisson distribution. The Poisson distribution is a probability distribution that models the number of occurrences of an event within a fixed interval of time or space. It is commonly used in various fields, such as physics, biology, and finance.

The function takes a single parameter, $mean$, which represents the $mean$ value of the Poisson distribution. It first calculates a threshold value $L$ based on the mean using the exponential function. It then generates a random number u between 0 and 1 and multiplies it with a running product $p$. The number of iterations required to reach a product value p below the threshold value $L$ corresponds to a random number that follows a Poisson distribution with the specified mean.

In [3]:
# Define the custom-built random number generator for the Poisson distribution
def poisson_random_number(mean):
    L = math.exp(-mean)
    k = 0
    p = 1
    while p > L:
        k += 1
        u = random.uniform(0, 1)
        p *= u
    return k - 1


#### The generate rankings and matches function 


This function takes in a pandas $DataFrame$ data containing a list of countries along with their corresponding ranking and points. The function first sorts the DataFrame by rank, and then generates a rankings dictionary where the keys are the country names and the values are the corresponding points.

The function then generates a matches list where each element of the list is a tuple containing two team names. This list is generated by looping through all possible pairs of teams in the DataFrame and adding each pair to the list. The matches list is then returned along with the rankings dictionary. 

In [15]:
def generate_rankings_and_matches(data):
    # Sort the data by rank
    data = data.sort_values(by="rank")
    
    # Generate the rankings dictionary
    rankings = {}
    for index, row in data.iterrows():
        rankings[row["country"]] = row["points"]
    
    # Generate the matches list
    matches = []
    for i in range(0, len(data)-1):
        for j in range(i+1, len(data)):
            team1 = data.iloc[i]["country"]
            team2 = data.iloc[j]["country"]
            matches.append((team1, team2))
    
    return rankings, matches

#### The Simulate matches function


This function simulates matches between teams using the Poisson distribution and returns the number of wins for each team. It takes two arguments, rankings and matches, which are generated by the **generate_rankings_and_matches** function.

The function first creates a dictionary **win_counts** that stores the number of **wins** for each team. It then shuffles the order of the matches using the **random.shuffle** function to introduce randomness in the simulation. It then loops through each match and uses the Poisson distribution to simulate the number of goals scored by each team. The team with the most goals is declared the winner, and their win count is incremented in the win_counts dictionary.

Finally, the function returns the win_counts dictionary, which contains the number of wins for each team after simulating all the matches.

In [5]:
def simulate_matches(rankings, matches):
    win_counts = {country: 0 for country in rankings.keys()}
    # Shuffle the order of the matches for each simulation
    random.shuffle(matches)
    # Simulate each match and update the win counts
    for match in matches:
        team1, team2 = match
        mean1, mean2 = rankings[team1], rankings[team2]
        goals1 = poisson_random_number(mean1)
        goals2 = poisson_random_number(mean2)
        if goals1 > goals2:
            win_counts[team1] += 1
        elif goals1 < goals2:
            win_counts[team2] += 1
    return win_counts

In [12]:
top_32 = world_cup_data.sort_values('rank').head(32)
rankings, matches = generate_rankings_and_matches(top_32)
num_simulations = 1000

In [13]:
# Create a thread pool to run simulations in parallel
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(simulate_matches, rankings, matches) for i in range(num_simulations)]

win_counts = {country: 0 for country in rankings.keys()}
# Combine the results from all the simulations
for future in futures:
    result = future.result()
    for country, count in result.items():
        win_counts[country] += count

# Print the results
for country, count in sorted(win_counts.items(), key=lambda x: x[1], reverse=True):
    print(f"{country}: {count} ({count / num_simulations * 100:.2f}%)")

England: 15612 (1561.20%)
Switzerland: 15547 (1554.70%)
Tunisia: 15547 (1554.70%)
Wales: 15537 (1553.70%)
Peru: 15505 (1550.50%)
Serbia: 15491 (1549.10%)
Belgium: 15451 (1545.10%)
Japan: 15432 (1543.20%)
Argentina: 15422 (1542.20%)
USA: 15419 (1541.90%)
Brazil: 15395 (1539.50%)
Mexico: 15363 (1536.30%)
Netherlands: 15358 (1535.80%)
Chile: 15357 (1535.70%)
Ukraine: 15353 (1535.30%)
Colombia: 15344 (1534.40%)
Turkey: 15336 (1533.60%)
Denmark: 15332 (1533.20%)
Austria: 15329 (1532.90%)
Algeria: 15327 (1532.70%)
Sweden: 15305 (1530.50%)
IR Iran: 15300 (1530.00%)
Uruguay: 15272 (1527.20%)
France: 15269 (1526.90%)
Portugal: 15252 (1525.20%)
Italy: 15244 (1524.40%)
Spain: 15211 (1521.10%)
Senegal: 15186 (1518.60%)
Poland: 15153 (1515.30%)
Venezuela: 15133 (1513.30%)
Germany: 15129 (1512.90%)
Croatia: 15008 (1500.80%)


Here below are the World cup simulation using the Mersenne Twister algorithm.

**There is something missing in Simulating the match using the Poisson distribution** 

i really have no idea why it take so long :( 

In [20]:
def mt19937(seed):
    """
    Function: mt19937

    This function generates random numbers using the Mersenne Twister algorithm.

    Parameters:

    seed : int : the seed value to initialize the random number generator.

    Returns:

    A generator object that yields a sequence of 32-bit unsigned integer random numbers.

    Algorithm:

    The Mersenne Twister algorithm is a pseudorandom number generator that produces a sequence of numbers that are uniformly distributed.
    The algorithm is based on a matrix linear recurrence over a finite field. The matrix is designed to have a long period and good statistical properties.
    The algorithm generates a sequence of 624 32-bit unsigned integers using the seed value.
    The algorithm uses a tempering function to improve the statistical properties of the sequence.
    The algorithm also uses a twist function to generate a new set of numbers based on the previous 624 numbers in the sequence.

    Usage:

    The mt19937 function can be used to generate random numbers for various applications, such as simulations, cryptography, and statistical analysis.
    Example usage: rng = mt19937(1234), random_number = next(rng)
    """
    # Initialize the state array with a seed value
    state = np.zeros(624, dtype=np.uint32)
    state[0] = seed
    for i in range(1, 624):
        state[i] = (1812433253 * (state[i-1] ^ (state[i-1] >> 30)) + i) & 0xffffffff

    # Generate a sequence of 624 random numbers
    index = 0
    while True:
        if index == 0:
            state = _twist(state)
        y = state[index]
        y = y ^ (y >> 11)
        y = y ^ ((y << 7) & 0x9d2c5680)
        y = y ^ ((y << 15) & 0xefc60000)
        y = y ^ (y >> 18)
        index = (index + 1) % 624
        yield y

In [17]:
def poisson(lambda_):
    """
    Generate a Poisson-distributed random variable using the algorithm by Knuth.
    """
    L = np.exp(-lambda_)
    p = 1.0
    k = 0
    while p > L:
        k += 1
        p *= np.random.uniform(0, 1)
    return k - 1

In [18]:
def generate_matches(df):
    seed = 42
     # Generate a list of teams
    teams = df['country'].tolist()
    # Define the Mersenne Twister random number generator
    rng = mt19937(seed)
    # Loop over every team combination and generate a match
    matches = []
    for i, home_team in enumerate(teams):
        for away_team in teams[i+1:]:
            # Generate the Poisson distribution with lambda parameter
            lambda_ = df.loc[df['country'] == home_team, 'points'].iloc[0]
            poisson_dist = partial(poisson, lambda_=lambda_, rng=rng)
            # Simulate the match using the Poisson distribution
            home_score, away_score = poisson_dist(), poisson_dist()
            # Add the match to the list of matches
            matches.append({'Home Team': home_team, 'Away Team': away_team, 'Home Score': home_score, 'Away Score': away_score})
    # Convert the list of matches to a pandas DataFrame
    matches = pd.DataFrame(matches)
    return matches

In [19]:
def simulate_world_cup(df: pd.DataFrame, num_simulations: int, seed: int) -> dict:
    """
    Simulates the World Cup tournament for a given dataframe of teams and their rankings.

    Args:
        df (pd.DataFrame): A dataframe containing the teams and their rankings.
        num_simulations (int): The number of times to simulate the tournament.
        seed (int): The seed for the random number generator.

    Returns:
        dict: A dictionary containing the number of times each team won the tournament.
    """
    rng = np.random.default_rng(seed)
    matches = generate_matches(df)
    results = {team: 0 for team in df['country']}
    for i in range(num_simulations):
        tournament_results = df.copy()
        tournament_results['points'] = 0
        for j, match in matches.iterrows():
            home_team, away_team = match[['Home Team', 'Away Team']]
            home_score = poisson(df.loc[df['country'] == home_team, 'rank'].iloc[0], rng)
            away_score = poisson(df.loc[df['country'] == away_team, 'rank'].iloc[0], rng)
            if home_score > away_score:
                tournament_results.loc[tournament_results['country'] == home_team, 'points'] += 3
            elif away_score > home_score:
                tournament_results.loc[tournament_results['country'] == away_team, 'points'] += 3
            else:
                tournament_results.loc[tournament_results['country'].isin([home_team, away_team]), 'points'] += 1
        tournament_results = tournament_results.sort_values(by=['points', 'rank'], ascending=False)
        winner = tournament_results['country'].iloc[0]
        results[winner] += 1
    return results

In [None]:
matches = generate_matches(world_cup_data[:4])

In [None]:
simulate_world_cup(matches, 1000, 12345)