# Introduction

In this project, we will be looking into the hypotheses that goals scored can be modelled using a Possion Distribution

# Bernoulli Trials 

The first step into our Poisson Distribution analysis should be a study of Bernoulli Trials. A Bernoulli Trial is a random experiment with exactly two possible outcomes, "success" and "failure". For the purpose of this exercise, we define "success" as a goal being scored by a given player. Mathematically, the calculation is very simple:

\begin{equation*}
P(Goals = k)   = {n \choose k} p^k (1-p)^{ n-k}
\end{equation*}

where $n$ is the number of Bernoulli trials (or in our case, number of shot attempts), $k$ is number of successes (goals scored), ${n \choose k}$ is the Binomial Coefficient, $p$ is the probability of the trial ending in a success and $q = 1 - p$.


In [2]:

# We import the necessary packages to run our code

import getData
import binomialCoefficient
import pandas as pd
from IPython.display import display

For this initial analysis, we look at Wolverhampton Wanderers players from the 2024/25 season, using stats taken from fbref.com. We also apply a few data cleaning processes so the data is more easily manipulated, remove empty cells etc...

In [3]:
URL ='https://fbref.com/en/squads/8cec06e1/Wolverhampton-Wanderers-Stats' 

playersTable = getData.fetchDataFromUrl(URL)

listOfPlayers = playersTable['Player']

playersTable.set_index("Player", inplace = True)

playersTable.dropna(inplace= True)

## Start of Bernoulli Trial

We will use stats per 90 as this data is normalised. We will also use xG per shot as the probability of a player scoring with each shot.
This is labelled as 'npxG/Sh' in our Data Frame.
We can also use the 'Sh/90' value to determine how many shots a player takes per game, this can be our 'n' in our trial. (We can round up or down depending on the data)

In [5]:
numberOfShots = [1, 2, 3, 4, 5]

goalProbabilityTable = pd.DataFrame(columns = numberOfShots, index = listOfPlayers) # create our table with number of shots as columns and players as rows (index)
goalProbabilityTable.rename_axis('Attempts on goal', axis = 'columns')

# Populate our Table with values
for player in listOfPlayers:
    for number in numberOfShots:
        playerTrial = binomialCoefficient.BinomialCoefficient(number, 1)
        goalProbabilityTable.loc[player, number] = playerTrial.bernoulliTrial(playersTable.at[player, 'npxG/Sh'])

# The table will give us the values of P(Goals = 1) given the amount of shots taken for each player

goalProbabilityTable.style.highlight_max(color='green')

print(goalProbabilityTable)


KeyError: 'Toti Gomes'

The table above gives us the values of $P(Goals = 1)$ given the amount of shots taken for each player. For example, $P_{João Gomes}(Goals = 1)_{1} = 0.07$ if João Gomes only takes 1 shot during a match. $P_{João Gomes}(Goals = 1)_{2} = 0.1302$ etc...