# Bayesian Updating

We assume that runs scored by each team are both random variables that follow a Weibull distribution. We follow the scientific research in the field to determine the shape parameter of the underlying Weibull distribution. Having fixed that parameter, in the first simulation, we assume that the scale paramater of the underlying Weibull distribution for each team is equal to the average runs that they have scored through their last game. In the second simulation, we use an alternative approach where this scale parameter for each team is determined through a continuous `Bayesian` updating process. The general idea is that we form a `prior` belief about the expected runs that a team can score at the beginning of the season. Then, as the season progresses, by observing the number of runs that they score in each game, we `update` our belief about their future peroformance. This updated belief, is then, used for simulation and prediction of their next game.



We rely on statistical theory to form this Bayeisan updating algorithm. The theory tells us that if a random variable $x$ has a Weibull distribution with a *known* shape, then, an `Inverse Gamma (IG)` distribution would be a `conjugate` prior distribution for its scale parameter. In other words, consider the following Weibull random variable

$$
f(x|k, \theta) = \frac{k}{\theta} x^{k-1} e^{-\frac{x^k}{\theta}}
$$

where $k$ is the **known** shape and $\theta$ is the **unknown** scale parameter. If we assume the the scale parameter $\theta$, is a random variable with an IG distribtion with parameterts *a* and *b*, we can formulate the problem in such a way that after each observation only the parameters of the IG distribtion are updated. In this framework, let's assume that we start with a prior distribtuion with parameters $a_0$ and $b_0$ ($k$ is the known shape parameter), *i.e.,* 

$$\theta|k \sim IG(a_0, b_0) $$

Then, if we observe $n$ games where the team in each game scores $rs_i \hspace{2mm} i \in \{1,2, \ldots, n\}$ runs, then, we can form our updated `posterior` belief as

$$\theta|k \sim IG(a_n, b_n) $$

where

$$a_n = a_0 + n $$
and
$$ b_n = b_0 + \sum\limits_{i=1}^{n} rs_i^{k}$$

This is the algorithm we have followed in our simulation. The only issue is in implementation of the algorithm as in Python the Weibull distribution is defined in with a slightly differnet notation. Specifically, when in Python a Weibull random variable has the following PDF:

$$
f(x|k,\lambda) = \frac{k}{\lambda} (\frac{x}{\lambda})^{(k-1)} e^{-(\frac{x}{\lambda})^k}
$$


We should note that the two notations are identical iff $\theta = \lambda^k$. Using this relation, we formulate our Bayesian updaitng algorithm. 

##  Hyperparameters of the Prior

The above discussion completes the algorithm provided that we have a known prior -*i.e., $a_0$ and $b_0$*. In order to form our prior we make two assumptions. The first, is a standard assumption, $ a_0 = 2$. Note that, in the first notation, 

$$\theta|k \sim IG(a_0, b_0) \hspace{3mm} \Rightarrow \hspace{3mm} \mathbb{E}\theta = \frac{b_0}{a_0-1} $$

This normalizing assumption, basically implies that $ \mathbb{E}\theta = b_0$.

In order to determine $b_0$, then, we can rely on the observed data. The initial values of the hyperparameter are defined such that before the first game, all teams are expected to score `rs_0` runs in their first game. This is an *uninformed prior*. Since the runs scored have a Weibull distribution:

$$
\mathbb{E} runs = \lambda \Gamma(1+1/k)  \hspace{3mm} \Rightarrow \hspace{3mm} \lambda_0 = \frac{rs_0}{\Gamma(1+1/k)}
$$

Thus,

$$\mathbb{E} \lambda_0^k = \frac{b}{a-1}  \hspace{3mm} \Rightarrow \hspace{3mm}\Big[\frac{rs_0}{\Gamma(1+1/k)} \Big]^k = \frac{b}{2-1}$$


This gives us the initial hyperparameters for all teams :

$$ b_0 = \Big[\frac{rs_0}{\Gamma(1+1/k)} \Big]^k$  \hspace{10mm} \text{and} \hspace{10mm} a_0 = 2.0$$


After one game, if a team scores $rs_1$ rusn. Then, we update our belief about their expected future runs as the following: 

$$ \lambda^k \sim IG(a_0 +1, b_0 + rs_1^{k})$$

Considering that, 

$$\mathbb{E}\lambda^k = \frac{ b_0 + rs_1^{k}}{a_0 +1 -1} $$

we have:

$$\lambda_1 = \Big[\frac{ b_0 + rs_1^{k}}{a_0 +1 -1} \Big]^{\frac{1}{k}}$$

This is the scale parameter that we use in our simulation. Similarly, after n games:

$$\lambda_n = \Big[\frac{ b_0 + \sum\limits_{i=1}^{n} rs_i^{k}}{a_0 + n -1} \Big]^{\frac{1}{k}}$$

This completes our second simulation algorithm.

### Change in PE 

We are assuming that after each game, the PE should not change by a large margin:

$$
|PE_{n+1} - PE_{n}| \leq \frac{\epsilon_1}{n^{\epsilon_2}}
$$

Therefore,

$$
\ln \big\{|PE_{n+1} - PE_{n}|\big\} \leq \ln \big\{\frac{\epsilon_1}{n^{\epsilon_2}}\big\} = \ln \epsilon_1 - \epsilon_2 \ln (n) 
$$

One solution to determine both parameters is to run a regression for each team, and find the average across all teams.