**Exercise:** In Major League Baseball, most players have a batting average between 200 and 330, which means that their probability of getting a hit is between 0.2 and 0.33.

Suppose a player appearing in their first game gets 3 hits out of 3 attempts.  What is the posterior distribution for their probability of getting a hit?

For this exercise, I'll construct the prior distribution by starting with a uniform distribution and updating it with imaginary data until it has a shape that reflects my background knowledge of batting averages.

Here's the uniform prior:

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
from scipy.stats import binom

hypos = np.linspace(.1, .4, 101)

# Approach to get the ~80% of the players in the interval .2 and .33
# get the mean between .2 and .33
# run n experiments
# check how close we got from the 80%
# Repeat if not close enough

m = (.2+.33)/2  #.265

# get 265 hits out of 1000 throws: it gives 99% of the players are in the interval
# get 53 hits out of 200 throws: it gives 96% of the players are in the interval
# get 26 hits out of 100 throws: it gives 86% of the players are in the interval <- winner
# get 13 hits out of 50 throws: it gives 70% of the players are in the interval
likelihood_fn = binom.pmf(
    26, 100, hypos
)

# build a bayes table to run the experiments. The b_ prefix stands for base as
# we will use b_posterior as prior for the update in a posterior step
bt = pd.DataFrame({'b_prior': hypos})
bt['b_likelihood'] = likelihood_fn
bt['b_conjuction'] = bt.b_prior * bt.b_likelihood
bt['b_posterior'] = bt.b_conjuction / bt.b_conjuction.sum()

interval = (
    (bt.b_prior >= .2) & (bt.b_prior <= .33)
)
print(
    'The % of the players that are in the interval is:',
    bt[interval].b_posterior.sum()
)

sns.lineplot(x=hypos, y=bt.b_posterior);

In [None]:
# Update for the player
# our likelihood function has changed and now it seems something like this
# as we have hit 3 out 3 throws
likelihood_fn2 = binom.pmf(3, 3, hypos)
p = sns.lineplot(x=hypos, y=likelihood_fn, label='initial');
sns.lineplot(x=hypos, y=likelihood_fn2, label='update');
p.set_title('Likelihood functions');

In [None]:
# We update our beliefs with the new likelihood function
bt['conjuction'] = bt.b_posterior * likelihood_fn2
bt['posterior'] = bt.conjuction / bt.conjuction.sum()

interval = (
    (bt.b_prior >= .2) & (bt.b_prior <= .33)
)
print(
    'The % of the players that are in the interval is:',
    bt[interval].posterior.sum()
)

sns.lineplot(x=hypos, y=bt.b_posterior, label='initial');
sns.lineplot(x=hypos, y=bt.posterior, label='updated');

So basically we updated our beliefs and now we have the 79% of the players in the range

Let's see how the max probability has shifted

In [None]:
k0 = bt.b_posterior.max()
k1 = bt.posterior.max()
k2 = bt[bt.b_posterior == k0].b_prior.values[0]
k3 = bt[bt.posterior == k1].b_prior.values[0]
print('The expectation of getting a hit in the initial table is:', round(k2, 2))
print(f'However only the {100*k0.round(2)}% of the players will average that exact number')
print('The expectation of getting a hit in the updated table is:', round(k3, 2))
print(f'However only the {100*k1.round(2)}% of the players will average that exact number')

We can see that after 3 hits in a row we update our expectation about a 2%. However, it is important to realize that the probability of having a player in a season hitting 290 times out of 1000 throws is quite low, about a 3%