In [2]:
from empiricaldist import Cdf, Pmf
import pandas as pd
from utils import make_die
import numpy as np

In Dungeons & Dragons, each character has six attributes: strength, intelligence, wisdom, dexterity, constitution, and charisma.

To generate a new character, players roll four 6-sided dice for each attribute and add up the best three. For example, if I roll for strength and get 1, 2, 3, 4 on the dice, my character’s strength would be the sum of 2, 3, and 4, which is 9.

When you generate a D&D character, instead of rolling dice, you can use the “standard array” of attributes, which is 15, 14, 13, 12, 10, and 8. Do you think you are better off using the standard array or (literally) rolling the dice?

Compare the distribution of the values in the standard array to the distribution we computed for the best three out of four:

    Which distribution has higher mean? Use the mean method.

    Which distribution has higher standard deviation? Use the std method.

    The lowest value in the standard array is 8. For each attribute, what is the probability of getting a value less than 8? If you roll the dice six times, what’s the probability that at least one of your attributes is less than 8?

    The highest value in the standard array is 15. For each attribute, what is the probability of getting a value greater than 15? If you roll the dice six times, what’s the probability that at least one of your attributes is greater than 15?


In [3]:
standard_array = [15, 14, 13, 12, 10, 8]


In [11]:
samples = 10_000

#simulate rolling 1 dice 4 times. Building distribution
rolls = np.random.randint(1, 7, size=(samples, 4))

#sort by each row
rolls.sort(axis=1)

summed_best3 = rolls[:, 1:].sum(axis=1)


In [None]:
pmf_best3 = Pmf.from_seq(summed_best3)
cdf_best3 = pmf_best3.make_cdf() # used in solution, so far not in use.

In [23]:
print(f'Mean of standard array {np.mean(standard_array)}, mean of rolling the dice {pmf_best3.mean()}')
print(f'Std of standard array {np.std(standard_array)}, std of rolling the dice {pmf_best3.std()}')

Mean of standard array 12.0, mean of rolling the dice 12.2185
Std of standard array 2.3804761428476167, std of rolling the dice 2.8679884501162136


In [28]:
#standard array min is 8, prob of finding less than 8
print(f'Prob less than 8 {cdf_best3.prob_lt(8)}')

#standard array max is 15, prob of finding greater than 8
print(f'Prob greater than 15 {cdf_best3.prob_gt(15)}')


Prob less than 8 0.061200000000000004
Prob greater than 15 0.13019999999999987


In [31]:
#probability at least one attribute is less than 8

#order wrong
# cdf_best3**6 # roll 6 times
# 1-cdf_best3**6 # take compliment


# (1-cdf_best3)# looking for worst, therefore take compliment
# (1-cdf_best3)**6 #6 samples , but this is for all rolls.
# 1-(1-cdf_best3)**6 # need the compliment since interested in single roll

1-(1-cdf_best3.lt_dist(8))**6

np.float64(0.3153974924052577)

In [32]:
#probability at least one attribute is greater than 15
1-(1-cdf_best3.gt_dist(15))**6

np.float64(0.566971560462616)

Exercise: Suppose you are fighting three monsters:

    One is armed with a short sword that causes one 6-sided die of damage,

    One is armed with a battle axe that causes one 8-sided die of damage, and

    One is armed with a bastard sword that causes one 10-sided die of damage.

One of the monsters, chosen at random, attacks you and does 1 point of damage.

Which monster do you think it was? Compute the posterior probability that each monster was the attacker.

In [None]:
# we know 1 point of damage was done. Therefore which monster is likely?
# hypos are the dice in play, only one monster
# likelihood is the chance of rolling a 1 - which dice 6 is most likely. But the likelihood is 1/hypo
hypos = [6, 8, 10]
counts = [1, 1, 1]

pmf_monster = Pmf(counts, hypos)
likelihood = 1/pmf_monster.qs
posterior = pmf_monster * likelihood
posterior.normalize()
posterior

Unnamed: 0,probs
6,0.425532
8,0.319149
10,0.255319


If the same monster attacks you again, what is the probability that you suffer 6 points of damage?

In [52]:
#building the distribution of damage
dice = [make_die(side) for side in hypos]
df = pd.DataFrame(dice).fillna(0).transpose()

# combining damage distribution with selection distribution
df = df * posterior.ps # using posterior here because we already suffered one point of damage.
df.sum(axis=1)

1     0.136348
2     0.136348
3     0.136348
4     0.136348
5     0.136348
6     0.136348
7     0.065426
8     0.065426
9     0.025532
10    0.025532
dtype: float64

Henri Poincaré was a French mathematician who taught at the Sorbonne around 1900. The following anecdote about him is probably fiction, but it makes an interesting probability problem.

Supposedly Poincaré suspected that his local bakery was selling loaves of bread that were lighter than the advertised weight of 1 kg, so every day for a year he bought a loaf of bread, brought it home and weighed it. At the end of the year, he plotted the distribution of his measurements and showed that it fit a normal distribution with mean 950 g and standard deviation 50 g. He brought this evidence to the bread police, who gave the baker a warning.

For the next year, Poincaré continued to weigh his bread every day. At the end of the year, he found that the average weight was 1000 g, just as it should be, but again he complained to the bread police, and this time they fined the baker.

Why? Because the shape of the new distribution was asymmetric. Unlike the normal distribution, it was skewed to the right, which is consistent with the hypothesis that the baker was still making 950 g loaves, but deliberately giving Poincaré the heavier ones.

To see whether this anecdote is plausible, let’s suppose that when the baker sees Poincaré coming, he hefts `n` loaves of bread and gives Poincaré the heaviest one. How many loaves would the baker have to heft to make the average of the maximum 1000 g?

In [53]:
mean = 950
std = 50

np.random.seed(17)
sample = np.random.normal(mean, std, size=365)

In [None]:
# lets assume a value for n, n=10

n=10

#cdf is useful here

cdf = Cdf.from_seq(sample)
cdf.max_dist(n) #this is saying, what is the probability of n loaves
max_10_cdf = cdf.max_dist(n)
max_10_pmf = max_10_cdf.make_pmf()
max_10_pmf.mean() # sum of pi*qi - weighted average basically since sum of pi = 1

'''
CDF is useful here because we want to understand the probability of a particular loaf size given n trials
By using CDF we can say 'we sampled n times from our normal distriution'

When we say we took 10 loaves, 
we are doing 10 independent trials then asking the probability that a particular loaf was selected
So if 400g loaf is not likely, 10 trials means it is rarely chosen.

If we used PMF, we would have to wrangle with the PMF giving point probabilities 
and not the sum of probabilities up to a point like the CDF does.
This means, when we try to get the max, we have to put more work in.
The CDF is natural for this though, we give it a number and it tells us the probability of all values less than that.

'''

np.float64(1028.2523518388548)

In [None]:
# from above, we know 10 is enough

for n in range(1, 10):
    print(n, cdf.max_dist(n).mean())
# 4 loaves would be enough.

1 949.7832346541664
2 978.4666876067706
3 992.7589004318227
4 1002.0372868686195
5 1008.8226939493089
6 1014.142390301017
7 1018.507694202546
8 1022.2066965318894
9 1025.416321307913
