# Probability 2: Loaded dice 

In this assignment you will be reinforcening your intuition about the concepts covered in the lectures by taking the example with the dice to the next level. 

This assignment will not evaluate your coding skills but rather your intuition and analytical skills. You can answer any of the exercise questions by any means necessary, you can take the analytical route and compute the exact values or you can alternatively create some code that simulates the situations at hand and provide approximate values (grading will have some tolerance to allow approximate solutions). It is up to you which route you want to take! 

Note that every exercise has a blank cell that you can use to make your calculations, this cell has just been placed there for you convenience but **will not be graded** so you can leave empty if you want to.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import utils

## Some concept clarifications 🎲🎲🎲

During this assignment you will be presented with various scenarios that involve dice. Usually dice can have different numbers of sides and can be either fair or loaded.

- A fair dice has equal probability of landing on every side.
- A loaded dice does not have equal probability of landing on every side. Usually one (or more) sides have a greater probability of showing up than the rest.

Let's get started!

## Exercise 1:



Given a 6-sided fair dice (all of the sides have equal probability of showing up), compute the mean and variance for the probability distribution that models said dice. The next figure shows you a visual represenatation of said distribution:

<img src="./images/fair_dice.png" style="height: 300px;"/>

**Submission considerations:**
- Submit your answers as floating point numbers with three digits after the decimal point
- Example: To submit the value of 1/4 enter 0.250

Hints: 
- You can use [np.random.choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) to simulate a fair dice.
- You can use [np.mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html) and [np.var](https://numpy.org/doc/stable/reference/generated/numpy.var.html) to compute the mean and variance of a numpy array.

In [2]:
# You can use this cell for your calculations (not graded)
rolls = np.random.choice([1, 2, 3, 4, 5, 6], size=1000000)

mean = np.mean(rolls)
variance = np.var(rolls)

mean, variance

(3.500241, 2.9165399419189995)

In [3]:
# Run this cell to submit your answer
utils.exercise_1()

FloatText(value=0.0, description='Mean:')

FloatText(value=0.0, description='Variance:')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 2:

Now suppose you are throwing the dice (same dice as in the previous exercise) two times and recording the sum of each throw. Which of the following `probability mass functions` will be the one you should get?

<table><tr>
<td> <img src="./images/hist_sum_6_side.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_5_side.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_6_uf.png" style="height: 300px;"/> </td>
</tr></table>


Hints: 
- You can use numpy arrays to hold the results of many throws.
- You can sum to numpy arrays by using the `+` operator like this: `sum = first_throw + second_throw`
- To simulate multiple throws of a dice you can use list comprehension or a for loop

In [4]:
# You can use this cell for your calculations (not graded)

num_rolls = 1000000
first_throw = np.random.choice([1, 2, 3, 4, 5, 6], size=num_rolls)
second_throw = np.random.choice([1, 2, 3, 4, 5, 6], size=num_rolls)
sum_of_throws = first_throw + second_throw

values, counts = np.unique(sum_of_throws, return_counts=True)
probabilities = counts / num_rolls

pmf = dict(zip(values, probabilities))

pmf

{2: 0.027911,
 3: 0.055461,
 4: 0.083433,
 5: 0.111112,
 6: 0.138867,
 7: 0.167062,
 8: 0.139211,
 9: 0.11128,
 10: 0.083133,
 11: 0.054796,
 12: 0.027734}

In [5]:
# Run this cell to submit your answer
utils.exercise_2()

ToggleButtons(description='Your answer:', options=('left', 'center', 'right'), value='left')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 3:

Given a fair 4-sided dice, you throw it two times and record the sum. The figure on the left shows the probabilities of the dice landing on each side and the right figure the histogram of the sum. Fill out the probabilities of each sum (notice that the distribution of the sum is symetrical so you only need to input 4 values in total):

<img src="./images/4_side_hists.png" style="height: 300px;"/>

**Submission considerations:**
- Submit your answers as floating point numbers with three digits after the decimal point
- Example: To submit the value of 1/4 enter 0.250

In [6]:
# You can use this cell for your calculations (not graded)

probabilities_4_sided_dice = []

for sum_value in range(2, 6):  # Only go up to 5 because of symmetry
    # Count how many ways to get this sum
    count_ways = len([(a, b) for a in range(1, 5) for b in range(1, 5) if a + b == sum_value])
    # The total number of outcomes is 4 (sides) times 4
    probabilities_4_sided_dice.append(count_ways / 16.0)

probabilities_4_sided_dice

[0.0625, 0.125, 0.1875, 0.25]

In [7]:
# Run this cell to submit your answer
utils.exercise_3()

FloatText(value=0.0, description='P for sum=2|8', style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='P for sum=3|7:', style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='P for sum=4|6:', style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='P for sum=5:', style=DescriptionStyle(description_width='initial'))

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 4:

Using the same scenario as in the previous exercise. Compute the mean and variance of the sum of the two throws  and the covariance between the first and the second throw:

<img src="./images/4_sided_hist_no_prob.png" style="height: 300px;"/>


Hints:
- You can use [np.cov](https://numpy.org/doc/stable/reference/generated/numpy.cov.html) to compute the covariance of two numpy arrays (this may not be needed for this particular exercise).

In [8]:
# You can use this cell for your calculations (not graded)

sides = np.array([1, 2, 3, 4])
probabilities = np.array([1/4] * 4)  # Equal probability for each side

# Calculate the mean and variance for one throw
mean_one_throw = np.sum(sides * probabilities)
variance_one_throw = np.sum((sides - mean_one_throw)**2 * probabilities)

# For two throws, the sums range from 2 to 8
sums = np.array(range(2, 9))

# Calculate the probabilities for the sums of two throws
# For a fair 4-sided die, the sums and their probabilities are symmetric, so we can use the previous results
probabilities_sums = np.array([1/16, 2/16, 3/16, 4/16, 3/16, 2/16, 1/16])

# Calculate the mean and variance for the sum of two throws
mean_sum_two_throws = np.sum(sums * probabilities_sums)
variance_sum_two_throws = np.sum((sums - mean_sum_two_throws)**2 * probabilities_sums)

# Since the throws are independent, the covariance between them is 0
covariance = 0

mean_sum_two_throws, variance_sum_two_throws, covariance



(5.0, 2.5, 0)

In [9]:
# Run this cell to submit your answer
utils.exercise_4()

FloatText(value=0.0, description='Mean:')

FloatText(value=0.0, description='Variance:')

FloatText(value=0.0, description='Covariance:')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 5:


Now suppose you are have a loaded 4-sided dice (it is loaded so that it lands twice as often on side 2 compared to the other sides): 


<img src="./images/4_side_uf.png" style="height: 300px;"/>

You are throwing it two times and recording the sum of each throw. Which of the following `probability mass functions` will be the one you should get?

<table><tr>
<td> <img src="./images/hist_sum_4_4l.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_4_3l.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_4_uf.png" style="height: 300px;"/> </td>
</tr></table>

Hints: 
- You can use the `p` parameter of [np.random.choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) to simulate a loaded dice.

In [10]:
# You can use this cell for your calculations (not graded)

# Define the probabilities for the loaded die
# Side 2 has double the probability compared to the other sides
loaded_probabilities = [1, 2, 1, 1]
loaded_probabilities = np.array(loaded_probabilities) / np.sum(loaded_probabilities)

# Simulate rolling two loaded dice
num_rolls = 1000000
first_throw = np.random.choice([1, 2, 3, 4], size=num_rolls, p=loaded_probabilities)
second_throw = np.random.choice([1, 2, 3, 4], size=num_rolls, p=loaded_probabilities)
sum_of_throws = first_throw + second_throw

# Calculate the frequency of each sum to estimate the PMF
values, counts = np.unique(sum_of_throws, return_counts=True)
probabilities = counts / num_rolls

# Prepare the data in the form of a dictionary for easier understanding
pmf_loaded_dice = dict(zip(values, probabilities))

pmf_loaded_dice



{2: 0.040101,
 3: 0.160229,
 4: 0.240456,
 5: 0.23963,
 6: 0.199373,
 7: 0.08004,
 8: 0.040171}

In [11]:
# Run this cell to submit your answer
utils.exercise_5()

ToggleButtons(description='Your answer:', options=('left', 'center', 'right'), value='left')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 6:

You have a 6-sided dice that is loaded so that it lands twice as often on side 3 compared to the other sides:

<img src="./images/loaded_6_side.png" style="height: 300px;"/>

You record the sum of throwing it twice. What is the highest value (of the sum) that will yield a cumulative probability lower or equal to 0.5?

<img src="./images/loaded_6_cdf.png" style="height: 300px;"/>

Hints:
- The probability of side 3 is equal to $\frac{2}{7}$

In [12]:
# You can use this cell for your calculations (not graded)

# Define the probabilities for the loaded die, with side 3 having double the probability
loaded_probabilities_6_sided = [1, 1, 2, 1, 1, 1]
loaded_probabilities_6_sided = np.array(loaded_probabilities_6_sided) / np.sum(loaded_probabilities_6_sided)

# Calculate the PMF for the sum of two throws
sums = np.arange(2, 13)  # Possible sums of two dice
pmf_sums = np.zeros_like(sums, dtype=np.float)

# Calculate PMF by considering all possible combinations of dice throws
for i in range(1, 7):
    for j in range(1, 7):
        pmf_sums[i + j - 2] += loaded_probabilities_6_sided[i - 1] * loaded_probabilities_6_sided[j - 1]

# Calculate the CDF from the PMF
cdf_sums = np.cumsum(pmf_sums)

# Find the highest sum with a CDF of 0.5 or less
highest_sum = sums[cdf_sums <= 0.5][-1] if any(cdf_sums <= 0.5) else None

highest_sum, cdf_sums

(6,
 array([0.02040816, 0.06122449, 0.16326531, 0.28571429, 0.44897959,
        0.6122449 , 0.75510204, 0.87755102, 0.93877551, 0.97959184,
        1.        ]))

In [13]:
# Run this cell to submit your answer
utils.exercise_6()

IntSlider(value=2, continuous_update=False, description='Sum:', max=12, min=2)

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 7:

Given a 6-sided fair dice you try a new game. You only throw the dice a second time if the result of the first throw is **lower** or equal to 3. Which of the following `probability mass functions` will be the one you should get given this new constraint?

<table><tr>
<td> <img src="./images/6_sided_cond_green.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_blue.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_red.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_brown.png" style="height: 250px;"/> </td>

</tr></table>

Hints:
- You can simulate the second throws as a numpy array and then make the values that met a certain criteria equal to 0 by using [np.where](https://numpy.org/doc/stable/reference/generated/numpy.where.html)

In [14]:
# You can use this cell for your calculations (not graded)

num_rolls = 1000000
first_throw = np.random.choice([1, 2, 3, 4, 5, 6], size=num_rolls)

# Simulate the second throw with the condition
# If the first throw is greater than 3, the second throw is effectively 0
second_throw = np.where(first_throw <= 3, np.random.choice([1, 2, 3, 4, 5, 6], size=num_rolls), 0)

# Calculate the sum of throws
sum_of_throws = first_throw + second_throw

# Calculate the PMF
values, counts = np.unique(sum_of_throws, return_counts=True)
probabilities = counts / num_rolls

# Prepare the data in the form of a dictionary for easier understanding
pmf_new_game = dict(zip(values, probabilities))
pmf_new_game

{2: 0.02791,
 3: 0.055495,
 4: 0.249521,
 5: 0.250545,
 6: 0.250128,
 7: 0.083203,
 8: 0.055463,
 9: 0.027735}

In [15]:
# Run this cell to submit your answer
utils.exercise_7()

ToggleButtons(description='Your answer:', options=('left-most', 'left-center', 'right-center', 'right-most'), …

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 8:

Given the same scenario as in the previous exercise but with the twist that you only throw the dice a second time if the result of the first throw is **greater** or equal to 3. Which of the following `probability mass functions` will be the one you should get given this new constraint?

<table><tr>
<td> <img src="./images/6_sided_cond_green2.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_blue2.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_red2.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_brown2.png" style="height: 250px;"/> </td>

</tr></table>


In [16]:
# You can use this cell for your calculations (not graded)

num_rolls = 1000000
first_throw = np.random.choice([1, 2, 3, 4, 5, 6], size=num_rolls)

# Simulate the second throw with the new condition
# If the first throw is less than 3, the second throw is effectively 0
second_throw = np.where(first_throw >= 3, np.random.choice([1, 2, 3, 4, 5, 6], size=num_rolls), 0)

# Calculate the sum of throws
sum_of_throws = first_throw + second_throw

# Calculate the PMF
values, counts = np.unique(sum_of_throws, return_counts=True)
probabilities = counts / num_rolls

# Prepare the data in the form of a dictionary for easier understanding
pmf_new_constraint = dict(zip(values, probabilities))

pmf_new_constraint

{1: 0.167217,
 2: 0.165981,
 4: 0.02787,
 5: 0.055229,
 6: 0.083411,
 7: 0.11093,
 8: 0.111465,
 9: 0.110886,
 10: 0.083241,
 11: 0.055716,
 12: 0.028054}

In [17]:
# Run this cell to submit your answer
utils.exercise_8()

ToggleButtons(description='Your answer:', options=('left-most', 'left-center', 'right-center', 'right-most'), …

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 9:

Given a n-sided fair dice. You throw it twice and record the sum. How does increasing the number of sides `n` of the dice impact the mean and variance of the sum and the covariance of the joint distribution?

In [18]:
# You can use this cell for your calculations (not graded)



In [19]:
# Run this cell to submit your answer
utils.exercise_9()

As the number of sides in the die increases:


ToggleButtons(description='The mean of the sum:', options=('stays the same', 'increases', 'decreases'), value=…

ToggleButtons(description='The variance of the sum:', options=('stays the same', 'increases', 'decreases'), va…

ToggleButtons(description='The covariance of the joint distribution:', options=('stays the same', 'increases',…

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 10:

Given a 6-sided loaded dice. You throw it twice and record the sum. Which of the following statemets is true?

In [20]:
# You can use this cell for your calculations (not graded)



In [21]:
# Run this cell to submit your answer
utils.exercise_10()

RadioButtons(layout=Layout(width='max-content'), options=('the mean and variance is the same regardless of whi…

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 11:

Given a n-sided dice (could be fair or not). You throw it twice and record the sum (there is no dependance between the throws). If you are only given the histogram of the sums can you use it to know which are the probabilities of the dice landing on each side?

In other words, if you are provided with only the histogram of the sums like this one:
<td> <img src="./images/hist_sum_6_side.png" style="height: 300px;"/> </td>

Could you use it to know the probabilities of the dice landing on each side? Which will be equivalent to finding this histogram:
<img src="./images/fair_dice.png" style="height: 300px;"/>


In [22]:
# You can use this cell for your calculations (not graded)



In [23]:
# Run this cell to submit your answer
utils.exercise_11()

RadioButtons(layout=Layout(width='max-content'), options=('yes, but only if one of the sides is loaded', 'no, …

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Before Submitting Your Assignment

Run the next cell to check that you have answered all of the exercises

In [24]:
utils.check_submissions()

All answers saved, you can submit the assignment for grading!


**Congratulations on finishing this assignment!**

During this assignment you tested your knowledge on probability distributions, descriptive statistics and visual interpretation of these concepts. You had the choice to compute everything analytically or create simulations to assist you get the right answer. You probably also realized that some exercises could be answered without any computations just by looking at certain hidden queues that the visualizations revealed.

**Keep up the good work!**
