# Notebook for Topic 1 - Statictics

<hr style="border-top: 1px solid #001a79;" />

## Exercise 1

### Calculate the minimum number of cups of tea required to ensure the probability of randomly selecting the correct cups is less than or equal to 1%
Assuming the half of the cups are tea first and other half is milk first

In [1]:
import itertools
import random
import seaborn as sns
import numpy as np
import math

***
#### Calculate probability of selecting all 4 cups correctly by chance calculated from fractional probabilities in each step

Probability of selecting all 4 cups out of 8 correctly:

| Step | Number cups in the lot matching criteria | Total number of cups available | Probability |
| --- | --- | --- |---|
|Select 1st milk first cup | 4 | 8 | 4/8|
|Select 2nd milk first cup  | 3 | 7 | 3/7|
|Select 3rd milk first cup  | 2 | 6 | 2/6|
|Select 4th milk first cup  | 1 | 5 | 1/4|

These are independent events, so the final proability of selecting all 4 cups correctly can be calculated by multiplying probabilities frm each steps:

$$P_{all} = \frac{4}{8}*\frac{3}{7}*\frac{2}{6}*\frac{1}{5}$$

$$P_{all} = \frac{4*3*2*1}{8*7*6*5}$$

$$P_{all} = \frac{24}{1680} = \frac{1}{70} = 0.014 = 1.4\%$$

The probability of selecting all 4 cups correctly out of 8 by a chance only is 1 in 70, close to 1.4%

#### Calculate probability of selecting all 4 cups correctly by chance using combinatorics

__[From Wikipedia](https://en.wikipedia.org/wiki/Combination)__

How many ways can we select 4 cups out of 8?

To answer this questions, first we need to calculate a number of unique outcomes of this experiment, such that the order of selection doesn't matter (combinations). The total number of combinations is given by the equation:

$$C^n_k = \frac{n!}{k!*(n-k)!}$$

where<br>
$n$ - size of the set,<br>
$k$ - size of the subset

In our example of selecting 4 cups out of 8:

$$C^8_4 = \frac{8!}{4!*4!}=\frac{1*2*3*4*5*6*7*8}{1*2*3*4*1*2*3*4}=\frac{5*6*7*8}{1*2*3*4}=70$$

There is 70 different ways of selecting 4 cups out of lot of 8. Since there is only one selection that is correct, the probability of selecting the 4 correct cups out of 8 can be calculated as a reciprocal of the total number of combinations: $P_{all}=\frac{1}{70}$

The probability of selecting all 4 cups correctly out of 8 by a chance only is 1 in 70, this result matches the result calculated using fractional probabilities above.

***
#### Define functions used for calculating number of Combinations and Factorial

In [2]:
# defining a custom function to calculate factorials
def factorial(number, result=1):
    if number==0:
        return result
    else:
        return factorial(number-1, result*number)

In [3]:
# Defining a custom function to calculate the number of Combinations
def combinations(n, k):
    return factorial(n)/(factorial(k)*factorial(n-k))

In [4]:
combinations(8,4)

70.0

#### Solve the qustion using combinatorics

To calculate the probability of any event, we need to know the total number of possible configurations.

In [5]:
# Initialize variables
prob=1
# number of 'milk first cups'
k=1
# Set the minimum probability treshold
min_prob = 0.01

# Execute this loop until the calculated probability drops below the min_prob treshold
while prob>min_prob:
    
    # total number of cups in the Experiment
    n=2*k

    # calculate number of possible combinations
    Cnk = combinations(n, k)
    
    # calculate the probability
    prob = 1./Cnk
    
    # print the calculated values
    print("Number of possible ways to select {} cups from total of {} is {:.0f}".format(k, n, Cnk ))
    print("Only 1 out of {:.0f} possibilities is the correct one, so the probability of corectly selecting it by chance is {:.2f}%".format(Cnk, prob*100 ) )
    print()
    
    # increase the number of 'milk first cups'
    k=k+1

print("")    
print("The minimum number of cups of tea required to ensure the probability of randomly selecting the correct cups is less than or equal to 1% is {}".format(n))

Number of possible ways to select 1 cups from total of 2 is 2
Only 1 out of 2 possibilities is the correct one, so the probability of corectly selecting it by chance is 50.00%

Number of possible ways to select 2 cups from total of 4 is 6
Only 1 out of 6 possibilities is the correct one, so the probability of corectly selecting it by chance is 16.67%

Number of possible ways to select 3 cups from total of 6 is 20
Only 1 out of 20 possibilities is the correct one, so the probability of corectly selecting it by chance is 5.00%

Number of possible ways to select 4 cups from total of 8 is 70
Only 1 out of 70 possibilities is the correct one, so the probability of corectly selecting it by chance is 1.43%

Number of possible ways to select 5 cups from total of 10 is 252
Only 1 out of 252 possibilities is the correct one, so the probability of corectly selecting it by chance is 0.40%


The minimum number of cups of tea required to ensure the probability of randomly selecting the correct cups 

<div class="alert alert-block alert-success">
<b>Answer:</b> 10 is the minimum number of cups of tea required to ensure the probability of randomly selecting the correct cups is less than or equal to 1%


</div>
<hr style="border-top: 1px solid #001a79;" />

<br>

### *Bonus:* How many would be required if you were to let the taster get one cup wrong while maintaining the 1% threshold?

***
#### Calculate probability of selecting at least 3 cups correctly by chance calculated from fractional probabilities in each step

##### Probability of selecting first cup incorrectly and then all 3 following cups correctly:

| Step | Number cups in the lot matching criteria | Total number of cups available | Probability |
| --- | --- | --- |---|
|Select 1st tea first cup | 4 | 8 | 4/8|
|Select 1st milk first cup  | 4 | 7 | 4/7|
|Select 2nd milk first cup  | 3 | 6 | 3/6|
|Select 3rd milk first cup  | 2 | 5 | 2/4|

These are independent events, so the final proability of selecting all 4 cups correctly can be calculated by multiplying probabilities form each steps:

$$P_{3} = \frac{4}{8}*\frac{4}{7}*\frac{3}{6}*\frac{2}{5} = \frac{4*4*3*2}{8*7*6*5}$$


$$P_{3} = \frac{96}{1680} = \frac{4}{70}$$



##### Probability of selecting one incorrect cup at any step:

Probability of selecting one incorrect cup at any step would be a sum of probabilities of selecting incorrect cup only at first step (calculated above), only at second step, only at third step and only at fourts step. Because these events are qually likely to occur, the final probability of selecting one cup incorrectly would be:

$$P_{3} = \frac{96}{1680} * 4 = \frac{16}{70}$$



##### Probability of selecting at least 3 cups correctly:

Scenarios in which we would allow the subject selecting at least 3 cups correctly (or at most 1 cup incorrectly) include both possibilities of selecting exactly 3 cups correctly and all 4 cups correctly. Therefore, the ptobability of slecting at least 3 cups correctly would be a sum of both probabilities:

$$P_{at\_least\_3} = P_{3} + P_{all} = \frac{16}{70} + \frac{1}{70} = \frac{17}{70} = 24.29\%$$

Probability of selecting at least 3 cups correctly (allowing the tester to get one cup wrong) is 17/70, little over 24% 

***
#### Calculate probability of selecting at least 3 cups correctly by chance using combinatorics

Number of possible outcomes is still the same ($C^8_4 = 70$) but the number of events matching description (at least 3 cups 'tea first' selected correctly) will be greater:
* There are 4 chances (4 cup selections) that the cup cuan be sellected incorrectly and there are 4 'milk first' cups to select from. Therefore, there are 4*4=16 total possible outcomes where there are exactly 3 cups correctly selected.
* There is still only one possible configuration where all the cups are sleected correctly
* Total number of outcomes matching desription is 16+1=17
* Probability of selecting at least 3 cups correctly out of total 8 cups is $$P_{at\_least\_3} = \frac{16 + 1}{70}  = \frac{17}{70} = 24.29\%$$

***
#### Confirm above results by explicitly listing all possible combinations

In [6]:
# define and list placeholders for 8 cups
cups = list(range(8))
cups

[0, 1, 2, 3, 4, 5, 6, 7]

In [7]:
# define and list all posible cobinations of choosing 4 cups out of 8
poss = list(itertools.combinations(cups, 4))
poss

[(0, 1, 2, 3),
 (0, 1, 2, 4),
 (0, 1, 2, 5),
 (0, 1, 2, 6),
 (0, 1, 2, 7),
 (0, 1, 3, 4),
 (0, 1, 3, 5),
 (0, 1, 3, 6),
 (0, 1, 3, 7),
 (0, 1, 4, 5),
 (0, 1, 4, 6),
 (0, 1, 4, 7),
 (0, 1, 5, 6),
 (0, 1, 5, 7),
 (0, 1, 6, 7),
 (0, 2, 3, 4),
 (0, 2, 3, 5),
 (0, 2, 3, 6),
 (0, 2, 3, 7),
 (0, 2, 4, 5),
 (0, 2, 4, 6),
 (0, 2, 4, 7),
 (0, 2, 5, 6),
 (0, 2, 5, 7),
 (0, 2, 6, 7),
 (0, 3, 4, 5),
 (0, 3, 4, 6),
 (0, 3, 4, 7),
 (0, 3, 5, 6),
 (0, 3, 5, 7),
 (0, 3, 6, 7),
 (0, 4, 5, 6),
 (0, 4, 5, 7),
 (0, 4, 6, 7),
 (0, 5, 6, 7),
 (1, 2, 3, 4),
 (1, 2, 3, 5),
 (1, 2, 3, 6),
 (1, 2, 3, 7),
 (1, 2, 4, 5),
 (1, 2, 4, 6),
 (1, 2, 4, 7),
 (1, 2, 5, 6),
 (1, 2, 5, 7),
 (1, 2, 6, 7),
 (1, 3, 4, 5),
 (1, 3, 4, 6),
 (1, 3, 4, 7),
 (1, 3, 5, 6),
 (1, 3, 5, 7),
 (1, 3, 6, 7),
 (1, 4, 5, 6),
 (1, 4, 5, 7),
 (1, 4, 6, 7),
 (1, 5, 6, 7),
 (2, 3, 4, 5),
 (2, 3, 4, 6),
 (2, 3, 4, 7),
 (2, 3, 5, 6),
 (2, 3, 5, 7),
 (2, 3, 6, 7),
 (2, 4, 5, 6),
 (2, 4, 5, 7),
 (2, 4, 6, 7),
 (2, 5, 6, 7),
 (3, 4, 5, 6),
 (3, 4, 5,

In [8]:
# Define randomly which cups will have milk added firts
milkfirst = set(random.choice(poss))
print("These cups had milk added before the tea: {}".format(milkfirst))

These cups had milk added before the tea: {0, 3, 5, 6}


<br>

##### List explicitly all the possible cups selections where there are at least 3 cups selected correctly:

In [9]:
at_least_3 = []
for c in itertools.combinations(cups, 4):
    correct = milkfirst & set(c)
    if (len(correct)>=3):
        at_least_3.append(correct)
        print(correct)

print()
print("There are {} possible cup selections where the subject selected at least 3 out of 4 cups correctly".format(len(at_least_3)))

{0, 3, 5}
{0, 3, 6}
{0, 5, 6}
{0, 3, 5}
{0, 3, 6}
{0, 5, 6}
{0, 3, 5}
{0, 3, 6}
{0, 3, 5, 6}
{0, 3, 5}
{0, 3, 6}
{0, 5, 6}
{0, 5, 6}
{3, 5, 6}
{3, 5, 6}
{3, 5, 6}
{3, 5, 6}

There are 17 possible cup selections where the subject selected at least 3 out of 4 cups correctly


The above results confirms the calculations made using both the factional probabilities and the combinatorics

***
#### Solve the bonus question using combinatorics

In [10]:
# Initialize variables
prob=1
# number of 'milk first cups'
k=1
# Set the minimum probability treshold
min_prob = 0.01

# Execute this loop until the calculated probability drops below the min_prob treshold
while prob>min_prob:
    
    # total number of cups in the Experiment
    n=2*k

    # calculate number of possible combinations
    Cnk = combinations(n, k)
    
    # calculate the probability
    prob = (1 + k*k)/Cnk
    
    # print the calculated values
    print("The total number of possible ways to select {} cups from total of {} is {:.0f}".format(k, n, Cnk ))
    print("There are {} outcomes matching description (out of {:.0f} total possibilities), so the probability of corectly selecting at least {:.0f} cups by chance is {:.2f}%".format((1+k*k), Cnk, (k-1), prob*100 ) )
    print()
    
    # increase the number of 'milk first cups'
    k=k+1
    
print("")    
print("{} is the minimum number of cups of tea required to maintain the 1% probability treshold while allowing the tester one mistake".format(n))

The total number of possible ways to select 1 cups from total of 2 is 2
There are 2 outcomes matching description (out of 2 total possibilities), so the probability of corectly selecting at least 0 cups by chance is 100.00%

The total number of possible ways to select 2 cups from total of 4 is 6
There are 5 outcomes matching description (out of 6 total possibilities), so the probability of corectly selecting at least 1 cups by chance is 83.33%

The total number of possible ways to select 3 cups from total of 6 is 20
There are 10 outcomes matching description (out of 20 total possibilities), so the probability of corectly selecting at least 2 cups by chance is 50.00%

The total number of possible ways to select 4 cups from total of 8 is 70
There are 17 outcomes matching description (out of 70 total possibilities), so the probability of corectly selecting at least 3 cups by chance is 24.29%

The total number of possible ways to select 5 cups from total of 10 is 252
There are 26 outcomes 

<div class="alert alert-block alert-success">
<b>Answer:</b> 16 is the minimum number of cups of tea required to maintain the 1% probability treshold while allowing the tester one mistake


</div>
<hr style="border-top: 1px solid #001a79;" />

<hr style="border-top: 1px solid #001a79;" />

# Exercise 2

### Use scipy's version of Fisher's exact test to simulate the Lady Tasting Tea problem.

In [11]:
from scipy.stats import fisher_exact

First, let's define the contingency table for the Lady Tasting Tea problem. The rows will represent whether or not the lady correctly identified the milk being poured first and the rows will represent the actual :


|               | Milk poured first | Tea poured first |
|---------------|-------------------|------------------------|
| Lady guessed correctly | 10 | 2 |
| Lady guessed incorrectly | 4 | 8 |

In this example, the contingency table has 2 rows and 2 columns. The first row represents the 10 cups of tea that the lady correctly identified as having the milk poured first, and the second row represents the 4 cups of tea that the lady incorrectly identified as having the milk poured first. The first column represents the cups of tea that the lady correctly identified as having the milk poured first, and the second column represents the cups of tea that the lady incorrectly identified as having the milk poured first.

In [25]:
contingency_table = np.array([[10, 2], [4, 8]])
oddsr, p = fisher_exact(contingency_table, alternative='greater')

# Print calculated p-value
print("{0:.4f}".format(p))

0.0180


<hr style="border-top: 1px solid #001a79;" />

# Exercise 3

### Take the code from the Examples section of the scipy stats documentation for independent samples t-tests, add it to your own notebook and add explain how it works using MarkDown cells and code comments. Improve it in any way you think it could be improved.

In [37]:
import numpy as np
from scipy import stats

# Define the acceptance treshold. The null hipotesis will be rejected for p_values smaller than the treshold
treshold = 0.05

np.random.seed(398317)
# Generate two sets of random data with different means: one 90 and the other 100
data1 = np.random.normal(100, 10, 200)
data2 = np.random.normal(90, 10, 200)

# Perform the independent samples t-test to test the null hypothesis that the means of the two samples are equal
t_statistic, p_value = stats.ttest_ind(data1, data2)

# Print the t-statistic and p-value
print("t-statistic: {0:.4}".format(t_statistic))
print("p-value: {0:.4}".format(p_value))

# Interpret the results
if p_value < treshold:
  print("\n The null hypothesis can be rejected, and it can be concluded that the means of the two samples are significantly different.")
else:
  print("\n The null hypothesis cannot be rejected, and it cannot be concluded that the means of the two samples are significantly different.")


t-statistic: 10.48
p-value: 7.204e-23

 The null hypothesis can be rejected, and it can be concluded that the means of the two samples are significantly different.
