# Applied Statistics Winter 2024 Tasks

**by Nur Bujang**

tasks.ipynb
***

# Task 1: Permutations and Combinations

Suppose we alter the Lady Tasting Tea experiment to involve twelve cups of tea.
Six have the milk in first and the other six having tea in first.
A person claims they have the special power of being able to tell whether the tea or the milk went into a cup first upon tasting it.

You agree to accept their claim if they can tell which of the six cups in your experiment had the **milk in first**.

Calculate, using Python, the probability that they select the **correct six cups**.
Here you should assume that they have no special powers in figuring it out, that they are just guessing.
Remember to show and justify your workings in code and MarkDown cells.

Suppose, now, you are willing to accept one error.
Once they select the six cups they think had the milk in first, you will give them the benefit of the doubt should they have selected at least five of the correct cups.
Calculate the probability, assuming they have no special powers, that the person **makes at most one error**.

Would you accept two errors? Explain.

## Plan

- Set null hypothesis and alternative hypothesis
- import libraries
- instantiate variables - totalcups, milkfirst, teafirst, allways
- Get all 6 correct - 1/all possibilities
    -math.comb - no repetition, no order
    - sixcorr, probsixcorr
- Get 5/6 (1 error) and 6/6 correct
    - fivecorronein, fivesixcorr, probfivesixcorr
- Get 4/6 correct (2 errors)
    - fourcorrtwoin, fourfivesixcorr, probfourfivesixcorr

## Methods and Implementation

Null Hypothesis: The lady cannot tell the difference between tea first or milk first.

Alternative Hypothesis: The lady can tell the difference between tea first or milk first.

In [1]:
# import library
import math

# instantiate variables
totalcups = 12
milkfirst = 6
teafirst = 6
sixcorr = math.comb(6,6)
sixcorr

1

I use math.comb <a href="https://docs.python.org/3/library/math.html#math.comb">(Python Software Foundation, 2024)</a> to pick k from n without repetition nor order, as shown also by <a href="https://github.com/ianmcloughlin/2425_applied_statistics">McLoughlin (2024)</a>.

In [2]:
allways = math.comb(totalcups, milkfirst) # where k are picked from n
allways

# there are 924 possibilities

924

In [3]:
# The probability that she randomly selects the correct 6 cups
probsixcorr = sixcorr / allways
probsixcorr

0.0010822510822510823

For guessing 6/6 cups correctly, the probability is 0.001 or 0.1%. 

In [4]:
# Ways that she randomly selects 5 correct cups, 1 error
fivecorronein = math.comb(6, 5) * math.comb(6, 1)
fivecorronein

36

In [5]:
# Ways that she makes 1 error + no error
fivesixcorr = fivecorronein + sixcorr
fivesixcorr


37

In [6]:
# The probability that she makes 1 error + no error
probfivesixcorr = fivesixcorr / allways
probfivesixcorr 

0.04004329004329004

For making at most 1 error, the probability is 0.04. Assuming the significance level <a href="https://en.wikipedia.org/wiki/Statistical_significance">(Wikipedia Contributors, 2019)</a>, alpha=0.05, p < alpha, so we can reject the null hypothesis.

In [7]:
# The probability that she randomly selects 4 correct cups, 2 error
fourcorrtwoin = math.comb(6, 4) * math.comb(6, 2)
fourfivesixcorr = fourcorrtwoin + fivesixcorr

# The probability that she makes 2 error + 1 error + no error
probfourfivesixcorr = fourfivesixcorr / allways
probfourfivesixcorr

0.28354978354978355

For at most 2 errors, the probability is 0.24. Assuming the significance level is 0.05, we cannot reject the null hypothesis. 

## Conclusion

For guessing 6 milk first cups correctly, the probability is 0.001. For making at most 1 error, the probability is 0.04. At  significance level alpha=0.05, p < alpha, so we can reject the null hypothesis if the lady guesses at least 5 correct milk first cups.

If the lady makes 2 errors, the probability is 0.24. Because p > alpha, we cannot reject the null hypothesis. Therefore, we cannot accept the claim that she has special powers to differentiate between the milk first and tea first cups.

## References

McLloughlin, I. (2024). GitHub - ianmcloughlin/2425_applied_statistics. [online] GitHub. Available at: https://github.com/ianmcloughlin/2425_applied_statistics [Accessed 21 Oct. 2024].

Python Software Foundation (2024). math — Mathematical functions. [online] Python documentation. Available at: https://docs.python.org/3/library/math.html#math.comb [Accessed 21 Oct. 2024].

Wikipedia Contributors (2019). Statistical significance. [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/Statistical_significance [Accessed 21 Oct. 2024].

***

# Task 2: numpy's Normal Distribution

In this task you will assess whether `numpy.random.normal()` properly generates normal values.
To begin, generate a sample of one hundred thousand values using the function with mean `10.0` and standard deviation `3.0`.

Use the `scipy.stats.shapiro()` function to test whether your sample came from a normal distribution.
Explain the results and output.

Plot a histogram of your values and plot the corresponding normal distribution probability density function on top of it.

## Plan

- import libraries - numpy, scipy.stats shapiro, normal, matplotlib
- instantiate variables - samplesize, mean, stddev, randsample
- generate 100000 values with numpy.random.normal <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html">(NumPy Developer, n.d.)</a>
- test for normality with scipy.stats.shapiro <a href="https://scipy.github.io/devdocs/reference/generated/scipy.stats.shapiro.html>">(The SciPy community, 2024)</a>
- plot a histogram with matplotlib <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html">(Hunter et al., n.d.)</a>
    - on top, plot probability density function  - numpy.linspace <a href="https://numpy.org/doc/stable/reference/generated/numpy.linspace.html">(NumPy Developers, n.d.)</a> as also shown by <a href="https://github.com/ianmcloughlin/2425_applied_statistics">McLoughlin (2024)</a>

## Methods and Implementation



In [11]:
# import libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# instantiate variables
samplesize = 100000
mean = 10.0
stddev = 3.0

# generate 100000 values with numpy.random.normal
randsample = np.random.normal(loc=10.0, scale=3.0, size=100000) # where loc = mean, scale=standard deviation, size=samplesize
randsample # must be an array of sample data

array([13.69058431,  8.61683123, 10.96930386, ...,  9.61572866,
        7.50236049,  8.71457443])

In [12]:
# test for normality with Shapiro-Wilk test (The SciPy community, 2024)
shapiro_test = stats.shapiro(randsample)
shapiro_test, shapiro_test.statistic, shapiro_test.pvalue

# warning: p-value may not be accurate for N > 5000.




(ShapiroResult(statistic=0.9999794960021973, pvalue=0.9044300317764282),
 0.9999794960021973,
 0.9044300317764282)

## Conclusion



## References

Reference listMcLloughlin, I. (2024). GitHub - ianmcloughlin/2425_applied_statistics. [online] GitHub. Available at: https://github.com/ianmcloughlin/2425_applied_statistics [Accessed 22 Oct. 2024].

NumPy Developer (n.d.). numpy.random.normal — NumPy v1.21 Manual. [online] numpy.org. Available at: https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html [Accessed 22 Oct. 2024].

NumPy Developers (n.d.). numpy.linspace — NumPy v1.23 Manual. [online] numpy.org. Available at: https://numpy.org/doc/stable/reference/generated/numpy.linspace.html [Accessed 22 Oct. 2024].

The SciPy community (2024). shapiro — SciPy v1.15.0.dev Manual. [online] Github.io. Available at: https://scipy.github.io/devdocs/reference/generated/scipy.stats.shapiro.html [Accessed 22 Oct. 2024].


***

# Task 3: t-Test Calculation

Consider the following dataset containing resting heart rates for patients before and after embarking on a two-week exercise program.

| Patient ID |  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 |
|:-----------|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| Before     | 63 | 68 | 70 | 64 | 74 | 67 | 70 | 57 | 66 | 65 |
| After      | 64 | 64 | 68 | 64 | 73 | 70 | 72 | 54 | 61 | 63 |

Calculate the t-statistic based on this data set, using Python.
Compare it to the value given by `scipy.stats`.
Explain your work and list any sources used.

## Plan

- Set null hypothesis and alternative hypothesis
- import libraries - scipy.stats <a href=>"https://docs.scipy.org/doc/scipy/reference/stats.html">(The SciPy community, 2019)</a> as also shown by <a href="https://github.com/ianmcloughlin/2425_applied_statistics">McLoughlin (2024)</a>
- instantiate variables - before, after 
- calculate t-statistic with python (manually?)
- calculate t-statistic with scipy.stat
- compare 


## Methods and Implementation

## Conclusion



## References



***

# Task 4: ANOVA

In this test we will estimate the probability of committing a type II error in specific circumstances.
To begin, create a variable called `no_type_ii` and set it to `0`.

Now use a loop to perform the following test 10,000 times.

1. Use `numpy.random.normal` to generate three samples with 100 values each. Give each a standard deviation of `0.1`. Give the first sample a mean of `4.9`, the second a mean of `5.0`, and the third a mean of `5.1`. 

2. Perform one-way anova on the three samples and add `1` to `no_type_ii` whenever a type II error occurs.

Summarize and explain your results.

## Plan

- import libraries - numpy.random.normal <a href=>"https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html">(NumPy Developer, n.d.)</a> as also shown by <a href="https://github.com/ianmcloughlin/2425_applied_statistics">McLoughlin (2024)</a>
- instantiate variables - no_type_ii=0, numtest=10000, splsize=100, stddev=0.1, splone, pltwo, splthree, maybe put mean in a list means = [4.9, 5.0, 5.1], maybe splonemean, spltwomean, splthreemean
- for loop - <a href=>"https://stackoverflow.com/questions/71625642/python-range-and-for-loop-understanding">(Stack Exchange Inc, 2022)</a>, <a href=>"https://pynative.com/python-range-function/">(Hule, 2019)</a>
- one-way anova - <a href=>"https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html">(The SciPy community, 2014)</a> 

## Methods and Implementation

## Conclusion



## References



***

## End of tasks.ipynb