# Tutorial Assignment

- You can add new cells if you need (with the "+" button above); but, deleting cells could very likely cause your notebook to fail MarkUs autotesting (and you'd have to start over and re-enter your answers into a completely fresh version of the notebook to get things to work again...)

> Your mark will be based on the TA checking ***MarkUs*** autotests and then manually reviewing your answers to `Q1` and `Q8`

> - The following questions "automatically fail" during automated testing so that MarkUs exposes example answers for student review and consideration for these problems.  These "failed MarkUs tests" are not counted against the student: `...`



## Fisher's Tea Experiment

A most beloved piece of [statistical lore](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2012.00620.x) about about the (most famous) statistician Ronald Fisher involves cups of tea with milk. Fisher and his friend and colleague, Dr. Muriel Bristol, worked at Cambridge in the 1920s and regularly had tea together. During one of their afternoon tea times, Bristol refused a cup of tea from Fisher because he put milk in first BEFORE pouring in the tea. Bristol said she could taste the difference, and much preferred the taste of tea when the milk was poured in afterward the tea. Fisher didn't think that there could be a difference and proposed a hypothesis test to examine the situation.

Fisher made 8 cups of tea, 4 with milk added in first and 4 with tea added in first, and gave them to Dr. Bristol without her seeing how they were made and she would say if she thought the tea or the milk was poured first.  As it turned out, Bristol correctly identified if the tea or milk was poured first for all 8 of the cups. Fisher, being a skeptical statistician wanted to test if this could be happening by chance with Bristol just randomly guessing (or whether there was evidence against an assumption of Bristol just randomly guessing).

Suppose you run an experiment like this with students in STA130. You get a random sample of 80 STA130 students to each taste one cup of tea and tell you whether they think the milk or tea was poured first. Suppose 49 students are able to correctly state which was poured first. Provide a statistical analysis of this experiment as guided through the following set of questions

### Q0: Indicate what the population, parameter $p$, sample, and test statistic $\hat p$ under consideration here are; and, indicate what the observed value of the test statistic is.

> - Hint: putting a "hat" on a parameter symbol, such as $\hat p$ is the preferred statistical notation to refer to test statistic that could estimate a parameter $p$; and, assigning a value such as $\hat p=0.33$ indicates the observed value of the test statistic for the sample in question.

- Compare your response to the answer given in the ***MarkUs*** output

> Answer here...

- Population:
- Parameter $p$:
- Sample:
- Test Statistic $\hat p$:
- Observed Test Statistic $\hat p = \cdots$: 

### Q1: Ignoring the question of sample size, what is the difference between the experiment with STA130 students and the original context of the experiment with Fisher and Bristol?

> - Hint 1: the parameter is more personalized in the original experiment; whereas, the parameter in the context of STA130 students is a more abstract
> - Hint 2: this could be discussed from the perspective of differences in what the population is; or, what the sample is comprised of; or, what the parameter means, etc.

- This question will be manually graded by TAs. They are looking for a well-written and reasonably sensible answer as opposed to a specific answer.

> Answer here...



### Q2: For the experiment with STA130 students, state a formal *null hypotheses* $H_0$ in terms of parameter $p$, give a written statement specifying the claim of the *null hypothesis* in informal casual everyday common language, and provide an *alternative hypothesis* $H_1$ in terms of $H_0$.

> - Hint: Since the population, parameter, sample, test statistic, and observed test statistic are clear from the previous questions, don't worry about specifying those explicitly again here; however, it's not bad practice to include these details in conjunction with a *null hypothesis* statements so the experiment and context are always clear.

- Compare your response to the answer given in the ***MarkUs*** output

> Answer here...



### Q3: Examine the analysis below and describe what each single dot in the plot represents.

- Compare your response to the answer given in the ***MarkUs*** output

> Answer here...



In [5]:
import numpy as np
import pandas as pd
import plotly.express as px

np.random.seed(201)
reps = 100
num_observations = 80
truth = np.array(["Milk first", "Tea first"]*int(num_observations/2))
observed_test_stat = 49/num_observations
print(observed_test_stat)
sim_results = np.array([0.]*reps)

for i in range(reps):
    sim = np.random.choice(["Milk first", "Tea first"], size=num_observations)
    sim_results[i] = (sim == truth).sum() / num_observations
    
# Code below to output a dotplot
encountered_so_far = {}
sim_results_height = []
for entry in sim_results:
    if entry not in encountered_so_far:
        encountered_so_far[entry] = 1
    else:
        encountered_so_far[entry] += 1
    sim_results_height.append(encountered_so_far[entry])

dataframe = pd.DataFrame({'Proportion Milk First': sim_results, 'Number': sim_results_height, 'size': 1})
fig = px.scatter(dataframe, x='Proportion Milk First', y='Number', size='size', size_max= 15)
fig.add_vline(x=observed_test_stat, line_dash="dash", line_color="black",
              annotation_text="observed_test_stat (0.6125)")
fig.add_vline(x=1-observed_test_stat, line_dash="dash", line_color="black",
              annotation_text="(1-0.6125)")
fig.update_layout(xaxis_title="Proportion choosing milk first assuming no ability to distinguish (i.e., just guessing)")
fig.update_yaxes(visible=False)
fig.show()

0.6125


### Q4: Describe what the distribution given by all the dots in the plot above represents.

> - Hint: you should be discussing the "simulation" of a "sampling distribution" for a specific "test statistic" under the assumptions of a *null hypothesis*...

- Compare your response to the answer given in the ***MarkUs*** output

> Answer here...



### Q5: Report the simulated "two-sided" p-value found in the analysis above

> - Hint 1: A "two-sided" p-value includes simulated test statistics that are "as or more extreme" (further away) than the observed test statistic relative to the parameter value in the *null hypothesis* in both a "larger" or "smaller" sense.
> - Hint 2: These simulated p-values are not necessarily particularly precise since they are based on a quite small (and hence a quite roughly approximated) simulation of the samplilng distribution; nonetheless, treat the observed p-value resolution as sufficiently refined and accurate for the purposes of the question

In [42]:
# Your answer will be autotested with MarkUs
Q5 = None # Your answer should be a ratio of integers
# E.g., Q5 = 1/100

### Q6: Report the simulated "one-sided" p-value found in the analysis above

> - Hint 1: A "one-sided" p-value only includes simulated tests statistics that are "as or more extreme" than the observed test statistic isto the parameter value in the *null hypothesis* in the same direction as the observed test statistic 
> - Hint 2: These simulated p-values are not necessarily particularly precise since they are based on a quite small (and hence a quite roughly approximated) simulation of the samplilng distribution; nonetheless, treat the observed p-value resolution as sufficiently refined and accurate for the purposes of the question

In [49]:
# Your answer will be autotested with MarkUs
Q6 = None # Your answer should be a ratio of integers
# E.g., Q6 = 1/100

<p align="center">
  <img src="https://www.jcpcarchives.org/userfiles/values-of-p-Inference.jpg" />
</p>

### Q7: Using the table above, state the strength of evidence against the *null hypothesis* for both the "one-sided" and "two-sided" p-values calcuated above

- Compare your response to the answer given in the ***MarkUs*** output

> Answer here...

- For the null hypothesis that... and a "one-sided" p-value of... simulated based on... we have XYZ evidence against...
- For the null hypothesis that... and a "two-sided" p-value of... simulated based on... we have XYZ evidence against...


### Q8: Explain the difference between $p$, $\hat p$, and p-value

- This question will be manually graded by TAs.

> Answer here...



### Q9: change `reps` value in the code for Q3 to `1000`, `10000`, and `100000` and compare the results to the following two figures

- Compare your response to the answer given in the ***MarkUs*** output

#### Comment on the emerging simiarities between the figure in Q3 to the two figures below

> Answer here...



In [6]:
from scipy import stats; x=np.arange(0,80); prob=stats.binom(n=80, p=0.5).pmf(x)
fig = px.bar(pd.DataFrame({'x':x/80,'probability':prob}), x='x', y='probability',
             title='Theoretically Exact (Binomial) Sampling Distribution of p-hat assuming "H0: Random Guessing"')
fig.add_vline(x=observed_test_stat, line_dash="dash", line_color="black",
              annotation_text="observed_test_stat (0.6125)")

In [7]:
x = np.linspace(0,1,200)
observed_sample = np.array([0]*31+[1]*49)
n = len(observed_sample)
dens = stats.t(loc=0.5, df=n-1, 
               scale=np.std(observed_sample, ddof=1)/n**0.5)
# Another possible approximation could be based on `dens=stats.norm(loc=40, scale=0.5*80**0.5).pdf(x)`
fig = px.line(pd.DataFrame({'x':x, 'density':dens.pdf(x)}), x='x', y='density',
             title='A Continuous Approximation to the Theoretical (Binomial) Sampling Distribution (under H0) above')
fig.add_vline(x=observed_test_stat, line_dash="dash", line_color="black",
              annotation_text="observed_test_stat (0.6125)")

> The Binomial sampling distrubition of $\hat p$ given above is theoretically derived (and indeed exactly correct) based on only\* assuming 
> 
> $$H_0: p=0.5 \text{ (Random Guessing on $n=80$ attempts)}$$
> 
> > \**and, actually, that the guessed answers are independent of each other; so, answers don't change based on previous answers or affect future answers...*
> 
> Thus, while this entails an assumption of the parameter $p=0.5$, there is actually no distributional assumptions made about the data itself (and the independence assumption, noted above, is also not a distributional assumption). Because there are no distributional assumptions made about the data, a p-value calculated based on the theoretical binomial distribution is a (theoretical) **nonparametric** p-value (even despite assumption on the parameter $p=0.5$).  
> 
> The continuous approximation to the theoretical Binomial sampling distribution (under $H_0$) given above actually does entail an assumption about the distribution of the data; namely, that each individual observation is an independent sample from *the same normally distributed population*. The "less accurate" this assumption is, the worse the approximation will be.  Regardless, based on this assumption, the calculations used above are executed and produce the continuous approximation seen above.  And since there are indeed distributional assumptions made about the data in this case, a p-value calculated based on the continuous approximation of the theoretical binomial distribution is a (approximate, theoretical) **parametric** p-value.  Of course, if the assumption that each data point was an independent sample from the same normally distributed population were true, then this wouldn't be an approximate p-value at all: it would be a theoretical **parameteric** p-value. 
> 
> The theoretical nature of the two approaches distinguish them from the simulation-based p-value initially calculated using the dot plot. That p-value was a simulation (sampling) based approximation of the true (theoretical Binomial) sampling distribution, so it is a (simulated approximate) **nonparametric** p-value.  Thus, we've seen the following: the binomial distribution based p-value is a theoretically exact **nonparametric** p-value, for which we have a (theoretical) **parametric** and simulation (**nonparametric**) based approximations.

### Q10: Based on the p-values below, does the normality assumption of the t-test appear reasonable?

- Compare your response to the answer given in the ***MarkUs*** output

> Answer here...



In [18]:
# Theoretically Exact (Binomial) Sampling Distribution "two sided" p-value
(1-stats.binom(n=80, p=0.5).cdf(49-1))*2
# This calculates "as or more extreme" as the sum of all the probabilities (bin heights) that are located at 
# 49/80, 50/80, 51/80, ..., up to 80/80 (and then multiplies this sum by two since this distribution is symmetric)

0.05666442634512103

In [19]:
# Area under the curve from 49/80 to 80/80 for the
# continuous approximation to the binomial distirbution
(1-dens.cdf(49/80))*2
# This calculates "as or more extreme as" the area under the curve 
# from 49/80 to 80/80 (and multiply this sum by two since this distribution is symmetric)

0.043435270855124886

In [20]:
# which is what the t-test computes on the basis of the 
# continuous approximation to the binomial distirbution
stats.ttest_1samp(observed_sample, .5) 

TtestResult(statistic=2.0524716550217077, pvalue=0.04343527085512493, df=79)

In [21]:
# though we might alternatively consider the following "continuity correction" 
# that's more analagous to the way `49-1` is used for the theoretical p-value
(1-dens.cdf((49-0.5)/80))*2

0.05614082274072851

### Q11: How would you make your simulated p-value more accurate to the true theoretical p-value given by the binomial distribution and what is the downside of doing so? 

> - Hint: The simulated sampling distribution examined in `Q3` and `Q4` is a simulation of the true theoretical binomial distribution in `Q9`.  This simulated sampling distribution would "converge" to the true theoretical binomial distribution if we kept running the simulation and adding simulated test statistics to the dot plot by continuing to increase the number of `reps` used. The subsequent approximation in `Q9` is not quite right because it has continuous rather than discrete values; so, e.g., it could produce a proportion correct of 48.5/80 (which is not actually possible in the real experiment); but, the continuous approximation is still a pretty good approximation as judged by the similarity in the p-value calculation and the general shape of the two distributions shown in `Q9` (with the main difference of course just being that one is continuous and other other is discrete). 

- Compare your response to the answer given in the ***MarkUs*** output

> Answer here...



### Q12: Comment on the appropriateness of the normality assumption of the t-test again based on the figures below.

- Compare your response to the answer given in the ***MarkUs*** output

> Answer here...


In [None]:
observed_sample = np.array([0]*31+[1]*49)
observed_sample

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [None]:
# If the assumption of the null hypothesis that random guessing $p=0.5$ is correct, then 
# then each of the 80 observed answers above is a sample from the following distribution
x=[0,1]; prob=[0.5,0.5]
px.bar(pd.DataFrame({'x':x,'probability':prob}), x='x', y='probability',
       title="Distribution of Wrong/Right (0/1) Under 'H0: Random Guessing'")

In [None]:
# The mean of the "single random guess" distribution above is 0.5
# Here's an overlayed normal approximation of this distribution 
import plotly.graph_objects as go
fig = px.bar(pd.DataFrame({'x':x,'PMF/PDF':prob}), x='x', y='PMF/PDF',
       title="Normal Distribution Approximation")
x_ = np.linspace(-1.5,1.5,200)
fig.add_trace(go.Scatter(x=x_, y=stats.norm(0.5,0.5).pdf(x_), mode='lines', name='Normal Density'))
# So rather than getting 0/1 (incorrect/correct) data points 
# a t-test is assumes the data points are numeric values sampled from the overlayed normal distribution

> The analyses in this assignment are based on constructing (through simuilation approximation, theoretical derivation, or continuous approximation) the sampling distribution of the sample proportion (average) under the assumption of the null hypothesis that $p=0.5$. The sampling distribution of a sample average (which a sample proportion is...) can also be (differently) theoretically derived with an additional assumption that each data point is an independent sample from a normal population. Indeed, and this was how the continuous approximation to the Binomial sampling distribution was derived; and, this kind of analysis is called a t-test. The t-test is a parametric test since it requires an assumption of data normality; and, it's a theoretical test because once this assumption is made, the sampling distribution of the sample average is theoretically derived.
> 
> The p-value for the random guessing binomial distribution is theoretically derived based only the on the assumption of "random guessing"; but, this is not a **parametric** analysis because the fact that it is uses a binomial distribution is not an assumption but rather that is derived only from the assumption of "random guessing". The statement of "random guessing" is enshrined in the assumption that the parameter $p=0.5$; however, this assumption alone is still viewed as **nonparametric** since it does not entail any specifications of the distribution of the data, which is what the term **parametric** refers to (even though it seems like it should be referring to parameters...).
>
> In summary, the p-value based on the binomial distribution is a theoretical **nonparametric** (truly correct) p-value; the original p-value from `Q3/Q4` is a simulated version of this **nonparametric p-value**; the t-test below is a theoretical **parametric** p-value (whose accuracy depends on the accuracy of the normality assumption entailed in the t-test).