# Question 1

What is the key factor that makes the difference between ideas that can, and cannot be examined and tested statistically? 
The ability to measure or observe outcomes in an unbiased, repeatable way.

What would you describe is the key "criteria" defining what a good null hypothesis is?
The null hypothesis needs to be the nonspecial case of what ever you are analysizing.

And what is the difference between a null hypothesis and an alternative hypothesis in the context of hypothesis testing? 
The null hypothesis needs to be the nonspecial case of what ever you are analysizing, whereas the alternative hypothesis is the outcome the statistician would like to get information about.


# Question 2

Statistical tests are done to get information about the population (called population parameters), like the population mean, denoted $\mu$. The first step in this process is to gather a sample, which is a set of values, denoted $x_i$ such that $1\le i\le n$ where n is the number of data points. Then, we use our values to calculate statistics like the sample mean, denoted $\bar{x}$. If we aim to setup up a hypothesis test, it is also important to consider null parameters like $\mu_0$ (the theoretical mean if the null hypothesis is true), to ground our tests with some certainty (discussed in detail in Q3). After constructing these statisticals tests, we can run the calculations and use the results to draw conclusions about characteristics of the population.

# Question 3

Statistics tests are grounded in comparisons to a set of standardized distributions. Assuming the null hypothesis is true allows us to construct a specific model that we can use to perform statisticals tests. For example, the normal distribution requires a center parameter and the null hypothesis can provide this center very concretely. Or, if we are contructing a confidence interval, we need a concrete value, like 0, to see check if it is in the interval. In conclusion, without assuming the null hypothesis is true, there would be too much uncertainty to start our statisticals tests.

# Question 4

The definition of a p-value is the probability of getting a specific sample statistic (value) given that the null hypothesis is true. As the p-value decreases, the probability decreases and it becomes more and more unlikely (more "ridiculous" if you will) that a sample would result in the sample statistic (value). Essentially, at low enough p-values, it becomes absurd to think that the data collected would be collected from a population where the null hypothesis is true.

# Question 5

We can use probability distribution function of a binomial distribution with n = 124 and p = .5 and then plug in our value of 80 to get a p-value. Here is the code (created myself):

In [14]:
import scipy

p_value = 1 - scipy.stats.binom.cdf(79,124,.5)
print("The p-value is " + str(p_value))

The p-value is 0.0007823670130848726


The p-value is lower than all generally excepted alpha levels, so there is strong evidence against the null hypothesis. In other words, we have sufficient evidence to reject the null hypothesis.

# Question 6

P-values can only be used to reject the null hypothesis, never to confirm the alternate hypothesis. In this way, no matter how low the p-value is, we cannot prove Fido innocent because there is always a chance he is guilty. Likewise, no matter how high the p-value is, we cannot prove Fido guilty because there is always a chance he is innocent. P-values resulting from a statistical test will never be 0 or 1, but if it hypothetically did, it would be enough to determine the innocence of Fido.

# Question 7

I took the code from Demo II of Tutorial 5 and realized it is rather simple to switch it to a one-sided test. It only the matter of getting rid of the absolute value signs. More specifically, all you have to do is change the criteria from the 
$ abs(sample statistics - population parameter) >= abs(observed statistic - population parameter) $ 
to 
$ sample statistics - population parameter >= observed statistic - population parameter $. 

This will change the counts of bootstrapped sample to only those more extreme in the positive direction, or in other words, a one-sided test. Our hypotheses change from $H_0 : S_f - S_I = 0, H_A : S_f - S_I \neq 0$ to $N_0 : S_f - S_I = 0, N_A : S_f - S_I > 0$. Originally the p-value was approximately .068, but now it is .0565, so the p-value decreased. This makes sense, because we are restricting our criteria for the counts as mentioned above. 

This is the finalized code (edited by me without the help of CHATGPT):

In [9]:
import pandas as pd
import numpy as np

patient_data = pd.DataFrame({
    "PatientID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "Age": [45, 34, 29, 52, 37, 41, 33, 48, 26, 39],
    "Gender": ["M", "F", "M", "F", "M", "F", "M", "F", "M", "F"],
    "InitialHealthScore": [84, 78, 83, 81, 81, 80, 79, 85, 76, 83],
    "FinalHealthScore": [86, 86, 80, 86, 84, 86, 86, 82, 83, 84]
})

patient_data['HealthScoreChange'] = patient_data.FinalHealthScore-patient_data.InitialHealthScore

#print(pd.DataFrame({'HealthScoreChange': patient_data['HealthScoreChange'],
                    #'> 0 ?': patient_data['HealthScoreChange']>0}))

random_difference_sign = np.random.choice([-1, 1], size=len(patient_data))
pd.DataFrame({'HealthScoreChange': random_difference_sign*patient_data['HealthScoreChange'].abs(),
              '> 0 ?': (random_difference_sign*patient_data['HealthScoreChange'])>0})
# And then can you see what's happening here???

np.random.seed(1)  # make simulation reproducible
number_of_simulations = 10000  # experiment with this... what does this do?
n_size = len(patient_data)  # 10
IncreaseProportionSimulations_underH0random = np.zeros(number_of_simulations)

# generate "random improvement" proportions assuming H0 (vaccine has no average effect) is true 
# meaning that the "before and after" differences are positive or negative at "random"
for i in range(number_of_simulations):
    
    # why is this equivalent to the suggested idea above?
    random_improvement = np.random.choice([0,1], size=len(patient_data), replace=True)  # <<< `replace=True` ^^^

    # why is .mean() a proportion? 
    IncreaseProportionSimulations_underH0random[i] = random_improvement.mean()
    # why is this the statistic we're interested in? Hint: next section...

    population_parameter_value_under_H0 = 0.5

observed_statistic = (patient_data.HealthScoreChange>0).mean()
simulated_statistics = IncreaseProportionSimulations_underH0random

SimStats_as_or_more_extreme_than_ObsStat = \
    simulated_statistics - population_parameter_value_under_H0 >= \
    observed_statistic - population_parameter_value_under_H0 
    
print('''Which simulated statistics are "as or more extreme"
than the observed statistic? (of ''', observed_statistic, ')', sep="")

pd.DataFrame({'(Simulated) Statistic': simulated_statistics,
              '>= '+str(observed_statistic)+" ?": ['>= '+str(observed_statistic)+" ?"]*number_of_simulations, 
              '"as or more extreme"?': SimStats_as_or_more_extreme_than_ObsStat})

# Calculate the p-value
# How many bootstrapped statistics generated under H0 
# are "as or more extreme" than the observed statistic 
# (relative to the hypothesized population parameter)? 

observed_statistic = (patient_data.HealthScoreChange>0).mean()
simulated_statistics = IncreaseProportionSimulations_underH0random

# Be careful with "as or more extreme" as it's symmetric!
SimStats_as_or_more_extreme_than_ObsStat = \
    simulated_statistics - population_parameter_value_under_H0 >= \
    observed_statistic - population_parameter_value_under_H0
    
p_value = (SimStats_as_or_more_extreme_than_ObsStat).sum() / number_of_simulations
print("Number of Simulations: ", number_of_simulations, "\n\n",
      "Number of simulated statistics (under HO)\n",
      'that are "as or more extreme" than the observed statistic: ',
      SimStats_as_or_more_extreme_than_ObsStat.sum(), "\n\n",
      'p-value\n(= simulations "as or more extreme" / total simulations): ', p_value, sep="")

Which simulated statistics are "as or more extreme"
than the observed statistic? (of 0.8)
Number of Simulations: 10000

Number of simulated statistics (under HO)
that are "as or more extreme" than the observed statistic: 565

p-value
(= simulations "as or more extreme" / total simulations): 0.0565


# Question 8 

Our experiment is similar to the Fisher's original, but we are targeting a different population and asking each participant only once. In this experiment, we are trying to draw conclusions about whether STA130 students (the population) can tell the difference between the teas. Unlike the original, we are interested in a group of people, not just one person, so sampling is recommended. In this case, we selected 80 students (presumably randomly) to act as our sample and recorded their answers.

Statistic: the proportion of students sampled who correctly answer what came first (defined $\hat{p}$).

Parameter: the proportion of all STA130 students who would correctly answer what came first (defined p).

$H_0 :$ students cannot tell the difference $(\mu = p = .5)$

$H_A :$ The null hypothesis is false $(\mu \neq .5)$

Calculation setup:
Because n is high enough, we can generalize to the normal model, with mean .5 from null hypothesis and standard deviation of $\sqrt{\frac{p*q}{n}} \approx 0.056 $. Then, we will calculate a z-score, using $ (\hat{p} - p) / S.D =  .1125 / 0.056 \approx 2 $. Then we can use the cdf function of the normal distribution to calculate a p-value.


In [1]:
import scipy

#given variables
p=.5
q=1-p
n=80
p_hat=49/80
#Standard Deviation Calculation
s_d=((p*q)/n)**.5
#Z score Calculation (we want the negative one)
z_s=(p-p_hat)/s_d
#Note because we are doing a two-sided test, we multiply by two
p_value = (scipy.stats.norm.cdf(z_s))*2
print("The p-value is " + str(p_value))

The p-value is 0.0441713449084425


Because the p-value is between 0 and .05, we have moderate evidence against the null hypothesis. In other words, there is sufficient evidence to suggest that the students of STA130 can tell in which order the tea was mixed.

# Question 9

Yes