# STA130 Homework 05 

# 1. 

The difference between ideas that can or cannot be tested statistically lies in whether they are quantifiable and measurable. A testable idea must be something that can be expressed through data and evaluated using statistical methods. A good null hypothesis should be specific, testable, and falsifiable. It must clearly state that there is no effect or no difference, which can then be either rejected or not rejected based on the data. The null hypothesis (H₀) proposes that there is no effect or no relationship between variables, serving as the default or baseline assumption. The alternative hypothesis (H₁) suggests that there is an effect or a relationship, challenging the null hypothesis. It's what we aim to support if the null is rejected based on the evidence.

# 2. 

The hypothesis test uses the sample data using the individual data (𝑥𝑖) and the sample mean (𝑥¯) to make conclusions about the population mean (𝜇). Even though the test is based on the sample, the outcome is a statement about the population — whether the population mean is likely to be 𝜇₀ (the value we hypothesized) or not. The population mean is what we are trying to figure out based on how close the sample mean is to the hypothesized mean of the population. With the sample mean calculations, we are seeing how likely it is that the population mean is actually the hypothesized population mean. 



# 3. 
The p-value measures how likely our sample data would be if the null hypothesis were correct.  The null hypothesis assumes there’s no effect or no difference between the two variables, sample mean and the hypothesized population mean. When we collect sample data, we are testing whether the sample result fits with this assumption. To do this, we ask: If the null hypothesis were true, how unusual or extreme is the sample result we got?

# 4. 
The p-value tells us the probability of getting a test statistic as extreme or more extreme than the one we observed, assuming the null hypothesis is true. The test statistic measures how far the sample mean is from the hypothesized mean in terms of standard deviations. When we assume the null hypothesis is true, we can use the sampling distribution of the test statistic to see what values we would expect to see. This distribution shows how sample means would be spread out if we took many samples from the population. The distance from the hypothesized mean gives a raw measurement of how far apart the two means are, while the p-value quantifies how unlikely that distance (or an even greater distance) is, given the variability in the data and the sample size. 
<br><br>
A small p-value means that the test statistic we got is far out in the left end of the tail of the sampling distribution, meaning it's rare or unlikely under the null hypothesis. This means that the null hypothesis is ridiculous since the probability of getting this test statistic as extreme as the one we observed is very low. A small p-value suggests that the sample mean is far enough from the hypothesized mean that it’s unlikely to have occurred by random chance under the null hypothesis, so the null hypothesis may be rejected.

# 5. 
Null Hypothesis: There is no head tilt preference when kissing (50% tilt right, 50% tilt left).
<br><br>
Alternative Hypothesis: There is a head tilt preference when kissing (not 50%/50%).
<br>

Simulate the Coin Flips:

- Simulate the situation under the null hypothesis (50% chance for left or right tilt) using many trials to create a distribution of the number of couples tilting to the right.

Calculate the Test Statistic:

- The test statistic in this case will be the observed proportion of couples tilting to the right compared to the expected proportion under the null hypothesis.

Calculate the P-Value:

- The p-value will be calculated as the proportion of simulated results that are as extreme as or more extreme than the observed result.

In [7]:
import numpy as np
import plotly.figure_factory as ff
import plotly.graph_objects as go

# Parameters
population_parameter_value_under_H0 = 0.5  # Null hypothesis value (50/50 coin flip)
observed_statistic = 80 / 124  # Proportion of couples tilting right

print('The null hypothesis (H0) assumes no head tilt preference (p = 0.5)')
print('Observed statistic (tilting right proportion):', observed_statistic)

np.random.seed(1)  # Set seed for reproducibility
number_of_simulations = 10000  # Total simulations
n_size = 124  # Total number of couples

# Simulations of head tilt proportions assuming H0 is true
simulated_proportions = np.zeros(number_of_simulations)

for i in range(number_of_simulations):
    # Simulate 124 couples, 0 for left tilt, 1 for right tilt
    simulated_tilts = np.random.choice([0, 1], size=n_size, replace=True)  
    simulated_proportions[i] = simulated_tilts.mean()  # Calculate the mean (proportion)

# Calculate p-value: proportion of simulated results as extreme or more extreme than the observed statistic
p_value = np.mean(np.abs(simulated_proportions - population_parameter_value_under_H0) >= 
                  np.abs(observed_statistic - population_parameter_value_under_H0))

print('P-value:', p_value)

# Plotting the results
hist_data = [IncreaseProportionSimulations_underH0random+np.random.uniform(-0.05,0.05,size=len(IncreaseProportionSimulations_underH0random))]
group_labels = ['Bootstrap<br>Sampling<br>Distribution<br>of the<br>Sample<br>Mean<br><br>assuming<br>that the<br>H0 null<br>hypothesis<br>IS TRUE']
fig = ff.create_distplot(hist_data, group_labels, curve_type='normal',
                         show_hist=True, show_rug=False, bin_size=0.1)
pv_y = 2.5
pv_y_ = .25
fig.add_shape(type="line", x0=observed_statistic, y0=0, 
              x1=observed_statistic, y1=pv_y,
              line=dict(color="Green", width=4), name="Observed Statistic")
fig.add_trace(go.Scatter(x=[observed_statistic], y=[pv_y+pv_y_], 
                         text=["Observed<br>Statistic<br>^"], mode="text", showlegend=False))
# "as or more extreme" also include the "symmetric" observed statistic...
symmetric_statistic = population_parameter_value_under_H0 -\
                      abs(observed_statistic-population_parameter_value_under_H0)
fig.add_shape(type="line", x0=symmetric_statistic, y0=0, 
              x1=symmetric_statistic, y1=pv_y,
              line=dict(color="Green", width=4), name="Observed Statistic")
fig.add_trace(go.Scatter(x=[symmetric_statistic], y=[pv_y+pv_y_], 
                         text=['"Symmetric" Observed Statistic<br>addrdssing for "as or more extreme"<br>^'], mode="text", showlegend=False))

# Add a transparent rectangle for the lower extreme region
fig.add_shape(type="rect", x0=-0.25, y0=0, x1=symmetric_statistic, y1=pv_y,
              fillcolor="LightCoral", opacity=0.5, line_width=0)
# Add a transparent rectangle for the upper extreme region
fig.add_shape(type="rect", x0=observed_statistic, y0=0, x1=1.25, y1=pv_y,
              fillcolor="LightCoral", opacity=0.5, line_width=0)

# Update layout
fig.update_layout(
    title="Bootstrapped Sampling Distribution<br>under H0 with p-value regions",
    xaxis_title="Mean Health Score Change", yaxis_title="Density", yaxis=dict(range=[0, pv_y+2*pv_y_]))
fig.show() # USE `fig.show(renderer="png")` FOR ALL GitHub and MarkUs SUBMISSIONS

The null hypothesis (H0) assumes no head tilt preference (p = 0.5)
Observed statistic (tilting right proportion): 0.6451612903225806
P-value: 0.0015


The p-value is 0.0015. This is less than the pre-set significance level (alpha) of 0.05. 0.0015<0.05. Therefore, we can reject the null hypothesis and conclude that there is a preference to tilt heads when kissing.

# 6. 
A smaller p-value cannot definitively prove that the null hypothesis is false. Instead, it provides evidence against the null hypothesis. A small p-value suggests that the observed results are unlikely under the assumption of the null hypothesis, leading us to consider rejecting it. However, it does not provide absolute proof. To say Fido is innocent or guilty based solely on a p-value is not possible. While a very low p-value might suggest evidence against Fido's innocence, it does not guarantee that he is guilty. 

# 7.
A one-sided test is used when we want to determine if a parameter (like a mean or proportion) is either greater than or less than a certain value, but not both.
A two-sided test is used when we want to determine if a parameter is simply different from a specific value, without specifying a direction. In a one-sided test, the p-value is calculated for one tail of the distribution. In a two-sided test, the p-value is calculated for both tails.

In [12]:
import numpy as np
import pandas as pd
# Read the data
patient_data = pd.DataFrame({
    "PatientID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "Age": [45, 34, 29, 52, 37, 41, 33, 48, 26, 39],
    "Sex": ["M", "F", "M", "F", "M", "F", "M", "F", "M", "F"],
    "InitialHealthScore": [84, 78, 83, 81, 81, 80, 79, 85, 76, 83],
    "FinalHealthScore": [86, 86, 80, 86, 84, 86, 86, 82, 83, 84]
})
patient_data

patient_data['HealthScoreChange'] = patient_data.FinalHealthScore-patient_data.InitialHealthScore


# Set the null hypothesis population parameter
population_parameter_value_under_H0 = 0.5  # Hypothesized mean (no effect)

# Calculate the observed test statistic (mean proportion of positive changes)
observed_test_statistic = (patient_data['HealthScoreChange'] > 0).mean()

# Number of simulations
number_of_simulations = 10000  
n_size = len(patient_data)  

# Array to hold simulated statistics
IncreaseProportionSimulations_underH0random = np.zeros(number_of_simulations)

# Generate "random improvement" proportions assuming H0 (no average effect)
for i in range(number_of_simulations):
    # Randomly assign improvements (0 or 1) assuming the null hypothesis is true
    random_improvement = np.random.choice([0, 1], size=n_size, replace=True)  
    IncreaseProportionSimulations_underH0random[i] = random_improvement.mean()

# Simulated test statistics
simulated_test_statistics = IncreaseProportionSimulations_underH0random

# For one-tailed test, check if simulated statistics are greater than or equal to the observed statistic
SimTestStats_as_extreme_than_ObsTestStat = simulated_test_statistics >= observed_test_statistic

# Calculate the one-tailed p-value
p_value_one_tailed = (SimTestStats_as_extreme_than_ObsTestStat).sum() / number_of_simulations

# Output the results
print("Number of Simulations: ", number_of_simulations)
print("Observed Test Statistic: ", observed_test_statistic)
print("Number of simulated test statistics (under H0)\n",
      'that are "as or more extreme" than the observed test statistic: ',
      SimTestStats_as_extreme_than_ObsTestStat.sum())
print('One-tailed p-value\n(= simulations "as or more extreme" / total simulations): ', p_value_one_tailed)

# Interpretation
if p_value_one_tailed < 0.05:  # Adjust the alpha level as needed
    print("We reject the null hypothesis. There is evidence to suggest that the vaccine increases health scores.")
else:
    print("We fail to reject the null hypothesis. There is no sufficient evidence to suggest that the vaccine increases health scores.")

# Create a histogram for the simulation results
hist_data = [IncreaseProportionSimulations_underH0random+np.random.uniform(-0.05,0.05,size=len(IncreaseProportionSimulations_underH0random))]
group_labels = ['Bootstrap<br>Sampling<br>Distribution<br>of the<br>Sample<br>Mean<br><br>assuming<br>that the<br>H0 null<br>hypothesis<br>IS TRUE']
fig = ff.create_distplot(hist_data, group_labels, curve_type='normal',
                         show_hist=True, show_rug=False, bin_size=0.1)
pv_y = 2.5
pv_y_ = .25
fig.add_shape(type="line", x0=observed_test_statistic, y0=0, 
              x1=observed_test_statistic, y1=pv_y,
              line=dict(color="Green", width=4), name="Observed Test Statistic")
fig.add_trace(go.Scatter(x=[observed_test_statistic], y=[pv_y+pv_y_], 
                         text=["Observed<br>Test Statistic<br>^"], mode="text", showlegend=False))
# "as or more extreme" also include the "symmetric" observed test statistic...
symmetric_test_statistic = population_parameter_value_under_H0 -\
                           abs(observed_test_statistic-population_parameter_value_under_H0)
fig.add_shape(type="line", x0=symmetric_test_statistic, y0=0, 
              x1=symmetric_test_statistic, y1=pv_y,
              line=dict(color="Green", width=4), name="Observed Test Statistic")
fig.add_trace(go.Scatter(x=[symmetric_test_statistic], y=[pv_y+pv_y_], 
                         text=['"Symmetric" Observed Test Statistic<br>addrdssing for "as or more extreme"<br>^'], mode="text", showlegend=False))

# Add a transparent rectangle for the lower extreme region
fig.add_shape(type="rect", x0=-0.25, y0=0, x1=symmetric_test_statistic, y1=pv_y,
              fillcolor="LightCoral", opacity=0.5, line_width=0)
# Add a transparent rectangle for the upper extreme region
fig.add_shape(type="rect", x0=observed_test_statistic, y0=0, x1=1.25, y1=pv_y,
              fillcolor="LightCoral", opacity=0.5, line_width=0)

# Update layout
fig.update_layout(
    title="Bootstrapped Sampling Distribution<br>under H0 with p-value regions",
    xaxis_title="Mean Health Score Change", yaxis_title="Density", yaxis=dict(range=[0, pv_y+2*pv_y_]))
fig.show() # USE `fig.show(renderer="png")` FOR ALL GitHub and MarkUs SUBMISSIONS

Number of Simulations:  10000
Observed Test Statistic:  0.8
Number of simulated test statistics (under H0)
 that are "as or more extreme" than the observed test statistic:  544
One-tailed p-value
(= simulations "as or more extreme" / total simulations):  0.0544
We fail to reject the null hypothesis. There is no sufficient evidence to suggest that the vaccine increases health scores.


The red regions together are the p value. p = 0.0544. This is larger than the pre-set significance level (alpha) of 0.05. 0.05<0.0544. So you can't reject the null because the p value is greater than it.

# 8.
**Problem Introduction**
<br>
In this analysis, we aim to investigate whether STA130 students can correctly identify the order in which milk and tea are poured, similar to the original experiment conducted by Ronald Fisher and Dr. Muriel Bristol in the 1920s. We will assess the results of an experiment involving 80 students, of whom 49 correctly identified the pouring order.
<br><br>
The original experiment by Fisher involved only 8 cups of tea, where Bristol demonstrated a perfect identification rate. In our experiment, the sample size is significantly larger, with 80 students, which enhances the robustness of our findings. Unlike Bristol's personal preference, the students’ ability to identify the order may be influenced by various factors, such as prior experiences or knowledge of tea preparation, thus making it a more abstract parameter.
<br><br>
Null Hypothesis: The proportion of students who can correctly identify the pouring order is equal to 0.5 (random guessing).
<br>
Alternative Hypothesis: The proportion of students who can correctly identify the pouring order is not equal to 0.5 (indicating some influence other than random guessing).
<br><br>

**Quantitative Analysis**
<br>
The observed proportion of students who correctly identified the pouring order is 49/80 = 0.6125 or 61.25%.

In [18]:
import numpy as np
import plotly.figure_factory as ff
import plotly.graph_objects as go

# Parameters
population_parameter_value_under_H0 = 0.5  # Null hypothesis value (50/50 coin flip)
observed_statistic = 49 / 80  # Proportion of students guessing correctly

print('The null hypothesis (H0) assumes there is no difference in the ability of students to identify the order of pouring milk and tea. (p = 0.5)')
print('The observed statistic is the proportion of students correctly identifying the pouring order: 49/80', observed_statistic)

np.random.seed(1)  # Set seed for reproducibility
number_of_simulations = 10000  # Total simulations
n_size = 80  # Total number of students

# Simulations of correct guess proportions assuming H0 is true
simulated_proportions = np.zeros(number_of_simulations)

for i in range(number_of_simulations):
    # Simulate 80 students, 0 for incorrect guess, 1 for correct guess
    simulated_tilts = np.random.choice([0, 1], size=n_size, replace=True)  
    simulated_proportions[i] = simulated_tilts.mean()  # Calculate the mean (proportion)

# Calculate p-value: proportion of simulated results as extreme or more extreme than the observed statistic
p_value = np.mean(np.abs(simulated_proportions - population_parameter_value_under_H0) >= 
                  np.abs(observed_statistic - population_parameter_value_under_H0))

print('P-value:', p_value)

# Plotting the results
hist_data = [IncreaseProportionSimulations_underH0random+np.random.uniform(-0.05,0.05,size=len(IncreaseProportionSimulations_underH0random))]
group_labels = ['Bootstrap<br>Sampling<br>Distribution<br>of the<br>Sample<br>Mean<br><br>assuming<br>that the<br>H0 null<br>hypothesis<br>IS TRUE']
fig = ff.create_distplot(hist_data, group_labels, curve_type='normal',
                         show_hist=True, show_rug=False, bin_size=0.1)
pv_y = 2.5
pv_y_ = .25
fig.add_shape(type="line", x0=observed_statistic, y0=0, 
              x1=observed_statistic, y1=pv_y,
              line=dict(color="Green", width=4), name="Observed Statistic")
fig.add_trace(go.Scatter(x=[observed_statistic], y=[pv_y+pv_y_], 
                         text=["Observed<br>Statistic<br>^"], mode="text", showlegend=False))
# "as or more extreme" also include the "symmetric" observed statistic...
symmetric_statistic = population_parameter_value_under_H0 -\
                      abs(observed_statistic-population_parameter_value_under_H0)
fig.add_shape(type="line", x0=symmetric_statistic, y0=0, 
              x1=symmetric_statistic, y1=pv_y,
              line=dict(color="Green", width=4), name="Observed Statistic")
fig.add_trace(go.Scatter(x=[symmetric_statistic], y=[pv_y+pv_y_], 
                         text=['"Symmetric" Observed Statistic<br>addrdssing for "as or more extreme"<br>^'], mode="text", showlegend=False))

# Add a transparent rectangle for the lower extreme region
fig.add_shape(type="rect", x0=-0.25, y0=0, x1=symmetric_statistic, y1=pv_y,
              fillcolor="LightCoral", opacity=0.5, line_width=0)
# Add a transparent rectangle for the upper extreme region
fig.add_shape(type="rect", x0=observed_statistic, y0=0, x1=1.25, y1=pv_y,
              fillcolor="LightCoral", opacity=0.5, line_width=0)

# Update layout
fig.update_layout(
    title="Bootstrapped Sampling Distribution<br>under H0 with p-value regions",
    xaxis_title="Mean Health Score Change", yaxis_title="Density", yaxis=dict(range=[0, pv_y+2*pv_y_]))
fig.show() # USE `fig.show(renderer="png")` FOR ALL GitHub and MarkUs SUBMISSIONS

The null hypothesis (H0) assumes there is no difference in the ability of students to identify the order of pouring milk and tea. (p = 0.5)
The observed statistic is the proportion of students correctly identifying the pouring order: 49/80 0.6125
P-value: 0.0439


**Findings and Discussion**
<br>The p-value is 0.0439. This is less than the pre-set significance level (alpha) of 0.05. 0.0439<0.05. Therefore, we can reject the null hypothesis. So, we can conclude that there is statistically significant evidence against the null hypothesis, suggesting that students may have a genuine ability to discern the pouring order beyond random guessing.
<br>
The ability of STA130 students to identify the pouring order is significantly different from what would be expected by chance. This would need further investigation into the factors influencing their guesses and how they are able to taste the difference between the pouring order.

# 9.
Yes.

# Chatbot Summaries

Here's a brief summary of our session:

1. **P-value Simulation**: We discussed simulating a p-value to test the null hypothesis (\(H_0\): \(p = 0.5\)) for a study on head tilting while kissing.

2. **Hypothesis Statements**: You clarified your null hypothesis and alternative hypothesis regarding guessing the order of milk and tea.

3. **Understanding P-values and Alpha**: We distinguished between p-values (probability of observing data under \(H_0\)) and alpha (\(\alpha\): threshold for significance, typically 0.05). You learned that p-values should be compared to \(\alpha\), not directly to the null hypothesis value.

4. **Fisher's Tea Experiment**: We connected Fisher's experiment to your project involving STA130 students and their guesses about pouring tea and milk.

5. **Conclusion**: You confirmed your understanding of the observed statistic and the role of p-values in hypothesis testing.

Link: https://chatgpt.com/share/670e9617-21f8-8007-b349-029dc84dd31d