# What is the key factor that makes the difference between ideas that can, and cannot be examined and tested statistically? What would you describe is the key "criteria" defining what a good null hypothesis is? And what is the difference between a null hypothesis and an alternative hypothesis in the context of hypothesis testing?

The key factor differentiating testable ideas from non-testable ones in statistics is the ability to collect data that can be analyzed to provide evidence for or against an idea.
<p>
A good null hypothesis should be:
<p>
Testable: We need data that can potentially contradict it.<p>
Default/Uninteresting: It often represents the status quo or a lack of effect.<p>
A "Straw Man": something we might not actually believe, but it sets up a framework to potentially disprove and move towards a more interesting conclusion.<p>
The null hypothesis (H<sub>0</sub>) is our initial assumption, often representing "no effect" or the default belief. The alternative hypothesis (H<sub>A</sub>) simply states that the null hypothesis is false. It doesn't specify how it's false, just that it is.

# "It is important to note that outcomes of tests refer to the population parameter, rather than the sample statistic! As such, the result that we get is for the population." In terms of the distinctions between the concepts of [...] how would you describe what the sentence above means?

Observed Values (xi): These are the individual data points we collect in our sample. For example, if we're measuring the height of students, each student's height is an x<sub>i</sub>.
<p>
Sample Average (x̄): This is the average of all the observed values in our sample. It gives us an estimate of the average height of students in our sample.
<p>
Population Average (μ): This is the true average value we're interested in, but we usually can't measure it directly (e.g., the average height of all students, not just those in our sample).
Hypothesized Value (μ0): This is the specific value of the population average we're testing in our null hypothesis (H<sub>0</sub>). For example, we might hypothesize that the average height of all students (μ) is 5 feet 8 inches (μ<sub>0</sub>).
<p>
The sentence means that while we use data from our sample to perform a hypothesis test, the conclusions we draw apply to the population parameter (μ), not just the sample statistic (x̄).

# "Imagine a world where the null hypothesis is true" when calculating a p-value? Explain why this is.

We "imagine a world where the null hypothesis is true" when calculating a p-value because the p-value is specifically defined as the probability of observing our data (or more extreme data) if the null hypothesis were actually true.
<p>
Sampling Distribution Under H0: To calculate a p-value, we need to consider the sampling distribution of our test statistic under the null hypothesis. This distribution shows us the range of values we'd expect for our statistic due to random chance alone if the null hypothesis were true.
<p>
Comparing Our Observed Statistic: We then compare our observed statistic to this sampling distribution. If our observed statistic falls in an extreme tail of the distribution (meaning it's unlikely to occur by chance if H<sub>0</sub> were true), we get a small p-value, providing evidence against the null hypothesis.

# A smaller p-value makes the null hypothesis look more ridiculous. Explain why this is.

A smaller p-value makes the null hypothesis look more "ridiculous" because it means our observed data is increasingly unlikely to have occurred by chance if the null hypothesis were actually true. A small p-value indicates that our observed test statistic falls far out in the tail of the sampling distribution under the null hypothesis.This suggests that our data is so unusual under the assumption of H<sub>0</sub> that it's more plausible that H<sub>0</sub> is actually false. The smaller the p-value, the stronger the evidence against H<sub>0</sub>.

# 5. Gunturkun kissing couples experiment

We are testing the null hypothesis ￼ that humans have no preference for left or right head tilt when kissing, meaning the probability of tilting either left or right is 50/50. <p>
The data from the study shows that 80 out of 124 couples tilted their heads to the right.
We assume a binomial distribution for this problem since each couple’s head tilt can be considered a Bernoulli trial (tilt right or not).
Under ￼, the probability of tilting right is 0.5 (50%).
Number of right tilts (successes): 80
Total number of couples (trials): 124
I used the binomial test to compute the probability of observing 80 or more couples tilting their heads to the right, given the null hypothesis ￼ (50% chance of either tilt).
The alternative hypothesis is one-sided (greater), since we are interested in whether more couples tilt to the right than expected by random chance.
Using the scipy.stats.binom_test() function in Python, I computed the p-value, which turned out to be 0.00078.
<p>
import scipy.stats as stats<p>

Number of successes (tilt right), total observations, and the null hypothesis probability (50%)<p>
successes = 80<p>
n = 124<p>
null_hypothesis_prob = 0.5<p>

Perform a binomial test<p>
p_value = stats.binom_test(successes, n, null_hypothesis_prob, alternative='greater')<p>
print(p_value)<p>
This p-value is very small, indicating that it’s highly unlikely to observe this many couples tilting their heads to the right if the true probability were 50%. Therefore, we reject the null hypothesis and conclude that there is very strong evidence against it.

# 5. Can a smaller p-value definitively prove that the null hypothesis is false? Is it possible to definitively prove that Fido (from the "second pre-lecture video") is innocent using a p-value? Is it possible to difinitively prove that Fido is guilty using a p-value? How low or high does a p-value have to be to definitely prove one or the other?
It is not possible to definitively prove that Fido is innocent. the P-value just states if our evidence makes the null hypothesis look ridiculous or not. A smaller P-value does not definitively prove the null hypothesis to be false as there is always some chance that it is true, and a larger P-value doesn't automatically prove it as true because it just means that there is not enough evidence to reject it. 

Question 5: Güntürkün (2003) and Head Tilting While Kissing
1.
Problem Context:
○
Observed Statistic: 64.5% (80 out of 124) of couples tilted their heads to the right. This is our sample proportion (equivalent to the sample average, x̄, in other contexts).
○
Population Parameter: The true proportion of all couples who tilt their heads to the right while kissing (denoted as p).
○
Hypothesized Value: Under the null hypothesis, we assume no head-tilting preference (p<sub>0</sub> = 0.5).
2.
Simulating a World Where H0 is True:
○
Coin-Flipping Model: We can simulate a world where the null hypothesis (no head-tilting preference) is true by using a "50/50 coin-flipping" model. Each coin flip represents a couple, with heads indicating a right tilt and tails a left tilt.
○
Simulating Many Samples: We repeatedly simulate samples of 124 couples (the same size as our observed sample) using the coin-flipping model. For each simulated sample, we calculate the proportion of "right tilts" (heads). This creates a sampling distribution of the proportion under the null hypothesis.
3.
Calculating the p-value:
○
"As or More Extreme": We determine how many of the simulated proportions are as extreme as, or more extreme than, our observed proportion of 64.5%. "More extreme" in this case means further away from the hypothesized value of 50%.
○
Proportion: The p-value is the proportion of simulated proportions that meet this "as or more extreme" criterion.
4.
Interpreting the p-value:
○
Let's assume our simulation resulted in a p-value of 0.005. According to the table, this indicates very strong evidence against the null hypothesis.
Conclusion: The very small p-value suggests that it's highly unlikely to observe 64.5% right-tilting couples by random chance alone if there were truly no head-tilting preference in the population. Therefore, we have strong evidence to reject the null hypothesis and conclude that there is likely a preference for rightward head tilting while kissing.
Pre-lecture vs. Post-lecture HW
This is just an organizational suggestion to help you manage your time. You can work on the "Post-lecture" questions before the lecture if you prefer.
Question 6: Can a p-value Prove Anything Definitively?
No, a p-value cannot definitively prove or disprove a null hypothesis.
●
Probability, Not Proof: P-values represent the probability of observing our data (or more extreme data) if the null hypothesis were true. They do not provide absolute proof.
●
Possibility of Error: There's always a chance of making a wrong decision in hypothesis testing (Type I or Type II errors).
In the example of Fido from the video, no p-value can definitively prove his innocence or guilt. A small p-value only provides evidence that supports the prosecution's case (against the null hypothesis of innocence), but it doesn't guarantee guilt. Similarly, a large p-value doesn't prove innocence; it just suggests a lack of strong evidence for guilt.
Question 7: One-Sided vs. Two-Sided Tests
The "Demo II" code in the Week 5 Tutorial simulates a two-sided (or two-tailed) hypothesis test to see if the vaccine has any effect (positive or negative) on average. To adjust this code for a one-sided (one-tailed) test, you'd need to change how you calculate the p-value.
Here's what needs to change:
●
Directionality: A one-sided test is directional. You need to specify whether you're interested in an increase or a decrease in the outcome variable due to the vaccine. Let's say we're interested in whether the vaccine improves health scores.
●
"More Extreme": When calculating the p-value, "more extreme" now only means values that are greater than the observed statistic (since we're only interested in improvements).
●
Code Adjustment: You'd modify the part of the code that counts how many simulated statistics are "as or more extreme" to only count those that are greater than the observed statistic.
Interpretation:
●
The one-sided test specifically addresses whether the vaccine improves health scores.
●
The two-sided test addresses whether the vaccine has any effect (improvement or reduction).
Expected p-value:
●
Generally smaller in one-tailed tests: The p-value in the one-sided test is likely to be smaller than in the two-sided test. This is because you're only considering one tail of the sampling distribution, making it easier to get a more extreme value.
Remember: To ask your ChatBot for a code demonstration and explanation specific to the "Demo II" code.
Question 8: Fisher's Tea Experiment
Here's a structured analysis of the experiment:
Problem Introduction
●
Relationship to Fisher's Original Experiment: This experiment replicates Fisher's classic tea-tasting experiment but with a larger sample size (80 STA130 students vs. Dr. Bristol). The population of interest has also shifted from a specific individual (Dr. Bristol) to a more general group (STA130 students), potentially making the parameter of interest less personal and more abstract.
●
Hypotheses:
○
Null Hypothesis (H0):
■
Formal: p = 0.5 (where p is the population proportion of STA130 students who can correctly identify milk-first vs. tea-first).
■
Informal: STA130 students have no ability to distinguish between tea made with milk first and tea made with tea first. They are essentially guessing randomly.
○
Alternative Hypothesis (HA): H<sub>0</sub> is false. This implies that STA130 students have some ability to distinguish between the two types of tea, suggesting they are not just guessing randomly.
Quantitative Analysis
●
Methodology: We'll use simulation to estimate a p-value for the null hypothesis.
1.
Simulate Data Under H0: We'll use a coin-flipping model (like in Question 5) to simulate a world where the null hypothesis is true (students are guessing randomly). Each "coin flip" represents a student, with heads indicating a correct guess and tails an incorrect guess.
2.
Sampling Distribution: We'll repeatedly simulate samples of 80 students (our sample size) using the coin-flipping model. For each simulated sample, we'll calculate the proportion of correct guesses. This will create a sampling distribution of the proportion of correct guesses under the null hypothesis.
3.
Calculate the p-value: We'll compare our observed proportion (49/80 = 0.6125) to this sampling distribution. The p-value will be the proportion of simulated proportions that are as extreme as, or more extreme than, our observed proportion. In this case, "more extreme" means further away from the hypothesized proportion of 0.5.
●
Code: (Example Python code, assuming you have NumPy imported as np)
●
Supporting Visualizations (Optional): You could create a histogram of the simulated proportions and mark the observed proportion on the plot. This would visually show how unusual our observed data is compared to what we'd expect under the null hypothesis.
Findings and Discussion
●
Interpreting the p-value: The p-value from the simulation will tell us the probability of observing a proportion of correct guesses as extreme as 0.6125 (or more extreme) if students were truly guessing randomly.
●
Conclusion: Based on the p-value and the table provided in Question 5, we'll make a conclusion about the strength of evidence against the null hypothesis. For example, if the p-value is less than 0.05, we would reject the null hypothesis at the 5% significance level, concluding that STA130 students likely have some ability to distinguish between the tea types.
Question 9: Reviewing Course Material
