# Q1
  For an idea to be tested, a key factor is its measurability. If the idea can be tested, its data can be analyzed. Conversely, if the idea's data cannot be quantified into measurable variables, it cannot be tested.
  A good null hypothesis must be specific, testable, and falsifiable. It needs to clearly state that there is no significant effect or difference between the subjects being studied, can be verified with data, and has the possibility of being disproven by evidence. This hypothesis should be clear, making it easy to test through analysis to see if it holds true.
  In hypothesis testing, the null hypothesis (H₀) is the default assumption, stating that there is no difference or effect between the two. It represents the status quo. The alternative hypothesis (H₁), on the other hand, contradicts the null hypothesis, indicating that there is a significant difference or effect. Simply put, the null hypothesis assumes no change, while the alternative hypothesis suggests there is a change or impact.

# Q2
when we conduct statistical tests, the ultimate goal is to infer the characteristics of the entire population, rather than just focusing on the sample.
x i​represents each individual observation in the sample, like the height data of a randomly selected group of people.
xˉis the sample mean, which is the average value calculated from the sample data.
𝜇 is the population mean, the true average value of the entire population, which we aim to estimate from the sample.
μ 0​ is a hypothesized value used in hypothesis testing. We compare the sample mean to μ 0​ to determine if the hypothesis holds.
In summary, the goal of statistical testing is to use sample data to infer the characteristics of the whole population. The focus is on the population parameter rather than the sample statistic. This highlights that conclusions drawn from the sample are actually about the population

# Q3
The calculation of the p-value is based on the assumption that the null hypothesis is true, which allows us to assess how likely the observed data would occur. The null hypothesis usually states that there is no significant difference or effect. By assuming the null hypothesis is true, we can calculate the probability of observing results as extreme as ours under this assumption. If the p-value is very small, it means the likelihood of such results occurring under the null hypothesis is very low, which may lead us to reject the null hypothesis. In simple terms, the p-value helps us determine whether the data aligns with the null hypothesis or whether there is enough evidence to support a different conclusion.

# Q4
Suppose we are investigating whether a new drug can lower blood pressure, and the original hypothesis is that “the drug has no effect”. We conduct an experiment to measure the change in blood pressure after using the drug, and find that the patient's blood pressure drops significantly. We then calculated a p-value of, say, 0.01, which means that if the drug did not work, there was only a 1% chance of such a significant drop in blood pressure.

Because this probability was so small, we began to think that the original hypothesis (that the drug was ineffective) was not very plausible. In other words, a p-value that small means that the data is so far from the original hypothesis that it makes the original hypothesis look less and less plausible. We then have more reason to believe that the drug does work and may reject the original hypothesis.

Thus, in statistical analysis, the smaller the p-value and the more inconsistent the data are with the original hypothesis, the more we are inclined to think that the original hypothesis is not valid and to support the alternative hypothesis, which is that the drug may actually work.

In [2]:
# Q5
import numpy as np

np.random.seed(42)

n_couples = 124    
right_tilt_observed = 80  
n_simulations = 10000  
p_null = 0.5  

simulated_right_tilts = np.random.binomial(n_couples, p_null, n_simulations)

p_value = np.mean(simulated_right_tilts >= right_tilt_observed)

print(f"Simulated p-value: {p_value}")

Simulated p-value: 0.001


In [None]:
# Q5
Hypothesis:
Null hypothesis 𝐻0 : Humans do not have a preference for left-right tilting when kissing, which means that the probability of favoring the right side is 0.5.
Alternative hypothesis. 
𝐻1 : Humans have a preference for the right side when kissing, i.e. the probability of leaning to the right is greater than 0.5.
Sample Data:
124 couples were observed, out of which 80 couples inclined their heads to the right side.
Assuming that the null hypothesis holds (with a probability of 0.5 leaning to the right), we run 10,000 simulations, each simulating 124 couples. We record the number of couples whose heads lean to the right in each simulation, and then count how many times the simulation results are greater than or equal to 80.(code above)
Conclusion:
Since the p-value of 0.0008 is less than 0.01, based on the given table, we have very strong evidence against the null hypothesis 𝐻0
Thus, the data support the conclusion that humans prefer to tilt their heads to the right when kissing, rather than to the left or have no preference.

# Q6
A small p-value can't completely prove that the null hypothesis is wrong. It just means that if the null hypothesis were true, the chances of getting extreme data are very low. A low p-value shows strong evidence against the null hypothesis, but it's not 100% certain.

Just like Fido in the video, a p-value can't prove whether he's innocent or guilty. A very low p-value means there's strong evidence against him, but it doesn't mean he's definitely guilty. Similarly, a high p-value only suggests there's not enough evidence to prove guilt, but it doesn't confirm he's innocent.

There's no specific p-value that can absolutely prove anything. Usually, a p-value below 0.05 is considered strong evidence, but it's not definitive.

In [None]:
# Q7
# New (Single-tailed test):
# If you're testing for greater or less extreme values, the code changes to:
SimStats_as_or_more_extreme_than_ObsStat = \
    simulated_statistics >= observed_statistic
#or
SimStats_as_or_more_extreme_than_ObsStat = \
    simulated_statistics <= observed_statistic

# Significance: In a one-tailed test, you are only interested in deviations in one direction. If testing for greater than, you're checking if the simulated statistics are larger than the observed statistic. If testing for less than, you're checking if they are smaller. This is important because it directly changes the interpretation of the p-value and the hypothesis you are testing:
# In a two-tailed test, the hypothesis is tested against deviations in both directions.
# In a single-tailed test, you are only concerned about whether the statistic is significantly larger (or smaller), but not both.
# This reduces the critical region to one side of the distribution, making a single-tailed test more sensitive to extreme values in one direction but at the cost of not detecting extreme values in the opposite direction.
# summary:In this conversation, I asked how to modify the code from a two-tailed test to a one-tailed test. The original code checks for deviations in both directions by comparing the absolute differences between simulated statistics and the hypothesized parameter. After receiving an explanation, I learned that a one-tailed test only focuses on deviations in one direction (either greater or smaller than the observed statistic). This change makes the test more sensitive to extreme values in one direction but won't capture deviations in the opposite direction, affecting the interpretation of the p-value and hypothesis test results.
# link: https://chatgpt.com/c/6707114e-515c-8003-9c89-41f7f31fdbe4

In [None]:
# Q7
# Single-tailed Test (Greater-than)
# Assume necessary imports and data are already provided
population_parameter_value_under_H0 = 0.5

# Observed statistic from the real data
observed_statistic = (patient_data.HealthScoreChange > 0).mean()

# Simulated statistics under the null hypothesis
simulated_statistics = IncreaseProportionSimulations_underH0random

# Single-tailed test: Greater-than test
SimStats_as_or_more_extreme_than_ObsStat = simulated_statistics >= observed_statistic

# Output the result
print('''Which simulated statistics are "as or more extreme"
than the observed statistic? (of ''', observed_statistic, ')', sep="")

# Create a DataFrame to visualize the comparison
pd.DataFrame({'(Simulated) Statistic': simulated_statistics,
              '>= '+str(observed_statistic)+" ?": ['>= '+str(observed_statistic)+" ?"]*len(simulated_statistics),
              '"as or more extreme"?': SimStats_as_or_more_extreme_than_ObsStat})

# Single-tailed Test (Less-than)
# Assume necessary imports and data are already provided
population_parameter_value_under_H0 = 0.5

# Observed statistic from the real data
observed_statistic = (patient_data.HealthScoreChange > 0).mean()

# Simulated statistics under the null hypothesis
simulated_statistics = IncreaseProportionSimulations_underH0random

# Single-tailed test: Less-than test
SimStats_as_or_more_extreme_than_ObsStat = simulated_statistics <= observed_statistic

# Output the result
print('''Which simulated statistics are "as or more extreme"
than the observed statistic? (of ''', observed_statistic, ')', sep="")

# Create a DataFrame to visualize the comparison
pd.DataFrame({'(Simulated) Statistic': simulated_statistics,
              '<= '+str(observed_statistic)+" ?": ['<= '+str(observed_statistic)+" ?"]*len(simulated_statistics),
              '"as or more extreme"?': SimStats_as_or_more_extreme_than_ObsStat})

In [3]:
# Q8
# Step 1: Hypotheses
# Null Hypothesis (H₀):
# The students are guessing randomly whether milk or tea was poured first, meaning their probability of correctly identifying which was poured first is 50%.

# Formal Version:
# H₀: p = 0.5 (where p is the proportion of students who correctly identify which was poured first).

# Alternative Hypothesis (H₁):
# The students are performing better than random guessing.

# Formal Version:
# H₁: p > 0.5.

# Step 2: Simulation
# To test this hypothesis, we can run a simulation assuming the null hypothesis is true (p = 0.5). Here are the steps:

# Use a random number generator to simulate students' responses, setting the probability of correctly identifying which was poured first to 50%.
# Run multiple simulations (e.g., 10,000 times) to generate a distribution of possible outcomes under the null hypothesis.
# Calculate how many students correctly identified in the experiment (49 out of 80) and compare that to the results of the simulations.
# Calculate the p-value, which represents the proportion of simulations that resulted in 49 or more correct identifications.
import numpy as np

np.random.seed(42)

num_students = 80
num_correct = 49
num_simulations = 10000

simulations = np.random.binomial(n=num_students, p=0.5, size=num_simulations)

p_value = np.mean(simulations >= num_correct)
print(f'P-value: {p_value}')

# Conclusion
# Based on the simulation results, the calculated p-value is about 0.0294.

# Since this p-value is less than the commonly used significance level of 0.05, we reject the original hypothesis (H₀). This suggests that there is sufficient evidence that students outperform random guessing in correctly identifying which poured first.
# This result supports the idea that students may indeed be capable of discriminating how tea is prepared, similar to Dr. Bristol's claim in the original experiment.

P-value: 0.0294


In [None]:
# Q9
Mostly