<a href="https://colab.research.google.com/github/hewp84/ENGR390/blob/main/Hypothesis_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hypothesis Testing

Hypothesis testing is a statistical method used to make inferences about a population parameter (e.g., population mean, proportion) based on sample data. The goal is to determine whether the sample data provides enough evidence to reject or fail to reject a null hypothesis (H0) in favor of an alternative hypothesis (Ha).

### Normal Data

When dealing with normal data, we typically use the z-test or the one-sample z-test if the population standard deviation is known, or the one-sample t-test if the population standard deviation is unknown.

**Example 1: One-Sample z-test**

Suppose we want to test if the mean height of a population is different from a claimed value (e.g., 68 inches). We collect a random sample of 50 individuals and calculate the sample mean height as 67.5 inches, with a known population standard deviation of 3 inches. Significance level: 95%

In [None]:
import numpy as np
from scipy.stats import norm

# Given
sample_mean = 67.5
population_mean = 68
population_std = 3
sample_size = 50
alpha = 0.05

# Null Hypothesis:
# Alternative Hypothesis:

# Left, Right, or Two-tailed test?

# Calculate the z-score
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))

# Calculate the p-value
#Uncomment the type of test you want to perform

#p_value = norm.cdf(z_score)  # For a left-tailed test
#p_value = 1 - norm.cdf(z_score)  # For a right-tailed test
p_value = 2 * (1 - norm.cdf(abs(z_score))) #For a two tailed test

# Print the results
if p_value < alpha:
  print(f"Reject Null Hypothesis, given that  p-value: {p_value:.4f} < {alpha:.4f}")
else:
  print(f"Fail to reject null hypothesis, given that  p-value is not: {p_value:.4f} < {alpha:.4f}")



**Exercise 1**

ETS, the company that administers the SAT exam, reports that the mean SAT mathematics score is 519. But some people think that this score overestimates the ability of typical high school seniors because only those who plan to attend college take the SAT. To check if this is true, a test was given to a SRS of 500 seniors from California. These students had an average score of xbar = 504. Is this enough evidence to say that the mean for all California seniors is lower than 519? Use a level of significance equal to α = 0.05. (Assume that σ = 100).

In [None]:
#Try it yourself


### Data Using t-test

When the population standard deviation is unknown, we use the t-test, which relies on the t-distribution instead of the normal distribution.

**Example 2.**

Suppose we want to test the effectiveness of a new drug on blood pressure. We have two groups: a control group (no drug) and an experimental group (with the new drug). We measure the blood pressure of individuals in both groups and want to determine if there is a significant difference between the two groups. Confidence level: 95%

In [None]:
import numpy as np
from scipy.stats import ttest_ind
alpha = 0.05

# Control group data
control_data = np.array([125, 132, 128, 119, 133, 126, 130, 124, 129, 121])

# Experimental group data
experimental_data = np.array([118, 120, 122, 125, 117, 123, 119, 121, 124, 120])

# Null Hypothesis:
# Alternative Hypothesis:

# Left, Right, or Two-tailed test?

# Perform the independent t-test

#t_statistic, p_value = ttest_ind(experimental_data, control_data, alternative='less')  #Left-tailed test
#t_statistic, p_value = ttest_ind(experimental_data, control_data, alternative='greater')  #Right-tailed test
t_statistic, p_value = ttest_ind(experimental_data, control_data)   #Two-tailed test

# Print the results
print("Control group mean:", np.mean(control_data))
print("Experimental group mean:", np.mean(experimental_data))
print(f"T-statistic: {t_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

# Print the results
if p_value < alpha:
  print(f"Reject Null Hypothesis, given that  p-value: {p_value:.4f} < {alpha:.4f}")
else:
  print(f"Fail to reject null hypothesis, given that  p-value is not: {p_value:.4f} < {alpha:.4f}")

**Exercise 2.**

A university professor gave online lectures instead of face-to-face classes due to Covid-19. Later, he uploaded recorded lectures to the cloud for students who followed the course asynchronously (those who did not attend the lesson but later watched the records). However, he believes that the students who attend class at the class time and participate in the process are more successful. Therefore, he recorded the average grades of the students at the end of the semester. The data is below.

In [10]:
import numpy as np
from scipy.stats import ttest_ind
alpha = 0.05

#Experimental data
sync = np.array([94. , 84.9, 82.6, 69.5, 80.1, 79.6, 81.4, 77.8, 81.7, 78.8, 73.2,
       87.9, 87.9, 93.5, 82.3, 79.3, 78.3, 71.6, 88.6, 74.6, 74.1, 80.6])

#Control data
asyncr =np.array([77.1, 71.7, 91. , 72.2, 74.8, 85.1, 67.6, 69.9, 75.3, 71.7, 65.7, 72.6, 71.5, 78.2])


