**Hypothesis Testing**

**Introduction to Hypothesis Testing**

"Hypothesis testing is a statistical method used to make decisions or inferences about a population based on sample data. It helps us determine whether there is enough evidence to reject a null hypothesis in favor of an alternative hypothesis."

 **Steps in Hypothesis Testing**

In [1]:
# import image module 
from IPython.display import Image 
  
# get the image 
Image(url="C:/Users/Suhel/Desktop/Statistics/19.png", width=800, height=300) 

**Types of Hypothesis Tests**

In [2]:
# import image module 
from IPython.display import Image 
  
# get the image 
Image(url="C:/Users/Suhel/Desktop/Statistics/20.png", width=800, height=300) 

**Example 1: One-Sample t-Test (Comparing Sample Mean to Population Mean)**

**Scenario**

A company claims that the average weight of its product is 500g. We collect a sample of 10 products and test whether the mean weight is different from 500g at a 5% significance level.

In [3]:
# import image module 
from IPython.display import Image 
  
# get the image 
Image(url="C:/Users/Suhel/Desktop/Statistics/21.png", width=800, height=300) 

In [4]:
import numpy as np
import scipy.stats as stats

# Sample data (weights of 10 products)
sample_data = [498, 502, 495, 501, 499, 497, 503, 500, 496, 504]

# Population mean
mu = 500

# Perform one-sample t-test
t_stat, p_value = stats.ttest_1samp(sample_data, mu)

# Print results
print(f"T-Statistic: {t_stat:.4f}")
print(f"P-Value: {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The mean weight is significantly different from 500g.")
else:
    print("Fail to reject the null hypothesis: No significant difference from 500g.")


T-Statistic: -0.5222
P-Value: 0.6141
Fail to reject the null hypothesis: No significant difference from 500g.


**Example 2: Two-Sample t-Test (Comparing Two Groups)**

**Scenario**

A researcher wants to compare the test scores of two different classes to see if there is a significant difference.

In [5]:
# import image module 
from IPython.display import Image 
  
# get the image 
Image(url="C:/Users/Suhel/Desktop/Statistics/22.png", width=800, height=300) 

In [6]:
# Sample data (Test scores of two classes)
class_A = [75, 78, 80, 85, 88, 92, 95, 98, 100, 102]
class_B = [70, 72, 76, 79, 83, 87, 90, 91, 94, 99]

# Perform independent t-test
t_stat, p_value = stats.ttest_ind(class_A, class_B)

# Print results
print(f"T-Statistic: {t_stat:.4f}")
print(f"P-Value: {p_value:.4f}")

# Decision
if p_value < alpha:
    print("Reject the null hypothesis: The two classes have significantly different scores.")
else:
    print("Fail to reject the null hypothesis: No significant difference between classes.")


T-Statistic: 1.2039
P-Value: 0.2442
Fail to reject the null hypothesis: No significant difference between classes.


**Example 3: Chi-Square Test (Categorical Data)**

**Scenario**

A company wants to test if customer satisfaction is independent of the type of product purchased.

In [7]:
# import image module 
from IPython.display import Image 
  
# get the image 
Image(url="C:/Users/Suhel/Desktop/Statistics/23.png", width=800, height=300) 

In [8]:
import scipy.stats as stats

# Observed frequency table (Product Type vs Satisfaction)
observed = np.array([[30, 10], [20, 40]])  # [[Satisfied, Not Satisfied], [Satisfied, Not Satisfied]]

# Perform Chi-Square test
chi2_stat, p_value, dof, expected = stats.chi2_contingency(observed)

# Print results
print(f"Chi-Square Statistic: {chi2_stat:.4f}")
print(f"P-Value: {p_value:.4f}")

# Decision
if p_value < alpha:
    print("Reject the null hypothesis: Customer satisfaction depends on product type.")
else:
    print("Fail to reject the null hypothesis: No evidence that satisfaction depends on product type.")


Chi-Square Statistic: 15.0417
P-Value: 0.0001
Reject the null hypothesis: Customer satisfaction depends on product type.


**Summary of Decision Making**

In [9]:
# import image module 
from IPython.display import Image 
  
# get the image 
Image(url="C:/Users/Suhel/Desktop/Statistics/24.png", width=800, height=300) 

**Conclusion**

1.Hypothesis testing is essential for making data-driven decisions

2.Different tests are used for different types of data.

3.The p-value helps determine whether to reject the null hypothesis.

**ANOVA (Analysis of Variance)**

**ANOVA (Analysis of Variance)** is a statistical test used to **compare the means** of **three or more groups** to determine if there is a statistically significant difference between them.

In [11]:
# import image module 
from IPython.display import Image 
  
# get the image 
Image(url="C:/Users/Suhel/Desktop/Statistics/25.png", width=800, height=300) 

**Example Scenario**
  
A teacher wants to compare the exam scores of students from three different classes to see if there is a significant difference in their performances.

In [12]:
# import image module 
from IPython.display import Image 
  
# get the image 
Image(url="C:/Users/Suhel/Desktop/Statistics/256.png", width=800, height=300) 

In [13]:
import numpy as np
import scipy.stats as stats

# Exam scores of students from three different classes
class_A = [85, 90, 88, 92, 86, 89, 91, 87, 93, 95]
class_B = [78, 82, 80, 79, 85, 83, 84, 81, 88, 86]
class_C = [72, 75, 78, 74, 76, 77, 79, 73, 80, 81]

# Perform One-Way ANOVA
f_stat, p_value = stats.f_oneway(class_A, class_B, class_C)

# Print results
print(f"F-Statistic: {f_stat:.4f}")
print(f"P-Value: {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the class scores.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the class scores.")


F-Statistic: 43.4040
P-Value: 0.0000
Reject the null hypothesis: There is a significant difference between the class scores.


**Interpreting the Output**

In [14]:
# import image module 
from IPython.display import Image 
  
# get the image 
Image(url="C:/Users/Suhel/Desktop/Statistics/27.png", width=800, height=300) 

**When to Use ANOVA?**

1.Comparing the test scores of students in multiple classes.

2.Analyzing the effectiveness of three different teaching methods.

3.Comparing the mean customer satisfaction across multiple products.