
# Hypothesis Testing: t-tests, Chi-Square Tests, and ANOVA

## Overview
Hypothesis testing is a statistical method used to make inferences or draw conclusions about a population based on sample data. This notebook covers:

- **t-tests**: Test whether the means of two groups are significantly different.
- **Chi-Square Tests**: Test for independence between categorical variables.
- **ANOVA (Analysis of Variance)**: Test for differences in means across multiple groups.



## 1. t-Tests

A t-test assesses whether the means of two groups are statistically different. Types of t-tests include:

- **One-sample t-test**: Tests if the sample mean is significantly different from a known value.
- **Independent two-sample t-test**: Tests if the means of two independent groups are different.
- **Paired t-test**: Tests if the means of two related groups are different.

Let's conduct an independent two-sample t-test in Python.


In [1]:

import numpy as np
from scipy import stats

# Generate sample data for two groups
np.random.seed(0)
group1 = np.random.normal(10, 2, 30)  # Mean 10, SD 2, n=30
group2 = np.random.normal(12, 2, 30)  # Mean 12, SD 2, n=30

# Independent two-sample t-test
t_stat, p_value = stats.ttest_ind(group1, group2)
print("t-statistic:", t_stat)
print("p-value:", p_value)

# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The means are significantly different.")
else:
    print("Fail to reject the null hypothesis: The means are not significantly different.")


t-statistic: -1.0246375585969996
p-value: 0.30978886450466103
Fail to reject the null hypothesis: The means are not significantly different.



## 2. Chi-Square Test

The chi-square test is used to determine if there is a significant association between categorical variables.

For example, suppose we want to test if there is an association between gender (male/female) and preference for a product (like/dislike). We can use a chi-square test for independence.

Let's perform a chi-square test on a contingency table.


In [2]:
# Contingency table
# Example: Gender (Male/Female) vs Preference (Like/Dislike)
observed = np.array([[30, 10], [15, 25]])  # Format: [[Male_like, Male_dislike], [Female_like, Female_dislike]]

# Chi-square test of independence
chi2_stat, p_value, dof, expected = stats.chi2_contingency(observed)
print("Chi-square statistic:", chi2_stat)
print("p-value:", p_value)
print("Degrees of freedom:", dof)
print("Expected frequencies:\n", expected)

# Interpretation
if p_value < alpha:
    print("Reject the null hypothesis: There is an association between gender and preference.")
else:
    print("Fail to reject the null hypothesis: No association between gender and preference.")


Chi-square statistic: 9.955555555555556
p-value: 0.001603647262414105
Degrees of freedom: 1
Expected frequencies:
 [[22.5 17.5]
 [22.5 17.5]]
Reject the null hypothesis: There is an association between gender and preference.



## 3. ANOVA (Analysis of Variance)

ANOVA is used to test whether the means of three or more groups are significantly different. It is an extension of the t-test for more than two groups.

The null hypothesis for ANOVA is that all group means are equal. If we reject the null hypothesis, at least one group mean is different.

Let's conduct a one-way ANOVA test.


In [3]:

# Generate sample data for three groups
group1 = np.random.normal(10, 2, 30)  # Mean 10, SD 2, n=30
group2 = np.random.normal(15, 2, 30)  # Mean 15, SD 2, n=30
group3 = np.random.normal(20, 2, 30)  # Mean 20, SD 2, n=30

# One-way ANOVA
f_stat, p_value = stats.f_oneway(group1, group2, group3)
print("F-statistic:", f_stat)
print("p-value:", p_value)

# Interpretation
if p_value < alpha:
    print("Reject the null hypothesis: At least one group mean is different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in group means.")


F-statistic: 220.363105625877
p-value: 8.796512926879192e-35
Reject the null hypothesis: At least one group mean is different.



## Summary

In this notebook, we explored:

- **t-tests**: Used to test for differences between means in one or two groups.
- **Chi-Square Test**: Tests for independence between categorical variables.
- **ANOVA (Analysis of Variance)**: Tests for differences in means across multiple groups.
