# Different Types of Hypothesis Tests

## Goals:
* Understand various types of hypothesis tests
* Learn when to use each type of test
* Implement these tests using Python

We will cover the following types of tests:
1. One-sample t-test
2. Two-sample t-test
3. Paired t-test
4. One-way ANOVA
5. Chi-square test of independence

In [None]:
# Import necessary libraries
import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.stats.multicomp import pairwise_tukeyhsd

## 1. One-sample t-test

The one-sample t-test is used to determine whether a sample mean significantly differs from a hypothesized population mean.

### Example: Plant Growth
A botanist claims that a new fertilizer increases plant height to an average of 25 cm. You measure the heights of 30 plants treated with this fertilizer.

In [None]:
# Generate sample data
np.random.seed(42)
plant_heights = np.random.normal(26, 2, 30)  # Mean 26, SD 2, 30 samples

# Perform one-sample t-test
t_statistic, p_value = stats.ttest_1samp(plant_heights, 25)

print(f"T-statistic: {t_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. The fertilizer significantly affects plant height.")
else:
    print("Fail to reject the null hypothesis. There's not enough evidence to conclude the fertilizer affects plant height.")

## 2. Two-sample t-test

The two-sample t-test is used to determine if there is a significant difference between the means of two groups.

### Example: Comparing Two Fertilizers
Compare the effectiveness of two different fertilizers on plant height.

In [None]:
# Generate sample data
fertilizer_A = np.random.normal(26, 2, 30)
fertilizer_B = np.random.normal(24, 2, 30)

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(fertilizer_A, fertilizer_B)

print(f"T-statistic: {t_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference between the two fertilizers.")
else:
    print("Fail to reject the null hypothesis. There's not enough evidence to conclude a difference between the fertilizers.")

## 3. Paired t-test

The paired t-test is used when you have two related samples and want to determine if there's a significant difference between them.

### Example: Before and After Treatment
Measure plant heights before and after applying a fertilizer to the same plants.

In [None]:
# Generate sample data
before_treatment = np.random.normal(20, 2, 30)
after_treatment = before_treatment + np.random.normal(2, 1, 30)  # Add some growth

# Perform paired t-test
t_statistic, p_value = stats.ttest_rel(before_treatment, after_treatment)

print(f"T-statistic: {t_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. The treatment has a significant effect on plant height.")
else:
    print("Fail to reject the null hypothesis. There's not enough evidence to conclude the treatment affects plant height.")

## 4. One-way ANOVA

One-way ANOVA (Analysis of Variance) is used to determine whether there are any statistically significant differences between the means of three or more independent groups.

### Example: Comparing Multiple Fertilizers
Compare the effectiveness of four different fertilizers on plant height.

In [None]:
# Generate sample data
fertilizer_A = np.random.normal(26, 2, 30)
fertilizer_B = np.random.normal(24, 2, 30)
fertilizer_C = np.random.normal(25, 2, 30)
fertilizer_D = np.random.normal(23, 2, 30)

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(fertilizer_A, fertilizer_B, fertilizer_C, fertilizer_D)

print(f"F-statistic: {f_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There are significant differences among the fertilizers.")
else:
    print("Fail to reject the null hypothesis. There's not enough evidence to conclude differences among the fertilizers.")

# Post-hoc test (Tukey's HSD)
if p_value < alpha:
    data = np.concatenate([fertilizer_A, fertilizer_B, fertilizer_C, fertilizer_D])
    labels = ['A']*30 + ['B']*30 + ['C']*30 + ['D']*30
    tukey_results = pairwise_tukeyhsd(data, labels)
    print("\nTukey's HSD Results:")
    print(tukey_results)

## 5. Chi-square test of independence

The Chi-square test of independence is used to determine if there is a significant relationship between two categorical variables.

### Example: Plant Survival and Fertilizer Type
Investigate if there's a relationship between the type of fertilizer used and whether plants survive or die.

In [None]:
# Create a contingency table
observed = pd.DataFrame({
    'Survived': [43, 53, 38],
    'Died': [9, 5, 11]
}, index=['Fertilizer A', 'Fertilizer B', 'Fertilizer C'])

print("Contingency Table:")
print(observed)

# Perform Chi-square test
chi2, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"\nChi-square statistic: {chi2:.4f}")
print(f"P-value: {p_value:.4f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant relationship between fertilizer type and plant survival.")
else:
    print("Fail to reject the null hypothesis. There's not enough evidence to conclude a relationship between fertilizer type and plant survival.")

## Conclusion

We've covered five different types of hypothesis tests:
1. One-sample t-test: Compare a sample mean to a known population mean.
2. Two-sample t-test: Compare means of two independent groups.
3. Paired t-test: Compare means of two related groups.
4. One-way ANOVA: Compare means of three or more independent groups.
5. Chi-square test of independence: Test relationship between categorical variables.

Each test has its specific use case and assumptions. Understanding when to use each test is crucial for proper statistical analysis in various fields, including machine learning and data science.