# Probability and Statistics for Machine Learning: Hypothesis Testing

## 6. Hypothesis Testing


### What is Hypothesis Testing?

Hypothesis testing is a statistical method used to determine whether there is enough evidence in a sample of data to support a particular claim (hypothesis) about a population parameter.

There are two main hypotheses in hypothesis testing:
1. **Null Hypothesis (\( H_0 \))**: The hypothesis that there is no effect or no difference.
2. **Alternative Hypothesis (\( H_1 \))**: The hypothesis that there is an effect or a difference.

The goal of hypothesis testing is to either reject the null hypothesis or fail to reject it, based on the data.

### Steps in Hypothesis Testing:
1. **State the Hypotheses**: Define the null and alternative hypotheses.
2. **Choose a Significance Level (\( lpha \))**: The probability of rejecting the null hypothesis when it is actually true (common values are 0.05, 0.01).
3. **Collect Data**: Obtain a sample and calculate the test statistic.
4. **Make a Decision**: Compare the test statistic to a critical value or calculate a p-value.
5. **Interpret the Results**: Decide whether to reject or fail to reject the null hypothesis.

### Example: Testing the Mean of a Population

Let's say we want to test whether the average height of a population is 170 cm. The null and alternative hypotheses are:

\[
H_0: \mu = 170
\]
\[
H_1: \mu 
eq 170
\]

We can use a one-sample t-test to perform this test.
    

In [None]:

from scipy import stats

# Example: One-sample t-test
sample_data = np.array([168, 172, 169, 171, 170, 173, 174])
population_mean = 170

# Perform one-sample t-test
t_statistic, p_value = stats.ttest_1samp(sample_data, population_mean)
t_statistic, p_value
    


### Interpreting the Results

- If the p-value is less than the significance level \( lpha \) (e.g., 0.05), we reject the null hypothesis.
- If the p-value is greater than \( lpha \), we fail to reject the null hypothesis.

### Types of Errors

In hypothesis testing, two types of errors can occur:
1. **Type I Error**: Rejecting the null hypothesis when it is actually true (false positive). The probability of making a Type I error is the significance level \( lpha \).
2. **Type II Error**: Failing to reject the null hypothesis when it is actually false (false negative).

### Applications in Machine Learning

Hypothesis testing is used in machine learning to:
- Compare model performance.
- Test assumptions in data (e.g., normality).
- Perform feature selection by testing the significance of features.

    