## Hypothesis

A statistical hypothesis test is a method of statistical inference used to decide whethere the data at hand sufficiently support a perticular hypothesis. Hypothesis testing allows us to make probabilistic statements about population.

### Steps involved in hypothesis testing
1. formulate a Null and Alternative hypothesis
2. select a significance level
3. check assumption
4. Decide which test is appropriate(Z-test, T-test, Chi-square, ANOVA)
5. state the relevant test statistic
6. conduct the test
7. reject or do not reject the null hypothesis
8. interpret the result

## Z-TEST
A z-test is a statistical hypothesis test that is used to determine whether the mean of a sample is significantly different from a known population mean when the sample size is sufficiently large



## Assumptions for a Z-Test:
1. Random Sampling: The data should be collected through a random sampling process to ensure that it is representative of the population.

2. Normality: The population from which the sample is drawn should be approximately normally distributed. If the sample size is sufficiently large (typically n >= 30), this assumption can be relaxed.

3. Known Population Standard Deviation: You should know the population standard deviation (σ). If you don't know σ and have a small sample size, you should use a t-test instead.

### How to perform Z-TEST
#### 1. Formulate Hypothesis:

Null Hypothesis (H0): This is the hypothesis of no effect or no difference.

Alternative Hypothesis (Ha or H1): This is the hypothesis that you want to test, which proposes a specific effect or difference.

#### 2. Select Significance Level (α):

Choose a significance level (α), typically set at 0.05 (5%). This represents the probability of making a Type I error (rejecting the null hypothesis when it's true)

#### 3. Calculate the Test Statistic (Z-Score):
   
![image-2.png](attachment:image-2.png)



Sample Mean (x̄): The mean of your sample.

Population Mean (μ): The value specified in the null hypothesis (often a known population mean).

Population Standard Deviation (σ): Known population standard deviation.

Sample Size (n): The size of your sample.

#### 4. Determine the Critical Value(s):

For a two-tailed test at a 5% significance level, the critical z-value is approximately ±1.96.

For a one-tailed test (either left-tailed or right-tailed) at a 5% significance level, the critical z-value is approximately -1.645 (left-tailed) or +1.645 (right-tailed).

#### 5. Compare the Test Statistic and Critical Value(s):

For a two-tailed test, compare the absolute value of your calculated z-score to the critical values. If |z| > critical value(s), reject the null hypothesis. Otherwise, fail to reject it.

For a one-tailed test, compare your calculated z-score to the critical value for the specified tail. If z < critical value (left-tailed) or z > critical value (right-tailed), reject the null hypothesis. Otherwise, fail to reject it.

### let's perform a z-test using a hypothetical example:

suppose a company is evaluating the impact of a new training program on the productivity of its employees. The company has data on the average productivity of its empolyees before implementing the training program. The average productivity was 50 units per day with a known population satandard deviation of 5 units. After implementing the training program the company measures the productivity of a random sample of 30 employees. the sample has an average productivity of 53 units per day. The company wants to know if the new training program has significantly increased productivity.

#### solution
here mu = 50, x_bar = 53, sigma = 5,  n = 30
let alpha = 0.05

NULL HYPOTHESIS: Average productivity is equal to 50 (mu = 50)

ALTERNATIVE HYPOTHESIS : Average productivity is grater than 50 (mu > 50)

Z(calculated) = (53 - 50)/(5/root(30))
              = (3 * root(30))/5
              =  3.28


Z(tabulated) = 1.645

here Z(calculated) > Z(tabulated)
we reject the null hypothesis

which means training program has significantly increased the productivity 

### Example 2

Suppose a snack food company claims that their lays wafer packets contain an average weight of 50 gram per packet. to verify this claim, a consumer watchdog organization decides to test a random sample of Lays wafer packets. the organization wants to determine whether the actual average weight differs significantly from the claimed 50 grams. the organization collects a random sample of 40 lays packets and measures their weights. They find that the sample has an average weight of 49 grams, with a known population standard deviation of 4 grams.

here mu = 50, x_bar = 49, sigma = 4, n = 40, alpha = 0.05

solution:
NULL HYPOTHESIS: Average weight is equal to 50 gram (mu = 50)

ALTERNATIVE HYPOTHESIS : Average weight not equal to 50 grams (mu != 50)

Z(calculated) = (49 - 50)/(4/root(40))
              =  -root(40)/4
              = -1.58

Z(tabulated) = +- 1.96

Since my calculated z-score (-1.58) is not more extreme than -1.96 (in the lower tail) or 1.96 (in the upper tail), we do not reject the null hypothesis.

which means average weight is equal to 50 grams

In [17]:
import numpy as np
from scipy import stats
# Suppose we have a sample of 50 data points from a population with a known mean (μ = 100) and standard deviation (σ = 15).
# We want to test if our sample mean is significantly different from the population mean.
population_mean = 100
population_std = 15

# Example data
sample_data = np.array([110, 95, 105, 100, 98, 102, 97, 110, 101, 99, 105, 107, 100, 96, 104, 103, 98, 110, 111, 107,
                        108, 95, 94, 96, 99, 102, 103, 108, 110, 100, 105, 98, 97, 101, 102, 106, 109, 111, 107, 95,
                        92, 100, 99, 97, 95, 100, 101, 105, 107, 106])
sample_mean = np.mean(sample_data)
sample_size = len(sample_data)
# calculate standard error
standard_error = population_std / np.sqrt(sample_size)
# calculate z-score
z_score = (sample_mean - population_mean) / standard_error
# calcuate p-value, 2 for 2 tailed test
p_value = 2 * stats.norm.sf(np.abs(z_score))
# Significance level
alpha = 0.05  
if p_value < alpha:
    print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
else:
    print("Fail to reject the null hypothesis: The sample mean is not significantly different from the population mean.")

Fail to reject the null hypothesis: The sample mean is not significantly different from the population mean.
