# Chapter 01: Hypothesis Testing

In [1]:
# import of some library

import pandas as pd
import numpy as np
import scipy.stats as stats
import scipy as sp

# Configuração para o notebook e plotagem de imagens
from IPython.display import Latex
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
def jupyter_settings():
    plt.style.use('bmh')
    plt.rcParams['figure.figsize'] = [25, 12]
    plt.rcParams['font.size'] = 24
   # display(HTML('<style>.container { width:100% !important; }</style>'))
    sns.set()

jupyter_settings()

## 1 - Hypothesis

### 1.1 - What is a hypothesis?

A hypothesis is an assumption that is neither proven nor disproven. In the research process, a hypothesis is made at the very beginning and the goal is to either reject or not reject the hypothesis. In order to  reject or not reject a hypothesis, we need to evaluated using a *hypothesis test*.

An example of hypothesis could be: "Men earn more than women in the same job in Brazil."

<div align='center'>

|A hypothesis is an assumption  <br> about an expected association. | Mean earn more than women  <br> in the same job in Brazil. |
|---|---|
| Your target is to either **reject** <br> or **retain** this hypothesis. | **Yes** or **No**. |
| You can test your hypothesis  <br> based on your data | You made a survey among 1000  <br> employed people in Brazil|
| The analysis of the data is done  <br> with a **hypothesis test** | We can use a z-test or a  <br> t-test to test the hypothesis.|

<div align='left'>

In the next section we will see how to test a hypothesis using a z-test and a p-value. But first, let's understand how do I formulate a hypothesis.

### 1.2 - How I do formulate a hypothesis?

In order to formulate a hypothesis, **a research question must first be formulated**. A precisely formulated hypothesis about the population can then be derived from the research question, e.g. men earn more than women in the same job in Brazil. We can summarize this process as follows:

1. Subject
2. Question 
3. Hypothesis
4. Using the data, we can test the hypothesis.
5. Conclusion

It is importante to notice, that hypothesis are not simple statements, we can formulated in such a way. Other point, to test the hypothesis, it is necessary to define exactly which variables are involved and how the variables are related. Hypothesis, then, are assumptions about the cause-and-effect relationship or the associations between the variables.

### 1.3 - What is the null and alternative hypothesis?

When we want creating a hypothesis, there are always two hypothesis that are exactly opposite to each other, or that claim the opposite. These opposite hypothesis are called the **null** and the **alternative** hypothesis, and are abbreviated as $H_0$ and $H_a$, respectively.

- **Nul Hypothesis $H_0$:** The null hypothesis assumes that there is no difference between two or more groups with respect to a characteristic. Ex.:The salary of men and women does not differ in Brazil

- **Alternative Hypothesis $H_a$:** The alternative hypothesis is the opposite of the null hypothesis, assumes that there is a difference between two or more groups with respect to a characteristic. Ex.: The salary of men and women differ in Brazil

### 1.4 - Types of hypothesis

What types of hypothesis are available? The most common distinction is between **difference** and **correlation hypothesis**, as well as **directional** and **non-directional** hypothesis.

Difference hypothesis are used when different groups are to be distinguished, for example: The group of men earn more than the group of women.

Thus, one variable is always a categorical variable, e.g, gender (male, famale), smoking status (smoker, non-smoker), or country (Brazil, USA, Canada); the other variable is at least ordinal scaled, e.g., salary, percent risk of heart attack, or hours worked per week.


Correlation hypothesis are used when the relationship or correlation between variables is to be tested, for example: The taller a person is, the heavier he is, the more horsepower a car has, the higher its fuel consumption or the better the math grade, the higher the futura salary.

As can be seen the examples, correlation hypothesis often take the form "The more..., the higher/lower...". Thus, at lest two ordinary variables are being examined.

### 1.5 - Direction and non-directional hypothesis

The other distinction we can do is divided the hypothesis into *directional* and *non-directional* or *one-sided* and *two-sided* hypothesis. If the hypothesis contains words like *better than" or "worse than", the hypothesis is usually directional. 
In the case of a non-direction hypothesis, one often finds binding blocks such as "there is a difference between" in the formulation, but it is not stated in which directions the difference lies.

In other worlds, with a **non-directional hypothesis**, the only thing of interest is whether there is a difference in a value between the groups under consideration.

Examples: 
- There is a difference between the salary of men and women in Brazil (but it is not said who earns more!).
- There is a difference in the risk of heart attack between smokers and non-smokers (but it is not said who has the higher risk!).

In regard to a correlation hypothesis, this means there is a relationship or correlation between two variables, but it is not said whether this relationship is positive or negative.

Examples:
- There is a correlation between heigh and weight.
- There is a correlation between horsepower and fuel consumption.

In both cases it is not said whether this correlation is positive or negative!

When we have a **directional hypothesis**, we have a indication of the relationship of the difference. In the case of the difference hypothesis a statement is made which group has a higher or lower value.

Examples:
- Men earn **more** than women.
- Smokers have a **higher** risk of heart attack than non-smokers.

In the case of a correlation hypothesis, a statement is made as to whether the correlation is positive or negative.

- **The taller** a person is **the heavier** he is.
- **The more** horsepower a car has, **the higher** its fuel consumption.

We will understand better the idea of directional and non-directional hypothesis when go to some examples and applications. But first let's understand how the test of hypothesis works and how to reject a hypothesis.

## 2 - Hypothesis test

In the previous section, we see what is a hypothesis, how to formulate, the type of hypothesis and the difference between directional and non-directional hypothesis. Now we want to understand how to test a hypothesis and the erro in a hypothesis test.

When we want to test some affirmation about the data, we can constructed a hypothesis about the population with the help of sample. So whenever we want to prove or say something about the population with a sample, hypothesis tests are used.

We see too,that there is always a null and an alternative hypothesis. In inferential statistics, the null hypothesis is always tested using a hypothesis test. If we want to 100% accurate, the null hypothesis $H_0$ con only ever be rejected or not rejected using a hypothesis test. The non-rejected of $H_0$ is not sufficient reason to conclude that $H_0$ is true.

### 2.1 -Why is there a probability of error in a hypothesis test?

Whether an assumption or hypothesis about the population is reject or not reject by a hypothesis test can only ever be determined with a certain probability of error. But why does the probability of erro exist?

A simples answer: each time we take a sample, we of course get a different one, which means that the results are different every time. In the worst case, a sample is taken that happens to deviate very strongly from the population and the wrong statements is made. Therefore there is always a probability of error every statement or hypothesis. In the other wolds, every time we take a sample we have a sampling bias, which means that the results are different every time and there is no guarantee that a radom sample will be representative of the population.

### 2-2 - Level of significance

The probability of error that the null hypothesis is reject even though it is actually true is called the **significance level** or **level of significance** and is denoted by $\alpha$. 

The level of significance is used to decide whether the null hypothesis should be reject or not. We will see what is the p-value, but if the p-value is less than the level of significance, we rejected the null hypothesis; otherwise, it is not to be rejected or we don't have enough evidence to reject the null hypothesis.

Usually, the level of significance is set at 0.05. If a level of significance of 5% is set, it means that it is 5% likely to reject the null hypothesis even though it is actually true.

It is important to note that the level of significance is always set before the test and may not be changed afterwards in order to obtain the "desired" statement after all. To ensure a certain degree of comparability, the level of significance is usually 5% or 1%.

- $\alpha \leq 0.01$ highly significant
- $\alpha \leq 0.05$ significant
- $\alpha \leq 0.1$ not significant

And we compare the level of significance with the p-value, which we will sen in the next sections, but first let's talking about the types of erros.

### 2.3 - Types of errors

Because a hypothesis can only be rejected with a certain probability, different types of erros occur. We talk earlier that error can occur because we are taking a random sample from a population and this samples contains sampling bias, which is, there is no guarantee that a random sample will be representative of the population. Because that, we have two possible errors that could occur as a result of a hypothesis test:

- **Type I error:** Rejecting the null hypothesis when it is actually true.
- **Type II error:** Failure to reject the null hypothesis when it is actually false.

Let's go deep in this two topics and understand their difference

**Type I error**

A type I error occurs when a hypothesis test results in *rejecting the null hypothesis, but the nul hypothesis is actually true*. For example, say we have a distribution of data with a population mean of 30. We state our null hypothesis as $H_0: \mu = 30$. We take a random sample for our test, but the random values is the sample happen to be on the higher side of the distribution. Thus, the test result suggest that we should reject the null hypothesis. In this case, we have made a type * erro. This type of error is also called **false positive**.

When we make statistic test, it always possible that we will come to an incorrect conclusion due the data sampled from the targe population. The *probability of making a type one error is specified by $\alpha$*. Said another way, $\alpha$ represent how often we expect to make an error (the expected error rate). This is a free parameter we can select for our test and this free parameter is our level of significance that we can chose in the hypothesis test. 

**Type II error**

The other type of error er can make is called a type II error, In this case, *we fail to reject the null hypothesis when it is actually false*. Let's consider another example. Say we have a distribution of data and we want to test whether the mean of the distribution is 30 or not. We take a random sample for the test and the test suggest that we should not reject the null hypothesis. However, the true population mean is 35. In this case, we have made a type II error. This type of error is also called **false negative**.

As mentioned previously, it is always possible that a statistic test could lead to an erroneous conclusion. Thus, we want to control the probability of making an error. However, unlike $\alpha$, the type II error rate $\beta$ is not simply a free parameter that we can select. To understand the likeliness of making a type II error, we generally will conduct a power analysis.

In the end, we can summarize the possible results of hypothesis test in a table as follows:

<div align='center'>

<table>
    <tr>
        <th rowspan="2">Decision about Null Hypothesis</th>
        <th colspan="3">Null Hypothesis is:</th>
    </tr>
    <tr>
        <th>True</th>
        <th>False</th>
    </tr>
    <tr>
        <td><b>Don't Reject</b></td>
        <td>Correct Inference <br> (1 - α)</td>
        <td>Type II Error <br> (β)</td>
    </tr>
    <tr>
        <td><b>Reject</b></td>
        <td>Type I Error <br> (α)</td>
        <td>Correct Inference <br> (1 - β)</td>
    </tr>
</table>
<div align='left'>


## 3 - P-value

In the previously section we discuses about the hypothesis test and the level of significance, and to decide whether we reject the null hypothesis or not, we need to use the p-value to compare to the level of significance. But what is the p-value?

We can define p-value as the probability that the observed result or an even more extreme result will occur if the null hypothesis is true. 

The p-value is used to decide whether the null hypothesis should be rejected or not. If the p-value is smaller than the defined level of significance (often 5%), the null hypothesis is rejected, otherwise not. 

In the next section we will start make some test and discover how to calculate the p-value for this tests.

## 4 - Basic of the z-test

In this section, we discuss a type of hypothesis test called z-test. It is a statistical procedure using sample data assumed to be normally distributed to determine whether a statistical statements related to the value of a population parameter should be reject or not.

Before, let's review the population parameters:
<div align='center'>

| Definition | Population parameter |
| --- | --- |
| Mean | $\mu$ |
| Variance | $\sigma^2$ |
| Standard deviation | $\sigma$ |
| Difference in two means | $\mu_1 - \mu_2$ |
| Proportion | $p$ |
| Difference in two proportions | $p_1 - p_2$ |
| Correlation | $\rho$ |
| Slop (simple linear regression) | $\beta$ |
<div align='left'>


To applying a z-test we need to now the mean and the standard deviation of the population, if we don't know this information we need applying other type of hypothesis test that we be discussed in the next chapter. But now, we know this information and the test can be performed on the following:

- One sample (a left-tailed z-test, right-tailed z-test, or two-tailed z-test);
- Two samples (a two-sample z-test)
- Proportions (a one-proportion z-test or two-proportion z-test)

Before going into different types of z-test, we will discuss the z-score and z-statistic

### 4.1 -The z-score and z-statistic

To measure how far a particular value from a mean is, we could use the z-score or the z-statistic as statistical techniques together with the mean and the standard deviation to determine the relative location.

A z-score is computed with the following formula:

$$z_i = \frac{x_i - \bar{x}}{\sigma}$$
where, $z_i$ is the z-score for $x_i$, $\bar{x}$ is the sample mean, and $\sigma$ is the standard deviation. The z-score is also known as the z-value. And the z-score indicates how many standard deviations a measurements is from the mean, for example, if $z_i= 1.8$, that point is 1.8 standard deviations away from the mean. Similarly, $z_i=-1.5$, indicates that the point is 1.5 standard deviations aqy from the mean. The sign of the determine whether it is greater or less than the mean. 

Let's go to an example

He has an IQ score of [90, 78, 110, 110, 99, 115, 130, 100, 95, 93] and the IQ score population mean $\mu = 98$ and standard deviation $\sigma = 12$. A particular student took an IQ test and his score is 110. He has an IQ score greater than the score means but he wants to know whether is in the top 5%. We can calculate the z-score as:

$$z_{student} = \frac{110-98}{12} = \frac{12}{12} = 1$$

Now, of course, we only have a sample that comes from a specific population. But if the data is normally distributed and the sample size is great than 30, then we can use the z-value ou z-score to say what percentage of students have IQ score more the 110.

Due that, we need convert our data in a standard normal distribution, which is a type of normal distribution with a mean value of 0 and a standard deviation of 1.

Since we now have a standardized distribution, all we really need is a table that tells us what percentage of the values are higher of this value for as many z-values as possible.

The student can check a z-table (from website `http://www.z-table.com`), and get the value of 0.8413. So, we have 84.13% students bellow the IQ score 110 and, the students in the top 1-0.8413=0.1587 or 15.87% of his school IQ score.

In Python, we can use the **cumulative distribution function (CFF)** to discover the percentage of the values correspondent to the z-value ou z-score.

In [2]:
mu = 98  # mean of sample
sd = 12  # standard deviation
x = 110  # IQ score of the student

# calculate of z-score
z_score = (x - mu) / sd

percentagem = round(stats.norm.cdf(z_score),4)

print(f"The value of z-score is {z_score}, and this value correspond to a percentagem of {percentagem*100:.2f}% bellow of IQ = 110.")
print(f"Or the student is in the top {(1-percentagem)*100:.2f}% of students.")

The value of z-score is 1.0, and this value correspond to a percentagem of 84.13% bellow of IQ = 110.
Or the student is in the top 15.87% of students.


Another example here is a simple random sample of 10 scores taken from the IQ survey:

sample = [90, 78, 110, 110, 99, 115, 130, 100, 95, 93]

We can compute the z-score for each IQ score in the sample.

In [3]:
IQ  = sorted(np.array([90, 78, 110, 110, 99, 115, 130, 100, 95, 93]), reverse=True)# sample of IQ score

z_score = stats.zscore(IQ) # z-score

percentagem = []
for z in z_score:
    value = round(stats.norm.cdf(z),4)
    percentagem.append(float(value))

# Create dataframe

data_zscore = {
    "IQ score": IQ,
    "z-score": z_score,
    "percentagem":percentagem
}

IQ_zscore = pd.DataFrame(data_zscore)
IQ_zscore


Unnamed: 0,IQ score,z-score,percentagem
0,130,2.008214,0.9777
1,115,0.932385,0.8244
2,110,0.573775,0.7169
3,110,0.573775,0.7169
4,100,-0.143444,0.443
5,99,-0.215166,0.4148
6,95,-0.502053,0.3078
7,93,-0.645497,0.2593
8,90,-0.860663,0.1947
9,78,-1.721326,0.0426


Before discussing the z-statistic, we will introduce the notion of sampling distribution. Taken the last example, if we perform the process of selecting a simple random sample of 35 IQ scores from the pool of IQ scores of the high school repeatedly, as many times as needed for the study. Then we compute the mean score of each sample, called $\bar{x}$, because of the we take many samples, each could have various possible of values of $\bar{x}$. However, after many samples we expected the value of $\bar{x} is:

$$E{\bar{x}} = \mu$$

Here, $\mu$ is the population mean. $\sigma_{\bar{x}}$ denotes the standard deviation of $\bar{x}$ is $\frac{\sigma}{\sqrt{n}}$, where $\sigma$ is the population standard deviation and $n$ is the sample size for finite or infinite populations. The $\sigma_{\bar{x}}$ is also called the standard error of the mean to help us to determine how far the sample mean is from the population mean. 

The **sample size and standard error are inversely correlated,** when the sample size is increased, the standard erro decreases.

In hypothesis test about a population mean, we use test statistic where its formula is given as fallows:

$$z_{test} = z_{\bar{x}} = \frac{\bar{x}-\mu}{\sigma/\sqrt{n}}$$

Here, $\bar{x}$ is the sample mean, $\mu$ is the population mean, $\sigma$ is the population standard deviation, and $n$ is the sample size.

Let's consider the IQ test example again. The IQ score data has a mean $\mu=98$ and a standard deviation $\sigma=12$. Suppose the data is normally distributed. Let $x$ be the score taken randomly from the IQ data. 

What is the probability that $x$ is between 95 and 104? We will compute the z-score when $x=95$ and $x=104$ as fallows:

$$ z_{95} = \frac{95-98}{12} = -0.25$$

$$ z_{104} = \frac{104-98}{12} = 0.5$$

Obs: The sample size n=1.

The probability that the taken score is between 95 and 104 is

$$P(95 < x < 104) = P(z_{95} <z < z_{104}) = P(z_{104}) - P(z_{95}) = 0.6915-0.4013 = 0.2902$$

Then, the probability is about 29.02% that an IQ score taken at random from the data between 95 and 104.

In Python, we can implement the code as follows:

In [4]:
# calculate z score at x=95 and 104

zscore_95 = round((95-98)/12,2)
zscore_104 = round((104-98)/12,2)


# calculate cdf (the standard deviation normalized) and probability

cdf_95 = stats.norm.cdf(zscore_95)
cdf_104 = stats.norm.cdf(zscore_104)

prob = abs(cdf_104-cdf_95)

print(f"The probability to taken score randomly between 95 and 104 is {prob*100:.2f}%.")

The probability to taken score randomly between 95 and 104 is 29.02%.


## 5 - A z-test for means

In this part, we will discus about one-sample and two-sample z-tests related to a population mean or means of two populations  

The probability used to determine whether $H_0$ should be rejected for a p-values associated with a z-score of -2.67 in left-tailed.

### 5.1 -One-sample z-test

To constructed hypothesis first we selected sample from the a population which should be normally distributed. The population standard deviation is supposed to be known. 

Now, we need to develop the null and alternative hypothesis. We discus early that the hypothesis test can be separated in two group, one is non-directional and the other is directional hypothesis and we can translade this groups in left-tailed, right-tailed and two-tailed z-test. In the figure below, we can see how this works.


<div align='center'>
<img src="../imagens/hypothesis_test_direc_and_non_direct.png" width="1000"/>
</div align='left'>

And we can translade gains this imagem in the null and alternative hypothesis for the mean.


<div align='center'>

|Left-tailed       |Right-tailed      | Two-tailed    |
|------------------|------------------|---------------|
|$H_0:\mu\geq\mu_0$|$H_0:\mu\leq\mu_0$|$H_0:\mu=\mu_0$|
|$H_a:\mu<\mu_0$   |$H_a:\mu>\mu_0$   |$H_a:\mu\neq\mu_0$|

</div align='left'>

Next, we need to specify the level of significance, $\alpha$, which is the probability of rejecting the null hypothesis when it is true, or in the other worlds, it is the probability of a type I error.

Now, we can calculate the value of the test statistic, for this, we are two approaches:

- Using a p-value
- A critical value for the hypothesis testing.

To reject the null hypothesis, the p-value should be less than or equal to the specified level of significance $\alpha$.

In order to find the p-value based on the value of the test statistic in Python, we use the following syntax:

`scipy.stats.norm.sf(abs(x))`

where, `x`is the z-score. For example, we want to find the p-value associated with a z-score of -2.67 in a left-tailed test.


In [5]:
# find p_values for left-tailed

print(round(sp.stats.norm.sf(abs(-2.67)),4))

0.0038


The left and right tail probability is the same. 

For two-tailed test, we need to multiply the value by 2.

In [6]:
print(round(sp.stats.norm.sf(abs(2.67)),4)*2)

0.0076


In the following figure illustrates the idea of how the p-valeu is computed in each type of test.

<div align='center'>
<img src="../imagens/hypothesis_test_each_side.png" width="1000"/>
</div align='left'>

In the critical value approach, we will need to compute a critical value for the test statistic by using the level of significance. Critical values are the boundaries of the critical region where we can reject the null hypothesis.

<div align='center'>
<img src="../imagens/critical_region.png" width="1000"/>
</div align='left'>

To compute the critical value in Python, we use the following syntax:
`scipy.stats.norm.ppf(alpha)`

Let's see how this work. Critical value for the test statistic by using the level of significance with a $\alpha= 0.05$

In [7]:
alpha = 0.05 # level of significance

# find Z critical value for left-tailed test 
print(f"The critical values for left-tailed test is {round(sp.stats.norm.ppf(alpha),4)}")

# find Z critical value for right-tailed test
print(f"The critical value for right-tailed test is {round(sp.stats.norm.ppf(1-alpha),4)} ")

# find Z critical value for two-tailed test
print(f"The critical value for two-tailed are {round(-sp.stats.norm.ppf(1-alpha/2),4)} and {round(sp.stats.norm.ppf(1-alpha/2),4)}")

The critical values for left-tailed test is -1.6449
The critical value for right-tailed test is 1.6449 
The critical value for two-tailed are -1.96 and 1.96


At the level of significance $\alpha=0.05$ 

- For the left-tailed test, the critical value is -1.6449. Since this is a left-tailed test, if the test statistic is less than or equal this critical value, we reject the null hypothesis.

- For the right-tailed test, if the test statistic is greater than or equal to 1.6559, we reject the null hypothesis.

- For the two-tailed test, we reject the null hypothesis if the test statistic is greater than or equal to 1.95996 or less than or equal to -1.95996.


Let's put this in practical with one example.

Using again the students. The IQ score data has a mean of $\mu=98$ and a standard deviation of $\sigma=12$. A researcher wants to know whether IQ scores will be affected by some IQ training. 

He recruits 30 students and trains them to answer IQ questions 2 hours per day for 30 days and record their IQ levels after finishing the training period. 

The IQ score of 30 students after the training section are:

[95, 110, 105, 120, 125, 110, 98, 90, 99, 100,110, 112, 106, 92, 108, 97, 95, 99, 100, 100, 103, 125, 122, 110, 112, 102, 92, 97, 89, 102].

Their average score is $\bar{x} = 104.17$. 

We can easily implement the calculation in Python as follows:

In [8]:
IQ_scores = np.array([95, 110, 105, 120, 125, 110, 98, 90, 99, 100,
            110, 112, 106, 92, 108, 97, 95, 99, 100, 100,
            103, 125, 122, 110, 112, 102, 92, 97, 89, 102])
IQmean = round(IQ_scores.mean(),2)
print(f'The IQ score mean is {IQmean}!')

The IQ score mean is 104.17!


We define the null hypothesis and the alternative hypothesis:

$$ H_0:\mu_{after} = \mu = 98$$
$$ H_a:\mu_{after} > \mu = 98$$

The level of significance chose is $\alpha=0.05$. Previously, the critical value for the right-tailed test was 1.64485.

Now we calculate the test statistic on the problem:
$$ z = \frac{\bar{x}- \mu}{\sigma /\sqrt{n}} = \frac{104.17- 98}{12 /\sqrt{30}} = 2.8162$$

Because the results of the test statistic is greater than the critical region, we reject the null hypothesis. This means that the training does affect the IQ levels of these students and helps them improve their IQ scores.

We can calculate this from the code bellow:

In [9]:
n =30 # number of students
sigma = 12 # population standard deviation
mu = 98 # population mean

z  = (IQmean - mu)/ (sigma/n**(1/2))

print(f"The tst statistic z value is {round(z,2)}!")

The tst statistic z value is 2.82!


We can use the `ztest()` function from the `statsmodels` package to performe a z-test:

```python
statsmodels.stats.weightstats.ztest(x1, x2=None, values =0, alternative = 'two-sided')
```

where 
- $x_1$: The first of the two independent samples;
- $x_2$: The second of the two independent samples (if perfuming a two-tailed test);
- value: In the one-sample case, values is the mean of $x_1$ under the null hypothesis. In the two-sample case, value is the difference between the mean of $x_1$ and the mean of $x_2$ under the null hypothesis. The test statistic is x1_mean - x2_mean - value.
- alternative: the alternative hypothesis: 
    - `two-sided`: two-side test,
    - `larger`: Right-tailed test,
    - `smaller`: Left-tailed test.

Applying in the before example, but now to Right-tailed

In [10]:
from statsmodels.stats.weightstats import ztest as ztest

# IQ scores after training sections
IQ_scores = np.array([95, 110, 105, 120, 125, 110, 98, 90, 99, 100,
            110, 112, 106, 92, 108, 97, 95, 99, 100, 100,
            103, 125, 122, 110, 112, 102, 92, 97, 89, 102])

# perform one sample z-test

z_statistic, p_value = ztest(IQ_scores, value = 98, alternative='larger')

print(f"The test statistic is {z_statistic:.2f} and the correspondent p-value is {p_value:.5f}.")

The test statistic is 3.40 and the correspondent p-value is 0.00034.


Interpretation: 

The statistic value of z is 3.40 which indicates that the sample mean is 3.4 points higher than the population mean of 98. The higher of the statistic value of z, more unlikely is that the observed difference is just the result of chance.

The p-value is 0.00034, which means that, if we assuming that $H_0$ is true, the probability of get the mean that higher by chance is of 0.034%, which is very low. As we have the p-value = 0.00034 much less than the level of significance (0.05), we can reject the null hypothesis. In other words, the training does affect the IQ levels of theses students and helps them improve their IQ scores.


- **Two-sample z-test:**

Two normally distributed and independent populations. 

$\Omega$ is the hypothesis difference between the two means $\mu_1$ and $\mu_2$. 

We can have three forms for the null and alternative hypothesis:

$$ H_0: \mu_1 - \mu_2 \geq \Omega \hspace{1cm} H_0: \mu_1 - \mu_2 \leq \Omega \hspace{1cm}  H_0: \mu_1 - \mu_2 = \Omega $$
$$ H_a: \mu_1 - \mu_2 < \Omega \hspace{1cm} H_a: \mu_1 - \mu_2 > \Omega \hspace{1cm}  H_a: \mu_1 - \mu_2 \neq \Omega $$

For $\Omega = 0$ the null hypothesis is zero, or in the other words, the means are equal.

The test statistics for hypothesis tests is computed as follows:

$$ z = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$$

We have an error associated to the two random samples:
$$\sigma_{\bar{x}_1-\bar{x}_2} = \sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}$$

**Example:**

We study the IQ scores again, but for students from two schools, named A and B, and we want to know whether the mean IQ levels for theses two schools are different.

For this case, the null hypothesis and the alternative hypothesis are:
$$ H_0: \mu_A = \mu_B $$
$$ H_a: \mu_A \neq \mu_B $$

In [11]:
A = np.array([95, 110, 105, 120, 125, 110, 98, 90, 99, 100,
              110, 112, 106, 92, 108, 97, 95, 99, 100, 100,
              103, 125, 122, 110, 112, 102, 92, 97, 89, 102]) # school A

B = np.array([98, 90, 100, 93, 91, 79, 90, 100, 121, 89,
              101, 98, 75, 90, 95, 99, 100, 120, 121, 95,
              96, 89, 115, 99, 95, 121, 122, 98, 97, 97]) # school B

# perform two-sample z-test
z_statistic, p_value = ztest(A, B, value=0, alternative='two-sided')

print(f"The test statistic is {z_statistic:.2f} and the corresponding p-value is {p_value:.4f}")

The test statistic is 1.76 and the corresponding p-value is 0.0789


The test statistic and the p-value produced by Python code are 1.76 and 0.0789, respectively. Using a level of significance of 0.05, since the p-value is 0.0789, we fail to reject the null hypothesis. In the other words, we do not have enough evidence to show that the IQ means scores between the students from the two schools are different.


### 5.2 - Two-sample z-test

Consider now two normally distributed and independent populations. Let $\Omega$ be the hypothesis difference between the two means $\mu_1$ and $\mu_2$. We can have three forms for the null and alternative hypothesis:

$$ H_0: \mu_1 - \mu_2 \geq \Omega \hspace{1cm} H_0: \mu_1 - \mu_2 \leq \Omega \hspace{1cm} H_0: \mu_1 - \mu_2 = \Omega0 $$
$$ H_a: \mu_1 - \mu_2 < \Omega \hspace{1cm} H_a: \mu_1 - \mu_2 > \Omega \hspace{1cm} H_a: \mu_1 - \mu_2 \neq \Omega0 $$

It is common that $\Omega = 0$. That means that for the case of the two-said test, the null hypothesis is zero, or in other words, $\mu_1$ and $\mu_2$ are equal. 

The test statistic for hypothesis test is computed as follows:

$$ \frac{(\bar{x_1}-\bar{x_2})- \Omega}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}$$

Here, $\bar{x_1}$ and $\bar{x_2}$ are the sample means with the sample sizes, $n_1$ and $n_2$ randomly taken from the two population with the means $\mu_1$ and $\mu_2$, $\sigma_1$ and $\sigma_2$ are the standard deviations for these two population.


Let's consider an example. Now we study the IQ score of students from two schools, named A and B. We want to know whether the mean IQ levels for these two schools are different. A simple random sample of 30 students from each school is recorded.

The null and alternative hypothesis are:

- $H_0: \mu_1 - \mu_2 = 0$
- $H_a: \mu_1 - \mu_2 \neq 0$

we choose the level of significance $\alpha = 0.05$. In Python, by using the `ztest()` function of the `statsmodels` package, we perform the following calculation.

In [12]:
from statsmodels.stats.weightstats import ztest as ztest

# IQ scores from two schools

A = [ 95, 110, 105, 120, 125, 110, 98, 90, 99, 100,
     110, 112, 106, 92, 108, 97, 95, 99, 100, 100,
     103, 125, 122, 110, 112, 102, 92, 97, 89, 102]
B = [98, 90, 100, 93, 91, 79, 90, 100, 121, 89,
     101, 98, 75, 90, 95, 99, 100, 120, 121, 95,
     96, 89, 115, 99, 95, 121, 122, 98, 97, 97]

# perform two-sample z-test

z_statistic, p_value = ztest(A, B, value=0, alternative="two-sided")

print(f"The test statistic is {z_statistic:.4f} and the corresponding p-value is {p_value:.4f}")

The test statistic is 1.7573 and the corresponding p-value is 0.0789


The results shows that the p-value is large than the level of significance $\alpha=0.05$. Therefore, we fail to reject the null hypothesis. In the other words, we do not have enough evidence to show that the IQ means scores between the students from the two schools are different.

## 6 - Z-test for proportions

The z-test for proportions is used to compare two proportions.

### 6.1 - A one-proportion z-test

First, we start with the case for One-proportion z-tests which are used to compare the difference between a sample proportion $\bar{p}$ and a hypothesized proportion $p_0$. 

$$ H_0: \bar{p} \geq p_0 \hspace{1cm} H_0: \bar{p} \leq p_0 \hspace{1cm} H_0: \bar{p} = p_0 $$
$$ H_a: \bar{p} < p_0 \hspace{1cm} H_a: \bar{p} > p_0 \hspace{1cm} H_a: \bar{p} \neq p_0 $$

and the test statistic is computed as follows:
$$z = \frac{\bar{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$$

Let us consider an exemple. 

In a community college, a researcher wants to know whether students support some changes equal to 80%. He will use a one-proportion z-test at the level of significance $\alpha=0.05$.

The python code is:

`statsmodels.stats.proportion.proportions_ztest(count, nobs, value=0.8, alternative='two-sided')`
 - `count`: The number of successes in the sample.
 - `nobs`: The number of observations in the sample.
 - `value`: The hypothesized value of the proportion.
 - `alternative`: The alternative hypothesis: 
    - `two-sided`: two-side test,
    - `larger`: Right-tailed test,
    - `smaller`: Left-tailed test.

In our case, the researcher gathers a sample of data with an observed sample proportion $ p = 0.84$, the hypothesized population proportion $p_0=0.8$, and the sample size $n=500$. The null and alternative hypothesis are?

$$ H_0: \bar{p} = p_0 \hspace{1cm} H_a: \bar{p} \neq p_0 $$

We will implement a one-proportion two-tailed z-test in Python.

In [13]:
from statsmodels.stats.proportion import proportions_ztest

count = 0.8 * 500
nobs = 500
value = 0.84

# perform one proportion two-tailed z-test

z_statistic, p_value = proportions_ztest(count, nobs, value, alternative='two-sided')

print(f"The test statistic is {z_statistic:.2f} and the corresponding p-value is {p_value:.4f}")

The test statistic is -2.24 and the corresponding p-value is 0.0253


The results show us that the z-test is -2.24 which is indicate that proportion is 2.24 less than the hypothesized proportion value. The corresponding p-value is 0.0253, since the p-value <0.05, we reject the null hypothesis. There is enough evidence to suggest the proportion of students who support the changes is different from 0.8.

However, this results only say that the proportion is different from 0.8. To discover a range of values, we need to use a confidence interval (CI).

How this works?

- The confidence interval is built around the sample proportion ($p$) and take in consideration the standard error of the sample proportion.

- If the confidence interval does not include the hypothesized population proportion ($p_0$), we reject the null hypothesis.

$$IC = p \pm z_{\alpha/2} \frac{\sqrt{p(1-p)}}{n}$$

where $\alpha$ is the level of significance, $p$ is the sample proportion, and $n$ is the sample size.


In our case:
- sample proportion: $p=0.8$
- hypothesized population proportion: $p_0=0.84$
- sample size: $n=500$
- level of significance: $\alpha=0.05$

In [14]:
import numpy as np
p = 0.8
n = 500
alpha = 0.05

# calculate the standard error
se = np.sqrt(p*(1-p)/n)

# z-critical value 
z_critical = sp.stats.norm.ppf(1-alpha/2)

# calculate the margin of error
margin_of_error = z_critical * se

# calculate the confidence interval
lower_bound = p - margin_of_error
upper_bound = p + margin_of_error

print(f"The confidence interval is ({lower_bound:.4f}, {upper_bound:.4f})")

The confidence interval is (0.7649, 0.8351)


The results shows us that the confidence interval of 95% for the sample proportion is $IC = 0.8 \pm 0.0351$ or between (0.7659, 0.8341).

### 6.2 - A two-proportion z-test


The test is used to test the difference between two population proportions.\
The test statistic is computed as fallows:

$$z = \frac{\bar{p}_1 - \bar{p}_2}{\sqrt{\bar{p}(1-\bar{p})(\frac{1}{n_1}+\frac{1}{n_2})}}$$

Here, $\bar{p}_1$ and $\bar{p}_2$ are the sample proportions, $n_1$ and $n_2$ are the sample sizes from population 1 and population 2, and $\bar{p}$ is the total pooled proportion, calculated as follows:

$$\bar{p} = \frac{n_1\bar{p}_1 + n_2\bar{p}_2}{n_1+n_2}$$

We consider a similar exemple:

Consider if there is a difference in the proportion of students from school A who support the changes compared to the proportion of students from school B.

Assume here, $n_1=100$, $\bar{p}_1= 0.8$, and  $n_2=100$, $\bar{p}_2=0.7$.

The null hypothesis and the alternative hypothesis are:

$$ H_0: \bar{p}_1 = \bar{p}_2 $$
$$ H_a: \bar{p}_1 \neq \bar{p}_2 $$

And the level of significance is $\alpha=0.05$.

In [15]:
p1bar = 0.8
p2bar = 0.7
n1 = 100
n2 = 100
alpha = 0.05

p = (p1bar *n1 + p2bar * n2) / (n1 + n2)

z = (p1bar - p2bar) / np.sqrt(p*(1-p)*(1/n1 + 1/n2))

p_value = sp.stats.norm.sf(abs(z)) * 2

print(f"The test statistic is {z:.4f} and the corresponding p-value is {p_value:.4f}.")

The test statistic is 1.6330 and the corresponding p-value is 0.1025.


Since the p-value is 0.1025, we greater than the level of significance of 0.05, we fail to reject the null hypothesis. In the other words, we do not have enough evidence to reject the hypothesis that the proportion of the two schools are the equal.

## 7 - Summary

When to Use Each Z-Test for Means and Proportions

| **Test**                          | **Number of Groups** | **Sample Type**       | **Assumptions**                                      | **When to Use** |
|------------------------------------|----------------------|----------------------|-----------------------------------------------------|-----------------|
| **One-Sample Z-Test (Mean)**       | 1                    | Independent          | Normality or large $n$; known population variance ($\sigma$) | Compare the sample mean to a known population mean when **population variance is known** |
| **Two-Sample Z-Test (Means)**      | 2                    | Independent          | Normality or large $n$; known population variances ($\sigma_1$, $\sigma_2$) | Compare the means of two independent groups when **population variances are known** |
| **One-Sample Z-Test (Proportion)** | 1                    | Independent          | Large sample size ($np \geq 5 $), $n(1-p) \geq 5 $. $n$ size of sample and $p$ is the proportion| Compare a sample proportion to a known population proportion |
| **Two-Sample Z-Test (Proportions)**| 2                    | Independent          | Large sample size in each group | Compare proportions between two independent groups |

