# Z-Test

`statsmodels.stats.weightstats.ztest(x1, x2=None, value=0)`
- x1: values for the first sample
- x2: values for the second sample (if performing a two sample z-test)
- value: mean under the null (in one sample case) or mean difference (in two sample case)

## One Sample Z-Test

- Suppose the IQ in a certain population is normally distributed with a mean of μ = 100 and standard deviation of σ = 15.
- A researcher wants to know if a new drug affects IQ levels, so he recruits 20 patients to try it and records their IQ levels.
- The following code shows how to perform a one sample z-test in Python to determine if the new drug causes a significant difference in IQ levels:

In [2]:
from statsmodels.stats.weightstats import ztest

In [3]:
data = [88, 92, 94, 94, 96, 97, 97, 97, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 115]

In [4]:
ztest(data, value=100)

(1.5976240527147705, 0.11012667014384257)

- The test statistic for the one sample <b> z-test is 1.5976 and the corresponding p-value is 0.1101 </b>.
- Since this p-value is not less than .05, we do not have sufficient evidence to reject the null hypothesis. 
- In other words, the new drug does not significantly affect IQ level.

## Two Sample Z-Test

- Suppose the IQ levels among individuals in two different cities are known to be normally distributed with known standard deviations.
- A researcher wants to know if the mean IQ level between individuals in city A and city B are different, so she selects a simple random sample of  20 individuals from each city and records their IQ levels.
- The following code shows how to perform a two sample z-test in Python to determine if the mean IQ level is different between the two cities:

In [8]:
from statsmodels.stats.weightstats import ztest

cityA = [82, 84, 85, 89, 91, 91, 92, 94, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 114]
cityB = [90, 91, 91, 91, 95, 95, 99, 99, 108, 109,109, 114, 115, 116, 117, 117, 128, 129, 130, 133]

ztest(cityA, cityB, value=0)

(-1.9953236073282115, 0.046007596761332065)

- The test statistic for the two sample <b> z-test is -1.9953 and the corresponding p-value is 0.0460 </b>.
- Since this p-value is less than .05, we have sufficient evidence to reject the null hypothesis. 
- In other words, the mean IQ level is significantly different between the two cities.

# T-Test

## One Sample T-Test

- Suppose a botanist wants to know if the mean height of a certain species of plant is equal to 15 inches. 
- She collects a random sample of 12 plants and records each of their heights in inches.
- Use the following steps to conduct a one sample t-test to determine if the mean height for this species of plant is actually equal to 15 inches.

In [11]:
import scipy.stats as stats

data = [14, 14, 16, 13, 12, 17, 15, 14, 15, 13, 15, 14]

stats.ttest_1samp(a=data, popmean=15)

TtestResult(statistic=-1.6848470783484626, pvalue=0.12014460742498101, df=11)

- The <b> t test statistic is -1.6848 and the corresponding two-sided p-value is 0.1201 </b>.
- The two hypotheses for this particular one sample t-test are as follows:
    - H0: µ = 15 (the mean height for this species of plant is 15 inches)
    - HA: µ ≠15 (the mean height is not 15 inches)
- Because the p-value of our test (0.1201) is greater than alpha = 0.05, we fail to reject the null hypothesis of the test. 
- We do not have sufficient evidence to say that the mean height for this particular species of plant is different from 15 inches.

## Two Sample T-Test

`ttest_ind(a, b, equal_var=True)`
- a: an array of sample observations for group 1
- b: an array of sample observations for group 2
- equal_var: if True, perform a standard independent 2 sample t-test that assumes equal population variances. 
- If False, perform Welch’s t-test, which does not assume equal population variances. This is True by default.

- Researchers want to know whether or not two different species of plants have the same mean height. 
- To test this, they collect a simple random sample of 20 plants from each species.
- Use the following steps to conduct a two sample t-test to determine if the two species of plants have the same height.
- Before we perform the test, we need to decide if we’ll assume the two populations have equal variances or not. 
- As a rule of thumb, we can assume the populations have equal variances if the ratio of the larger sample variance to the smaller sample variance is less than 4:1. 
- The ratio of the larger sample variance to the smaller sample variance is 12.26 / 7.73 = 1.586, which is less than 4. 
- This means we can assume that the population variances are equal.

In [14]:
import numpy as np
import scipy.stats as stats

group1 = np.array([14, 15, 15, 16, 13, 8, 14, 17, 16, 14, 19, 20, 21, 15, 15, 16, 16, 13, 14, 12])
group2 = np.array([15, 17, 14, 17, 14, 8, 12, 19, 19, 14, 17, 22, 24, 16, 13, 16, 13, 18, 15, 13])

print(np.var(group1), np.var(group2))

stats.ttest_ind(a=group1, b=group2, equal_var=True)

7.727500000000001 12.260000000000002


Ttest_indResult(statistic=-0.6337397070250238, pvalue=0.5300471010405257)

- The <b> t test statistic is -0.6337 and the corresponding two-sided p-value is 0.53005 </b>.
- The two hypotheses for this particular two sample t-test are as follows:
    H0: µ1 = µ2 (the two population means are equal)
    HA: µ1 ≠µ2 (the two population means are not equal)
- Because the p-value of our test (0.53005) is greater than alpha = 0.05, we fail to reject the null hypothesis of the test. 
- We do not have sufficient evidence to say that the mean height of plants between the two populations is different.

## Paired T-Test

- Suppose we want to know whether a certain study program significantly impacts student performance on a particular exam. 
- To test this, we have 15 students in a class take a pre-test. 
- Then, we have each of the students participate in the study program for two weeks. 
- Then, the students retake a test of similar difficulty.
- To compare the difference between the mean scores on the first and second test, we use a paired samples t-test because for each student their first test score can be paired with their second test score.
- Perform the following steps to conduct a paired samples t-test in Python.

In [17]:
import scipy.stats as stats

pre = [88, 82, 84, 93, 75, 78, 84, 87, 95, 91, 83, 89, 77, 68, 91]
post = [91, 84, 88, 90, 79, 80, 88, 90, 90, 96, 88, 89, 81, 74, 92]

stats.ttest_rel(a=pre, b=post)

TtestResult(statistic=-2.9732484231168796, pvalue=0.01007144862643272, df=14)

- The <b>test statistic is -2.9732 and the corresponding two-sided p-value is 0.0101 </b>.
- In this example, the paired samples t-test uses the following null and alternative hypotheses:
    - H0: The mean pre-test and post-test scores are equal
    - HA:The mean pre-test and post-test scores are not equal