# Hypothesis testing 

In [1]:
import numpy as np
import scipy.stats as stats 
import pandas as pd 

Suppose that African elephants have weights distributed normally around a mean of 9000 lbs with a standard deviation of 900 lbs. 

Pachyderm Adventures has recently measured the weights of 35 Gabonese elephants and has calculated their average weight at 8637 lbs. 

Is the weight of Gabonese elephants different to the weight of African elephants? 

You will now perform a statistical test at significance level $\alpha = 0.05$. 

1. State the null and alternative hypotheses for this scenario. 

    * Null hypothesis: The weight of Gabonese elephants is the same as the weight of African elephants.

    * Alternative hypothesis: The weight of Gabonese elephants is different than the weight of African elephants.


2. What kind of statistical test do we need to run? Why? 

    * Two-tailed one-sample z-test. We assume the data is normally distributed, the samplee size $n \geq 30$, and we know the population mean and standard deviation.

3. What's the critical test statistic value we should use in this case? 

Since we are dealing with a two-tailed one-sample z-test: 

In [2]:
stats.norm.ppf(0.025), stats.norm.ppf(1-0.025)

(-1.9599639845400545, 1.959963984540054)

Two-tailed test: 
1. If the z-score we compute is greater than 1.96 or smaller than -1.96, then we can reject the null hypothesis at significance level 0.05 in favor of the alternative hypothesis. 

In [3]:
n = 35
sigma = 900

x_bar = 8637
mu = 9000

se = sigma/np.sqrt(n)
z = (x_bar - mu)/se

$\alpha = 0.05$ 

In [4]:
z

-2.386152179183512

z is smaller than -1.96, thus we can reject the null hypothesis in favor of the alternative hypothesis at significance level $\alpha = 0.05$. 

Another way of getting to same answer: 

In [5]:
stats.norm.sf(z)

0.9914871479192084

In [6]:
stats.norm.cdf(z)

0.008512852080791552

The area of the tail corresponding to this z-score is 0.0085. This is below 0.025. Thus we reject the null hypothesis in favor of the alternative at significance level $\alpha = 0.05$. 

**Would we be able to reject the null hypothesis if our significance threshold was $\alpha = 0.01$?**

To reject the null hypothesis at $\alpha = 0.01$, the areas of the two-tails need to add up to 0.01. 

In [7]:
alpha = 0.01
print(alpha/2)

0.005


The area of the tail corresponding to z = -2.386 is `stats.norm.cdf(z) = 0.0085`. 

In [8]:
0.0085 < 0.005

False

> Since the area of the tail corresponding to the z-score we obtained is 0.0085, which is greater than 0.005, we cannot reject the null hypothesis in favor of the alternative at a significance level of $\alpha = 0.01$. 

**What if we wanted to test if the weight of Gabonese elephants was smaller than the weight of African elephants at a significance level of 0.05?**

State the null and alternative hypothesis and perform the corresponding statistical hypothesis test.  

* Null hypothesis: The weight of Gabonese elephants is greater than or equal to the weight of African elephants.

* Alternative hypothesis: The weight of Gabonese elephants is less than the weight of African elephants.

What kind of test do we need to run?

> We need to run a one-tailed one-sample z-test. 

What's the critical test statistic value we should use? 

In [9]:
z_critical = stats.norm.ppf(0.05)

Compute the value of the test statistic we need to compare against the critical test statistic value. 

In [10]:
n = 35
sigma = 900

x_bar = 8637
mu = 9000

se = sigma/np.sqrt(n)
z = (x_bar - mu)/se
print(z)

-2.386152179183512


> Our z-statistic is smaller than the critical z-statistic, thus we can reject the null hypothesis in favor of the alternative, at a significance level of 0.05. 

Alternatively:

In [11]:
stats.norm.cdf(z)

0.008512852080791552

>The area under the tail corresponding to this z-score is below 0.05. Thus we can reject the null in favor of the alternative hypothesis at alpha =0.05. 

Would we be able to reject the null hypothesis at a significance level of $\alpha = 0.01$ in this case? 
* Yes, since the area under the tail is below 0.01. 

# Hypothesis testing 

Regardless of the type of statistical hypothesis test you're performing, there are five main steps to executing them:

1. Set up a null and alternative hypothesis 

2. Choose a significance level $\alpha$ (or use the one assigned). 

3. Calculate the test statistic. 

4. Determine the critical test statistic value or p-value. Find the rejection region. 

5. Compare the test statistic value to the critical test statistic value to accept or reject the null hypothesis. 

# Language of Hypothesis testing 

**Significance Level $\alpha$**

The significance level $\alpha$ is the marginal threshold at which you're okay with rejecting the null hypothesis. It is the probability of rejecting the null hypothesis when it is true. 

The most commonly used $\alpha$ in science is $\alpha = 0.05$. When you set $\alpha = 0.05$, you're saying "I'm okay with rejecting the null hypothesis if there is less than a 5% chance that the results I am seeing are actually due to randomness". 

**p-values**

The p-value is the probability of observing a test statistic at least as large as the one observed, by random chance, assuming that the null hypothesis is true. 

If $p \lt \alpha$, we reject the null hypothesis. 

If $p \geq \alpha$, we fail to reject the null hypothesis.

> **We do not accept the alternative hypothesis, we only reject or fail to reject the null hypothesis in favor of the alternative.**


**What if the experiment we perform fails to reject the null hypothesis?**

* We do not throw out failed experiments! 
* We say "this methodology, with this data, does not produce significant results" 
    * Maybe we need more data!

# z-tests vs t-tests

The z-test is used when you want to know if your sample comes from a particular population, your sample is large enough (size $\geq$ 30), and you know the population standard deviation. 

t-tests are used when you do not know the population standard deviation. 

In both cases, it is assumed that the samples are normally distributed. 

A t-test is like a modified z-test:
1. Penalize for small sample size; use "degrees of freedom" 
2. Use the _sample_ standard deviation $s$ to estimate the population standard deviation $\sigma$. 

T-distributions have more probability in the tails. As the sample size increases, this decreases and the t distribution more closely resembles the z, or standard normal, distribution. By sample size n = 1000 they are virtually indistinguishable from each other. 

# Examples

Let's work on another example. 

1. Given the following data, we want to know if the sample is different from the population: 

In [12]:
population_mean = 85
sample = [90, 100, 110]

State the null and alternative hypothesis: 

> Null hypothesis: $H_0$: The sample mean is the same as the population mean. 

> Alternative hypothesis: $H_1$: The sample mean is not the same as the population mean. 

What type of hypothesis test do we want to perform and why? 

> We want to perform a two-sided one-sample t-test. We do not know the population standard deviation, so we need to estimate it, and our sample size is small (n = 3). 

In [13]:
# Using scipy
stats.ttest_1samp(a=sample, popmean=population_mean)

Ttest_1sampResult(statistic=2.5980762113533156, pvalue=0.12168993434632014)

In [14]:
# By "hand"
mu = 85 
x_bar = np.mean(sample)
n = len(sample)
s = np.std(sample, ddof=1)
df = n-1

t = (x_bar - mu)/(s/n**0.5)
print(t)
print(df)

2.5980762113533156
2


Another example: 

2. I'm buying jeans from store A and store B. I know nothing about their inventory other than prices. 

``` python
store1 = [20,30,30,50,75,25,30,30,40,80]
store2 = [60,30,70,90,60,40,70,40]
```

Should I go just to one store for a less expensive pair of jeans? I'm pretty apprehensive about my decision, so $\alpha = 0.1$. It's okay to assume the samples have equal variances, for now. 

State the null and alternative hypotheses: 

> Null: Store A and B have the same jean prices. 

> Alternative: Store A and B do not have the same jean prices. 

What kind of test should we run? Why? Use `scipy` to perform your test. 

> Run a two-tailed two independent sample t-test. Sample sizes are small. 

In [15]:
store1 = [20,30,30,50,75,25,30,30,40,80]
store2 = [60,30,70,90,60,40,70,40]

stats.ttest_ind(store1, store2)

Ttest_indResult(statistic=-1.70113828065953, pvalue=0.10826653002468378)

We fail to reject the null hypothesis at a significance level of $\alpha = 0.1$. We do not have evidence to support that jean prices are different in store A and store B. 

**3. More practice:**

>A rental car company claims the mean time to rent a car on their website is 60 seconds with a standard deviation of 30 seconds. A random sample of 36 customers attempted to rent a car on the website. The mean time to rent was 75 seconds. Is this enough evidence to contradict the company's claim at a significance level of $\alpha = 0.05$? 

We know the population standard deviation and our sample size n $\geq$ 30. 
* We are going to perform a two-tailed one-sample z-test at a significance level of $\alpha=0.05$. 

Null hypothesis: There is no difference in the mean time to rent of the sample and the claim by the rental company. 

Alternative hypothesis: There is a difference in the mean time to rent of the sample and the claim by the rental company. 

For a two-sided one-sample test and $\alpha = 0.05$, the critical z-scores are -1.96 and 1.96. That is, if our computed z-statistic is below -1.96 or above 1.96, we have enough evidence to reject the null hypothesis in favor of the alternative.



In [16]:
# one-sample z-test 
z = (75 - 60)/(30/np.sqrt(36))
print(z)

3.0


The z-statistic is greater than 1.96, thus we can reject the null hypothesis in favor of the alternative hypothesis, at $\alpha = 0.05$.  

**4. Another example:** 

> A coffee shop relocates from Manhattan to Brooklyn and wants to make sure that all lattes are consistent. They believe each latte has 4 oz of espresso. A random sample of 25 lattes shows a mean of 4.6 oz and standard deviation of 0.22 oz. Are their lattes different now that they've relocated to Brooklyn? Use alpha = 0.05. 

State null and alternative hypothesis
1. Null: the amount of espresso in the lattes is the same as before 
2. Alternative: the amount of espresso in the lattes is different 

What kind of test? 
* two-tailed one-sample t-test
    * small sample size
    * unknown population standard deviation 
    * two-tailed because we want to know if amounts are same or different 

In [17]:
x_bar = 4.6 
mu = 4 
s = 0.22 
n = 25 

df = n-1

t = (x_bar - mu)/(s/n**0.5)
t

13.63636363636363

is this value in a rejection region? 

In [18]:
# critical values
stats.t.ppf(1-0.975, df), stats.t.ppf(0.975, df)

(-2.0638985616280205, 2.0638985616280205)

Yes. t > |t_critical|. we can reject the null hypothesis in favor of the alternative at $\alpha = 0.05$. 

**5. Another example**

You measure the delivery times of two different restaurants in your neighborhood, ten times each. You want to know if the restaurants have different delivery times.  It's okay to assume both samples have equal variances. Set your significance threshold to 0.05. 

``` python
delivery_times_A = [28.4, 23.3, 30.4, 28.1, 29.4, 30.6, 27.8, 30.9, 27.0, 32.8]
delivery_times_B = [26.4, 26.3, 27.4, 30.4, 25.1, 28.4, 23.3, 24.7, 31.8, 24.3]
```

State null and alternative hypothesis

> Null hypothesis: The delivery times for restaurant A are equal to delivery times for restaurant B. 

> Alternative hypothesis: Delivery times for restaurant A are not equal to delivery times for restaurant B. 

Type of test? 

> Two-sided unpaired two-sample t-test

In [26]:
delivery_times_A = [28.4, 23.3, 30.4, 28.1, 29.4, 30.6, 27.8, 30.9, 27.0, 32.8]
delivery_times_B = [26.4, 26.3, 27.4, 30.4, 25.1, 28.4, 23.3, 24.7, 31.8, 24.3]

In [27]:
stats.ttest_ind(delivery_times_A, delivery_times_B)

Ttest_indResult(statistic=1.7223240113288751, pvalue=0.10214880648482656)

We cannot reject the null hypothesis that restaurant A and B have equal delivery times. pvalue > 0.05. 

**6. Example, again**

Consider the gain in weight of 19 female rats between 28 and 84 days after birth. 

12 were fed on a high protein diet and 7 on a low protein diet. 

In [28]:
high_protein = [134, 146, 104, 119, 124, 161, 107, 83, 113, 129, 97, 123]
low_protein = [70, 118, 101, 85, 107, 132, 94]

Is there any difference in the weight gain of rats fed on high protein diet vs low protein diet? It's OK to assume equal sample variances. 

Null and alternative hypotheses? 

> null: there is no difference in the weight gain of rats who were fed a high protein diet vs a low protein diet 

> alternative: weight gains differ by kind of diet 

Kind of test and why?

> Two-sided unpaired two-sample t-test. Low sample size. Unknown population stdev. 

In [29]:
stats.ttest_ind(high_protein, low_protein)

Ttest_indResult(statistic=1.89143639744233, pvalue=0.07573012895667763)

In [30]:
def sample_variance(sample):
    sample_mean = np.mean(sample)
    return np.sum((sample - sample_mean)**2)/(len(sample) - 1)

def pooled_variance(sample1, sample2):
    n1, n2 = len(sample1), len(sample2)
    var1, var2 = sample_variance(sample1), sample_variance(sample2)
    return ((n1 -1)*var1 + (n2-1)*var2)/(n1+n2-2)

def twosample_tstatistic(sample1, sample2):
    mean_1, mean_2 = np.mean(sample1), np.mean(sample2)
    pool_var = pooled_variance(sample1, sample2)
    n_1, n_2 = len(sample1), len(sample2)
    num = mean_1 - mean_2
    denom = np.sqrt(pool_var* ((1/n_1) + (1/n_2)))
    return num/denom

In [31]:
twosample_tstatistic(high_protein, low_protein)

1.89143639744233

In [32]:
stats.t.ppf(q=0.975, df = len(sample1) + len(sample2) - 2)

2.10092204024096

> Cannot reject null hypothesis at alpha = 0.05 level (two-sided test).

What if we wanted to test if the rats who ate a high protein diet gained more weight than those who ate a low-protein diet? 

Null: weight gain by rats who ate high protein diet less than or equal to weight gain of low protein diet rats 

alternative: weight gain by rats who ate high protein diet greater than weight gain of low protein diet rats 

Kind of test? One-sided unpaired two-sample test 

In [33]:
stats.t.ppf(q=0.95, df = len(sample1)+len(sample2)-2) #critical t-statistic 

1.7340636066175354

Can reject the null hypothesis in favor of the alternative at alpha = 0.05, one-sided test. 

In [42]:
!pwd

/Users/lcolonmelendez/Projects/lesson-plans/section20
