**Table of contents**<a id='toc0_'></a>    
- [Hypothesis setting](#toc1_)    
  - [Check reality](#toc1_1_)    
  - [Formulate the hypothesis](#toc1_2_)    
    - [Two-tailed hypothesis](#toc1_2_1_)    
    - [Right-tailed hypothesis - **more than**](#toc1_2_2_)    
    - [Left-tailed hypothesis - **less than**](#toc1_2_3_)    
- [Hypothesis testing](#toc2_)    
  - [Choose significance/confidence level](#toc2_1_)    
  - [Sampling](#toc2_2_)    
    - [Intuition Building](#toc2_2_1_)    
  - [Running statistical test](#toc2_3_)    
    - [T-test](#toc2_3_1_)    
      - [💡 Do it yourself](#toc2_3_1_1_)    
    - [Matched-pair test - for **2 dependent samples**](#toc2_3_2_)    
    - [A/B Test - for **2 independent samples**](#toc2_3_3_)    
    - [ANOVA test - for **>2 independent samples**](#toc2_3_4_)    
- [📖 Self-guided: Hypothesis testing on proportions](#toc3_)    
- [Acknowledgements](#toc4_)    
- [References](#toc5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

In [None]:
import pandas as pd
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
data = pd.read_csv('https://raw.githubusercontent.com/data-bootcamp-v4/data/main/titanic_train.csv')
data.head()

# <a id='toc1_'></a>[Hypothesis setting](#toc0_)

![](https://media.giphy.com/media/1jaJf7liv7Dl5SrrTU/giphy.gif)

> You have seen that the prices in first class were on average
85 dollars and someone told you that prices in 3rd class were usually a fifth of prices in
first class.

## <a id='toc1_1_'></a>[Check reality](#toc0_)

⚠️**Hypotheses apply to the samples, not the population**⚠️

Our full titanic dataset is a census/population, therefore we can say precisely whether the ticket prices for the 3rd class are usually a 5th of the prices for the 1st class:

In [None]:
# Get the first and third class means
first_class = data[data.Pclass == 1]
first_class_price = first_class.Fare.mean()
third_class = data[data.Pclass == 3]
third_class_price = third_class.Fare.mean()

In [None]:
# Check the different means
print("A fifth of the first class price is:", round(first_class_price / 5, 2))
print("The third class price is:", round(third_class_price, 2))

We can see that our hypothesis is correct on average. But what about ships that don't have the population data?

## <a id='toc1_2_'></a>[Formulate the hypothesis](#toc0_)

### <a id='toc1_2_1_'></a>[Two-tailed hypothesis](#toc0_)

![](https://i0.wp.com/statisticsbyjim.com/wp-content/uploads/2018/11/t-test_two_tails_05.png?w=576&ssl=1)  
(Source: [Statistics by Jim](https://statisticsbyjim.com/hypothesis-testing/one-tailed-two-tailed-hypothesis-tests/))

**Null hypothesis ($H_0$):** 3rd class price mean = 17  

**Alternative hypothesis ($H_1$):** 3rd class price mean =/= 17

### <a id='toc1_2_2_'></a>[Right-tailed hypothesis - **more than**](#toc0_)

**Null hypothesis ($H_0$):** 3rd class price mean >= 17  

**Alternative hypothesis ($H_1$):** 3rd class price mean < 17

![](https://i0.wp.com/statisticsbyjim.com/wp-content/uploads/2018/11/t-test_right_tail_05.png?w=576&ssl=1)  
(Source: [Statistics by Jim](https://statisticsbyjim.com/hypothesis-testing/one-tailed-two-tailed-hypothesis-tests/))

### <a id='toc1_2_3_'></a>[Left-tailed hypothesis - **less than**](#toc0_)

**Null hypothesis ($H_0$):** 3rd class price mean <= 17  

**Alternative hypothesis ($H_1$):** 3rd class price mean > 17

![](https://i0.wp.com/statisticsbyjim.com/wp-content/uploads/2018/11/t-test_left_tail_05.png?w=576&ssl=1)  
(Source: [Statistics by Jim](https://statisticsbyjim.com/hypothesis-testing/one-tailed-two-tailed-hypothesis-tests/))

# <a id='toc2_'></a>[Hypothesis testing](#toc0_)

## <a id='toc2_1_'></a>[Choose significance/confidence level](#toc0_)

> The significance level, also known as alpha or α, is a measure of the strength of the evidence that must be present in your sample before you will reject the null hypothesis and conclude that the effect is statistically significant. The researcher determines the significance level before conducting the experiment.[$^{[1]}$](https://statisticsbyjim.com/glossary/significance-level/)

The significance level is chosen depending on how grave the consequences would be for committing a Type I error, i.e. rejecting the null hypothesis when the null hypothesis is true. Typically, the significance level cosen (especially by research papers) is 5% but in more serious cases (e.g. medical or legale field) you would go for significance levels of 1% or much lower.

Examples of high-stakes scenarios:
- Diagnosing someone with a very grave disease, e.g. cancer
- Convicting someone of murder

**Note:** A significance level of 5% is equivalent to a confidence level of 95%.

## <a id='toc2_2_'></a>[Sampling](#toc0_)

In [None]:
# Sample 30 passengers from 3rd class
c3_sample = data[data['Pclass']==3]['Fare'].sample(30)
c3_sample

In [None]:
# Check the mean of my sample
print("Third class price for the sample:", round(c3_sample.mean(), 2))

### <a id='toc2_2_1_'></a>[Intuition Building](#toc0_)

Let's take the following samples:

In [None]:
samp1 = np.array([10, 12, 9, 11, 9, 10, 10, 12, 13, 11])
samp2 = np.array([10, 12, 9, 11, 9, 10, 10, 12, 9, 8])

In [None]:
# Show sample means
print("samp1 mean:", samp1.mean())
print("samp2 mean:", samp2.mean())

Our null hypothesis ($H_0$) is that the population mean is 10. Which of the two samples is more "consistent" with the $H_0$?

Let's have a look at 2 other samples:

In [None]:
samp3 = np.array([10, 12, 9, 11, 9, 10, 10, 11, 12, 9])
samp4 = np.array([21, 2, 0, 21, 5, 15, 3, 17, 16, 2])

In [None]:
# Show sample means
print("samp3 mean:", samp3.mean())
print("samp4 mean:", samp4.mean())

Which of the two new samples is more "consistent" with the $H_0$?

Finally, let's have a look at 2 more samples:

In [None]:
samp6 = np.array([19, 2, 3, 17, 5, 15, 3, 17, 16, 3, 12])
samp7 = np.array([19, 2, 3, 17, 5, 15, 3, 17, 16, 3, 12, 19, 2, 3, 17, 5, 15, 3, 17, 16, 3, 12, 19, 2, 3, 17, 5, 15, 3, 17, 16, 3, 12,19, 2, 3, 17, 5, 15, 3, 17, 16, 3, 12])

In [None]:
# Check the mean and std
print("samp6 mean:", samp6.mean())
print("samp7 mean:", samp7.mean())
print("samp6 std:", samp6.std())
print("samp7 std:", samp7.std())

Which of the two new samples is more "consistent" with the $H_0$?

## <a id='toc2_3_'></a>[Running statistical test](#toc0_)

### <a id='toc2_3_1_'></a>[T-test](#toc0_)

Two-tailed test

In [None]:
# Apply test statistic
st.ttest_1samp(c3_sample, 17)

What p-value do you get?

Left-tailed test - **Less**

In [None]:
#...and for a single tailed experiment #requires scipy>1.6.0
st.ttest_1samp(c3_sample, 17, alternative='less')

#### <a id='toc2_3_1_1_'></a>[💡 Do it yourself](#toc0_)

Now repeat this exercise for the following hypothesis:

> The 1st class prices are different from $85.

Use the same sample size (30) and significance level (5%). What do you see?

**Tip:** Make sure to properly state your null and alternative hypothesis!

In [None]:
# Your code here

### <a id='toc2_3_2_'></a>[Matched-pair test - for **2 dependent samples**](#toc0_)

We measure the blood pressure of patients before and after applying a blood pressure treatment. The data in the two samples is dependent, as we're looking at the blood pressure for the same patient before and after the treatment. 

In [None]:
blood_pressure = pd.read_csv('https://raw.githubusercontent.com/sabinagio/data-analytics/main/data/blood_pressure.csv')
blood_pressure.head()

**Null hypothesis ($H_0$):** blood_pressure_before = blood_pressure_after  

**Alternative hypothesis ($H_1$):** blood_pressure_before != blood_pressure_after

In [None]:
st.ttest_rel(blood_pressure['after'], blood_pressure['before'])

In [None]:
st.ttest_rel(blood_pressure['before'], blood_pressure['after'])

**Null hypothesis ($H_0$):** blood_pressure_before <= blood_pressure_after  

**Alternative hypothesis ($H_1$):** blood_pressure_before > blood_pressure_after

In [None]:
st.ttest_rel(blood_pressure['before'], blood_pressure['after'], alternative='greater')

How would you control for a placebo effect?

### <a id='toc2_3_3_'></a>[A/B Test - for **2 independent samples**](#toc0_)

We created two different designs for our website - one has a CTA (Call-To-Action) in the middle of the page (A) and the other one has the CTA in the bottom right of the page (B). We want to see whether the change in the CTA position resulted in a change in time spent on the website. We say that the two samples are independent because we are looking at completely different groups of people.

In [None]:
ab_test = pd.read_csv('https://raw.githubusercontent.com/sabinagio/data-analytics/main/data/ab_test.csv')
ab_test.head()

**Null hypothesis ($H_0$):** retention_a = retention_b  

**Alternative hypothesis ($H_1$):** retention_a != retention_b

In [None]:
st.ttest_ind(ab_test['a'], ab_test['b'], equal_var=False)

In [None]:
sns.boxplot(data=ab_test)

**Null hypothesis ($H_0$):** retention_a <= retention_b  

**Alternative hypothesis ($H_1$):** retention_a > retention_b

In [None]:
st.ttest_ind(ab_test['a'], ab_test['b'], equal_var=False, alternative='less')

### <a id='toc2_3_4_'></a>[ANOVA test - for **>2 independent samples**](#toc0_)

We're a bank that is giving out loans and has different policies on interest rate setting depending on location. Now, the bank wants to see whether they should adjust the rate at a state level so that all city interest rates are the same. However, before doing this, the bank needs to check whether or not there is a difference in the first place!

In [None]:
interest_r = pd.read_csv('https://raw.githubusercontent.com/sabinagio/data-analytics/main/data/rate_by_city.csv')
interest_r.sample(7)

In [None]:
# Reshape the dataframe so it shows each sample separately

# Create new index by counting the individuals in each group (cumcount)
interest_r['city_count'] = interest_r.groupby('City').cumcount() ##is the new index

# Create pivot table
interest_r_pivot = interest_r.pivot(index='city_count', columns='City', values='Rate')

# Change titles to display "City_n"
interest_r_pivot.columns = ['City_' + str(x) for x in interest_r_pivot.columns.values]
interest_r_pivot.head()

**Null hypothesis ($H_0$):** interest_1 = interest_2 = interest_3 = interest_4 = interest_5 = interest_6  

**Alternative hypothesis ($H_1$):** interest_1 != interest_2 or interest_3 != interest_4 or interest_5 != interest_6 or... many more options

In [None]:
# Use Fischer's one-way test
st.f_oneway(interest_r_pivot['City_1'], interest_r_pivot['City_2'], interest_r_pivot['City_3'], interest_r_pivot['City_4'], interest_r_pivot['City_5'], interest_r_pivot['City_6'])

# <a id='toc3_'></a>[📖 Self-guided: Hypothesis testing on proportions](#toc0_)

The confidence intervals and hypothesis testing for proportions will have different assumptions compared to using absolute data for a population.

For example, if we look at the distribution of height for people in the US, we assume the population has a normal distribution and we base our statistical tests on that.

However, if we're looking at the proportion of citizens that are taller than 1.65m, instead of looking at a normal distribution, we're looking at a **Bernoulli distribution**:

![](https://imgs.search.brave.com/ZZbB8TH305JKKIEAj7oFVMyUTEBBa4Skk7a7vE4KR5g/rs:fit:860:0:0/g:ce/aHR0cHM6Ly93d3cu/bWF0aHdvcmtzLmNv/bS9oZWxwL2V4YW1w/bGVzL3N0YXRzL3dp/bjY0L0NvbXB1dGVC/ZXJub3VsbGlEaXN0/cmlidXRpb25QZGZF/eGFtcGxlXzAxLnBu/Zw)  
(Source: [Mathworks](https://www.mathworks.com/help/stats/bernoulli-distribution.html))

Given our distribution is much different from the normal distribution, we can no longer use the same statistics (mean and standard deviation of the sample) to compute confidence intervals and p-values. Instead, we will use the probabilities of the 2 possible events:
- **p** (probability that height > 1.65m)
- **1-p** (probability that height <= 1.65m)

We can get a sample of 100 people where 64 are taller than 1.65m and we hypothesize that 60% of the population is taller than 1.65m:

In [None]:
#import proportions_ztest function
from statsmodels.stats.proportion import proportions_ztest

#perform one proportion z-test
proportions_ztest(count=60, nobs=100, value=0.64)

For a significance level of 5%, we cannot reject the null hypothesis.

For a more in-depth overview of statistical tests for proportions, you can refer to this [series of tutorials from Khan Academy](https://www.khanacademy.org/math/statistics-probability/significance-tests-confidence-intervals-two-samples/comparing-two-proportions/v/comparing-population-proportions-1).

# <a id='toc4_'></a>[Acknowledgements](#toc0_)

Thank you, David Henriques, for your awesome lesson structure and content.

# <a id='toc5_'></a>[References](#toc0_)

[1] [Significance Level, Statistics by Jim](https://statisticsbyjim.com/glossary/significance-level/)