# HYPOTHESIS TESTING

Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data.  Hypothesis Testing is basically an assumption that we make about the population parameter.

# Key Terms and Concepts

* **Null hypothesis:** Null hypothesis is a statistical hypothesis that assumes that the observation is due to a chance factor.  Null hypothesis is denoted by; H0: μ1 = μ2, which shows that there is no difference between the two population means.

* **Alternative hypothesis:** Contrary to the null hypothesis, the alternative hypothesis shows that observations are the result of a real effect.

* **Level of significance:** Refers to the degree of significance in which we accept or reject the null-hypothesis.  100% accuracy is not possible for accepting or rejecting a hypothesis, so we therefore select a level of significance that is usually 5%.

* **Type I error:** When we reject the null hypothesis, although that hypothesis was true.  Type I error is denoted by alpha.  In hypothesis testing, the normal curve that shows the critical region is called the alpha region.

* **Type II errors:** When we accept the null hypothesis but it is false.  Type II errors are denoted by beta.  In Hypothesis testing, the normal curve that shows the acceptance region is called the beta region.

* **Power:** Usually known as the probability of correctly accepting the null hypothesis.  1-beta is called power of the analysis.

* **One-tailed test:** When the given statistical hypothesis is one value like H0: μ1 = μ2, it is called the one-tailed test.

* **Two-tailed test:** When the given statistics hypothesis assumes a less than or greater than value, it is called the two-tailed test.

![](https://www.statisticssolutions.com/wp-content/uploads/2017/12/rachnovblog-768x310.jpg)

![](https://www.simplypsychology.org/type-1-and-2-errors.jpg?ezimgfmt=rs:555x410/rscb20/ng:webp/ngcb20)

# Hypothesis Testing

## One Sample Significance Tests

The purpose of One Sample Significance Tests is to check if a sample of observations could have been generated by a process with a specific mean or proportion.

Some questions that can be answered by one sample significance tests are:
* Is there equal representation of men and women in a particular industry?
* Is the normal human body temperature 98.6 F?

We will try and apply this test to a few real world problems in this notebook.

The Suicide dataset was obtained from Kaggle courtesy Rajanand Illangovan. You can download it here: https://www.kaggle.com/rajanand/suicides-in-india

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from scipy import stats

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Analyzing Suicides in India by Gender

Are men as likely to commit suicide as women?

This is the question we will attempt at answering in this section. To answer this question, we will use suicide statistics shared by the National Crime Records Bureau (NCRB), Govt of India. To perform this analysis, we need to know the sex ratio in India. The Census 2011 report states that there are 940 females for every 1000 males in India.

Let p denote the fraction of women in India.

If there is no correlation between gender and suicide, then the sex ratio of people committing suicides should closely reflect that of the general population. 

Let us now get our data into a Pandas dataframe for analysis.

In [None]:
#df = pd.read_csv('https://raw.githubusercontent.com/anntenna/Suicides-In-India/master/Suicides_in_India_2001-2012.csv')

df = pd.read_csv('/content/drive/My Drive/FDP/data/Suicides_in_India_2001-2012.zip')
df.head()

We can see that the number of female suicides is slightly lesser than the number of male suicides. There are also fewer females than males. How do we prove that females are as likely to commit suicide as males? This can be answered through hypothesis testing.|

#### Step 1: Formulate the hypothesis and decide on confidence level

The null hypothesis, as stated in the slides, is the default state. Therefore, I will state my null and alternate hypothesis as follows.

* **Null Hypothesis (H0)**: Men and women are equally likely to commit suicide.
* **Alternate Hypothesis (H1)**: Men and women are not equally likely to commit suicide.

If the null hypothesis is true, it would mean that the fraction of women committing suicide would be the same as the fraction of women in the general population. We now need to use a suitable statistica test to find out if this is indeed is the case.

Our statistical test will generate a p-value which has to be compared to a significance level ($\alpha$). If p is less than alpha, then it is extremely unlikely that the event must have occurred by chance and we would be reasonable in rejecting the null hypothesis. On the contrary, if the p-value is higher than $\alpha$, we will not be in a position to reject the null hypothesis.

Let us assume, $\alpha$ = 0.05

#### Step 2: Decide on the Statsitical Test

We will be using the One Sample Z-Test here. How to decide upon a test will be discussed in another notebook.

#### Step 3: Compute the p-value

Standard deviation. Compute the standard deviation (σ) of the sampling distribution.
σ = sqrt[ P * ( 1 - P ) / n ]

Test statistic. The test statistic is a z-score (z) defined by the following equation.
z = (p - P) / σ

where P is the hypothesized value of population proportion in the null hypothesis, p is the sample proportion, and σ is the standard deviation of the sampling distribution.

The p value is so small that Python has effectively rounded it to zero.

#### Step 4: Comparison and Decision

The p value obtained is extremely strong evidence to suggest that it is much lower than our significance level $\alpha$. We can thus safely disregard the null hypothesis and accept the alternate hypothesis (since it is the negation of the null hypothesis).

**Men and women are not equally likely to commit suicide.**

Note that this test says nothing about if men are more likely than women to commit suicide or vice versa. It just states that they are not equally likely. The reader is encouraged to form their own hypothesis tests to check these results.

### Analyzing the average heights of NBA Players

I was interested in knowing the average height of NBA playes. A quick Google search tells me that the average height of players between 1985-2006 was **6'7"** or 200.66 cm. Is this still the case?

To answer this question, we will be using the NBA Players Stats - 2014-2015 dataset on Kaggle courtesy DrGuillermo. The dataset can be downloaded here: https://www.kaggle.com/drgilermo/nba-players-stats-20142015

In [None]:
df2 = pd.read_csv('/content/drive/My Drive/FDP/data/NBA_player.csv')
df2.head()

#### Hypothesis Testing

One Sample Significance Test for Mean is extremely similar to that for Proportion. We will go through almost an identical process.

The hypotheses are defined as follows:
* **Null Hypothesis**: The average height of an NBA player is 200.66 cm.
* **Alternate Hypothesis**: The average height of an NBA player is not 200.66 cm.

Significance Level, $\alpha$ is at 0.05. Assuming Null Hypothesis to be true.

The p value obtained is much lesser than the significance level $\alpha$. We therefore reject the null hypothesis and accept the alternate hypothesis (the negation). We can therefore arrive at the following conclusion from this analysis:

**The average height of NBA Players is NOT 6'7"**.

## Two Sample Significance Tests

In the last section, we saw how one sample significance tests could be used to test if the proportion or the mean of a certain feature of a population is equal to a predefined proportion or mean respectively. In other words, we were comparing A sample with a prdefined value.

Two sample significance tests, on the other hand, allow us to compare two different populations and check if there is any meaningful difference in their means or proportions. The steps involved and the tools used are almost identical to the one sample significance test with one critical difference. The null hypothesis mean or proportion is assumed to be the difference of the means or proportions of the two populations and is set to zero.

Using two sample significance tests, we can answer questions such as:
* Is there racial discrimination when it comes to recruitment for white collar jobs?
* Is there a pay gap between men and women in the industry? Are women, on average, paid less?
* Do some universities involve in conscious racial discrimination? That is, are they more inclined to accept a student of a particular race as compared to another?

### Analyzing Literacy Rates

In this section, we will try and compare the literacy rates in the major areas of Punjab and Delhi ICT and discern if there is any meaningful difference between the two aforementioned quantities.

To answer this question, we will be using the 'Top 500 Indian Cities' dataset made available on Kaggle courtesy Arijit Mukherjee. The dataset can be found here: https://www.kaggle.com/zed9941/top-500-indian-cities

# Equations
https://www.statsdirect.co.uk/help/parametric_methods/utt.htm

In [None]:
df3 = pd.read_csv('/content/drive/My Drive/FDP/data/cities.csv')
df3.head()

From the above calculations, it can be seen that the mean and the standard deviations of Punjab and Delhi literacy rates differ slightly. The next step is to determine if this difference is a statistically significant one.

For hypothesis testing, the following are defined:

* **Null Hypothesis:** The true mean literacy rate for Punjab and Delhi are the same.
* **Alternate Hypothesis:** The true mean literacy rate for Punjab and Delhi are not the same.

The threshold value of $\alpha$ is assumed to be 0.05.
Assuming Null Hypothesis is true.

Since we are dealing with sample sizes less than 30, using the t-statistic will be more appropriate. To use student's t though, we need to calculate the degree of freedom. This is done as follows:

The value of p obtained here is much higher than the significance level $\alpha$. Therefore, we cannot reject the null hypothesis. It stands.

**The true mean literacy rate for Punjab and Delhi are the same.**

## Chi Squared Significance Test



Now, let us check out the Chi Squared Significance Test. The Chi Square test is used to check if there is a dependency between two ordinal or categorical variables. 

For instance, let's say we want to know the preference of ice cream between men and women. We give them three choices of ice creams and ask them to choose their favorite. Is there a gender preference for a certain type of ice cream? This is something that this test can answer for us.

### Analysing Airbnb Booking Trends by Gender

Do men and women have certain preferences when it comes to traveling to a certain country. To answer this question, we will be using the Airbnb Bookings dataset available on Kaggle. It can be downloaded here: https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings

The steps to be followed are the same as above. The only difference is in the statistical method that we are going to use.

For the hypothesis testing, we define the following:

* **Null Hypothesis:** There is no relationship between country preference and the sex of the customer.
* **Alternate Hypothesis:** There is a relationship between country preference and the sex of the customer.

We will assume our significance level, $\alpha$ to be 0.05. Let's begin!

In [None]:
df4 = pd.read_csv('/content/drive/My Drive/FDP/data/train_users_2.csv.zip')
df4.head()

To keep our calculations simple, we'll just be looking at France, Italy and Great Britain. However, please note that this test can theoretically be used for an arbritrarily large number of categories.

Let us now construct the contingency table.

The p-value that we have obtained is less than our chosen significance level. Therefore, we reject the null hypothesis and accept the negating alterate hypothesis.

**There is a relationship between country preference and the sex of the customer.**

Note that had we chosen our $\alpha$ to be 0.01, then we wouldn't have been able to reject our null hypothesis. Choosing the value of $\aplha$ depends on your relative tolerance of Type I and Type II error. We will not discuss this in this talk.

## Using Different Tests to arrive at the same result

### Racial Discrimination in Industry Recruitments

In this section, we are going to check for racial discrimination using two different tests: The test of two proportions (the two sample significance test) and the Chi Square significance test. We will also compare the statistical results arrived at by both these tests (specifically, the p-value).

The question we are trying to answer is if there is racial discrimination when it comes to giving callbacks to candidates. The first step, as usual, is to formulate our hypothesis.

* **Null Hypothesis:** There is no relation between race and callbacks
* **Alternate Hypothesis:** There is a relationship between race and callbacks

We are going to assume that the null hypothesis is true. Also, the significance level $\alpha$ is assumed to be 10% or 0.1.

In [None]:
df5 = pd.io.stata.read_stata('/content/drive/My Drive/FDP/data/us_job_market_discrimination.dta')
df5.head()

#### Chi Square Test

The p-value obtained is much lesser than our significance level. Therefore, we reject the null hypothesis and accept the negating alternate hypothesis.

**There is a relationship between race and callbacks.**

#### Two Sample Significance Test

Let us now do the same hypothesis testing using the two sample significance test for proportions.

We notice that the p-value arrived at is **identical** to what we got with the Chi Squared Independence test. Here again, obviously, we reject the null hypothesis and accept the alternate hypothesis. We can conclude that at a 0.1 significance level, there is a relationship between race and callbacks. 

When we are dealing with binary categorical variables, we have a choice between using the two aforementioned tests. We will arrive at (near) identical results.