## Hypothesis testing

A hypothesis is a preconceived knowledge about the nature or the values of parameters of the population. We assume something as true and then verify our assumption based on certain statistical tests e.g. if we have a group of people and we think that the average weight of the group is 70 kgs, we would now want to conduct some tests to see if this assumption is statistically true or not.

Basic question that we are trying to answer using hypothesis testing - "Did our sample come from the same population we assume is underlying the null hypothesis?" 

## Parametric Vs Non-parametric tests

**Parametric** tests makes some assumption about the distribution of the underlying population while the **non-parametric** tests do not rely on the assumptions that the data are drawn from a given probability distribution. Z test, t test, F test are all parametric while chi square test is non parametric.

## Basics of hypothesis testing

### Null and alternative hypothesis

A null hypothesis is the status quo or the assumption that we want to check. Even though the information is only available from a sample, the hypothesis is stated in terms of population parameters. It is usually always denoted by $H_{0}$. It can usually be associated with terms like **status quo/assumption/given** and always contain **equality sign(equal to, greater than equal to or less than equal to)**.

Alternative hypothesis is always specified along with a null hypothesis and is denoted by $H_{1}$. It represents the conclusion reached by rejecting the null hypothesis. It can usually be associated with terms like **unknown/claim/assertion** and do not contain the **equality sign(not equal to, greater than or less than)**.

In hypothesis testing, failure to reject the null hypothesis is not proof that its true. We can never prove that the null hypothesis is correct because the decision is based only on the sample information, not on the entire population. The term used is "we fail to reject the null hypothesis" and not "we accept null hypothesis".

If the sample statistic is close enough to the population parameter, we have insufficient evidence to reject the null hypothesis.

### Critical value of test-statistic

In our example of weights of a group of people, if we pick a sample and calculate their average weight, we should expect the average weight of the sample to be close to average weight of the entire group. In that case, we would have insufficient evidences to reject null hypothesis as its likely to get such a sample from our population. However, if there is a large difference between the sample statistic and the hypothesized value of population parameter, we might want to reject the null hypothesis.

The critical value of test statistic helps us differentiate between the two situations. We can calculate the probability of getting a particular sample result if the null hypothesis is true. This is done by determining the sampling distribution for the sample statistic as the sampling distribution often follows a well-known statistical distribution. This is basically the value which separates the rejection and non-rejection region.

### Regions of rejection(or critical region) and non-rejection

![rejectionregion.png](attachment:rejectionregion.png)

The sampling distribution of the test statistic can be divided into two regions - region of rejection and non-rejection. The critical value is used to make this division. If the test statistic falls into region of rejection, we reject the null hypothesis. Or we can say, the rejection region consists of the values of the test statistic that are unlikely to occur if the null hypothesis is true.

The size of the critical region depends on the amount of risk we would like to take.

### Type 1 and type 2 errors

**Type 1 error** – you reject the null hypothesis when it should not be rejected. Probability of type 1 error is called $\alpha$ or level of significance. The complement of probability of type 1 error or (1 – $\alpha$) is called confidence coefficient. This is the probability that you will not reject the null hypothesis when it is true and should not be rejected.


**Type 2 error** – you do not reject the null hypothesis when it should have been rejected. Probability of type 2 error is called $\beta$. The complement of probability of type 2 error or (1 – $\beta$) is called power of a statistical test. This is the probability of rejecting the null hypothesis when it is false and should be rejected.

As $\alpha$ decreases, so does the chance of our Type 1 errors or the critical values move outwards. However, as the critical value move outwards, we may also capture a mean from a different population off to the side and the type 2 error increases.

Type 1 error can be directly controlled by changing the alpha value but if we reduce the type 1 error by taking a small value of alpha, it will increase the type 2 error. The choice of reasonable values of alpha and beta depends on the cost and impact associated with each type of error.

![type1vstype2.jpg](attachment:type1vstype2.jpg)

### One tailed test Vs two-tailed test

If the null hypothesis has a = sign and the alternate hypothesis does not, this means that it’s a two-tailed test. The rejection region is then equally distributed on both sides. The one-tailed test can be left tailed or right tailed based on the inequality sign. 

![1tailedvs2tailed.png](attachment:1tailedvs2tailed.png)

<div>
<img src="attachment:1tailedvs2tailed.png"/>
</div>    

### p-value

Sampling(or statistics) can never be perfect. We will always come across some weird samples which might give us incorrect impression e.g. one random sample with mean weight much lesser than actual weight does not mean that there is a problem with overall weights using all other samples.

It can be defined as the probability value of getting the statistical summary result equal to or more extreme than the actual observed results, given that the null hypothesis is true.

If the p-value is less than the chosen level of significance, we must reject the null hypothesis.

A p-value is the probability that the results from your sample data occurred by chance. 

### PDF Vs CDF Vs PPF

1. *Probability Density Function (PDF)*: Returns the **probability** for an observation having a **specific** value from the distribution.        

2. *Cumulative Density Function (CDF)*: Returns the **probability** for an observation **equal to or lesser than** a specific value from the distribution.          

3. *Percent Point Function (PPF)*: Returns the **observation value** for the provided probability that is **less than or equal to** the provided probability from the distribution.

### PMF vs PDF

**PMF** - Probability mass function refers to **discrete** probabilities. Specifically p(x) is the probability the random variable equals x.

**PDF** - Probability density function refers to **continuous** probabilities. Specifically since the probability of a specific vale occurring is 0 the probability the the random variable is between a and b is the integral from a to b of the pdf.

## Z-test for the mean

### What is a z-score?

A z-score simply tells us how far is a specific point from the mean or how many standard deviations is a point from the mean. This can be compared against a Z-table which gives us the cumulative probability of a standard normal distribution upto a specific z-score.

### Z-test

The Z-test for the mean is used when we know the population standard deviation or $\sigma$ and if the population is normally distributed. If the population is not normally distributed, you can still use the z-test if the sample size is large enough (Central Limit Theorem). Also, we need the sample size to be greater than 30. Not used extensively in the industry because the population standard deviation is normally not known.

Below is the equation for the test statistic for the z-test

![ztest.png](attachment:ztest.png)

Look for examples in a different notebook. 

## t-test for the mean

**T-test is used when the population variance is not known. Some literature also says that t-test should be used when n < 30. As the population size increases, t-distribution approaches the z-distribution or normal distribution.**

The t score is a ratio between the difference between two groups and the difference within the groups. The larger the t score, the more difference there is between groups. The smaller the t score, the more similarity there is between groups.

A t-distribution is more or less similar to z-distribution except the fact that it has larger standard deviation or larger tails. As the sample size increases, it approaches normal distribution.

Below is the equation for the test statistic

![t-test.png](attachment:t-test.png)![tdistribution.png](attachment:tdistribution.png)

### One sample t-test

A one-sample t-test checks whether a sample mean differs from the population mean. In this case the null hypothesis would be that the sample mean is the same as population mean.

#### Example 1

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import math

In [30]:
np.random.seed(10)

x_pop = stats.poisson.rvs(loc=20, mu=30, size=100000)

x_sample = stats.poisson.rvs(loc=20, mu=30, size = 25)

print( x_pop.mean() )
print( x_sample.mean() )

49.99248
51.2


In [31]:
#ttest at 95% confidence interval
stats.ttest_1samp(a= x_sample,               # Sample data
                 popmean= x_pop.mean())      # Pop mean

Ttest_1sampResult(statistic=1.3046331164585383, pvalue=0.20439015819094758)

So the p-value is higher than the significance level or the sample mean lies in the non-rejection region. Or we fail to reject the null hypothesis in this case.

#### Example 2

A company that manufactures chocolate bars is particularly concerned that the mean weight of a chocolate bar is not greater than 6.03 ounces. A sample of 50 chocolate bars is selected. The sample mean is 6.034 ounces and the sample standard deviation is 0.02 ounces. Using $\alpha$ = 0.01 level of significance, is there evidence that the population mean weight of the chocolate bars is greater than 6.03 ounces? 

Null hypothesis($H_{0}$) - mean <= 6.03               
Alternative hypothesis($H_{1}$) - mean > 6.03

In [45]:
x_bar = 6.034
mu_0 = 6.03
s = .02
n = 50

test_stat = (x_bar - mu_0)/(s/np.sqrt(n))
test_stat

1.4142135623729393

In [46]:
# Critical value
stats.t.ppf(q=0.99,  # Quantile to check
            df=49)  # Degrees of freedom

2.4048917596601207

Since the test statistic is less than the critical value and this being a right tailed test, the point lies in the non-rejection region and thus we fail to reject the null hypothesis.

### Two sample t-test

A two sample test compares the means of two independent samples. The null hypothesis states that the mean for two groups is the same.

In [39]:
np.random.seed(10)

x_s1 = stats.poisson.rvs(loc=20, mu=30, size=10000)

x_s2 = stats.poisson.rvs(loc=20, mu=25, size=25)

# sample mean is almost the same as population mean
print( x_s1.mean() )
print( x_s2.mean() )

49.9609
44.44


In [41]:
stats.ttest_ind(a= x_s1,
                b= x_s2)

Ttest_indResult(statistic=5.036463908474603, pvalue=4.824696564657397e-07)

With such a small p-value we can go ahead and reject the null hypothesis or we can infer that the two samples have different means.

### Paired t-test

This is used for related population i.e. if we assume that the difference scores are randomly and independently selected from a population that is normally distributed, we can use the paired t-test for the mean difference in related populations to determine whether there is significant population mean difference. The null hypothesis in this case should be that the mean difference between the paired observations is zero.

Below is the equation for the test statistic

![pairedt-test.jpg](attachment:pairedt-test.jpg)

In [43]:
np.random.seed(1)

before= stats.norm.rvs(scale=30, loc=20, size=25)

after = before + stats.norm.rvs(scale=10, loc=18, size=25)

df = pd.DataFrame({"before":before,
                          "after":after,
                          "change":after-before})

df.describe()             # Check a summary of the data

Unnamed: 0,after,before,change
count,25.0,25.0,25.0
mean,37.235105,19.618103,17.617002
std,37.046429,32.944143,8.673528
min,-37.962768,-49.046161,6.826897
25%,8.292066,-2.836207,11.128273
50%,39.119116,14.827154,16.081644
75%,71.19771,47.025678,20.344157
max,86.376818,72.344353,39.002551


In [44]:
stats.ttest_rel(a = before,
                b = after)

Ttest_relResult(statistic=-10.155614836901899, pvalue=3.6326356791646245e-10)

The p-value is much smaller than our significance level, thus we can reject the null hypothesis and infer that the mean difference between the paired observations is more than 0. We are looking at a two-tailed test in this case but if we want to look at a one tailed test, we can change the significance level e.g. it would become 2.5% as compared to 5%.

## ANOVA

ANOVA or analysis of variance is used to compare three or more groups or samples using a single test. With a One Way, you have one independent variable affecting a dependent variable.

F-statistic is the test statistic used for ANOVA. It is basically a ratio of between and within group variances.

It essentially answers the question: do any of the group means differ from one another?

![anova.jpg](attachment:anova.jpg)

Null hypotheses($H_{0}$): Groups means are equal (no variation in means of groups or variation just because of random sampling)      
Alternative hypotheses($H_{1}$): At least, one group mean is different from other groups

### Example 1

In [40]:
df = pd.read_csv('anova_data.csv')
df

Unnamed: 0,A,B,C,D
0,25,45,30,54
1,30,55,29,60
2,28,29,33,51
3,36,56,37,62
4,29,40,27,73


In [41]:
df['A'].mean(),df['B'].mean(),df['C'].mean(),df['D'].mean()

(29.6, 45.0, 31.2, 60.0)

In [42]:
df['A'].var(),df['B'].var(),df['C'].var(),df['D'].var()

(16.3, 125.5, 15.2, 72.5)

In [43]:
stats.f_oneway(df['A'], df['B'], df['C'], df['D'])

#fvalue, pvalue = stats.f_oneway(df['A'], df['B'], df['C'], df['D'])
#print(fvalue, pvalue)

F_onewayResult(statistic=17.492810457516338, pvalue=2.639241146210922e-05)

Given that we are getting such a low p-value and a high F-statistic, we can safely reject the null hypothesis or infer that at least one group mean is different from other groups.

## Proportion hypothesis test

This is used when we want to test the proprtion of events of interest in the population,$p_{0}$, rather than mean. So we collect a random sample and calculate the sample proportion value. Now this value can be compared against the hypothesized value for the population.

If the number of events of interest and the number of events which are not of interest, both are more than 5, then we can assume that the sampling distribution for the proportion aprroximately follows normal distribution.

Below is the equation for the test statistic

![proportionz-test.png](attachment:proportionz-test.png)

### Example 1

In a large sample of 500 items manufactured by a company, the number of defective items was observed to be 20. The purchaser claims that 5% of the items are defective. Is the claim justified? Test it with 5% level of significance.

In [44]:
import numpy as np
p_pop = 0.05
n = 500
p_sample = 20/500

$H_{0}$ can be stated as p_pop is greater than or equal to 0.05        
$H_{1}$ can be stated as p_pop is less than 0.05

In [45]:
se = np.sqrt(p_pop*(1-p_pop)/n)
se

0.009746794344808964

In [46]:
test_stat = (p_sample - p_pop)/se
test_stat

-1.0259783520851542

In [47]:
z_crit = stats.norm.ppf(0.05)
z_crit

-1.6448536269514729

Given that the test statistic value is less than the critical value, the point lies in the non-rejection region or we cannot reject the null hypothesis.

### Example 2

We assume that 470 heads were obtained in 1000 throws of an unbiased coin. Test the hypothesis that the coin is fair. Use 5% level of significance.

In [1]:
p_pop = 0.5
n = 1000
p_sample = 470/1000

$H_{0}$ can be stated as p_pop is equal to 0.5        
$H_{1}$ can be stated as p_pop is not equal to 0.5

In [9]:
import numpy as np
import scipy.stats
se = np.sqrt(p_pop*(1-p_pop)/n)
se

0.015811388300841896

In [10]:
test_stat = (p_sample - p_pop)/se
test_stat

-1.8973665961010293

In [12]:
z_crit_lower = scipy.stats.norm.ppf(0.025)
z_crit_lower

-1.9599639845400545

In [14]:
z_crit_upper = scipy.stats.norm.ppf(0.975)
z_crit_upper

1.959963984540054

With 5% level of significance, the non-rejection region would lie between -1.96 and +1.96. Our test statistic has a value of -1.9 which is inside the non-rejection region. Thus we fail to reject the null hypothesis.

## Confidence intervals

A point estimate gives a good idea about the population parameter but given that estimates are prone to error, a confidence interval gives an improved way of getting the true population parameter at some predetermined confidence level.

A confidence interval is created by adding or subtracting a margin of error from the point estimate to create a range. Margin of error is defined by the confidence level selected e.g. for a 95% confidence level, we would interpret the confidence interval as the range which would contain the sampling mean for 95% of the samples i.e. if we take 100 samples, 95% of the samples would contain the population mean.

Margin of error is defined as

z * s/$\sqrt{n}$

### Example 1

In [53]:
np.random.seed(10)
x = stats.poisson.rvs(loc=20, mu=35, size=100000)

sample_size = 100
sample = np.random.choice(a = x,size = sample_size)
sample_mean = sample.mean()

z_crit = stats.norm.ppf(0.975)

s = x.std()

margin_of_error = z_crit * s/np.sqrt(sample_size)
conf_int = (sample_mean - margin_of_error,sample_mean + margin_of_error)
conf_int

(53.46093327866088, 55.77906672133911)

### Example 2

Let's take a smaller sample size and create a confidence interval using t-distribution. When using the t-distribution, we have to supply a degree of freedom value which can simply be the number of observations - 1.

In [54]:
sample_size = 25
sample = np.random.choice(a = x,size = sample_size)
sample_mean = sample.mean()

t_crit = stats.t.ppf(q = 0.975,df = sample_size -1)


sample_std = sample.std()
sigma = sample_std/np.sqrt(sample_size)

margin_of_error = t_crit * sigma
conf_int = (sample_mean - margin_of_error,sample_mean + margin_of_error)
conf_int

(52.23332667018051, 58.006673329819485)

In [55]:
# confidence interval using just a formula
stats.t.interval(alpha = 0.95, df = sample_size - 1, loc = sample_mean,scale = sigma)

(52.23332667018051, 58.006673329819485)