## T-test, Chi-Square-test, ANOVA test
### p-value also known as significance value.
* We come to a conclusion based on a sample dataset. Based on the sample dataset, we'll divide into Null Hypothesis & Alternate Hypothesis.

* Chi-square-test when comparing 2 categorical features 
* T-test when numerical feature, categorical feature with 2 categories
* ANOVA-test when numerical feature, categorical feature with more than categories

## T Test

A t-test is a type of inferential statistic which is used to determine if there is a significant difference between the means of two groups which may be related in certain features

T-test has 2 types : 1. one sampled t-test 2. two-sampled t-test.

#### one-sampled t-test

In [2]:
ages=[10,20,35,50,28,40,55,18,16,55,30,25,43,18,30,28,14,
      24,16,17,32,35,26,27,65,18,43,23,21,20,19,70]
len(ages)

32

In [3]:
import numpy as np
ages_mean=np.mean(ages)
print(ages_mean)

30.34375


In [6]:
## Lets take sample
sample_size=10
age_sample=np.random.choice(ages,sample_size)
print(age_sample)

[21 26 55 40 24 65 30 14 30 55]


#### Null hypothesis : There is NO difference between sample mean and population mean
#### Alternate Hypothesis : There is difference between sample mean and population mean

In [9]:
from scipy.stats import ttest_1samp

t_test, p_value = ttest_1samp(age_sample, 30)
print(f"p-value: {p_value}")
print(f"t-test value: {t_test}")

p-value: 0.2935524176524259
t-test value: 1.11545489883563


In [11]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
    print("We are rejecting null hypothesis")
else:
    print("We are accepting null hypothesis")

We are accepting null hypothesis


### Some More Examples
Consider the age of students in a college and in Class A

In [13]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import math
np.random.seed(6)

# poisson distribution
# age feature starts from 18, considering mean=35 and the population=1500
school_ages=stats.poisson.rvs(loc=18,mu=35,size=1500)

# Sample considering size=60
classA_ages=stats.poisson.rvs(loc=18,mu=30,size=60)

In [15]:
t_test, p_value=stats.ttest_1samp(a=classA_ages,popmean=school_ages.mean())
print(f"p-value: {p_value}")
print(f"t-test value: {t_test}")

p-value: 1.139027071016194e-13
t-test value: -9.604796510704091


In [16]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
    print("We are rejecting null hypothesis")
else:
    print("We are accepting null hypothesis")

We are rejecting null hypothesis


## Two-sample T-test With Python
The Independent Samples t Test or 2-sample t-test compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. The Independent Samples t Test is a parametric test. This test is also known as: Independent t Test.

In [17]:
np.random.seed(12)
ClassB_ages=stats.poisson.rvs(loc=18,mu=33,size=60)
ClassB_ages.mean()

50.63333333333333

In [19]:
t_test, p_value=stats.ttest_ind(a=classA_ages, b=ClassB_ages,equal_var=False)
print(f"p-value: {p_value}")
print(f"t-test value: {t_test}")

p-value: 0.00039942095100859375
t-test value: -3.647123483685195


In [20]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
    print("We are rejecting null hypothesis")
else:
    print("We are accepting null hypothesis")

We are rejecting null hypothesis


### Paired T-test With Python
When you want to check how different samples from the same group are, you can go for a paired T-test

In [23]:
weight1=[25,30,28,35,28,34,26,29,30,26,28,32,31,30,45]
weight2=weight1+stats.norm.rvs(scale=5,loc=-1.25,size=15)
print(weight1)
print(weight2)

[25, 30, 28, 35, 28, 34, 26, 29, 30, 26, 28, 32, 31, 30, 45]
[10.65253339 32.44523171 30.08750529 33.98201318 30.08038798 39.78974242
 25.00574593 23.0701257  19.55445346 25.06889364 23.18015951 27.94057668
 24.08765575 30.12145502 47.42956055]


In [25]:
weight_df=pd.DataFrame({"weight_10":np.array(weight1),
                         "weight_20":np.array(weight2),
                       "weight_change":np.array(weight2)-np.array(weight1)})

weight_df

Unnamed: 0,weight_10,weight_20,weight_change
0,25,10.652533,-14.347467
1,30,32.445232,2.445232
2,28,30.087505,2.087505
3,35,33.982013,-1.017987
4,28,30.080388,2.080388
5,34,39.789742,5.789742
6,26,25.005746,-0.994254
7,29,23.070126,-5.929874
8,30,19.554453,-10.445547
9,26,25.068894,-0.931106


In [26]:
test_, p_value=stats.ttest_rel(a=weight1,b=weight2)
print(f"p-value: {p_value}")
print(f"t-test value: {t_test}")

if p_value < 0.05:    # alpha value is 0.05 or 5%
    print("We are rejecting null hypothesis")
else:
    print("We are accepting null hypothesis")

p-value: 0.12415067833129617
t-test value: -3.647123483685195
We are accepting null hypothesis
