## T test

A t-test is a type of inferential statistics which is used to determine if there is a significant difference between the means of two groups which may have certain related features.

#### 1. One Sampled t-test
#### 2. Two Sampled t-test 

### One Sample t-test
(One group)
It will tell us whether means of the sample and population are different or not.

###         t = (Mean(x) - mu)/SD(x) 
         ### where
###         SD(x) = s/sqrt(n)

where

### mu = Proposed constant for the population
### Mean(x) = Sample mean
### n = Sample Size
### s = Sample SD
### SD(x) = Estimated SD error of the mean(s/sqrt(n)) 

In [1]:
ages = [10,20,35,50,28,40,55,18,16,55,30,25,43,18,30,28,14,24,16,17,32,35,26,27,65,18,43,23,21,20,19,70]

In [2]:
import numpy as np
ages_mean = np.mean(ages)
ages_mean

30.34375

In [3]:
#Let's take sample
sample_size = 10
age_sample = np.random.choice(ages,sample_size)
age_sample

array([25, 19, 23, 18, 24, 10, 65, 18, 24, 21])

In [4]:
import scipy
from scipy.stats import ttest_1samp

In [5]:
ttest, p_value = ttest_1samp(age_sample,30.34)
p_value

0.2598796401589057

In [6]:
# H0: There is no difference
# H1: There is difference

if(p_value < 0.05):        ## alpha value is 0.05 or 5%
    print("We reject H0.")
else:
    print("We accept H0.") 

We accept H0.


### Two Sample T test (Independent T test)

(Two Independent groups)
It compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different.

####  t = (x1_bar - x2_bar)/sqrt(s**2*((1/n1)+(1/n2)))

where

####  s**2 =          

In [7]:
import scipy.stats as stats
import math
np.random.seed(12)
ClassB_ages = stats.poisson.rvs(loc=18,mu=33,size =60)
ClassA_ages = stats.poisson.rvs(loc=18,mu=30,size=60)
print(ClassA_ages.mean(),ClassB_ages.mean())

47.4 50.63333333333333


In [8]:
_,p_value = stats.ttest_ind(a=ClassA_ages,b=ClassB_ages,equal_var=False)
p_value

0.00148827761873106

In [9]:
if p_value < 0.05:
    print("We reject H0.")
else:
    print("We accept H0.")

We reject H0.


### Paired T test
Checks how different samples are from the same group based on time interval or situation, etc.

In [10]:
weight1 = [25,30,28,35,28,34,26,29,30,26,28,32,31,30,45]
weight2 = weight1+stats.norm.rvs(scale=5,loc=1.25,size=15)
weight1

[25, 30, 28, 35, 28, 34, 26, 29, 30, 26, 28, 32, 31, 30, 45]

In [11]:
weight2

array([28.15056793, 26.46663484, 35.59929495, 37.00259688, 25.4243432 ,
       32.56489822, 19.1935255 , 31.07494424, 26.19611891, 26.67300315,
       34.9501934 , 34.94001222, 34.62757231, 44.4452304 , 49.70553787])

In [12]:
weight_df =  pd.DataFrame({"weight_10" : np.array(weight1), "weight_20" : np.array(weight2), "weight_change": np.array(weight2) - np.array(weight1)})
weight_df

<IPython.core.display.Javascript object>

Unnamed: 0,weight_10,weight_20,weight_change
0,25,28.150568,3.150568
1,30,26.466635,-3.533365
2,28,35.599295,7.599295
3,35,37.002597,2.002597
4,28,25.424343,-2.575657
5,34,32.564898,-1.435102
6,26,19.193525,-6.806475
7,29,31.074944,2.074944
8,30,26.196119,-3.803881
9,26,26.673003,0.673003


In [13]:
_,p_value = stats.ttest_rel(a=weight1,b=weight2)
p_value

0.1687755106412083

In [14]:
if p_value < 0.05:
    print("We reject H0.")
else:
    print("We accpet H0.")

We accpet H0.


## Chi Square Test

- Two categorical variables froma a single population
- Used to determine if there is significant association between any two categorical variables

In [15]:
import seaborn as sns
data = sns.load_dataset('tips')
data.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [16]:
data_table = pd.crosstab(data['sex'],data['smoker'])
data_table

<IPython.core.display.Javascript object>

smoker,Yes,No
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,60,97
Female,33,54


In [17]:
#Observed Values
obs = data_table.values
obs

array([[60, 97],
       [33, 54]], dtype=int64)

In [18]:
val = stats.chi2_contingency(data_table)

In [19]:
val

(0.008763290531773594,
 0.925417020494423,
 1,
 array([[59.84016393, 97.15983607],
        [33.15983607, 53.84016393]]))

In [20]:
Expected = val[3]
Expected

array([[59.84016393, 97.15983607],
       [33.15983607, 53.84016393]])

In [21]:
no_of_rows = len(data_table.iloc[0:2,0])
no_of_col = len(data_table.iloc[0,0:2])
ddof = (no_of_rows-1)*(no_of_col-1)
print("Degree of Freedom", ddof)
alpha = 0.05

Degree of Freedom 1


In [26]:
from scipy.stats import chi2
chiSq = sum([(o-e)**2/e for o,e in zip(obs,Expected)])
chiSq_stat = chiSq[0] + chiSq[1]
print("-",chiSq_stat)

- 0.001934818536627623


#### Calculate Chi Square Statistic OR P Value

In [28]:
critical_value = chi2.ppf(q=1-alpha,df=ddof)
critical_value

3.841458820694124

In [32]:
p_value = 1-chi2.cdf(x=chiSq_stat,df=ddof)
print(p_value,alpha,ddof)

0.964915107315732 0.05 1


In [34]:
if chiSq_stat >= critical_value:
    print("Reject H0.")
else:
    print("Accept H0.")

if p_value <= alpha:
    print("Reject H0.")
else:
    print("Accept H0.")

Accept H0.
Accept H0.
