# P value Ttest and Correlation in python

## T test
Is a type of inferential statistic that is used to determine if there is a significant difference between the means of 2 groups that may be related with some features

It has 2 types:

-one sampled t test

-2 sampled t test


## One sample T test

In [1]:
ages=[10,20,45,67,56,34,23,31,52,12,23,28,35,44,15,16,29,39,59,48,80,70,50,26]
len(ages)

24

In [2]:
# now we calculate the mean
import numpy as np
ages_mean=np.mean(ages)
print(ages_mean)

38.0


Now we have to consider a sample of this population

In [5]:
# we take a random sample of 10
sample_size=10
ages_sample=np.random.choice(ages,sample_size)
ages_sample

array([80, 52, 45, 59, 26, 44, 31, 10, 56, 23])

Now we consider 2 hypothesis: H0 which says we have no difference and H1 which is the alternative

We calculate P value and conclude

In [6]:
# to perform the 1 sample t test we import the following
from scipy.stats import ttest_1samp

In [9]:
#compare the sample to the whole population mean
ttest,p_value=ttest_1samp(ages_sample,30)

In [10]:
print(p_value)
if p_value<0.05:
    print ("we reject H0")
else:
    print('We accept H0')

0.08444511263412664
We accept H0


## More complex examples
Consider the Age of students in college in class A

In [12]:
import pandas as pd
import scipy.stats as stats
import math


In [15]:
np.random.seed(6)
# mean is 35, left-most value is 18
school_ages=stats.poisson.rvs(loc=18,mu=35,size=1500)
classA_ages=stats.poisson.rvs(loc=18,mu=30,size=60)

In [16]:
# see the mean
classA_ages.mean()

46.9

In [21]:
_,p_value=stats.ttest_1samp(a=classA_ages,popmean=school_ages.mean())

In [22]:
school_ages.mean()

53.303333333333335

In [23]:
p_value

1.139027071016194e-13

In [24]:
# this is wayw less than 0.05 so we reject null hyphotesis
if p_value<0.05:
    print ("we reject H0")
else:
    print('We accept H0')

we reject H0


# 2 Sampled T test
This compares the means of 2 independent groups to see if their means are very different



In [25]:
# we initialize another class B
np.random.seed(12)
ClassB_ages=stats.poisson.rvs(loc=18,mu=33,size=60)
ClassB_ages.mean()

50.63333333333333

In [28]:
# this is applied to the whole population
_,p_value=stats.ttest_ind(a=classA_ages,b=ClassB_ages,equal_var=False)
     

In [30]:
print(p_value)
# this is wayw less than 0.05 so we reject null hyphotesis
if p_value<0.05:
    print ("we reject H0")
else:
    print('We accept H0')

0.00039942095100859375
we reject H0


# P paired T test
When we want to check how different samples from the same group are

Let s say we have the weight of some students, and same but years after

In [33]:
weight1=[45,47,49,50,56,57,58,59,62,73,75,79,80,89,88,64,65,77]
#we add a normal distribuition to the same weights
weight2=weight1+stats.norm.rvs(scale=5,loc=1.25,size=18)

In [34]:
print(weight1)
print(weight2)

[45, 47, 49, 50, 56, 57, 58, 59, 62, 73, 75, 79, 80, 89, 88, 64, 65, 77]
[46.48201318 51.58038798 57.28974242 51.50574593 52.5701257  49.05445346
 59.56889364 56.68015951 60.44057668 68.58765575 77.62145502 83.92956055
 83.42159375 84.64979448 93.69547305 66.82253473 53.80997768 81.22954305]


In [36]:
# now we take the 2 hypothesis
# H0- there is no statistical difference
# H1- there is a statistical difference
weight_df=pd.DataFrame({"weight_10":np.array(weight1),
                         "weight_20":np.array(weight2),
                       "weight_change":np.array(weight2)-np.array(weight1)})
weight_df

Unnamed: 0,weight_10,weight_20,weight_change
0,45,46.482013,1.482013
1,47,51.580388,4.580388
2,49,57.289742,8.289742
3,50,51.505746,1.505746
4,56,52.570126,-3.429874
5,57,49.054453,-7.945547
6,58,59.568894,1.568894
7,59,56.68016,-2.31984
8,62,60.440577,-1.559423
9,73,68.587656,-4.412344


In [37]:
_,p_value=stats.ttest_rel(a=weight1,b=weight2)

In [39]:
print(p_value)
if p_value<0.05:
    print ("we reject H0")
else:
    print('We accept H0')

0.7869983688783727
We accept H0
