# Paired t-test

### Dataset creation
We need two sets of heights - before qand after. 

We’ll use scipy’s truncnorm function to help out with this. Truncnorm helps in creating a truncated normal distribution - a normal distribution with the x-values truncated from below, above or both ends. 

Since we’re dealing with height - a value which obviously can’t be between negative infinity and infinity - we need to set lower and upper boundaries, thus we’re using a truncated normal distribution. 

In [33]:
import scipy.stats as stats

lower, upper = 152.0, 173.0 # set the lower and upper bounds 
mu, sigma = 162.0, 2.40 # set the mean and standard deviation
dist = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma) # call the truncnorm function to create the distribution

values_before = dist.rvs(25) # get 25 random values from our truncated normal distribution

In [34]:
values_before.mean()

161.27381008209488

In [35]:
values_before.std()

1.9235249545654398

In [36]:
values_before

array([161.76374741, 162.59386722, 161.30987032, 157.98545025,
       158.80108749, 160.57628889, 163.15444162, 158.24739908,
       159.4923962 , 162.05427581, 162.07892074, 162.2121437 ,
       166.45367854, 161.60598575, 162.34047273, 159.86493067,
       161.03803666, 161.83416595, 163.17439602, 157.82573149,
       160.0332295 , 162.63287606, 163.09324963, 159.81846224,
       161.8601481 ])

In [57]:
import scipy.stats as stats

lower, upper = 157.0, 179.0 # set the lower and upper bounds 
mu, sigma = 163.0, 2.40 # set the mean and standard deviation
dist = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma) # call the truncnorm function to create the distribution

values_after = dist.rvs(25) # get 25 random values from our truncated normal distribution

In [58]:
values_after.mean()

162.71094480854129

In [59]:
values_after.std()

2.8331946996840816

In [60]:
values_after

array([158.74014853, 165.92404909, 162.23634905, 162.44495908,
       163.96287528, 159.81694236, 165.6083967 , 164.56746343,
       159.58459113, 163.65555083, 169.26600781, 162.84473321,
       160.38675086, 167.43716999, 161.70510557, 163.57406141,
       163.24227802, 164.26797128, 157.45431875, 166.81909286,
       159.33212755, 160.32409765, 160.91927371, 161.46793773,
       162.19136833])

In [41]:
import random
ages = [random.randint(19,23) for i in range (25)]

In [42]:
ages

[20,
 19,
 20,
 19,
 20,
 22,
 19,
 22,
 23,
 19,
 21,
 20,
 21,
 20,
 22,
 22,
 19,
 19,
 21,
 22,
 19,
 21,
 20,
 21,
 19]

In [61]:
import pandas as pd
df = pd.DataFrame({'age':ages, 'height_before':values_before, 'height_after':values_after})
df.to_csv("paired-t.csv", index=False)

In [68]:
data = pd.read_csv("paired-t.csv")
data.head()

Unnamed: 0,age,height_before,height_after
0,20,161.763747,158.740149
1,19,162.593867,165.924049
2,20,161.30987,162.236349
3,19,157.98545,162.444959
4,20,158.801087,163.962875


In [74]:
data.describe()

Unnamed: 0,age,height_before,height_after
count,25.0,25.0,25.0
mean,20.4,161.27381,162.710945
std,1.258306,1.963189,2.891617
min,19.0,157.825731,157.454319
25%,19.0,159.864931,160.386751
50%,20.0,161.763747,162.444959
75%,21.0,162.340473,164.267971
max,23.0,166.453679,169.266008


In [73]:
stats.ttest_rel(data.height_before, data.height_after, alternative='greater')

Ttest_relResult(statistic=-1.9094338173992416, pvalue=0.9658844528005113)