# Paired Sample T-test

Researchers conducted a random trial on the treatment of [fibromyalgia](http://www.nejm.org/doi/pdf/10.1056/NEJMoa0912611). Fibromyalgia is a condition where the body suffers from chronic, localized pain and as a result affects a sufferer's psyche among many other symptoms. There is no cure for fibromyalgia except to manage the pain. We obtained a dataset that has 50 observations and 11 variables. The variables are sex, BMI, duration of treatement, age, treatment,	coexists i.e. other health issues, FIQ_baseline, FIQ_12W, and FIQ_24W scores. Treatment is measured using the severity of the symptoms immediately before the treatment was started (FIQ-baseline) and after 12 weeks of treatment (FIQ-12W) and 24 weeks of treatment (FIQ-24W). Let us look to see if the tai chi treatment relieves pain after 12 weeks. That is, we want to determine if there is a difference in pain severity given the tai chi treatment. 

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats

In [None]:
%matplotlib inline

In [None]:
data = pd.read_csv('taiChiData.csv', sep=',')

In [None]:
data

We can split our observations into two groups according to the `treatment`, namely a control group and a tai chi group. But first clean the data a bit by removing null values and duplicates.

In [None]:
# enter code to drop rows with missing values 
data.drop_duplicates(inplace=True)  # drop duplicates

In [None]:
treatment = data.groupby(['treatment'])

See this discussion for more on how to use the [groupby function](https://www.datacamp.com/community/tutorials/pandas-split-apply-combine-groupby).

In [None]:
type(treatment)

We cast the `pandas.core.groupby.DataFrameGroupBy` object to a list and extract the treatment groups from it and place them in new dataframes.

In [None]:
taichi  = list(treatment)[1][1]

In [None]:
print(type(taichi), len(taichi))

### Visualizing the differences in treatment

Let's add the difference between the baseline scores and twelve weeks after treatment to the dataframe.

In [None]:
taichi['diff'] = taichi['FIQ_baseline'] - taichi['FIQ_12W']

In [None]:
taichi

In [None]:
sns.boxplot(data=taichi.loc[:,['FIQ_baseline','FIQ_12W']])

The box plots show the decrease in pain severity after the 12 week treatment.

In [None]:
print(abs(taichi['FIQ_baseline'].mean() - taichi['FIQ_12W'].mean()))

On average, there is a difference in the pain severity before and after treatment of 10.23. Is this difference significant? 

### Formulate the hypothesis

We can now state our hypothesis. We are interested in whether there is no difference between the the severity of pain symptoms before the start of the tai chi treatment and after 12 weeks of the tai chi treatment. We state the null hypothesis as: 

$H_0$: $\mu_d = 0$ , i.e., the mean difference is zero.

$H_1$: $\mu_d \neq 0$, a two-tail test. There is a difference attributed to the tai chi treatment.


In [None]:
stats.ttest_rel(taichi['FIQ_baseline'], 
                taichi['FIQ_12W'])

### Interpreting results

We obtain a $p \approx 0.000011$, which is less than the significance level $\alpha = 0.05.$ A low p-value indicates decreased support for the null hypothesis. We reject the null hypothesis. This means that the average pain score before treatment is significantly different from the average pain score after treatment with tai chi. However, we cannot rule out the possibility that the result we observed is a rare one.

### Exercise

Use the same steps as above to determine if there is a difference in the control treatment.  