# Module 2: Hypothesis testing


## Statistics of pulse wave velocity

In lab 3, you will be investigating the effect of exercise of pulse wave velocity (the velocity of the pressure wave that is created by your heart and moves through your blood vessels). You are asking the question, "Does exercise affect pulse wave velocity?" You decided that a type I error rate of 0.05 will be acceptable in determining your final statistical conclusion. Each student's pulse wave velocity was measured immediately before and after a 3-minute stair-stepping exercise routine. The collected data was stored in a .csv file.

In [None]:
# Import relevant packages
import scipy.stats as stats
import numpy as np
import plotly.graph_objects as go
import pandas as pd

# Import data as pandas dataframe
df = pd.read_csv("../data/pwv_data.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   subject  12 non-null     int64  
 1   before   12 non-null     float64
 2   after    12 non-null     float64
dtypes: float64(2), int64(1)
memory usage: 416.0 bytes


## Visualizing the data

Create two overlaid histograms displaying the two distributions of data. What preliminary observations can you make from these histograms?

In [None]:
# The graph_objects package from plotly allows us to overlay traces
fig = go.Figure()
fig.add_trace(go.Histogram(x=df['before'],
                           nbinsx=10))
fig.add_trace(go.Histogram(x=df['after'],
                           nbinsx=10))

fig.update_layout(barmode='overlay')
fig.update_traces(opacity=0.6) # You can change this to improve visualization
fig.show()


Another way to view the data is with a bar graph. Create a bar graph that effectively displays the descriptive statsitcs of the data provided (mean, SEM). 

In [None]:
mean_before = df['before'].mean()
sem_before = df['before'].std()/np.sqrt(df['before'].count())
mean_after = df['after'].mean()
sem_after = df['after'].std()/np.sqrt(df['after'].count())

# Graphing can get a little complicated. We've included the template here to save you time.
fig = go.Figure()
fig.add_trace(go.Bar(x=("before","after"),
                     y=(mean_before,mean_after),
                     error_y=dict(type='data',
                                  array=(sem_before,sem_after),
                                  visible=True)))
fig.show()

## Performing statistical testing

What kind of statistical test do you think is appropriate in this situation? Be as specific as you can.

Assume the assumptions necessary for your chosen test are valid. Perform your test and determine if the null hypothesis can be rejected.

In [None]:
difference = df['after']-df['before']
t = difference.mean()/(difference.std()/np.sqrt(difference.count()))
print('t-statistic: %.2f' % t)

# Be mindful of the type of test you've chosen here! Hint: how many tails?
alpha = 0.05
t_crit = stats.t.ppf(1-alpha/2,difference.count()-1)
print('t-critical: %.2f' % t_crit)

if abs(t) > t_crit:
  print('|t-stat| > t-crit, therefore we reject the null hypothesis.')
else:
  print('|t-stat| <= t-crit, therefore we fail to reject the null hypothesis.')

difference.mean()

t-statistic: 2.48
t-critical: 2.20
|t-stat| > t-crit, therefore we reject the null hypothesis.


0.20000000000000007

## Confidence intervals

Based on the sample data, determine the 95% confidence interval for the resting (before exercise) pulse wave velocity. What does this interval indicate?

In [None]:
ci_lower = mean_before-t_crit*sem_before
ci_upper = mean_before+t_crit*sem_before
ci = (ci_lower,ci_upper)

# You can print using tuples! Here's an example.
print('We are 95%% confident that the population mean is within the interval (%.2f,%.2f).' % ci)

We are 95% confident that the population mean is within the interval (5.15,7.33)
