Hypothesis testing is one of the most common applications in statistical analysis.  We construct hypotheses in attempt to provide evidence in order to justify an action or change a belief.  We do this by forming an opinion we are prepared to believe in the absence of data and attempt to find evidence that would support changing that opinion.  

For example, we might consider launching a new product, but we would only do so if there were sufficient demand.  We would assume there was not sufficient demand and then look for evidence that there was.  If we found sufficient evidence, we would conclude that it was worth launching the new product; if we did not, we would conclude that there was not sufficient evidence to justify launching the new product.  

**Hypothesis tests can be simple or complicated; they can have multiple simultaneous components.  Python makes it simple to run either type!**

# Import data and run regression

The data can be found in the structural change data tab in the Cheat_Sheet_Hypothesis_Testing_V1_0 spreadsheet.

In [None]:
#Import the required libraries
import pandas as pd
import os.path as osp
import statsmodels.api as sm
from statsmodels.formula.api import ols

#Build the path for the data file
data_path = osp.join(
    osp.curdir,'Data','Cheat_Sheet_Hypothesis_Testing_V1_0.xlsx')

#Use the read_excel function to pull data from the Hypothesis Testing Sheet
data = pd.read_excel(
    data_path,sheet_name=0,index_col='Obs')

#Build our model using statsmodels
model = ols('Y ~ X1 + X2 + X3 + X4 + X5',data).fit()
model.summary()

# Performing Hypothesis Testing using the Wald Test

Using the following arguments, we will test and see whether $X_1$, $X_2$, and $X_3$ have a coefficient of zero in front of them. In other words, we are testing the correllation.

In [None]:
'''
We put our null hypothesis in the following string format. Here we are going
to test whether X1, X2, and X3 have a 0 coefficient in front of them.
'''
hypothesis = '(X1=0,X2=0,X3=0)'

In [None]:
#Pass the hypothesis to the Wald_Test
print(model.wald_test(hypothesis))

The p-value above is what we are concerned with the most. You should have gotten a value of 8.27306050734528e-14 (the 'e' here means $\times 10 \text{ to the power of}$). For example, 8.27e-14 is $8.27\times 10^{-14}$ (very small!). If our p-value is less than $0.05$, which it clearly is here, then we **reject the null hypothesis**. This tells us that there is, to great confidence, some correllation between the three parameters and our $Y$ value. Let's try another example where we test the correllation of just $X_1$ and $X_5$.

In [None]:
hypothesis_2 = '(X1=0,X5=0)'
print(model.wald_test(hypothesis_2))

Here, our p-value is much higher, about $0.09$. This p-value is not less than $0.05$, so in this case, we **do not reject the null hypothesis**. Note, this does not mean that we have confirmed $X_1$ and $X_5$ are not correllated with our $Y$ value, we simply found that there is not enough evidence to suggest that they are.

# General Notes

- All of these restrictions are tested simultaneously as part of the null hypothesis.
- You can test many null hypotheses using the linearHypothesis command.
- There are other tests (for heteroskedasticity, specification error, etc.) that cannot be tested using this approach.