Chow tests are not a different form of test so much as a method of implementing a test that allows one to understand if:
 - Structural change has taken place over time.
 - Differences exist between subgroups within a population.
 
For example, one could test whether the effectiveness of advertising budgets have changed over time or if US and Canadian markets behave the differently.

If it turns out that two data sets behave the same, it is normally better to combine the observations to estimate one set of parameters.  If they do not, then you typically need to allow for two or more sets of relationships.  Finally, once you understand the basics of running a chow test manually, you could do a modified version where only a subset of relationships are suspected to have changed.

Chow tests involve testing multiple variables simultaneously and often require the generation of new dummy and compound variables (i.e. dummy times continuous like the ever popular Male_Ad_Budget example) to implement manually.  


# Task

In this case we will use data in the workbook *Cheat_Sheet_Chow_Test_V1_0*.  We will interpret the observation as capturing the week when the observation was recorded and we will determine whether the relationship between $y$ and $x_1$, $x_2$, and $x_3$ has changed at week 50 in the model:

$$y = B_0 + B_1 x_1 + B_2 x_2+ B_3 x_3$$

## Import the data into Pandas

The data can be found in the *Structural_Change_Data* or *Modified_Structural_Change_Data* tab in the *Cheat_Sheet_Chow_Test_V1_0*.  If you use the *Modified_ Structural_Change_Data* you will not need to use queries to do recoding as is shown in step 3 below.  You should learn to do the queries yourself, as the exam may not be as kind. 

In [7]:
import pandas as pd
import os.path as osp

data_path = osp.join(
    osp.curdir,'Data','Cheat_Sheet_Chow_Test_V1_0.xlsx')

data = pd.read_excel(
    data_path,sheet_name='Structural_Change_Data')
data.head()

Unnamed: 0,Obs,Y,X1,X2,X3
0,1,570,1,22844,958
1,2,541,1,21820,866
2,3,499,1,8037,713
3,4,331,1,3293,190
4,5,519,1,18459,121


## Run the Regression
Perform a quick assessment of the results.  Does anything stand out as unusual?

In [8]:
from statsmodels.formula.api import ols
model = ols('Y ~ X1 + X2 + X3',data).fit()
model.summary()

0,1,2,3
Dep. Variable:,Y,R-squared:,0.086
Model:,OLS,Adj. R-squared:,0.072
Method:,Least Squares,F-statistic:,6.122
Date:,"Thu, 21 Mar 2024",Prob (F-statistic):,0.000531
Time:,18:27:32,Log-Likelihood:,-1221.6
No. Observations:,200,AIC:,2451.0
Df Residuals:,196,BIC:,2464.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,319.0316,21.196,15.051,0.000,277.230,360.833
X1,20.4202,15.569,1.312,0.191,-10.285,51.125
X2,0.0042,0.001,3.490,0.001,0.002,0.007
X3,0.0494,0.028,1.790,0.075,-0.005,0.104

0,1,2,3
Omnibus:,40.85,Durbin-Watson:,1.809
Prob(Omnibus):,0.0,Jarque-Bera (JB):,17.593
Skew:,-0.534,Prob(JB):,0.000151
Kurtosis:,2.015,Cond. No.,35500.0


In [None]:
initial_hypothesis = ('X1=0,X2=0,X3=0')
model.wald_test(initial_hypothesis)

## Write a query
Create variants of X1..X3 called C1..C3 that are 0 for obs less than or equal to 50 and their respective values for obs 51 and beyond.  You will have to create a C0 variable that is 0 for observations 1 through 50 and 1 for the rest using similar logic. To do this you can use the Numpy *where* function. You will do this for each of C1 through C3.  


In [9]:
import numpy as np

'''
We use the where function to set values in C0 as 0 where 'Obs' is less than
or equal to 50 and 1 otherwise.
'''
data['C0'] = np.where(data['Obs'] <= 50, 0, 1)


'''
Here we accomplish the same thing for C1...C3. The for loop allows us to
bypass writing the same function 3 times. This doesn't save much work here
but would in cases where you have a large amount of parameters.
'''
for i in range (1,4):
    data['C' + str(i)] = np.where(data['Obs'] <= 50, 0, data['X' + str(i)])
data.head()

Unnamed: 0,Obs,Y,X1,X2,X3,C0,C1,C2,C3
0,1,570,1,22844,958,0,0,0,0
1,2,541,1,21820,866,0,0,0,0
2,3,499,1,8037,713,0,0,0,0
3,4,331,1,3293,190,0,0,0,0
4,5,519,1,18459,121,0,0,0,0


In [10]:
#Note the transition here where Obs is 51
data[47:53]

Unnamed: 0,Obs,Y,X1,X2,X3,C0,C1,C2,C3
47,48,552,0,14771,24,0,0,0,0
48,49,522,1,24439,387,0,0,0,0
49,50,416,0,8414,222,0,0,0,0
50,51,335,0,10736,300,1,0,10736,300
51,52,470,0,7671,386,1,0,7671,386
52,53,435,0,6645,480,1,0,6645,480


In [None]:
data.tail()

## Running the Regression on the Query

In [11]:
#Running a regression using all parameters
new_model = ols('Y ~ X1 + X2 + X3 + C0 + C1 + C2 + C3',data).fit()
new_model.summary()

0,1,2,3
Dep. Variable:,Y,R-squared:,0.161
Model:,OLS,Adj. R-squared:,0.13
Method:,Least Squares,F-statistic:,5.267
Date:,"Thu, 21 Mar 2024",Prob (F-statistic):,1.61e-05
Time:,18:30:49,Log-Likelihood:,-1213.0
No. Observations:,200,AIC:,2442.0
Df Residuals:,192,BIC:,2468.0
Df Model:,7,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,384.3656,39.504,9.730,0.000,306.448,462.283
X1,26.9682,30.886,0.873,0.384,-33.951,87.888
X2,0.0032,0.002,1.477,0.141,-0.001,0.007
X3,0.0411,0.052,0.795,0.428,-0.061,0.143
C0,-80.7824,46.302,-1.745,0.083,-172.108,10.543
C1,-10.6216,35.443,-0.300,0.765,-80.530,59.286
C2,0.0009,0.003,0.355,0.723,-0.004,0.006
C3,0.0095,0.060,0.158,0.875,-0.110,0.129

0,1,2,3
Omnibus:,37.387,Durbin-Watson:,1.974
Prob(Omnibus):,0.0,Jarque-Bera (JB):,17.255
Skew:,-0.536,Prob(JB):,0.000179
Kurtosis:,2.041,Cond. No.,127000.0


# Performing a Hypothesis Test

In [12]:
#Now run the chow test which is just a specific joint hypothesis test!
hypothesis = '(C0=0,C1=0,C2=0,C3=0)'
new_model.wald_test(hypothesis)



<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=array([[4.31428187]]), p=0.0023005453054455414, df_denom=192, df_num=4>

Your output should contain the null hypothesis, the unrestricted model, and the statistics we care about.  The P-value should be very low.  
 - What does this mean?  
 - How does this compare with the individual t-test results?
