<table style="width: 100%;">
    <tr style="background-color: transparent;"><td>
        <img src="https://d8a-88.github.io/econ-fa19/assets/images/blue_text.png" width="250px" style="margin-left: 0;" />
    </td><td>
        <p style="text-align: right; font-size: 10pt;"><strong>Economic Models</strong>, Spring 2020<br>
            Dr. Eric Van Dusen<br>
            Alan Liang<br>
        Umar Maniku<br>
        </p></td></tr>
</table>

# Lab 11: Consumer Spending and Economic Stimulus Payments

In [1]:
import warnings
warnings.simplefilter("ignore")
import statsmodels.api as sm
import numpy as np
import pandas as pd
import statsmodels.sandbox.regression.gmm as sm_gmm

def get_dummies(tbl, col, drop = True, drop_first = True):
    """Creates dummy variables for a column of a table"""
    values = np.unique(tbl[col])
    if drop_first:
        values = values[1:]
    for val in values:
        encoding = tbl[col].apply(lambda s: int(s == val))
        tbl[col + "_" + str(val)] = encoding
    if drop:
        tbl = tbl.drop(columns = col)
    return tbl

Welcome to lab 11! 

In this lab, you will investigate how economic stimulus payments in the form of tax rebates affect household consumption. This lab is based on *Household Expenditure and the Income Tax Rebates of 2001* by David S. Johnson, Jonathan A Parker, and Nicholas S Souleles (which we will refer to as the JPS study for short). 

In 2001, the Bush administration passed the *Economic Growth and Tax Relief Reconciliation Act of 2001*, which mainly  reduced the rates of individual income taxes. In addition, the bill initiated a series of one-time rebates for all taxpayers that filed a tax return for 2000. The payment of these rebates were broadly announced, so that most households were aware of an incoming stimulus payment (much similar to the recent stimulus check). The rebate was as follows:
- Up to a maximum of \$300 for single filers with no dependents
- Up to a maximum of \$500 for single parents
- Up to a maximum of \$600 for married couples

We are interested in determining how individuals altered their consumption patterns due to the economic stimulus payments, and by extent see if the permanent income hypothesis holds. The permanent income hypothesis states that consumers attempt to smooth their consumption across their life time, so that "changes in permanent income, rather than changes in temporary income, are what drive the changes in a consumer's consumption patterns." Intuitively, if the permanent income hypothesis were to hold, we would expect households to smooth out the spending of the rebate even before the rebate arrived. Thus, their consunmption would not change much between periods since they had known in the previous period that they would be receiving a sizable increase in income in the near future.

Notably, these stimulus payments were assigned to households at random periods in time, which allows us to better conclude a causal effect of a one time cash payment on changes in consumption.

To determine the true causal effect, we will use least squares regression. JPS propose the following regression for any household relating the stimulus payment to change in consumption:

$$C_{t+1} - C_t = \sum_s \beta_{0,s} \text{month}_s + \beta_1 \text{age} + \beta_2 \text{$\Delta$ children} + \beta_3 \text{$\Delta$ adults} + \beta_4 \text{Stimulus Payment}_{t+1} + u$$

Here, we control for seasonal effects by creating dummy variables for each period (measured in months), and also control for changes in the number of children and adults in a household. Let's consider a few scenarios in context of the regression to gain some intuition:
- If a household received a stimulus in period $t+1$, then the change in consumption ($C_{t+1} - C_t$) due to the rebate should be captured by $\beta_4$ if we have sufficiently controlled for all potential factors of change in consumption between the 2 periods. 
- If a household did not receive a stimulus payment in $t+1$, then the stimulus payment will be 0. Thus, the change in consumption will only be explained by our control variables: age, changes in family members, and seasonal variation. 



Let's read in the table. The columns labels are:

| Label | Description |
|------|------|
| `newid` | household ID |
| `year_month` | month when data was collected |
| `dcf` | change in food expenditures |
| `dcs` | change in strictly nondurable expenditures |
| `dcn` | change in nondurable expenditures |
| `dlcf` | change in log food expenditures |
| `dlcs` | change in log strictly nondurable expenditures |
| `dlcn` | change in log nondurable expenditures |
| `dnumadult` | change in number of adults |
| `dnumkids` | change in number of kids |
| `age` | average age of head & spouse (if exists) |
| `taxreb` | total rebates received in reference period |
| `ltaxreb` | rebates received in prior reference period (-1) |
| `l2taxreb` | rebates received in twice prior reference period (-2) |

In [2]:
rebates = pd.read_csv("JPS.csv")
rebates.head()

Unnamed: 0,newid,year_month,dcf,dcs,dcn,dlcf,dlcs,dlcn,dnumadult,dnumkids,age,taxreb,ltaxreb,l2taxreb
0,113314,200103,281,343.0,352.0,0.618805,0.502042,0.284406,0,0,85.0,0,0.0,0.0
1,113314,200106,-129,-176.0,427.0,-0.238032,-0.226313,0.262581,0,0,85.0,0,0.0,0.0
2,113314,200109,-90,-42.0,169.0,-0.207639,-0.06252,0.087462,0,0,85.0,0,0.0,0.0
3,113318,200103,131,820.0,1241.0,0.067395,0.267209,0.344537,0,0,51.0,0,0.0,0.0
4,113318,200106,302,3147.0,3256.0,0.139978,0.641809,0.567968,0,0,51.0,0,0.0,0.0


One very important thing to note is that the unit of observation is not per household, but rather per time period per household. If a household were observed at 3 different time periods, then they would make up 3 rows and hence "contribute 3 times to the regression". This kind of set up is most oftne referred to as a *panel data study*. 

Let's visualize the data. Below is a household that received a stimulus payment in August 2001.

In [3]:
rebates[rebates["newid"] == 116249]

Unnamed: 0,newid,year_month,dcf,dcs,dcn,dlcf,dlcs,dlcn,dnumadult,dnumkids,age,taxreb,ltaxreb,l2taxreb
1869,116249,200105,-469,-582.0,-465.0,-0.274464,-0.206017,-0.145046,0,0,38.0,0,0.0,0.0
1870,116249,200108,-1226,-1359.0,-1494.0,-1.746342,-0.763995,-0.696173,0,0,39.0,120,0.0,0.0
1871,116249,200111,555,646.0,471.0,1.145132,0.435119,0.275487,0,0,39.0,0,120.0,0.0


Thus for the data point in which $t+1$ refers to August (so that $t$ refers to May), $\text{Stimulus Payment}_{t+1} = 120$. For the data point in which $t+1$ refers to November (so that $t$ refers to August), $\text{Stimulus Payment}_{t+1} = 0$ and $\text{Stimulus Payment}_t = 120$. 

In general, we will use `taxreb` as the $\text{Stimulus Payment}_{t+1}$ variable.

## Part 1 - OLS on Rebate as a Dollar Value
Let's try to recreate JPS' regression. We have selected the relevant columns for the independent variables:

In [4]:
X_q1 = rebates[["year_month", "dnumadult", "dnumkids", "age", "taxreb"]]
X_q1.head()

Unnamed: 0,year_month,dnumadult,dnumkids,age,taxreb
0,200103,0,0,85.0,0
1,200106,0,0,85.0,0
2,200109,0,0,85.0,0
3,200103,0,0,51.0,0
4,200106,0,0,51.0,0


**Question 1.1:** Create dummy variables to represent the different months. Augment the `X_q1` table with dummy variables for `year_month`, and assign it to `X_q1_dummies`.

In [None]:
X_q1_dummies = ...
X_q1_dummies.head()

In [5]:
## Solution ##
X_q1_dummies = get_dummies(X_q1, "year_month")
X_q1_dummies.head()

Unnamed: 0,dnumadult,dnumkids,age,taxreb,year_month_200103,year_month_200104,year_month_200105,year_month_200106,year_month_200107,year_month_200108,year_month_200109,year_month_200110,year_month_200111,year_month_200112,year_month_200201,year_month_200202,year_month_200203,year_month_200204,year_month_200205,year_month_200206
0,0,0,85.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,85.0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,85.0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,0,0,51.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,51.0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0


**Question 1.2:** Conduct an OLS regression of change in food consumption using `statsmodels` replicating JPS' setup. Interpret the coefficient on `taxreb`.

In [None]:
q1_2_X = ...
q1_2_y = ...
model_q1_2 = sm.OLS(..., ...).fit()
model_q1_2.summary()

In [6]:
## Solution ##
q1_2_X = X_q1_dummies
q1_2_y = rebates["dcf"]
model_q1_2 = sm.OLS(q1_2_y, q1_2_X).fit()
model_q1_2.summary()

0,1,2,3
Dep. Variable:,dcf,R-squared (uncentered):,0.006
Model:,OLS,Adj. R-squared (uncentered):,0.005
Method:,Least Squares,F-statistic:,4.383
Date:,"Thu, 21 May 2020",Prob (F-statistic):,2.06e-10
Time:,13:40:15,Log-Likelihood:,-123710.0
No. Observations:,14960,AIC:,247500.0
Df Residuals:,14940,BIC:,247600.0
Df Model:,20,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,155.6932,30.700,5.071,0.000,95.517,215.869
dnumkids,53.1387,39.832,1.334,0.182,-24.938,131.215
age,0.6227,0.463,1.344,0.179,-0.285,1.531
taxreb,0.1044,0.050,2.093,0.036,0.007,0.202
year_month_200103,-5.5566,55.166,-0.101,0.920,-113.688,102.575
year_month_200104,-5.6264,52.532,-0.107,0.915,-108.595,97.343
year_month_200105,-50.4889,51.347,-0.983,0.325,-151.136,50.158
year_month_200106,-28.7730,39.826,-0.722,0.470,-106.836,49.290
year_month_200107,3.7976,40.657,0.093,0.926,-75.895,83.490

0,1,2,3
Omnibus:,5669.566,Durbin-Watson:,2.568
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3743853.964
Skew:,0.232,Prob(JB):,0.0
Kurtosis:,80.498,Cond. No.,2650.0


**Question 1.3:** Conduct an OLS regression of change in consumption for strictly non-durable goods using `statsmodels` replicating JPS' setup. How does the coefficient on `taxreb` compare with that for 1.2?

In [None]:
q1_3_X = ...
q1_3_y = ...
model_q1_3 = sm.OLS(..., ...).fit()
model_q1_3.summary()

In [7]:
## Solution ##
q1_3_X = q1_2_X
q1_3_y = rebates["dcs"]
model_q1_3 = sm.OLS(q1_3_y, q1_3_X).fit()
model_q1_3.summary()

0,1,2,3
Dep. Variable:,dcs,R-squared (uncentered):,0.007
Model:,OLS,Adj. R-squared (uncentered):,0.006
Method:,Least Squares,F-statistic:,5.475
Date:,"Thu, 21 May 2020",Prob (F-statistic):,2.79e-14
Time:,13:40:18,Log-Likelihood:,-132260.0
No. Observations:,14960,AIC:,264600.0
Df Residuals:,14940,BIC:,264700.0
Df Model:,20,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,353.0230,54.334,6.497,0.000,246.521,459.525
dnumkids,100.8134,70.497,1.430,0.153,-37.368,238.995
age,0.6068,0.820,0.740,0.459,-1.000,2.214
taxreb,0.2501,0.088,2.833,0.005,0.077,0.423
year_month_200103,86.3582,97.634,0.885,0.376,-105.016,277.733
year_month_200104,200.3884,92.973,2.155,0.031,18.151,382.626
year_month_200105,99.3944,90.876,1.094,0.274,-78.734,277.523
year_month_200106,71.6821,70.485,1.017,0.309,-66.477,209.841
year_month_200107,-41.7049,71.956,-0.580,0.562,-182.748,99.338

0,1,2,3
Omnibus:,14750.825,Durbin-Watson:,2.437
Prob(Omnibus):,0.0,Jarque-Bera (JB):,10235888.527
Skew:,3.927,Prob(JB):,0.0
Kurtosis:,130.904,Cond. No.,2650.0


## Part 2 - OLS on Rebate as a Binary Value

For the second part, we will treat the variable $\text{Stimulus Payment}$ as a binary variable, in which $\text{Stimulus Payment}_{t+1} = 1$ if the household received a stimulus payment in period $t+1$, and $\text{Stimulus Payment}_{t+1} = 0$ if not. 

In [9]:
X_q2 = rebates[["year_month", "dnumadult", "dnumkids", "age", "taxreb"]]
X_q2.head()

Unnamed: 0,year_month,dnumadult,dnumkids,age,taxreb
0,200103,0,0,85.0,0
1,200106,0,0,85.0,0
2,200109,0,0,85.0,0
3,200103,0,0,51.0,0
4,200106,0,0,51.0,0


**Question 2.1:** Create a binary variable to represent whether a stimulus payment was received and add it to `X_q2` as a column called `itaxreb`. Make sure to drop `taxreb`.

In [None]:
X_q2_1 = X_q2
X_q2_1["itaxreb"] = ...
X_q2_1 = ...
X_q2_1.head()

In [11]:
## Solution ##
X_q2_1 = X_q2
X_q2_1["itaxreb"] = (X_q2["taxreb"] > 0).astype(int)
X_q2_1 = X_q2_1.drop(columns = "taxreb")
X_q2_1.head()

Unnamed: 0,year_month,dnumadult,dnumkids,age,itaxreb
0,200103,0,0,85.0,0
1,200106,0,0,85.0,0
2,200109,0,0,85.0,0
3,200103,0,0,51.0,0
4,200106,0,0,51.0,0


**Question 2.2:** Similar to 1.1, create dummy variables to represent the different months. Augment the `X_q2_1` table with dummy variables for `year_month`, and assign it to `X_q2_dummies`.

In [None]:
X_q2_dummies = ...
X_q2_dummies.head()

In [12]:
## Solution ##
X_q2_dummies = get_dummies(X_q2_1, "year_month")
X_q2_dummies.head()

Unnamed: 0,dnumadult,dnumkids,age,itaxreb,year_month_200103,year_month_200104,year_month_200105,year_month_200106,year_month_200107,year_month_200108,year_month_200109,year_month_200110,year_month_200111,year_month_200112,year_month_200201,year_month_200202,year_month_200203,year_month_200204,year_month_200205,year_month_200206
0,0,0,85.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,85.0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,85.0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,0,0,51.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,51.0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0


**Question 2.3:** Conduct an OLS regression of change in food consumption using `statsmodels` replicating JPS' setup. Interpret the coefficient on `itaxreb`, and compare this with your results in 1.2.

In [None]:
q2_3_X = ...
q2_3_y = ...
model_q2_3 = sm.OLS(..., ...).fit()
model_q2_3.summary()

In [13]:
## Solution ##
q2_3_X = X_q2_dummies
q2_3_y = rebates["dcf"]
model_q2_3 = sm.OLS(q2_3_y, q2_3_X).fit()
model_q2_3.summary()

0,1,2,3
Dep. Variable:,dcf,R-squared (uncentered):,0.006
Model:,OLS,Adj. R-squared (uncentered):,0.004
Method:,Least Squares,F-statistic:,4.328
Date:,"Thu, 21 May 2020",Prob (F-statistic):,3.18e-10
Time:,13:41:08,Log-Likelihood:,-123720.0
No. Observations:,14960,AIC:,247500.0
Df Residuals:,14940,BIC:,247600.0
Df Model:,20,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,156.4078,30.698,5.095,0.000,96.237,216.579
dnumkids,53.3483,39.834,1.339,0.181,-24.732,131.428
age,0.6069,0.463,1.311,0.190,-0.301,1.514
itaxreb,48.2649,26.576,1.816,0.069,-3.828,100.358
year_month_200103,-4.7893,55.163,-0.087,0.931,-112.915,103.337
year_month_200104,-4.8074,52.528,-0.092,0.927,-107.769,98.154
year_month_200105,-49.6792,51.343,-0.968,0.333,-150.319,50.960
year_month_200106,-27.9922,39.820,-0.703,0.482,-106.044,50.060
year_month_200107,4.5889,40.652,0.113,0.910,-75.093,84.271

0,1,2,3
Omnibus:,5671.471,Durbin-Watson:,2.568
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3744917.927
Skew:,0.233,Prob(JB):,0.0
Kurtosis:,80.509,Cond. No.,691.0


**Question 2.4:** Conduct a simliar OLS regression of change in strictly non-durable consumption. How does the coefficient on `itaxreb` compare with your results in 1.3?

In [None]:
q2_4_X = ...
q2_4_y = ...
model_q2_4 = sm.OLS(..., ...).fit()
model_q2_4.summary()

In [14]:
## Solution ##

q2_4_X = q2_3_X
q2_4_y = rebates["dcs"]
model_q2_4 = sm.OLS(q2_4_y, q2_4_X).fit()
model_q2_4.summary()

0,1,2,3
Dep. Variable:,dcs,R-squared (uncentered):,0.007
Model:,OLS,Adj. R-squared (uncentered):,0.006
Method:,Least Squares,F-statistic:,5.307
Date:,"Thu, 21 May 2020",Prob (F-statistic):,1.13e-13
Time:,13:41:13,Log-Likelihood:,-132260.0
No. Observations:,14960,AIC:,264600.0
Df Residuals:,14940,BIC:,264700.0
Df Model:,20,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,354.8399,54.334,6.531,0.000,248.339,461.341
dnumkids,101.2076,70.505,1.435,0.151,-36.991,239.406
age,0.5567,0.820,0.679,0.497,-1.050,2.163
itaxreb,101.8670,47.039,2.166,0.030,9.665,194.069
year_month_200103,88.7778,97.636,0.909,0.363,-102.601,280.157
year_month_200104,202.9728,92.973,2.183,0.029,20.735,385.211
year_month_200105,101.9516,90.876,1.122,0.262,-76.176,280.080
year_month_200106,74.1539,70.480,1.052,0.293,-63.996,212.304
year_month_200107,-39.1984,71.952,-0.545,0.586,-180.233,101.836

0,1,2,3
Omnibus:,14768.797,Durbin-Watson:,2.437
Prob(Omnibus):,0.0,Jarque-Bera (JB):,10258341.76
Skew:,3.936,Prob(JB):,0.0
Kurtosis:,131.044,Cond. No.,691.0


**Question 2.5:** What are the differences in consumption changes between food and strictly non-durables?

*Type your answer here, replacing this text*

<div class="alert alert-danger">
<Strong>Solution:</Strong> 
From parts 1 and 2, we observe that households typically on food a bit less than half of that of strictly non-durables. This makes sense, since food is a subset of strictly non-durables.
</div>

**Question 2.6:** Looking at non-durables is more relevant in the context of the permanent income hypothesis. Strictly non-durable goods like food do not last between time periods, so that households consume the good in 1 period only. Thus, we can attribute a change in non-durable consumption to consumption that was actually carried out in the corresponding period.

What would we expect $\beta_4$ to be if the permanent income hypothesis were to hold? Is this result in line with your OLS results from parts 1 and 2?

*Type your answer here, replacing this text*

<div class="alert alert-danger">
<Strong>Solution:</Strong> 
If the permanent income hypothesis were true, we would expect a stimulus payment to not increase consumption between these 2 periods, i.e. $\beta_4 = 0$. This is because in the previous period $t$, households anticipate a change in incomes in the near future at $t+1$, causing them to increase their consumption in $t$. Since we are statistically confident that $\beta_4$ is not 0, we can reject the permanent income hypothesis.
</div>

## Part 3 - Instrumental Variables

One concern from the regerssion in part 1 is that there may be confounding variables between one's rebate amount and the change in consumption; for example, the size of a household will affect how much a household receives in the rebate and also how much in consumption changes across periods. JPS address this by conducting an instrumental variable regression to better determine causality. They use the binary variable on whether one received a rebate as an instrument for the rebate amount, and construct a 2 stage least squares regression. Note that all control variables are added in both the first stage and structural models. 

For this part, you do not have to understand how the IV regression is conducted in python. You simply have to interpret the results from constructing this model.

**Question 3.1:** Does the instrument satisfy the conditions of *exogeneity* and *relevance*?

*Type your answer here, replacing this text*

<div class="alert alert-danger">
<Strong>Solution:</Strong> 
The instrument is exogenous of all potentially confounding variables since the receipt of a rebate was based on one's social security number and thus essentially randomly timed.
The instrument is clearly relevant as whether one receives a rebate is positively correlated with how much one receives on the rebate. 
</div>

**Question 3.2:** We have constructed a 2 stage least squares model below for the change in food consumption. Do the results differ much from that of part 1?

In [15]:
Z_q3 = X_q2_dummies
X_q3 = X_q1_dummies
model_q3_2 = sm_gmm.IV2SLS(rebates['dcf'], X_q3, instrument = Z_q3).fit()
model_q3_2.summary()

0,1,2,3
Dep. Variable:,dcf,R-squared:,0.006
Model:,IV2SLS,Adj. R-squared:,0.005
Method:,Two Stage,F-statistic:,
,Least Squares,Prob (F-statistic):,
Date:,"Thu, 21 May 2020",,
Time:,13:42:46,,
No. Observations:,14960,,
Df Residuals:,14940,,
Df Model:,20,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,155.7284,30.701,5.072,0.000,95.550,215.907
dnumkids,53.1333,39.832,1.334,0.182,-24.943,131.210
age,0.6208,0.463,1.340,0.180,-0.288,1.529
taxreb,0.1010,0.056,1.816,0.069,-0.008,0.210
year_month_200103,-5.4654,55.170,-0.099,0.921,-113.605,102.674
year_month_200104,-5.5289,52.537,-0.105,0.916,-108.507,97.450
year_month_200105,-50.3922,51.352,-0.981,0.326,-151.049,50.264
year_month_200106,-28.6791,39.831,-0.720,0.472,-106.754,49.396
year_month_200107,3.8929,40.663,0.096,0.924,-75.811,83.597

0,1,2,3
Omnibus:,5670.022,Durbin-Watson:,2.568
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3744309.268
Skew:,0.232,Prob(JB):,0.0
Kurtosis:,80.503,Cond. No.,2650.0


**Question 3.3:** We have constructed a 2 stage least squares model below for the change in strictly non-durable consumption. Do the results differ much from that of part 1?

In [16]:
model_q3_3 = sm_gmm.IV2SLS(rebates['dcs'], X_q3, instrument = Z_q3).fit()
model_q3_3.summary()

0,1,2,3
Dep. Variable:,dcs,R-squared:,0.007
Model:,IV2SLS,Adj. R-squared:,0.006
Method:,Two Stage,F-statistic:,
,Least Squares,Prob (F-statistic):,
Date:,"Thu, 21 May 2020",,
Time:,13:42:48,,
No. Observations:,14960,,
Df Residuals:,14940,,
Df Model:,20,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,353.4061,54.336,6.504,0.000,246.900,459.912
dnumkids,100.7537,70.497,1.429,0.153,-37.429,238.936
age,0.5862,0.820,0.715,0.475,-1.022,2.194
taxreb,0.2131,0.098,2.166,0.030,0.020,0.406
year_month_200103,87.3509,97.641,0.895,0.371,-104.038,278.740
year_month_200104,201.4500,92.982,2.167,0.030,19.195,383.705
year_month_200105,100.4468,90.885,1.105,0.269,-77.699,278.593
year_month_200106,72.7041,70.495,1.031,0.302,-65.476,210.884
year_month_200107,-40.6672,71.967,-0.565,0.572,-181.731,100.397

0,1,2,3
Omnibus:,14757.884,Durbin-Watson:,2.437
Prob(Omnibus):,0.0,Jarque-Bera (JB):,10247084.549
Skew:,3.931,Prob(JB):,0.0
Kurtosis:,130.974,Cond. No.,2650.0


## Optional Part 4 - Incorporating Previous Period Payments into the OLS

One potential issue with the JPS study is that households who received a period in period $t$ but not in $t+1$ will have the same $\text{Stimulus Payment}_{t+1}$ value as households who did not receive a stimulus payment at all. This may not be the case - intuitively we might expect consumption to decrease if a payment was issued in the previous period but not the current period. Instead, we will implement an added variable on whether a household received stimulus payment in period $t$ to better control for the causal effect of a stimulus payment on changes in consumption. 

We will augment the JPS setup with an added variable $\text{Stimulus Payment}_{t}$:
$$C_{t+1} - C_t = \sum_s \beta_{0,s} \text{month}_s + \beta_1 \text{age} + \beta_2 \text{$\Delta$ children} + \beta_3 \text{$\Delta$ adults} + \beta_4 \text{Stimulus Payment}_{t+1} + \beta_5 \text{Stimulus Payment}_{t} + u$$

Thus:
- If a household received a stimulus in period $t+1$ only, then the change in consumption ($C_{t+1} - C_t$) due to the rebate should be captured by $\beta_4$ if we have sufficiently controlled for all potential factors of change between the 2 periods. 
- If a household did not receive a stimulus payment in $t+1$ or $t$, then both stimulus payment variables will be 0. Thus, the change in consumption will only be explained by our control variables: age, changes in family members, and seasonal variation. 
- If a household received a stimulus in period $t$ only, then the change in consumption ($C_{t+1} - C_t$) due to the rebate should be captured by $\beta_5$ if we have sufficiently controlled for all potential factors of change between the 2 periods. 

Notably, interpreting the coefficient $\beta_5$ for $\text{Stimulus Payment}_{t}$ will allow us to determine how much consumption will change in the period after a household receives the stimulus check.


The columns `ltaxreb` reflect the stimulus payment received in the previous period.


In [17]:
X_q4 = rebates[["year_month", "dnumadult", "dnumkids", "age", "taxreb", "ltaxreb"]]
X_q4.head()

Unnamed: 0,year_month,dnumadult,dnumkids,age,taxreb,ltaxreb
0,200103,0,0,85.0,0,0.0
1,200106,0,0,85.0,0,0.0
2,200109,0,0,85.0,0,0.0
3,200103,0,0,51.0,0,0.0
4,200106,0,0,51.0,0,0.0


**Question 4.1:** Conduct a new regression of change in food consumption using the new regression model proposed above. Interpret $\beta_4$ and $\beta_5$. Do your results differ much from that of part 1?

In [None]:
q4_1_X = ...
q4_1_y = ...
model_q4_1 = ...
...

In [18]:
## Solution ##
q4_1_X = get_dummies(X_q4, "year_month")
q4_1_y = rebates["dcf"]
model_q4_1 = sm.OLS(q4_1_y, q4_1_X).fit()
model_q4_1.summary()

0,1,2,3
Dep. Variable:,dcf,R-squared (uncentered):,0.006
Model:,OLS,Adj. R-squared (uncentered):,0.004
Method:,Least Squares,F-statistic:,4.182
Date:,"Thu, 21 May 2020",Prob (F-statistic):,4.13e-10
Time:,13:42:58,Log-Likelihood:,-123710.0
No. Observations:,14960,AIC:,247500.0
Df Residuals:,14939,BIC:,247600.0
Df Model:,21,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,155.5102,30.704,5.065,0.000,95.326,215.694
dnumkids,52.9902,39.835,1.330,0.183,-25.091,131.072
age,0.6104,0.464,1.315,0.189,-0.299,1.520
taxreb,0.1039,0.050,2.084,0.037,0.006,0.202
ltaxreb,-0.0214,0.051,-0.421,0.673,-0.121,0.078
year_month_200103,-4.9604,55.185,-0.090,0.928,-113.130,103.210
year_month_200104,-4.9892,52.555,-0.095,0.924,-108.004,98.025
year_month_200105,-49.8546,51.371,-0.970,0.332,-150.548,50.839
year_month_200106,-28.1522,39.854,-0.706,0.480,-106.271,49.967

0,1,2,3
Omnibus:,5673.117,Durbin-Watson:,2.568
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3748573.516
Skew:,0.234,Prob(JB):,0.0
Kurtosis:,80.547,Cond. No.,2730.0


**Question 4.2:** Conduct a similar regression as 4.1 on change in strictly non-durable consumption. Interpret $\beta_4$ and $\beta_5$. Do your results differ much from that of part 1?

In [None]:
q4_2_X = ...
q4_2_y = ...
model_q4_2 = ...
...

In [19]:
## Solution ##
q4_2_X = get_dummies(X_q4, "year_month")
q4_2_y = rebates["dcs"]
model_q4_2 = sm.OLS(q4_2_y, q4_2_X).fit()
model_q4_2.summary()

0,1,2,3
Dep. Variable:,dcs,R-squared (uncentered):,0.007
Model:,OLS,Adj. R-squared (uncentered):,0.006
Method:,Least Squares,F-statistic:,5.375
Date:,"Thu, 21 May 2020",Prob (F-statistic):,1.64e-14
Time:,13:43:06,Log-Likelihood:,-132250.0
No. Observations:,14960,AIC:,264500.0
Df Residuals:,14939,BIC:,264700.0
Df Model:,21,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,351.6118,54.335,6.471,0.000,245.108,458.116
dnumkids,99.6685,70.494,1.414,0.157,-38.508,237.845
age,0.5119,0.821,0.623,0.533,-1.098,2.122
taxreb,0.2467,0.088,2.795,0.005,0.074,0.420
ltaxreb,-0.1652,0.090,-1.836,0.066,-0.342,0.011
year_month_200103,90.9539,97.658,0.931,0.352,-100.468,282.376
year_month_200104,205.3009,93.004,2.207,0.027,23.002,387.600
year_month_200105,104.2837,90.908,1.147,0.251,-73.907,282.474
year_month_200106,76.4679,70.527,1.084,0.278,-61.774,214.710

0,1,2,3
Omnibus:,14763.087,Durbin-Watson:,2.437
Prob(Omnibus):,0.0,Jarque-Bera (JB):,10229816.482
Skew:,3.934,Prob(JB):,0.0
Kurtosis:,130.865,Cond. No.,2730.0


**Question 4.3:**  We have repeated the regression of change in food consumption, but with the stimulus payment as a binary variable, like in part 2. Interpret $\beta_4$ and $\beta_5$. Do your results differ much from that of part 2?

In [20]:
X_q4_3 = X_q4
X_q4_3["itaxreb"] = (X_q4["taxreb"] > 0).astype(int)
X_q4_3["iltaxreb"] = (X_q4["ltaxreb"] > 0).astype(int)
X_q4_3.head()

Unnamed: 0,year_month,dnumadult,dnumkids,age,taxreb,ltaxreb,year_month_200103,year_month_200104,year_month_200105,year_month_200106,...,year_month_200111,year_month_200112,year_month_200201,year_month_200202,year_month_200203,year_month_200204,year_month_200205,year_month_200206,itaxreb,iltaxreb
0,200103,0,0,85.0,0,0.0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,200106,0,0,85.0,0,0.0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
2,200109,0,0,85.0,0,0.0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,200103,0,0,51.0,0,0.0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,200106,0,0,51.0,0,0.0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


In [21]:
q4_3_X = get_dummies(X_q4_3, "year_month")
q4_3_y = rebates["dcf"]
model_q4_3 = sm.OLS(q4_3_y, q4_3_X).fit()
model_q4_3.summary()

0,1,2,3
Dep. Variable:,dcf,R-squared (uncentered):,0.006
Model:,OLS,Adj. R-squared (uncentered):,0.004
Method:,Least Squares,F-statistic:,3.825
Date:,"Thu, 21 May 2020",Prob (F-statistic):,1.68e-09
Time:,13:43:24,Log-Likelihood:,-123710.0
No. Observations:,14960,AIC:,247500.0
Df Residuals:,14937,BIC:,247600.0
Df Model:,23,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,155.4279,30.712,5.061,0.000,95.229,215.627
dnumkids,53.0338,39.839,1.331,0.183,-25.056,131.124
age,0.6142,0.464,1.323,0.186,-0.296,1.524
taxreb,0.1167,0.113,1.034,0.301,-0.105,0.338
ltaxreb,0.0160,0.115,0.139,0.889,-0.209,0.241
year_month_200103,-5.1419,55.191,-0.093,0.926,-113.322,103.039
year_month_200104,-5.1837,52.561,-0.099,0.921,-108.209,97.842
year_month_200105,-50.0473,51.376,-0.974,0.330,-150.751,50.657
year_month_200106,-28.3393,39.859,-0.711,0.477,-106.468,49.790

0,1,2,3
Omnibus:,5673.466,Durbin-Watson:,2.568
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3746101.951
Skew:,0.235,Prob(JB):,0.0
Kurtosis:,80.521,Cond. No.,2740.0


**Question 4.4:** Lastly we have also repeated the regression of change in consumption of strictly non-durables on `Stimulus Payment` as a binary variable. Interpret $\beta_4$ and $\beta_5$. Do your results differ much from that of part 2?

In [22]:
q4_4_X = q4_3_X
q4_4_y = rebates["dcs"]
model_q4_4 = sm.OLS(q4_4_y, q4_4_X).fit()
model_q4_4.summary()

0,1,2,3
Dep. Variable:,dcs,R-squared (uncentered):,0.008
Model:,OLS,Adj. R-squared (uncentered):,0.006
Method:,Least Squares,F-statistic:,4.969
Date:,"Thu, 21 May 2020",Prob (F-statistic):,5.06e-14
Time:,13:43:27,Log-Likelihood:,-132250.0
No. Observations:,14960,AIC:,264600.0
Df Residuals:,14937,BIC:,264700.0
Df Model:,23,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
dnumadult,350.6928,54.347,6.453,0.000,244.167,457.219
dnumkids,99.5177,70.499,1.412,0.158,-38.668,237.704
age,0.5295,0.822,0.644,0.519,-1.081,2.140
taxreb,0.3975,0.200,1.990,0.047,0.006,0.789
ltaxreb,-0.0233,0.203,-0.115,0.909,-0.421,0.375
year_month_200103,90.1086,97.664,0.923,0.356,-101.324,281.541
year_month_200104,204.3969,93.010,2.198,0.028,22.086,386.707
year_month_200105,103.3911,90.914,1.137,0.255,-74.811,281.593
year_month_200106,75.6083,70.534,1.072,0.284,-62.646,213.863

0,1,2,3
Omnibus:,14756.315,Durbin-Watson:,2.437
Prob(Omnibus):,0.0,Jarque-Bera (JB):,10222619.64
Skew:,3.931,Prob(JB):,0.0
Kurtosis:,130.821,Cond. No.,2740.0


### An Afterword

Johnson, Parker, and Souleles' results from their paper can be seen below. Part 1 corresponds to the leftmost set of regressions, part 2 to matches the second set from the left, and part 3 is the right-most set.
![](jps_results.png)

In this lab, you conducted 4 types of regressions to 'pin down' the causal effect of rebates on changes in non-durable consumption. We made multiple design choices in each model and could have made many other adjustments as well. For example, we could have determined changes in log consumption, controlled for more variables, or only regressed on households that received a stimulus payment. 

As you can see, doing econometrics often relies on many value judgments. Each subtle decision you make on your data or model may lead to large changes in your regression outcomes, and could be the difference between statistical significance and insignificance.

Something to keep in mind as we conduct many models to try out different adjustments is to be aware of potential p-hacking. With a p-value of 5%, we would expect to see statistically significant results even if the null hypothesis were true 1 in 20 times. Thus, even if the null hypothesis were true, a model may produce statistically significant results if we ran enough variations of the model.