In [1]:
import pandas as pd
import nbconvert

import statsmodels.api as sm
from IPython.display import display

# Homework 1
## Step 1: Review the Literature and Develop the Theoretical Model

### A: Marginal Propensity to Consume

- $ MPC = \frac{dC}{dY} $
- $ C = consumption $
- $ Y = income $

### B: Two Academic Articles that include MPC
- Stigler, George J. “The Economics of Minimum Wage Legislation.” The American Economic Review, vol. 36, no. 3, 1946, pp. 358–365. JSTOR, www.jstor.org/stable/1801842. Accessed 2 Feb. 2021.
- Ronn, Ehud I. “Nonadditive Preferences and the Marginal Propensity to Consume.” The American Economic Review, vol. 78, no. 1, 1988, pp. 216–223. JSTOR, www.jstor.org/stable/1814709. Accessed 2 Feb. 2021.

## Step 2: Specify the Model: Select the Independent Variables and the Functional Form

$$ CON = \beta_0 + \beta_1*PYD + \beta_2*AAA $$


## Step 3: Hypothesize the Expected Signs of the Coefficients

- $\beta_0 > 0$ because even at 0 income and 0 interest people still need to spend on necessities (food, water, shelter, etc)
- $\beta_1 > 0$ because people will spend more when they have a higher income
- $\beta_2 < 0$ because higher interest rates encourage saving and discourage spending


## Step 4: Collect the Data. Inspect and Clean the Data

In [2]:
df = pd.read_csv("lab3.txt", sep="\t")

print("All data:")
display(df)

for col in df:
    print(f"Summary of {col}:")
    print("----------------")
    print(df[col].describe())
    print()

All data:


Unnamed: 0,year,CON,PYD,AAA
0,1945,1061.5,1383.1,0.32
1,1946,1193.6,1368.6,-5.97
2,1947,1216.4,1312.9,-11.79
3,1948,1243.9,1382.1,-4.88
4,1949,1278.5,1393.3,3.66
...,...,...,...,...
65,2010,10036.3,11055.1,3.34
66,2011,10263.5,11331.2,1.44
67,2012,10449.7,11676.2,1.57
68,2013,10699.7,11650.8,2.74


Summary of year:
----------------
count      70.000000
mean     1979.500000
std        20.351085
min      1945.000000
25%      1962.250000
50%      1979.500000
75%      1996.750000
max      2014.000000
Name: year, dtype: float64

Summary of CON:
----------------
count       70.000000
mean      4808.328574
std       3074.049797
min       1061.500000
25%       2046.250000
50%       3997.800050
75%       6946.324950
max      10969.000000
Name: CON, dtype: float64

Summary of PYD:
----------------
count       70.000000
mean      5376.949986
std       3321.814939
min       1312.900000
25%       2350.799925
50%       4574.149900
75%       7682.499875
max      11939.400000
Name: PYD, dtype: float64

Summary of AAA:
----------------
count    70.000000
mean      2.608857
std       3.268162
min     -11.790000
25%       1.540000
50%       2.920000
75%       4.415000
max       8.840000
Name: AAA, dtype: float64



## Step 5: Estimate and Evaluate the Equation

In [3]:
x = df[["PYD", "AAA"]]
y = df["CON"]

x = sm.add_constant(x) # Include intercept term

reg = sm.OLS(y, x).fit()
print(reg.summary())


                            OLS Regression Results                            
Dep. Variable:                    CON   R-squared:                       0.999
Model:                            OLS   Adj. R-squared:                  0.999
Method:                 Least Squares   F-statistic:                 3.623e+04
Date:                Fri, 05 Feb 2021   Prob (F-statistic):          2.22e-102
Time:                        10:11:55   Log-Likelihood:                -416.43
No. Observations:                  70   AIC:                             838.9
Df Residuals:                      67   BIC:                             845.6
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       -156.3649     21.858     -7.154      0.0

### A: Signs
I was correct for $\beta_1$ and $\beta_2$, but I incorrectly expected the intercept term to be positive.

### B: $R^2$
With this regression model, $R^2 = 0.999$. This means that out of all the data, 0.999 of the variation in output (CON) is tied to variation in the inputs (PYD and AAA)

### C: MPC
Using the previous model of MPC, the value should be the coefficient of PYD in the linear regression (since that's the derivative of CON wrt PYD). This leaves us with
$$ MPC = 0.9288 $$

### D: Consumption Changed based on Interest
The coefficient for AAA is -11.3581. In other words, increasing interest rates by 0.03 should change CON by $-11.3581*0.03 = -0.34$

### E: Conclusion
I think the linear regression is accurate. There weren't any errors in data collection and the model does not look incorrect to me.