# Regression anatomy theorem
In this notebook we provide an example of how regression anatomy works. Suppose we want to estimate the causal effect of family size in labor supply.
\begin{equation}
    Y_i = \beta_0 + \beta_1 X_i + u_i
\end{equation}
where $Y$ is labor supply and X is family size.

In [3]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

# read data
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
def read_data(file):
    return pd.read_stata("https://raw.github.com/scunning1975/mixtape/master" + file)

auto = pd.read_stata('https://raw.github.com/scunning1975/mixtape/master/auto.dta')



In [6]:
auto.head()

Unnamed: 0,make,price,mpg,rep78,headroom,trunk,weight,length,turn,displacement,gear_ratio,foreign,length_diff
0,AMC Concord,4099,22,3.0,2.5,11,2930,186,40,121,3.58,Domestic,-1.932432
1,AMC Pacer,4749,17,3.0,3.0,11,3350,173,40,258,2.53,Domestic,-14.932432
2,AMC Spirit,3799,22,,3.0,12,2640,168,35,121,3.08,Domestic,-19.932432
3,Buick Century,4816,20,3.0,4.5,16,3250,196,40,196,2.93,Domestic,8.067568
4,Buick Electra,7827,15,4.0,4.0,20,4080,222,43,350,2.41,Domestic,34.067568


In [5]:
auto['length_diff'] = auto['length'] - auto['length'].mean()

In [8]:
modelocorto = sm.OLS.from_formula('price ~ length', \
    data=auto).fit()
modelolargo = sm.OLS.from_formula('price ~ length + weight + headroom + mpg', \
    data=auto).fit()

In [10]:
print(modelocorto.params)
print(modelolargo.params)

Intercept   -4584.899018
length         57.202238
dtype: float64
Intercept    14177.582331
length         -94.496510
weight           4.335045
headroom      -490.966654
mpg            -87.958383
dtype: float64


In [11]:
auto['y_single'] = modelocorto.params[0] + modelocorto.params[1]*auto['length']
auto['y_multi'] = modelolargo.params[0] + modelolargo.params[1]*auto['length']

In [12]:
import plotnine as p

p.ggplot(auto) + \
    p.geom_point(p.aes(x = 'length', y = 'price')) +\
    p.geom_smooth(p.aes(x = 'length', y = 'y_single')) + \
    p.geom_smooth(p.aes(x = 'length', y = 'y_multi'))

ModuleNotFoundError: No module named 'plotnine'

In [13]:
!pip3 install plotnine

Collecting plotnine
  Downloading plotnine-0.8.0-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 1.9 MB/s eta 0:00:01
[?25hCollecting numpy>=1.19.0
  Downloading numpy-1.22.4-cp39-cp39-macosx_10_15_x86_64.whl (17.7 MB)
[K     |████████████████████████████████| 17.7 MB 1.6 MB/s eta 0:00:01
[?25hCollecting mizani>=0.7.3
  Downloading mizani-0.7.4-py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 63 kB 1.7 MB/s eta 0:00:01
[?25hCollecting statsmodels>=0.12.1
  Downloading statsmodels-0.13.2-cp39-cp39-macosx_10_9_x86_64.whl (9.6 MB)
[K     |████████████████████████████████| 9.6 MB 2.0 MB/s eta 0:00:01
[?25hCollecting pandas>=1.1.0
  Downloading pandas-1.4.2-cp39-cp39-macosx_10_9_x86_64.whl (11.1 MB)
[K     |████████████████████████████████| 11.1 MB 2.0 MB/s eta 0:00:01
[?25hCollecting descartes>=1.1.0
  Downloading descartes-1.1.0-py3-none-any.whl (5.8 kB)
Collecting patsy>=0.5.1
  Downloading patsy-0.5.2-py2.py3-none-any.whl (233 kB)
