# Week 3 - Linear Models and Research Design (Stata)

Based on this material:

https://dss.princeton.edu/training/Regression101.pdf
https://dss.princeton.edu/training/Panel101.pdf

# Introduction to Regression

In [None]:
* we will be doing all regressions with reghdfe
* http://scorreia.com/software/reghdfe/quickstart.html

* Install ftools (remove program if it existed previously)
cap ado uninstall moresyntax
cap ado uninstall ftools
net install ftools, from("https://raw.githubusercontent.com/sergiocorreia/ftools/master/src/")

* Install reghdfe 5.x
cap ado uninstall reghdfe
net install reghdfe, from("https://raw.githubusercontent.com/sergiocorreia/reghdfe/master/src/")

* Install moremata (sometimes used by ftools but not needed for reghdfe)
cap ssc install moremata

ftools, compile
reghdfe, compile

cap ado uninstall ivreg2hdfe
cap ado uninstall ivreghdfe
cap ssc install ivreg2 // Install ivreg2, the core package
net install ivreghdfe, from(https://raw.githubusercontent.com/sergiocorreia/ivreghdfe/master/src/)

In [None]:
reghdfe, version

In [None]:
grstyle init
grstyle set plain, horizontal grid

In [None]:
* Load example dataset

use https://dss.princeton.edu/training/states.dta, clear

In [None]:
describe

In [None]:
su

In [None]:
* stata default regression command
reg csat expense

In [None]:
* regress SAT on student expenditures, no FE's absorbed, robust standard errors

reghdfe csat expense, noabsorb vce(robust)

In [None]:
* add more variables

reghdfe csat expense percent income high college, noabsorb vce(robust)

In [None]:
* load example dataset

sysuse nlsw88.dta

In [None]:
describe

In [None]:
tab industry

In [None]:
tab occupation

In [None]:
* regression with industry dummy variables

reghdfe wage hours i.industry, noabsorb vce(robust) 

In [None]:
* with absorb syntax, and clustering by industry.
* "nocons" means dont report the intercept -- we wont need it with absorb

reghdfe wage hours , absorb(industry) cluster(industry) nocons

In [None]:
* two-way FE's and two-way clustering:


reghdfe wage hours , absorb(industry occupation) cluster(industry occupation) nocons

In [None]:
* install estout to make regression tables
ssc install estout, replace

In [None]:
* "qui" mean "quietly" -- dont show regression output
eststo clear
eststo: qui reghdfe wage hours , noabsorb 
eststo: qui reghdfe wage hours , absorb(industry) cluster(industry) nocons
eststo: qui reghdfe wage hours, absorb(industry occupation) cluster(industry occupation) nocons
eststo: qui reghdfe wage hours union, absorb(industry occupation) cluster(industry occupation) nocons


In [None]:
* se means report standard errors, r2 means report r-squared
esttab, se r2

In [None]:
* get predicted yhat from last regression:
predict wagehat

In [None]:
* get residuals from last regression
gen wagetilde = wage - wagehat

In [None]:
* plot predictions against true values for outcome
scatter wagehat wage 

In [None]:
* binned means look better
binscatter wagehat wage

# Panel Data

In [None]:
* Load example dataset

use https://dss.princeton.edu/training/Panel101.dta, clear

In [None]:
su

In [None]:
* tell stata that this is a panel dataset
tsset country year

In [None]:
xtline y
graph display

In [None]:
xtline y, overlay
graph display

In [None]:
* Generate lags (uses value in the previous period)
tsset country year
gen L_y = L.y

In [None]:
* value in next period:
gen F_y = F.y

* value two periods ago:
gen L2_y = L2.y

In [None]:
ssc install lgraph, replace
lgraph y year
graph display

In [None]:
reghdfe y x1, noabsorb cluster(country) 

In [None]:
* the standard two-way fixed-effects model:

reghdfe y x1, absorb(country year) cluster(country) nocons

In [None]:
* with two-way clustering:

reghdfe y x1 x2, absorb(country year) cluster(country year) nocons

In [None]:
* can use lags/leads directly:

reghdfe y L.x1 x1 F.x1, absorb(country year) cluster(country) nocons