# Tutorial

# House Keeping

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from scipy.stats import chi2_contingency
import statsmodels.formula.api as smf
import statsmodels.api as sm
import matplotlib.pyplot as plt # type: ignore
from statsmodels.nonparametric.smoothers_lowess import lowess


import seaborn as sns # type: ignore
from causalml.match import NearestNeighborMatch # type: ignore
from causalml.match import create_table_one # type: ignore
import dowhy.datasets # type: ignore

# DiD: Effect of a Garbage Incinerator on Housing Price.

This question is Based on Kiel and McClain (1995). The data is from Wooldridge Example 13.3. The data [description](https://www.dropbox.com/s/mn2iu0gkix0pqii/KIELMC.DES?dl=0) and [raw data](https://www.dropbox.com/s/6nga0ds63zhujwq/KIELMC.raw?dl=0) are provided to you.
The variables and their meanings are listed below.

Obs:   321

1. year:                     1978 or 1981
2. age:                      age of house
3. agesq:                    age^2
4. nbh:                      neighborhood #, 1 to 6
5. cbd:                      dist. to central bus. dstrct, feet
6. intst:                    dist. to interstate, feet
7. lintst:                   log(intst)
8. price:                    selling price
9. rooms:                    # rooms in house
10. area:                     square footage of house
11. land:                     square footage lot
12. baths:                    # bathrooms
13. dist:                     dist. from house to incinerator, feet
14. ldist:                    log(dist)
15. wind:                     perc. time wind incin. to house
16. lprice:                   log(price)
17. y81:                      =1 if year == 1981
18. larea:                    log(area)
19. lland:                    log(land)
20. y81ldist:                 y81*ldist
21. lintstsq:                 lintst^2
22. nearinc:                  =1 if dist <= 15840
23. y81nrinc:                 y81*nearinc
24. rprice:                   price, 1978 dollars
25. lrprice:                  log(rprice)


- Read the data and give proper names to each variable.
  - Note: The raw data is not a CSV, so `read.csv` does not work. What can you do about it?
- Use 1981 data to estimate a linear model of `rprice` on `nearinc`.
  - Why not use `price`?
- Use 1978 data to estimate the same model.
- What is the treatment effect based on the previous two regression results?
- Set up a DiD regression as in the lecture and find the treatment effect.

# RDD

Let's have a competition to see who can estimate the local ATE more precisely. The true value will be revealed after the competition.

The data is [here](https://drive.google.com/file/d/1cGLkO_NkNAjOe-Q2RMNVGOMFP9Y13-0T/view?usp=sharing).
- $y$ is the output
- $d$ is the treatment
- $r$ is the running variable.



# IV

IV is well covered in Basic Econometrics. We are happy to help you in consultation if you need more details. The routines in the lecture note shall serve most purposes.

# Matching

- the effect of job training on subsequent earnings
- focus on estimating the effect of the treatment (treat) on earnings in 1978 (re78), conditional on covariates.

A data frame with 445 observations, corresponding to 185 treated and 260 control subjects. The treatment assignment indicator is the first variable of the data frame: treatment (1 = treated; 0 = control). More covariates are:

- age, measured in years;
- education, measured in years;
- black, indicating race (1 if black, 0 otherwise);
- hispanic, indicating race (1 if Hispanic, 0 otherwise);
- married, indicating marital status (1 if married, 0 otherwise);
- nodegree, indicating high school diploma (1 if no degree, 0 otherwise);
- re74, real earnings in 1974;
- re75, real earnings in 1975.
- The last variable of the data frame is re78, the real the earnings in 1978.
- u74 and u75 are derived, meaning zero income or not in the corresponding year.

In [None]:
lalonde = dowhy.datasets.lalonde_dataset() # type: ignore #ignore
