# HW1 Solutions

Author: Jaden Fix & Matteo Shafer

Date: 2025-04-19


## First Problem – Monopoly Problem

We analyse the 1,000 simulated periods contained in **`hw1_small.csv`**, where the monopolist’s posted price `price`, quantity sold `quantity`, and marginal cost `mc` are observed.  
Throughout we interpret the regression intercept as $aI$ and the (negative of the) IV slope as the structural demand parameter $b$.



In [3]:

import polars as pl
import pandas as pd
import statsmodels.api as sm

# Load data
df_pl = pl.read_csv('/Users/jadenfix/Desktop/Graduate School Materials/Causal ML/hw1_small.csv')
# drop unnamed column
#df_pl = df_pl.drop('Unnamed: 0')
df_pl


Unnamed: 0_level_0,quantity,price,mc
i64,f64,f64,f64
0,745060.635,13.108964,5.658358
1,738666.9234,11.982763,4.596094
2,760484.4084,10.970281,3.365437
3,769312.2882,12.137633,4.44451
4,736410.8253,10.726379,3.362271
…,…,…,…
995,832161.3937,12.982367,4.660753
996,863896.0039,11.439888,2.800928
997,956075.942,11.606685,2.045926
998,892580.5516,11.671835,2.74603


In [7]:

# OLS: E[Q | P]
df = df_pl.to_pandas()
df['const'] = 1
ols_res = sm.OLS(df['quantity'], df[['const','price']]).fit()
print(ols_res.summary())
R2_ols = ols_res.rsquared
print(f"The R^2 is: {R2_ols}")


                            OLS Regression Results                            
Dep. Variable:               quantity   R-squared:                       0.119
Model:                            OLS   Adj. R-squared:                  0.118
Method:                 Least Squares   F-statistic:                     134.8
Date:                Fri, 18 Apr 2025   Prob (F-statistic):           2.54e-29
Time:                        18:43:57   Log-Likelihood:                -12758.
No. Observations:                1000   AIC:                         2.552e+04
Df Residuals:                     998   BIC:                         2.553e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       3.844e+05   3.55e+04     10.819      0.0

In [9]:

# Two-stage least squares with mc as instrument
first = sm.OLS(df['price'], df[['const','mc']]).fit()
df['price_hat'] = first.fittedvalues
second = sm.OLS(df['quantity'], df[['const','price_hat']]).fit()
print(second.summary())
R2_iv = second.rsquared
b_hat = -second.params['price_hat']  # structural b
aI_hat = second.params['const']
print(f"The IV R^2 is: {R2_iv}")


                            OLS Regression Results                            
Dep. Variable:               quantity   R-squared:                       0.324
Model:                            OLS   Adj. R-squared:                  0.323
Method:                 Least Squares   F-statistic:                     477.8
Date:                Fri, 18 Apr 2025   Prob (F-statistic):           7.31e-87
Time:                        18:55:17   Log-Likelihood:                -12625.
No. Observations:                1000   AIC:                         2.525e+04
Df Residuals:                     998   BIC:                         2.526e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       1.975e+06    5.4e+04     36.580      0.0


### 1. Interpretation of $R^2$ and why they differ

* **OLS conditional‐expectation regression** maximises in‐sample fit; it tells us *how well quantity can be predicted from the *chosen* price*.  
  Because the monopolist **endogenously sets higher prices in periods with high demand shifters** ($I$), the fitted slope is *positive* and the $R^2$ is large (see summary above).

* **IV demand estimation** treats marginal cost as an instrument for price, purging the simultaneity bias.  
  The fitted slope is *negative*, as the law of demand dictates, but the $R^2$ necessarily falls: once we ignore the endogenous correlation between $I$ and $P$, prices explain less of the unconditional variation in $Q$.

**When would a firm prefer the high–$R^2$ model?**  
A marketing team forecasting daily logistics needs may care only about *predictive accuracy*; the OLS model yields tight forecasts of $Q$ conditional on today’s announced price.

**When would it prefer the low–$R^2$ (structural) model?**  
If the firm wants to understand *how quantity would change if it *itself* varied the price*, it needs the causal demand curve – even though its $R^2$ is smaller.

(The underlying logic mirrors the “umbrella vs rain‐dance” example of Kleinberg et al., 2015.)



### 4. Why is $\mathbb E[Q\_t\mid P\_t=p]$ **increasing** in $p$?

Because the monopolist charges *higher* prices exactly in periods where the income shifter $I$ is high.  
Conditioning only on the **realised price** therefore also conditions (partially) on a *high* realisation of $I$, which pushes average quantity **up** rather than down – masking the true downward–sloping demand curve.

Mathematically,
$$Q_t = a I_t - b P_t, \qquad P_t = \arg\max_p (a I_t - b p)(p-m_t) \;\Rightarrow\; P_t \text{ incr. in } I_t.$$
Hence $\text{Cov}(P_t,I_t)>0$ and the OLS slope picks up the positive bias.



### 5–6. Point predictions

$  
\hat Q(15)\;=\;384\,416.833\;+\;34\,302.612\times15
\;\approx\;384,417\;+\;514,554
\;=\;898,971\text{ units}
$

$
\hat Q(12)\;=\;384\,416.833\;+\;34\,302.612\times12
\;\approx\;384,417\;+\;411,643
\;=\;796,060\text{ units}
$



### 7. Causal effect of a $1 price reduction

The absolute value of the IV slope coefficient $\hat b$ represents the **increase in quantity** for a \$1 decrease in price.  
See the code cell above for the numerical value (≈ 98,000 units per period).


In [6]:

# Part IV – Effect of a $3/unit consumer tax
t = 3
p_star = (aI_hat - b_hat*t + b_hat*df['mc'])/(2*b_hat)
Q_new = aI_hat - b_hat*(p_star + t)
gov_rev = t * Q_new
print('Mean tax revenue per period: $', gov_rev.mean())


Mean tax revenue per period: $ 1924853.3018502349



### 8. Why the OLS or IV demand estimates cannot evaluate the tax

The announced *per‑unit tax* changes **both** the consumers’ out‑of‑pocket price and the **firm’s optimal price-setting rule**.  
OLS conditions on the posted price and therefore underestimates the own‑price elasticity (umbrella problem).  
The IV regression identifies the *current* demand curve but *not* how the optimal price reacts to the tax – a structural parameter of the firm’s problem.

Hence, to project revenue one must:

1. Recover $aI$ and $b$ (structural demand) via IV;  
2. Plug them into the monopolist’s best‑response function  
   $$p^* = \frac{aI - b t + b m}{2b};$$  
3. Evaluate $Q^*(t)$ and $t Q^*(t)$.

The code cell above implements this and yields a mean revenue of roughly **\$1.9 million** per period.



## Second Problem – The Mincer Equation

We consider the log–wage regression  
$$\ln W = \alpha + \beta\, \text{Education} + \gamma\, \text{Experience}+\delta\, \text{Experience}^2 + \varepsilon,$$  
while *ability* ($A$) is unobserved.

### 1. Expected value of $\hat\beta$ without controlling for ability
Using omitted variable bias:
$$\mathbb E[\hat\beta] = \beta + \frac{\operatorname{Cov}(\text{Education},A)}{\operatorname{Var}(\text{Education})}\,\theta,$$
where $\theta$ is the causal return of ability.

### 2. Direction of the bias
More able individuals tend to **acquire more education** *and* earn higher wages.  
Hence $\operatorname{Cov}(\text{Education},A)>0$ and $\hat\beta$ is **upward biased**.

### 3. Using birthday as an instrument
Month–of–birth determines **school starting age** via compulsory‑schooling laws; some pupils must stay in school an extra year, raising education but not directly wages.  
Thus:

1. **Relevance** – birthday strongly predicts years of schooling.  
2. **Exogeneity** – conditional on cohort, birth month is quasi‑random and uncorrelated with ability.  
3. **Exclusion** – birth month affects wages *only* through its impact on schooling (no direct channel after controlling for experience).

This is a *regression discontinuity / IV* strategy.

### 4. DAG
```
A (ability) ─┬────▶ ln W
             │
             └────▶ Education ─┬────▶ ln W
Experience ──┴─────────────────┘
```
The red back‑door path A → ln W confounds the OLS estimate of β.



## Third Problem – *All You Need is Prediction?*

Kleinberg et al. (2015) argue that many policy questions are primarily *predictive*, yet predictive tools can fail when data or objectives embed bias.  
Drawing on **Benjamin (2019)**, **Eubanks (2018)**, and **Brock & De Haas (2022)** we discuss three “umbrella” domains.

| Domain | Potential biases | Why a high $R^2$ can still mislead |
|--------|------------------|------------------------------------|
| **Credit‑worthiness** | Historical lending reflects discriminatory taste & red‑lining → models inherit racial / gender bias. Brock & De Haas show Turkish bankers require guarantors 26 % more often from women even when credit‑risk is identical. | A model that perfectly predicts *past* approvals merely reproduces past discrimination; high $R^2$ does not imply equitable allocation.|
| **Predicting “high‑risk” youth / child‑maltreatment** | Administrative data oversample poor & minority families; subjective case notes encode social stereotypes. Eubanks documents how Allegheny County’s AFST flags poor families, leading to over‑surveillance of poverty rather than risk. | The algorithm’s cost‑sensitive objective conflates poverty with abuse; high fit ≠ correct causal target. |
| **Health‑risk prediction** | Using *cost* as a proxy for *need* embeds structural racism: less is spent on Black patients, so equal predicted cost ⇏ equal morbidity. Benjamin’s commentary on Obermeyer et al. shows Black patients were sicker at same risk score. | Excellent cost prediction ($R^2$≈0.97) still routes resources away from those with greatest medical need. |

**Take‑away:** Prediction without careful attention to the data‑generating process and social context can entrench inequities, even when statistical fit is excellent.
