<a href="https://colab.research.google.com/github/jacquelinedoan/pricing_causal/blob/main/pricing_causal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Heckman Correction, Instrumental Variable, and DoubleML for Demand Modelling with Censored Purchase Data**

For a given customer $i$ and product $j$ (and context $x$), we want price $p^*$ such that
$$p^* = \text{argmax}_p \text{ profit} (p|x) = (p-c)\times E(D(p|x))$$

where
*   $c$: unit cost
*   $D(p|x)$: expected demand at price $p$ given context $x$

In [27]:
import os
import numpy
import pandas as pd
import kagglehub

# Data

path = kagglehub.dataset_download("olistbr/brazilian-ecommerce")
cust, sell, rev, item, prod, geo, cat, orders, pay = [pd.read_csv(f"{path}/{file}") for file in os.listdir(path)]
master = orders.merge(item, on="order_id", how='left')\
               .merge(prod, on="product_id", how='left')\
               .merge(cat, on="product_category_name", how='left')\
               .merge(cust, on="customer_id", how='left')\
               .merge(sell, on="seller_id", how='left')\
               .merge(pay, on="order_id", how='left')\
               .merge(rev, on="order_id", how='left')
# Master Table
# Order-level data

Dynamic pricing requires a demand model/ demand curve. A demand curve is the relationship between the quantity demanded (or propensity to purchase) and the price among other features.


**Problem Set-Up**:


1.   Endogenous Variable: price is partially determined by demand. **Need Instrumental Variable.**

2.   Censored Data: decision to not purchase is not observable. **Need Selection Correction.**

## Instrumental Variable
We want to estimate the response of market demand to exogenous changes in market prices. Quantity demanded depends on prices, but prices are not exogenously given since they are determined in part by market demand. The instrument for price is a variable that is correlated with price but does not directly effect quantity demanded, ideally a supply-inducing variable.



## Selection Correction