# OLS Demand Model

This notebook estimates a simple linear demand model using Ordinary Least Squares (OLS).

The purpose of this analysis is to create a baseline model that might be used in a real-life situation, and show how those results compare to more sophisticaed options. The panel data generated for this has the following characteristics:

- Store-level fixed effects
- Week-level fixed effects
- Lagged advertising effects
- Correlated regressors

Because the true data-generating process (DGP) is known, we can directly compare estimated coefficients to the true structural parameters.

Note that I expect the estimates here to be biased due to ignoring the above effects.


In [6]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from pathlib import Path

pd.set_option("display.float_format", "{:,.4f}".format)



In [7]:
# read the data file
base_dir = Path("..")
data_path = base_dir / "data_examples" / "final_simulated_panel.csv"
params_path = base_dir / "data_examples" / "true_params.json"

df = pd.read_csv(data_path)
true_params = pd.read_json(params_path, typ="series")

print(f"Loaded {len(df):,} rows")
df.head()


Loaded 7,800 rows


Unnamed: 0,store_id,store_size,area_income,manager_experience,manager_vacant,base_competitors,store_fe,week,week_fe,price,...,eff_print,eff_other,eff_adstock_broadcast_tv,eff_adstock_stream_tv,lower_funnel_multiplier,eff_paid_search_adj,eff_paid_social_adj,eff_direct_mail_adj,log_sales,sales
0,0,1.0609,62.8912,4.2437,0,2,0.1759,0,0.0488,11.494,...,0.1252,0.2182,0.0862,0.3454,1.074,0.4733,0.4022,0.1915,21.0647,1406929191.3342
1,0,1.0609,62.8912,4.2437,0,2,0.1759,1,0.1356,10.1604,...,0.1276,0.213,0.15,0.5272,1.1126,0.451,0.3842,0.1739,21.6582,2546956189.1163
2,0,1.0609,62.8912,4.2437,0,2,0.1759,2,-0.1171,10.3948,...,0.1322,0.2048,0.1967,0.6251,1.1353,0.4683,0.3814,0.168,21.2015,1613232442.5804
3,0,1.0609,62.8912,4.2437,0,2,0.1759,3,-0.1817,11.2145,...,0.1139,0.1991,0.2316,0.6908,1.1508,0.5188,0.3544,0.1998,21.1284,1499430677.3042
4,0,1.0609,62.8912,4.2437,0,2,0.1759,4,-0.3984,10.9123,...,0.1194,0.2358,0.2502,0.7458,1.1631,0.4971,0.421,0.2031,21.1984,1608195278.1551


## Model specification

We estimate the following naive demand model:

log(sales) ~ price  
            + relative_price  
            + ad_spend  
            + ad_spend_lag  
            + competitor_count  
            + store_size  
            + area_income  
            + manager_experience  
            + manager_vacant  

This model assumes:
1. model is linear
2. variance is constant over the data set (including over time and across locations)
3. errors are i.i.d. and normal, and not correlated with the regressors
4. no fixed effects (in time or by location)
5. no multicollinearity among the regressors



In [8]:
# define target variable and regressors

# Target variable
y = df["log_sales"]

# Base (non-advertising) regressors
X_base = df[
    [
        "price",
        "relative_price",
        "competitor_count",
        "store_size",
        "area_income",
        "manager_experience",
        "manager_vacant",
    ]
]

# Advertising channels (current period)
ad_channels = [
    "spend_paid_search",
    "spend_paid_social",
    "spend_broadcast_tv",
    "spend_stream_tv",
    "spend_direct_mail",
    "spend_print",
    "spend_other",
]

X_ads_current = df[ad_channels]

# Lagged advertising channels
X_ads_lagged = (
    df
    .groupby("store_id")[ad_channels]
    .shift(1)
    .add_suffix("_lag")
    .fillna(0.0)
)

# Combine all regressors
X = pd.concat(
    [X_base, X_ads_current, X_ads_lagged],
    axis=1
)

# Add intercept
X = sm.add_constant(X)


In [9]:
# fit the OLS model
ols_model = sm.OLS(y, X).fit()

ols_model.summary()


0,1,2,3
Dep. Variable:,log_sales,R-squared:,0.326
Model:,OLS,Adj. R-squared:,0.324
Method:,Least Squares,F-statistic:,179.3
Date:,"Sun, 21 Dec 2025",Prob (F-statistic):,0.0
Time:,16:45:48,Log-Likelihood:,-5921.1
No. Observations:,7800,AIC:,11890.0
Df Residuals:,7778,BIC:,12040.0
Df Model:,21,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,21.5079,0.151,142.663,0.000,21.212,21.803
price,-0.1194,0.005,-22.623,0.000,-0.130,-0.109
relative_price,-1.4053,0.123,-11.403,0.000,-1.647,-1.164
competitor_count,-0.1219,0.004,-31.629,0.000,-0.129,-0.114
store_size,0.6681,0.041,16.154,0.000,0.587,0.749
area_income,0.0240,0.001,30.224,0.000,0.022,0.026
manager_experience,-0.0089,0.003,-2.938,0.003,-0.015,-0.003
manager_vacant,-0.1694,0.019,-8.779,0.000,-0.207,-0.132
spend_paid_search,0.0008,0.000,1.944,0.052,-6.76e-06,0.002

0,1,2,3
Omnibus:,22.59,Durbin-Watson:,1.031
Prob(Omnibus):,0.0,Jarque-Bera (JB):,21.71
Skew:,-0.107,Prob(JB):,1.93e-05
Kurtosis:,2.854,Cond. No.,19000.0


The summary above reports coefficient estimates under a naive OLS assumption.

Because the data were generated with store- and week-level fixed effects,
several regressors are correlated with unobserved heterogeneity.

As a result, OLS estimates should not be interpreted as causal.


In [10]:
# compare estimates to true parameters

results = (
    pd.DataFrame({
        "OLS_Estimate": ols_model.params,
        "True_Value": true_params
    })
    .dropna()
)

results["Bias"] = results["OLS_Estimate"] - results["True_Value"]
results["Pct_Error"] = results["Bias"] / results["True_Value"]

results


Unnamed: 0,OLS_Estimate,True_Value,Bias,Pct_Error
area_income,0.024,0.02,0.004,0.2018
competitor_count,-0.1219,-0.15,0.0281,-0.1877
manager_experience,-0.0089,0.01,-0.0189,-1.8936
manager_vacant,-0.1694,-0.05,-0.1194,2.3884
price,-0.1194,-0.12,0.0006,-0.0047
relative_price,-1.4053,-1.31,-0.0953,0.0728
store_size,0.6681,0.3,0.3681,1.227


## Estimation accuracy

The table above compares OLS estimates to the known true parameters used
in the data-generating process.

Key observations:
- Some coefficients are biased toward zero
- Others are inflated due to omitted variable bias
- Price and advertising effects are particularly sensitive

This behavior is expected given the omission of fixed effects.


## Why OLS is biased here

OLS fails in this setting because:

1. Store fixed effects are correlated with price, advertising, and size
2. Week fixed effects induce common shocks across stores
3. Lagged advertising creates serial correlation
4. Competitive conditions vary systematically across stores

These violations break the exogeneity assumption required for OLS consistency.


## Next steps

In the next notebook, we will introduce:
- Store fixed effects
- Week fixed effects
- Proper panel estimation

This will allow us to recover the true demand elasticities
and demonstrate how fixed-effects models correct OLS bias.
