by Turan G. Bali, Heiner Beckmeyer and Timo Wiedemann (2023)
This repository provides replication code for the paper Expected Mispricing by Bali, Beckmeyer and Wiedemann (2023). Please cite this paper if you are using the code or data:
@techreport{bali2023expected,
title={{Expected Mispricing}},
author={Bali, G. Turan and Beckmeyer, Heiner and Wiedemann, Timo},
type={{Working Paper}},
institution={{Available at SSRN}},
year={2023}
}
The file 1_create_dataset.py
creates the inital dataset. We use the code provided by Jensen, Kelly and Pedersen (2023) (GitHub) to get a a time-series for a set of 153 monthly firm-level characteristics and apply filters proposed by the authors.
The file 2_run_ipca.py
estimates monthly stock-specific realized mispricing (MP) defined as the residual return component relative to a six-factor IPCA model. We carefully set up an estimation procedure that avoids the inclusion of forward-looking information by including only information available at time
Our measure of firm
3_run_nn.py
estimates the feed-forward neural network.3_run_gbt.py
estimates the gradient-boosted regression tree.3_run_rf.py
estimates the random forest.
We obtain our final measure of expected mispricing based on an equal-weighted ensemble of these three forecasts:
The file create_online_data.py
exemplifies how we create the ensemble forecasts and creates a file to be made publicly available. The Apache Parquet file EMP_data.pq
contains the following information:
Variable | Description |
---|---|
date | Datetime index |
permno | CRSP permanent stock identifier |
expected_mp | Next month |
lead1m_mp | Next month |
We also provide a .csv
file (EMP_data.csv
) with the same information.
For convenience, both files can be downloaded directly from Dropbox, with firm-level mispricing data covering January 1993 through December 2022.