# IVOL Puzzle & Arbitrage Asymmetry

This project aims to study the relationship between idiosyncratic volatility and expected return on equities. It attempts to recreate the study done by Stambaugh et al in 2015, with updated dataset up to 2024.

Idiosyncratic volatility is defined as the volatility of residuals when historical returns are regressed against Fama-French 3 factors. In line with Stambaugh's approach, we calculate IVOL over a historical period of one month. This yields a time series of IVOL values for each stock in our universe. 

At the same time, we use a composite, cross sectional measure for mispricing based on 10 return anomalies (the financial distress score is not easily retrievable). Each return anomaly is a time series object for each stock in our universe. To arrive at a composite measure for each time stamp, for each return anomaly, we rank each stock by the return anomaly. This yields 10 different rankings for each stock on any given day. We take the arithmetic mean of the 10 rankings to be the composite cross sectional measure of mispricing. 

With IVOL and mispricing measure, for each date, we sort stocks on IVOL and mispricing, grouping into 5 groups along each dimension. This forms 25 different baskets of stocks. We form each into a value-weighted portfolio. We then calculate the return on this portfolio for the next month, and regress against fama-french 3 factors to find the return that survive factor adjustment. 


## Notebook Set Up

In [3]:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
import numpy as np
import seaborn as sns
import wrds
import statsmodels.api as sm
from scipy.stats import percentileofscore

plt.style.use('seaborn-v0_8-notebook')
eps = 1e-8
mpl.rcParams['axes.titlesize'] = 12
mpl.rcParams['axes.labelsize'] = 10
mpl.rcParams['xtick.labelsize'] = 10
mpl.rcParams['ytick.labelsize'] = 10
mpl.rcParams['legend.fontsize'] = 10
mpl.rcParams['figure.figsize'] = [10, 6]
mpl.rcParams['figure.dpi'] = 100
mpl.rcParams['savefig.dpi'] = 100
mpl.rcParams['figure.autolayout'] = True

conn = wrds.Connection()
start = "1965-01-01"
end = "2025-01-01"

WRDS recommends setting up a .pgpass file.
Created .pgpass file successfully.
You can create this file yourself at any time with the create_pgpass_file() function.
Loading library list...
Done


## Security identification
In line with Stambaugh et al's methodology, we use all NYSE/AMex/NASDAQ stocks with share prices greater than five dollars. We expand the dataset to cover up to 2024.

The code used to extract all tickers from the three relevant exchanges was 

```python
tickers = conn.raw_sql(
    f"""
    SELECT permno, ticker, namedt
    FROM crsp.msenames
    WHERE namedt BETWEEN '{start}' AND '{end}'
    AND exchcd in (1, 2, 3)
    """
)
```

From documentation, we understand that PERMNO is a unique five-digit permanent identifier assigned by CRSP to each security in the file. Unlike CUSIP, TICKER, and COMNAM, the PERMNO neither changes during an issue's trading history, nor is reassigned after an issue ceases trading. The user may track a security through its entire trading history in CRSP's files with one PERMNO, regardless of name changes or capital structure changes. A security that is included on both CRSP's Nasdaq and NYSE/AMEX files will have the same CRSP permanent number in both files.

Therefore, we have elected to work with `permnos` when identifying securities, instead of using individual tickers


## IVOL calculation

To calculate idiosyncratic volatility, we retrieved daily FF 3 factors. 

```python
factors = conn.raw_sql(
    f"""
    SELECT date, mktrf, smb, hml, rf
    FROM ff.factors_daily
    WHERE date BETWEEN '{start}' AND '{end}'
    """,
    date_cols = ['date']
).set_index('date')

factors = (
    factors.replace([np.inf, -np.inf], np.nan)
    .fillna(0)
    .astype(float)
)
```

We want a time-series at month-end frequency for each security in our universe, where `ivol_df.loc[t, p]` refers to the volatility of the residuals after regressing the daily returns for security `p` for month `t` against FF 3 factors. 

```python
ivol_df = pd.DataFrame(
    index = month_end_dates,
    columns = tickers['permno'].unique()
)

for i, permno in enumerate(ivol_df.columns[:100]):
    ret_df = (
        conn.raw_sql(
        f"""
        SELECT date, ret
        FROM crsp.dsf
        WHERE date BETWEEN '{start}' AND '{end}'
        AND permno = {permno}
        """,
        date_cols = ['date'])
        .set_index('date')
        .merge(
            factors,
            left_index=True,
            right_index=True,
            how='left'
        )
    )
    ret_df['ex_ret'] = ret_df['ret'] - ret_df['rf']

    for idx in ivol_df.index:
        time_range_mask = (
            (ret_df.index >= idx - pd.DateOffset(months = 1)) * (ret_df.index <= idx)
        )
        X = np.asarray(
            ret_df[['mktrf', 'smb', 'hml']][time_range_mask]
        )
        y = np.asarray(
            ret_df[['ex_ret']][time_range_mask]
        )

        if X.shape[0] != 0:
            X = sm.add_constant(X)
            model = sm.OLS(y, X).fit()
            ivol = np.std(
                model.resid
            )
            ivol_df.loc[idx, col] = ivol


ivol_df = pd.read_csv('./Data/full_ivol_df.csv', index_col=0)
```

## Return anomalies




## IVOL EDA

In [4]:
ivol_df = pd.read_csv('./Data/full_ivol_df.csv', index_col=0)
ivol_df.index = pd.to_datetime(ivol_df.index)
ivol_df

Unnamed: 0,83264,63618,10896,69906,79030,11233,44127,86580,91786,88837,...,56120,31713,85792,83586,89403,81454,79163,86036,92970,19812
1965-01-31,,,,,,,,,,,...,,,,,,,,,,
1965-02-28,,,,,,,,,,,...,,,,,,,,,,
1965-03-31,,,,,,,,,,,...,,,,,,,,,,
1965-04-30,,,,,,,,,,,...,,,,,,,,,,
1965-05-31,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-08-31,0.011945,,,,,,,0.025849,0.009253,0.013778,...,,,0.015452,,0.018388,,,,,0.042034
2024-09-30,0.010587,,,,,,,0.011175,0.011429,0.010877,...,,,0.012494,,0.010137,,,,,0.162169
2024-10-31,0.007728,,,,,,,0.016819,0.010310,0.047612,...,,,0.008540,,0.009744,,,,,0.045123
2024-11-30,0.008987,,,,,,,0.017378,0.012528,0.048934,...,,,0.010319,,0.024362,,,,,0.108532
