# Corporate event trading

In [None]:
#hide
%load_ext autoreload
%autoreload 2
%matplotlib inline

import random
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

In [None]:
#hide
from IPython.display import display, Image

## Earnings announcement premium

In [None]:
#hide 
display(Image('images/savor_ea_1.png',width=500))

Main statements in Savor and Wilson (2016):
1. earnings announcement premium = 9.9% / year 
1. announcing firms are "risky" and therefore there should be a compensation for risk
1. more precisely, firm earnings contain news about market cash-flow risk and therefore matter for aggreate risk

We focus on the first point. 

In [None]:
#hide 
display(Image('images/savor_ea_2.png',width=700))

In [None]:
#hide 
display(Image('images/savor_ea_3.png',width=500))

## Regulatory filings

The main regulatory filings to the U.S. Securities and Exchange Commission (SEC) are known as the 10-Ks and 10-Qs. 
- The 10-K is an annual report that gives a summary of the company's financial performance (and includes information such as company history, executive compensation, etc). 
- The 10-Q is a quarterly report and contains similar information as the 10-K, but with less details. 

The regulatory filings and the earning conference calls take place typically on the same day, so that all the market-moving information is disclosed to the market at the same time. 

## Filing dates from 10-Ks/10-Qs

In this section, we use the sample of firms from the daily stock return dataset and match it to the McDonald repository (https://sraf.nd.edu/) used in particular in the Loughran-McDonald paper. 

In [None]:
from skfin.plot import line, bar
from skfin.datasets import load_sklearn_stock_returns, load_10X_summaries, mapping_10X
ret = load_sklearn_stock_returns(cache_dir="data")

In [None]:
df = load_10X_summaries()

In [None]:
df.sample(n=5).iloc[:, :10]

The mapping of stock tickers to company name is `mapping_10X`:
- given that the name of firms can change (e.g. "Dell computer corp" becoming "Dell inc"), all the possible names need to be tracked. 

In [None]:
random.choices(list(mapping_10X.items()), k=10)

The table below shows the number of regulatory filings over time for the selected firms. 

In [None]:
pd.DataFrame.from_dict({k: df.loc[lambda x: x.CoName.isin(v if isinstance(v, list) else [v])]\
                            .set_index('date')\
                            .loc[ret.index[0]:ret.index[-1]]\
                            .groupby(['FORM_TYPE'])['FILING_DATE'].count()
                        for k, v in mapping_10X.items()}, orient='index')\
            .assign(**{'10_K_Q': lambda x:x['10-K'] + x['10-Q'], 
                      'restatements': lambda x:x['10-K-A'] + x['10-Q-A'] })\
            .sort_values(['10_K_Q', 'restatements']).fillna(0).astype(int)

### mapping checks

In [None]:
v = mapping_10X['CVC']
print(v)
df.loc[lambda x: x.CoName.isin(v if isinstance(v, list) else [v])].set_index('date')\
  .loc[ret.index[0]:ret.index[-1]].loc['2006'].iloc[:, :10]

Matching company names is often a time-consuming task: below we use the package `rapidfuzz` to check the candidate matches. 

In [None]:
CoName = list(df.assign(CoName = lambda x:x.CoName.str.upper())\
              .groupby(['date', 'CoName'])['FILING_DATE'].count()\
             .loc[ret.index[0]:ret.index[-1]]\
             .groupby(level=1).count().index)

from rapidfuzz import fuzz
pd.Series({c: fuzz.token_set_ratio('CABLEVISION', c) for c in CoName}).sort_values(ascending=False).head(5)

## Stock returns on filing dates

In [None]:
ret_norm = ret.pipe(lambda x: x.div(x.ewm(halflife=63, min_periods=21).std()))\
              .dropna(how='all',axis=0)

mask = pd.concat({k: df.loc[lambda x: x.CoName.isin(v if isinstance(v, list) else [v])]\
                            .set_index('date')\
                            .loc['2002-01-01':ret.index[-1]]['FORM_TYPE']
           for k, v in mapping_10X.items()}).groupby(level=[1, 0]).count()

funcs = {'ea': lambda x: x.loc[x.FORM_TYPE.notna()].drop('FORM_TYPE', axis=1), 
         'not_ea': lambda x: x.loc[x.FORM_TYPE.isna()].drop('FORM_TYPE', axis=1)}

ret_ea = pd.concat({k: ret_norm.stack().rename('ret').to_frame().join(mask).pipe(v).squeeze() 
                    for k, v in funcs.items()}, axis=1)

The histogram below shows more extreme returns on filing dates.

In [None]:
bins = np.linspace(-10, 10, 50)
plt.hist(ret_ea['not_ea'].dropna(), bins, density=True, alpha=0.5, label='not fomc')
plt.hist(ret_ea['ea'].dropna(), bins, density=True, alpha=0.5, label='ea')
plt.legend(loc='upper right')
plt.show()

Statistically, this is confirmed with a higher volatility on filing dates. On this sample, the average returns do not seem different.

In [None]:
pd.concat({'Average volatility': ret_ea.std(), 'Average mean': ret_ea.mean()}, axis=1).round(2)

In [None]:
ea_std, ea_mean = {}, {}
for i in range(-5, 5): 
    mask_ = mask.unstack().reindex(ret.index).shift(i).stack().rename('FORM_TYPE')
    ret_ea_ = pd.concat({k: ret_norm.stack().rename('ret').to_frame().join(mask_).pipe(v).squeeze() 
                        for k, v in funcs.items()}, axis=1)
    ea_std[i] = ret_ea_.std()
    ea_mean[i] = ret_ea_.mean()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
line(pd.DataFrame(ea_std).T, title='Lead-lag volatility', sort=False, ax=ax[0], bbox_to_anchor=None, loc='best')
line(pd.DataFrame(ea_mean).T, title='Lead-lag mean', sort=False, ax=ax[1], bbox_to_anchor=None, loc='best')