## PEAD Strategy

Basically this model will be based on the following rationale: __we will enter in stocks of companies that have announced their earnings after market close__ . We will be basing our position in the movement from $Close_{t-1}$ to $Open_{t}$. A very positive move will be seen as a positive surprise due to earnings, we will get long.

We will do it in the same fashion as in the book (from 2011 to 2012) because currently we don't have a powerful way to retrieve earnings calendar. For the moment we get some non well known stocks from earningswhisper API (look at utils.py). What we can do is look at this strategy for the stocks earningwhisper is able to retrieve data. 

## Step 1: Imports

We will import all needed libraries and also open prices, close prices, SPX components and earnings calendar. 

In [2]:
import numpy as np
import pandas as pd
from datetime import datetime
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller

import yfinance as yf

import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import iplot
import cufflinks as cf
cf.go_offline()

In [3]:
# Importing csvs
open = pd.read_csv('open.csv')

close = pd.read_csv('close.csv')
stocks = pd.read_csv('components.csv')

#Cleaning data and treating date column. Putting it to index
open['Var1']=pd.to_datetime(open['Var1'],  format='%Y%m%d').dt.date # remove HH:MM:SS
open.columns=np.insert(stocks.values, 0, 'Date')
open.set_index('Date', inplace=True)

close['Var1']=pd.to_datetime(close['Var1'],  format='%Y%m%d').dt.date # remove HH:MM:SS
close.columns=np.insert(stocks.values, 0, 'Date')
close.set_index('Date', inplace=True)


# Now we read the earnings calendar of SPX contituents.
earnann=pd.read_csv('earnings_calendar.csv')
earnann['Date']=pd.to_datetime(earnann['Date'],  format='%Y%m%d').dt.date # remove HH:MM:SS
earnann.set_index('Date', inplace=True)


# Calendar earnings constituents
np.testing.assert_array_equal(stocks.iloc[0,:], earnann.columns)


#df will contain the open and close prices + earnings for each SPX constituents
df=pd.merge(open, close, how='inner', left_index=True, right_index=True, suffixes=('_op', '_cl'))
df=pd.merge(earnann, df, how='inner', left_index=True, right_index=True)

In [4]:
df

Unnamed: 0_level_0,A,AA,AAPL,ABC,ABT,ACE,ACN,ADBE,ADI,ADM,...,XL_cl,XLNX_cl,XOM_cl,XRAY_cl,XRX_cl,XYL_cl,YHOO_cl,YUM_cl,ZION_cl,ZMH_cl
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-01-03,0,0,0,0,0,0,0,0,0,0,...,21.58,28.62,72.43,34.85,11.43,,16.75,47.68,25.10,54.82
2011-01-04,0,0,0,0,0,0,0,0,0,0,...,21.47,28.74,72.77,34.46,11.12,,16.59,46.95,24.70,54.09
2011-01-05,0,0,0,0,0,0,0,0,0,0,...,21.59,28.84,72.57,34.79,11.18,,16.91,47.18,24.84,54.13
2011-01-06,0,0,0,0,0,0,0,0,0,0,...,21.58,29.33,73.04,34.59,11.18,,17.06,47.51,24.77,52.45
2011-01-07,0,0,0,0,0,0,0,0,0,0,...,21.54,29.16,73.44,34.45,11.03,,16.90,48.10,24.51,52.44
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2012-04-18,0,0,0,0,1,0,0,0,0,0,...,20.89,35.37,85.75,40.31,7.96,27.65,15.49,72.94,21.17,64.29
2012-04-19,0,0,0,0,0,0,0,0,0,0,...,21.37,35.04,85.28,39.73,7.91,27.10,15.40,71.41,20.93,63.06
2012-04-20,0,0,0,0,0,0,0,0,0,0,...,21.38,34.40,85.30,39.73,7.87,27.36,15.60,73.93,20.54,63.08
2012-04-23,0,0,0,0,0,0,0,0,0,0,...,21.03,33.92,85.69,39.27,7.88,26.89,15.33,73.78,20.81,62.13


The result is a dataframe df that will contain: 
- boolean columns corresponding to if there was an earnings call/announcement post-close for this stock
- Open Price (suffix: _op)
- Close Price (suffix: _cl)

## Step 2: Compute the strategy

First we will separate the calendar component, the open component and the close component from df. 

Then we will compute the __90D standard deviation of overnight returns for every individual stock__. 

We will have the following positions:

- Long if more than 0.5 movement overnight
- Short if more than -0.5 movement overnight

In [25]:

#We separate the open and close and earnings components from general df dataframe

#Creo que esta parte de juntar los tres dataframes y volverlos a separar se hace basicamente para que tengan la misma forma y se pueda comodamente relizar verificaciones booleanas de condiciones

# Calendar to boolean type
earnann=df.iloc[:, 0:(earnann.shape[1])].astype(bool)

#open and close components from general dataframe
open=df.iloc[:, (earnann.shape[1]):((earnann.shape[1])+open.shape[1])]
close=df.iloc[:, ((earnann.shape[1])+open.shape[1]):]

open.columns=stocks.iloc[0,:]
close.columns=stocks.iloc[0,:]

lookback=90

#Compute overnight returns
retC2O=(open-close.shift())/close.shift()

#Standard deviation of the overnight returns
stdC2O=retC2O.rolling(lookback).std()


#Initialize the positions array
positions=np.zeros(close.shape) 

#Getting long the ones that had a 0.5 standard deviations move and presented results
longs=  (retC2O >=  0.5*stdC2O) & earnann

#Conversely for shorts
shorts= (retC2O <= -0.5*stdC2O) & earnann

positions[longs]=1
positions[shorts]=-1


#Returns: We enter at open, we exit at close. Bad returns computing. Basically the weights or the number of stocks we have in our portfolio is not dynamically computed. The number 30 is the max number of stocks in our portfolio at any given point in time.
ret=np.sum(positions*(close-open)/open, axis=1)/30

cumret=(np.cumprod(1+ret)-1)
cumret.iplot()

print('APR=%f Sharpe=%f' % (np.prod(1+ret)**(252/len(ret))-1, np.sqrt(252)*np.mean(ret)/np.std(ret)))


APR=0.068126 Sharpe=1.494743


## Things to improve

### Backtest time

Basically the test is done for less than a year. 

### Entry and Exit point

Not realistic, we enter at the same point in time we receive the signal, for example if we see that NVDA presented results today after market close we will enter at a position tomorrow if the overnight movement is more than 0.5 90D std. We would have to enter one hour after open for example.

It seems quite proved that holding a position for more than a day is not profitable. Maybe we can test if exiting one hour before close or even before gives us a best PnL profile for our strategy.