# HW #2 Simple Spread Trading
[FINM 33150] Regression Analysis and Quantitative Trading Strategies\
Winter 2022 | Professor Brian Boonstra

_**Due:** Thursday, January 27th, at 11:00pm\
**Name:** Ashley Tsoi (atsoi, Student ID: 12286230)_

### 1. Fetch and clean data

#### 1-1. Import packages

In [1]:
import quandl
import json
import pandas as pd
pd.set_option("display.precision", 4)
pd.set_option('display.float_format', lambda x: '%.4f' % x)
from pandas.core.common import SettingWithCopyWarning
import warnings
warnings.simplefilter(action="ignore", category=SettingWithCopyWarning)

import math
import numpy as np
import datetime as dt
import functools

# let plot display in the notebook instead of in a different window
%matplotlib inline 
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = [21, 8]

#### 1-2. Define the functions to fetch data from Quandl

**1-2-1. Get my personal keys** from ../data/APIs.json

In [2]:
f = open('../data/APIs.json')
APIs = json.load(f)
f.close()

**1-2-2. Define date-format helper function**

In [3]:
def assertCorrectDateFormat(date_text):
    try:
        dt.datetime.strptime(date_text, '%Y-%m-%d')
    except ValueError:
        raise ValueError("Incorrect date format, should be YYYY-MM-DD")

**1-2-3. Define function** to retrieve raw data from Quandl

**Documentation:**
```
https://data.nasdaq.com/databases/EOD/usage/quickstart/python
```

In [4]:
# Define function that retrieves data from Quandl
def getQuandlEODData(secs,start_date,end_date):
    # Get one security (sec)'s data fom Quandl using quandl.get_table
    # NOTE: missing data for the inputted date will NOT return a row.

    # INPUT         | DATA TYPE                 | DESCRIPTION
    # sec           | string / list of string   | security name(s)
    # start_date    | string (YYYY-MM-DD)       | start date of data
    # end_date      | string (YYYY-MM-DD)       | end date of data (same as or after start_date)
    
    print("Quandl | START | Retriving Quandl data for securities: \n",secs)
    
    # Retrieve data using quandl.get_table
    quandl.ApiConfig.api_key = APIs['Quandl']
    data = quandl.get_table('QUOTEMEDIA/PRICES',
                            ticker = secs, 
                            date = {'gte':start_date, 'lte':end_date})

    print("Quandl | DONE  | Returning {:d} dates of data for {}.\n".format(len(data),secs))
    return data
    
@functools.lru_cache(maxsize=16) # Cache the function output
def getSpreadData(secs,start_date,end_date,N_window=15):

    # Input validation
    assert len(secs)==2 # secs must be a pair to calculate spread
    assertCorrectDateFormat(start_date)
    assertCorrectDateFormat(end_date)
    assert end_date >= start_date

    # Get Quandl Data
    if isinstance(secs, tuple): secs=list((secs))
    data = pd.DataFrame()
    for sec in secs:
        quandlData = getQuandlEODData(sec,start_date,end_date)[['date','adj_close','adj_volume']].set_index('date').sort_index(ascending=True)
        data[sec+'_adj_close'] = quandlData['adj_close']
        data[sec+'_N_t'] = (quandlData['adj_close']*quandlData['adj_volume']).rolling(N_window).median()

    ff = pd.read_csv('../data/F-F_Research_Data_Factors_daily.CSV').rename(columns={'Unnamed: 0':'date'})
    ff['date'] = ff['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
    ff = ff.set_index('date')[start_date:end_date]
    
    return pd.concat([data,ff],axis=1)

#### 1-3. Fetch cleaned spread data using functions above

Last 2 unique digits of my student ID: 3, 0

**Securities:**
```
Pair #0: FCOM VOX
```

**Securities Description:**
```
FCOM - Fidelity MSCI Communication Services Index ETF
VOX - Vanguard Communication Services ETF
```

**Dates:**
```
December 2, 2019 - December 31, 2021
```

In [5]:
# # test out the function "getSpreadData"
# secs,start_date,end_date = ("FCOM","VOX"),"2019-12-02","2021-12-31"
# df = getSpreadData(secs,start_date,end_date)
# df[15:25]

In [6]:

def spreadTradeSimulation(secs,start_date,end_date,N_window=15,M=10,j=0.05,g=0.1,s=1,K=None):
    
    print("========================================")
    
    assert j<g
    assert j>=0 # both j and g have to be positive
    assert s<0
    
    spread = getSpreadData(secs,start_date,end_date,N_window).dropna()
    
    if K==None:
        # set K to the maximum of N_t over the data period, times two
        N_t_cols = spread.columns[spread.columns.str.endswith('_N_t')]
        K = int(2*max(spread[N_t_cols].max()))+1
    
    print("trade  | START | K (initial capital)={}, M={}, j={}, g={}, s={}.\n".format(K,M,j,g,s))
    tradeSim = spread[spread.columns[spread.columns.str.endswith('_adj_close')]] # initalize trade simulation table # initalize trade simulation table
    
    for sec in secs:
        tradeSim[sec+'_M_day_cumret'] = (tradeSim[sec+'_adj_close'].pct_change()+1).rolling(M).apply(np.prod,raw=True)-1

    tradeSim['cumret_spread'] = tradeSim[secs[0]+'_M_day_cumret']-tradeSim[secs[1]+'_M_day_cumret']
    print("trade  |       | Trading on M-day cumulative return spread (X-Y) for X={}, Y={}.\n".format(secs[0],secs[1]))
    
    tradeSim = tradeSim[max(N_window,M):] # remove the NaN rows
    print("trade  |       | Trade period: {:d} - {:d}.\n".format(tradeSim.index[0],tradeSim.index[-1]))

    tradeSim['quantity'] = [int(min(row[secs[0]+'_N_t'],row[secs[1]+'_N_t'])/100) for i,row in spread[max(N_window,M):].iterrows()]

    months = tradeSim.index.to_period('M')
    endOfMonths = tradeSim.index[[months[n]!=months[n+1] for n in range(len(months)-1)]+[True]] # set last period to "True" as well to close out position

    # Initiate columns
    positions, prev_position = [0] , 0
    signals = [0]
    position_quantities = [0]
    secX_position_values, secY_position_values = [0], [0]
    position_values = [0]
    K_balances = [K]
    total_values = [K]
    gross_cash = 0
    stop_loss_limits, stop_loss_triggers = [], []
    PnL_daily_list, PnL_cumulative_list = [], []
    for i,row in tradeSim.iterrows(): # current columns: adj_close for both securities, spread, quantities to buy

        # Signals and positions ()
        if i in endOfMonths:
            position = 0
        
        else:
            currRetSpread = row['cumret_spread']
            dir = 1 if (currRetSpread>=0) else -1
            size = dir*currRetSpread # the return spread in absolute terms

            if size>g:
                position = -1*dir
            elif size<j: # around 0
                position = 0
            else: # in between j and g
                position = prev_position if ((prev_position!=dir) and (prev_position!=0)) else 0 # follow the previous position if the side is the same
          
        signal = int(position-prev_position) # buy/sell signal = change of position

        # Calculate present position value (but don't append to list yet since it may hit stop-loss limit)
        if signal: # if there's a new signal
            position_quantity = position*row['quantity'] # quantity = new position's quantity

        else: # if no signal, position quantity = previous position quantity (unless stop-loss limit is triggered later)
            position_quantity = position_quantities[-1]

        # Security buy amounts (but don't append to list yet since it may hit stop-loss limit)
        secX_position_value = position_quantity*row[secs[0]+'_adj_close']
        secY_position_value = -1*position_quantity*row[secs[1]+'_adj_close']
        position_value = secX_position_value + secY_position_value
        
        # Cash balances and total values (if doesn't hit stop-loss limit)
        K_balance = K_balances[-1] - position_value
        total_value = position_value + K_balance
        total_value_delta = total_value-total_values[-1]

        # Stop loss
        if signal: # update gross_cash if entering a new position 
            if position: 
                gross_cash = abs(secX_position_value) + abs(secY_position_value)
            else:
                gross_cash = 0 # reset gross_cash back to 0 if a position is exited/covered

        stop_loss_limit = s*gross_cash if gross_cash!=0 else math.nan
        stop_loss_trigger = total_value_delta < stop_loss_limit
        
        if stop_loss_trigger: 
            # "undo" the transaction
            position = 0 # change position to 0
            signal = int(0-positions[-1])
            
            position_quantity, position_value = 0, 0
            secX_position_value, secY_position_value = 0, 0

            K_balance = K_balances[-1]
            total_value = K_balance

        # PnLs (daily and cumulative)
        PnL_daily = total_value_delta/total_values[-1]
        PnL_cumulative = total_value/K - 1

        # update prev_position for next calculation
        prev_position = position
        
        # Append new variables into lists
        positions.append(position)
        signals.append(signal)
        position_quantities.append(position_quantity)
        secX_position_values.append(secX_position_value)
        secY_position_values.append(secY_position_value)
        position_values.append(position_value)
        K_balances.append(K_balance)
        total_values.append(total_value)
        stop_loss_limits.append(stop_loss_limit)
        stop_loss_triggers.append(stop_loss_trigger)
        PnL_daily_list.append(PnL_daily)
        PnL_cumulative_list.append(PnL_cumulative)

    # Save the data in tradeSim table
    tradeSim['signal'] = signals[1:]
    tradeSim['position'] = positions[1:]
    tradeSim['position_quantity'] = position_quantities[1:]
    tradeSim[secs[0]+'_position_value'] = secX_position_values[1:]
    tradeSim[secs[1]+'_position_value'] = secY_position_values[1:]
    tradeSim['position_value'] = position_values[1:]
    tradeSim['K='+str(K)] = K_balances[1:]
    tradeSim['total_value'] = total_values[1:]
    tradeSim['stop_loss_limit'] = stop_loss_limits
    tradeSim['stop_loss_trigger'] = stop_loss_triggers
    tradeSim['PnL_daily'] = PnL_daily_list
    tradeSim['PnL_cumulative'] = PnL_cumulative_list
    
    print("trade  | DONE  | \n")

    print("plot   |       | ")

    tp = plt

    tp.title('Cumulative Return Spread and Trades')

    tp.plot(tradeSim['cumret_spread'], label='cumulative return spread')
    
    tp.plot(tradeSim['cumret_spread'][tradeSim['signal']>1], color='green', marker='o', markersize=6, linestyle='none')
    tp.plot(tradeSim['cumret_spread'][tradeSim['signal']==1], color='green', marker='o', markersize=4, linestyle='none', label='buy signal')
    tp.plot(tradeSim['cumret_spread'][tradeSim['signal']==-1], color='red', marker='o', markersize=4, linestyle='none', label='sell signal')
    tp.plot(tradeSim['cumret_spread'][tradeSim['signal']<-1], color='red', marker='o', markersize=6, linestyle='none')
    
    tp.plot(tradeSim['cumret_spread'][tradeSim['stop_loss_trigger']], color='blue', marker='o', markersize=6, linestyle='none', label='stop-loss signal')

    tp.fill_between(tradeSim.index, j, g, color='grey', alpha=.3, label='j-g band')
    tp.fill_between(tradeSim.index, -j, -g, color='grey', alpha=.3)

    tp.legend()
        
    return (tradeSim,tp)

Try it with some parameters:

In [None]:
secs,start_date,end_date = ("FCOM","VOX"),"2019-12-02","2021-12-31"
N_window=15
M=10
j=0.00005
g=0.001
s=-0.6

t,p = spreadTradeSimulation(secs,start_date,end_date,M=M,j=j,g=g,s=s)
t

### 2. Analysis
#### 2-1. Study the performance of your strategy as you vary $j$, $g$, $s$, and $M$. Include plots.

You need not run a fancy nonlinear optimizer, but try to find which parameters work well, and explain how you did it. For one or more of the better settings you find, look into correlations to Fama French factor returns.

**2-1-1. Performance of various combinations of parameters**

The metric for performance would be **cumulative rate of return** over the period.

1. Use data to find good ranges for the parameters to try. (Without worrying about look-ahead bias.)
   1. `j` & `g` -- use "cumulative return spread" (`cumret_spread`) *NOTE: cumulative return may decrease as M increases.
   2. `s` -- find joint distribution of `a` and `gross_cash`
   3. `M` -- 
2. Create the combinations of parameters in the range as found above. (Constraint: `0=< j < g <= 1`)
3. Run the function `spreadTradeSimulation` on the combinations of parameters.

**2-1-2. Define analysis functions**

In [None]:
def getStats(data,annual_factor=1):

    data_desc = data.describe().loc[['count','mean', 'max', 'min', 'std']]

    data_desc.loc['mean'] = data_desc.loc['mean']*annual_factor
    data_desc.loc['vol'] = data_desc.loc['std']*np.sqrt(annual_factor)
    data_desc.loc['sharpe_ratio'] = data_desc.loc['mean']/data_desc.loc['vol']

    data_desc.loc['skewness'] = data.skew()
    data_desc.loc['excess_kurtosis'] = data.kurt()-3
    data_desc.loc['VaR05'] = np.quantile(data, .05, axis=0)
    data_desc.loc['CVaR05'] = np.mean(data <= data_desc.loc['VaR05'])

    return data_desc
    