# Intraday Mean-Reverting Pairs Trading Strategy

### In this strategy, we will focus on a pairs trading strategy between two ETFs, SPY, and IWM, which represent the S&P 500 and Russell 2000 indices.

We will calculate a hedge ratio (long #: short #) between the two ETFs in order to generate a spread that will (hopefully) revert back to the mean. We will be utilizing a z-score in order to provide a benchmark to compare the performance of the ETFs on, with long signals being generated when the z-score drops between a determined threshold, and short signals generated when z-score raises above a determined threshold. 

### Claim: Since the SPY and IWM are based on similar indices that take into account large-cap and small-cap domestic stocks, we believe that the long-term patterns of both ETFs are cointegrated. As a result, over a long period of time, both should revert to the mean if any changes occur. 

We will test the claim of them being cointegrated, as well as the viability of the strategy, including the strategy spread and cointegration factor. We will not be taking into account any transaction costs or biases in this initial stage, since we want to simplify the calculations and test if this strategy is even remotely viable.

Data: 1-minute bars of SPY and IWM, from August 2010 to August 2017.
Spread: We calculate via a rolling linear regression, which a lookback window of k bars.
Z-Score: Calculated normally, subtracting by the sample mean of the spread and dividing by the standard deviation (not taking into account lookahead bias).
Trades: Signals will be generated with an entry threshold z-score = 2 | -2, and exit threshold z = 1 | -1

source: https://www.quantstart.com/articles/Backtesting-An-Intraday-Mean-Reversion-Pairs-Strategy-Between-SPY-And-IWM

In [35]:
import numpy as np
import pandas as pd
import os, os.path
import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas_datareader as pdr
import fix_yahoo_finance as yf
import datetime

yf.pdr_override()
%matplotlib inline

First, we are gonna start off with create key abstract classes that we will utilize as a template for backtesting our strategy:

In [26]:
from abc import ABCMeta, abstractmethod

#We declare abstract classes as a template for strategies, inheriting from an Object superclass.

class DataHandler(object):
    '''Create our data handler for the upcoming strategy. The main goal of this handler is to extract data
    from Yahoo Finance, parse it for data that is relevant to the strategy, and store it in a Pandas
    DataFrame.'''
    
    __metaclass__ = ABCMeta
    
    @abstractmethod
    def create_data_handler(self):
        #Implementation requires to return DataFrame containing data relevant to the strategy.
        raise NotImplementedError("Implement create_data_handler()")


class Strategy(object):
    """Strategy will be utilized as a template for all future inherited trading strategies.
    
    The goal of the Strategy is to output a list of signals, via a Pandas DataFrame, which is then sent to the portfolio."""
    
    __metaclass__ = ABCMeta
    
    @abstractmethod
    def generate_signals(self):
        #Implementation requires to return DataFrame of symbols containing a set of signals, with tertiary values.
        raise NotImplementedError("Implement generate_signals()")

        
class Portfolio(object):
    '''Portfolio is utilized to generate buy/hold/sell positions based on the signals generated from
    our Strategy. Then, we will utilize the portfolio to generate an equity curve, complete with total value
    over the duration of the backtesting period, returns, and the cash/equity holdings at any particular time.'''
    
    __metaclass__ = ABCMeta
    
    @abstractmethod
    def generate_positions(self):
        '''Should be used to determine how portfolio positions are allocated, based on signals and current cash.'''
        raise NotImplementedError("Implement generate_positions()")
    
    @abstractmethod
    def backtest_portfolio(self):
        '''Gives logic to generate trading orders and portfolio equity curve (growth of equity over time)
        and returns.'''
        raise NotImplementedError("Implement backtest_portfolfio()")

In [27]:
class PairsDataHandler(DataHandler):
    #Self is a vector that contains all the symbols we need to run pair-strategy.
    def __init__(self, symbols):    
        self.symbols = symbols
    
    def create_data_handler(self):
        '''Create pandas DF with Adjusted Closing prices of a pair of symbols, for us to utilize.'''

        #Pull data remotely from Yahoo Finance to parse.
        first_df = pdr.get_data_yahoo(self.symbols[0], start=datetime.datetime(2010, 8, 1), end=datetime.datetime(2017,8,1))
        second_df = pdr.get_data_yahoo(self.symbols[1], start=datetime.datetime(2010,8,1), end=datetime.datetime(2017,8,1))

        #Create new DF with only Adj Close values from data.
        pairs = pd.DataFrame(index=first_df.index)
        pairs['%s_adj_close' % self.symbols[0].lower()] = first_df['Adj Close']
        pairs['%s_adj_close' % self.symbols[1].lower()] = second_df['Adj Close']
        pairs = pairs.dropna()

        return pairs

In [77]:
class PairsTradingStrategy(Strategy):
    
    def __init__(self, pairs, symbols):
        self.pairs = pairs
        self.symbols = symbols
    
    def calculate_spread_zscore(self, symbols, pairs, lookback=100):
        """Creates a hedge ratio between the two symbols by calculating
    a rolling linear regression with a defined lookback period. This
    is then used to create a z-score of the 'spread' between the two
    symbols based on a linear combination of the two."""
        X = sm.add_constant(pairs.loc[:, '%s_adj_close' % symbols[1].lower()])
        y = pairs.loc[:, '%s_adj_close' % symbols[0].lower()]
        model = sm.OLS(y, X, window=100).fit()

        #Get the beta, or hedging ratio, of our particular security
        model.summary()
        beta = model.params['iwm_adj_close']
        beta
        
        #Create the spread, and then a z-score of the spread.
        #We are utilizing a Gaussian Distribution for this, with
        #a population mean/std, which simplifies strategy but introduces lookahead bias
        pairs['spread'] = pairs['%s_adj_close' % self.symbols[1].lower()] - pairs['hedge_ratio']*pairs['%s_adj_close' % self.symbols[0].lower()]
        pairs['z_scores'] = (pairs['spread'] - np.mean(pairs['spread']))/np.std(pairs['spread'])
        
        return pairs
    
    def create_long_short_market_signals(self, symbols, pairs,
                                        z_entry_threshold=2.0,
                                        z_exit_threshold=1.0):
        '''We are going to generate signals based on z-score thresholds. 
        If the z-score of the pairs DF drops below the z-score equivalent
        to -2 stds, then we go long. When it raises to -1 std, we exit.
        Similarly for shorting, if the z-score goes above z-score equivalent
        of 2 stds, we short. When it drops to +1 std, we exit.'''
        
        pairs['longs'] = (pairs['z_scores'] <= -z_entry_threshold)*1.0
        pairs['shorts'] = (pairs['z_scores'] >= z_entry_threshold)*1.0
        pairs['exits'] = (np.abs(pairs['z_scores']) <= z_exit_threshold)*1.0                     
        
        #Long/short signals
        pairs['long_market'] = 0.0
        pairs['short_market'] = 0.0
                             
        #Variables to track whether to long or short
        long_market = 0
        short_market = 0
        
        #Calculate when to be in the market or not, i.e. to have
        #a long or short position, and also when not to.
        #since we are iterating over pandas DF to keep track of 
        #variables, it will be slower than vectorized.
        #We use enumerate function to get a row count, for iteration
        for i, rows in enumerate(pairs.iterrows()):
            #calculate longs
            if rows[i]['longs'] == 1.0:
                long_market = 1
            #calculate shorts
            if rows[i]['shorts'] == 1.0:
                short_market = 0
            #exits
            if rows[i]['exits'] == 1.0:
                long_market = 0
                short_market = 0
            #directly assign the signals to long market/short market
            #columns, so strategy calculates best times to stay in.
            pairs.ix[i]['long_market'] = long_market
            pairs.ix[i]['short_market'] = short_market
        return pairs

In [29]:
class PairsTradingPortfolio(Portfolio):
    '''Create a portfolio in order to implement the signals generated
    by the strategy, to keep track of account equity/holdings, and 
    generate an equity curve that we can use to determine drawdowns
    and Sharpe ratios'''
    
    def __init__(self, pairs, symbols):
        self.pairs = pairs
        self.symbols = symbols
        
    def backtest_portfolio(self, pairs, symbols):    
        sym1 = self.symbols[0].lower()
        sym2 = self.symbols[1].lower()
        
        #Construct portfolio with positions info
        portfolio = pd.DataFrame(index=pairs.index)
        #Subtract so we know if we're short or long. long == 1.0, short == -1.0
        portfolio['positions'] = pairs['long_market'] - pairs['short_market']
        portfolio[sym1] = -1.0 * pairs['%s_adj_close' % sym1] * portfolio['positions']
        portfolio[sym2] = pairs['$s_adj_close' % sym2] * portfolio['positions']
        portfolio['total'] = portfolio[sym1] + portfolio[sym2]
        
        #Percentage returns, drop NaN's
        portfolio['returns'] = portfolio['total'].pct_change()
        portfolio['returns'].fillna(0.0, inplace=True)
        portfolio['returns'].replace([np.inf,-np.inf], 0.0, inplace=True)
        portfolio['returns'].replace(-1.0,0.0,inplace=True)
        
        #Calculate full equity curve
        portfolio['returns'] = (portfolio['returns'] + 1.0).cumprod()
        return portfolio

In [30]:
symbols = ['SPY','IWM']
data_handler = PairsDataHandler(symbols)
pairs = data.create_data_handler()

In [31]:
pairs.head()

Unnamed: 0_level_0,spy_adj_close,iwm_adj_close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2010-08-02,97.686226,59.55957
2010-08-03,97.218399,59.082371
2010-08-04,97.868126,59.703629
2010-08-05,97.764183,59.001343
2010-08-06,97.365662,58.650181


In [78]:
strategy = PairsTradingStrategy(pairs, symbols)
pairs = strategy.calculate_spread_zscore(pairs, symbols)
pairs

AttributeError: 'list' object has no attribute 'loc'