# Pair Trading with Cointegration

## Background Information

### What is Pair Trading?

Pair trading is a two-stock alpha model that expects the chosen equities to trade similarly. The idea is that when the two stocks converge to an unusual ratio, they should converge at some point in the future. The idea is to buy the cheaper stock and short the more expensive stock to make a profit on the reversion.

### What is Cointegration?

Cointegration is a statistical measurement that analyses how closely two time-series datasets interact. We will be using this to evaluate which stocks to trade.

### Grabbing Data

In [2]:
### 
# Necessary Imports
###
import numpy as np
import pandas as pd
import statsmodels.tsa.stattools as sm
import requests
import bs4 as bs

In [3]:
####
# The code below is a webscraper to grab stock tickers from Wikipedia that are in the Technology sector
####

#Obtain list of S&P500 companies from wikipedia
resp = requests.get("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")
convert_soup = bs.BeautifulSoup(resp.text, 'lxml')
table = convert_soup.find('table',{'class':'wikitable sortable'})

# Stock_and_sector has information on the stock ticker and GICS sector; tickers holds only Information Technology tickers
stock_and_sector = []
tickers = []

# Grab all stock tickers and their associated GICS classification
for rows in table.findAll('tr')[1:]:
    stock_and_sector.append([rows.findAll('td')[0].text.strip(), rows.findAll('td')[2].text.strip()])

# Select only the Information Technology stocks and add their tickers to "tickers"
for stocks in stock_and_sector:
    if stocks[1] == "Information Technology":
        tickers.append(stocks[0])

In [4]:
####
# Grab historical stock data
####

import yfinance as yf

# Get stock data over the last year; Use the Adj Close to account for stock splits and dividends
data = yf.download(tickers, period="1y")["Adj Close"].dropna()

[*********************100%%**********************]  64 of 64 completed


### Identifying Pairs

In [5]:
####
# Get a list of all pairs of our technology stocks
####

pairs = []

# Brute force all possible pairs
for i in range(len(tickers)):
    for j in range(i+1, len(tickers)):
        pairs.append(tickers[i] + " | " + tickers[j])


In [6]:
####
# Identify the cointegration of each pair?
# Compare Adj Close and np.log(Adj Close)
####

cointegrated = []

for i in pairs:
    curr1, curr2 = i.split(" | ")
    
    # perform comparison between each stock to grab data
    t_stat, p_val = np.abs(sm.coint(data[curr1], data[curr2])[:2])
    
    # If both the t stat and p value are significant, there is a high chance the pair is cointegrated, thus tradeable
    if t_stat < 0.05 and p_val < 0.05:
        cointegrated.append(i)

### When do we trade?

Now that we have used cointegration, we know the most correlated stock pairs. Now, how do we identify when we actually enter a long-short trade of this pair? In other words, at what point is the stock most likely to converge?

Consider a single data point that directly compares the price of one stock to another.

What statistical value tells you how far a point is from average relative to the known data? (i.e. how many standard deviations away?)

In [7]:
####
# Identify a comparison between the price of your pairs and create a new data frame containing the information
####

pair_data = pd.DataFrame()

for i in cointegrated:
    curr1, curr2 = i.split(" | ")
    
    # Take the ratio of the pair to identify when the stock is overpriced / underpriced relative to average
    pair_data[i] = data[curr1]/data[curr2]
    
pair_data

In [8]:
####
# Identify the points in time where you should trade that individual pair
####

from scipy.stats import zscore

# Get the z-score values of ratios on each day to understand when a stock is over or under valued
signals = pair_data.apply(zscore)

# Create a function to identify when the z score is abnormal (beyond 1.5 standard deviations) and create a signal to 
# buy or sell the pair, where buying a pair means buying stock 1 and shorting stock 2 (ratio = stock 1 / stock 2)
def signal(val):
    if val > 1.5:
        return 1
    elif val < -1.5:
        return -1
    else:
        return 0

# Use the signal function for each column to get buy and sell signals
for i in cointegrated:
    signals[i] = signals[i].apply(signal)

signals

Series([], dtype: float64)