In this Jupyter notebook we created an algorithm that takes a set of public securities of our choosing and finds correlations or cointegrations between them. Using this, we then mark buy or sell signals accordingly allowing us to profit using pairwise trading.  

In [18]:
#imports
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
import yfinance as yf
from statsmodels.tsa.stattools import coint, adfuller
from pandas_datareader import data as pdr
pd.core.common.is_list_like = pd.api.types.is_list_like
import datetime

import matplotlib.pyplot as plt
import seaborn as sns; sns.set(style="whitegrid")

Area to choose what Stocks to track:

In [19]:
yf.pdr_override()
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime.now()
tickers = ['AAPL', 'ADBE', 'ORCL', 'EBAY', 'MSFT', 'QCOM', 'HPQ', 'JNPR', 'AMD', 'IBM', 'VOO']


df = pdr.get_data_yahoo(tickers, start, end)['Close']
df.tail()

[*********************100%***********************]  11 of 11 completed


Unnamed: 0_level_0,AAPL,ADBE,AMD,EBAY,HPQ,IBM,JNPR,MSFT,ORCL,QCOM,VOO
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2023-02-01,145.429993,383.920013,84.639999,50.400002,29.870001,135.089996,30.99,252.75,90.050003,138.460007,377.529999
2023-02-02,150.820007,392.230011,88.309998,51.66,30.790001,136.389999,31.450001,264.600006,89.379997,135.850006,382.940002
2023-02-03,154.5,379.329987,86.089996,50.66,30.51,136.940002,30.73,258.350006,89.620003,135.020004,378.850006
2023-02-06,151.729996,375.230011,83.68,49.98,29.77,136.179993,30.799999,256.769989,88.529999,132.929993,376.660004
2023-02-07,154.649994,383.820007,85.910004,50.169998,30.0,135.839996,31.219999,267.559998,87.739998,136.630005,381.519989


First we need to test for stationarity using the Dickey-Fuller Test; we obviously only want to look at stocks that are not following stationarity.

In [20]:
def stationarity_test(X, cutoff=0.01):
    pvalue = adfuller(X)[1]
    if pvalue < cutoff:
        print('p-value = ' + str(pvalue) + ' The series ' + X.name +' is likely stationary.')
    else:
        print('p-value = ' + str(pvalue) + ' The series ' + X.name +' is likely non-stationary.')

This is to test if the Stationarity Test actually works:

In [21]:
ticker = yf.Ticker('AAPL')
Stock_history = ticker.history(period = "60d")
Stock_close = Stock_history["Close"]
stationarity_test(Stock_close)

p-value = 0.2526287827678902 The series Close is likely non-stationary.


Test for Cointegration/Correlated Pairs:

In [22]:
def find_cointegrated_pairs(data):
    n = data.shape[1]
    score_matrix = np.zeros((n, n))
    pvalue_matrix = np.ones((n, n))
    keys = data.keys()
    pairs = []
    for i in range(n):
        for j in range(i+1, n):
            S1 = data[keys[i]]
            S2 = data[keys[j]]
            result = coint(S1, S2)
            score = result[0]
            pvalue = result[1]
            score_matrix[i, j] = score
            pvalue_matrix[i, j] = pvalue
            if pvalue < 0.05:
                pairs.append((keys[i], keys[j]))
    return score_matrix, pvalue_matrix, pairs

Now we will test the Cointegrated Pairs Function:

In [23]:
find_cointegrated_pairs(df)

(array([[ 0.        , -0.98623748, -2.45807469, -1.56105958, -2.30826372,
         -1.62809721, -1.14386441, -2.37864993, -2.78488955, -3.2352723 ,
         -2.46111956],
        [ 0.        ,  0.        , -1.8167117 , -2.6900279 , -1.97838715,
         -2.74750113, -1.16837497, -1.2134532 , -1.43941683, -2.30006887,
         -1.76646019],
        [ 0.        ,  0.        ,  0.        , -2.88732371, -2.34027444,
         -1.69288637, -1.26133615, -4.37405509, -2.66117514, -3.8565447 ,
         -3.17221162],
        [ 0.        ,  0.        ,  0.        ,  0.        , -2.65556434,
         -2.32830266, -1.57513574, -2.57886047, -1.89949683, -3.04109481,
         -2.53798769],
        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         -1.6517087 , -2.13354934, -2.4048132 , -2.55596398, -2.55865653,
         -2.71623031],
        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        , -3.31514299, -4.00341798, -3.75443526, -3.8132149

Now we will create a Pandas Dataframe to store all of the Cointegration values and the Correlations values. We have the option to store each pair twice or once in order to create iteration through the information easier for post-processing. 