# Basic Idea

<b>We will create a semi-automatic algorithm that will alert to buy or sell in real-time.</br>
This will be done in 4 steps: </b></br>
1) Find 2 assets that move similarly with eachother over the past X periods of time. </br>
2) Calculate the ratio between them.</br>
3) Find the correct signal to buy or sell the assets, by their standard deviation from the mean.</br>
4) Alert buy / sell.

# Basic Concepts
<b>Let's go over some of the concpets we'll use in this project:</b></br>
1) <u>Cointegration:</u> Similar to correlation. Means that the ratio between two series will vary around a mean. </br>  The two series, Y and X follow the follwing: Y = ⍺ X + e where ⍺ is the constant ratio and e is white noise</br>
In plain terms, it means that the ratio between the two financial time series will vary around a constant mean </br>
</br>
2) <u>Stationarity:</u> A stochastic process whose unconditional joint probability distribution does not change when shifted in time. (basically - not time dependant). </br>
3) <u>P-value</u>: The probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct. </br> We will use it to test for conitegration. 

# Project requirements

In [6]:
from asyncio import threads
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint, adfuller
import yfinance as yf
from yahoo_fin.stock_info import get_data
from datetime import datetime
import matplotlib.pyplot as plt
import plotly.graph_objs as go

# Data that will be used
We will examine ETFs of tech companies.</br>
Our assumption is that each of them is stationary, and that they will probably be cointegrated, or at least correlated. </br>
We will be looking at the following ETFs: </br>
* VGT
* XLK
* SMH
* SOXX
* IYW 

Which are the Top 5 ETFs considering total assets and 5 years look back window profits. ([etfdb](https://etfdb.com/etfdb-category/technology-equities/))

# Loading data

In [34]:
vgt = yf.Ticker('VGT').history(period='max')
xlk = yf.Ticker('XLK').history(period='max')
smh = yf.Ticker('SMH').history(period='max')
soxx = yf.Ticker('SOXX').history(period='max')
iyw = yf.Ticker('IYW').history(period='max')

# merge all ETFs data frames to one, by Close tag
vgt_xlk = pd.merge(left = vgt[['Close']], right = xlk[['Close']], left_index = True, right_index = True, 
              suffixes = ('_vgt', '_xlk'))
smh_soxx = pd.merge(left = smh[['Close']], right = soxx[['Close']], left_index = True, right_index = True, 
              suffixes = ('_smh', '_soxx'))
combined = pd.merge(left = vgt_xlk, right = smh_soxx, left_index = True, right_index = True)
all = pd.merge(left = iyw[['Close']], right = combined, left_index = True, right_index = True)
all.rename(columns={'Close': 'Close_iyw'}, inplace=True)

all

Unnamed: 0_level_0,Close_iyw,Close_vgt,Close_xlk,Close_smh,Close_soxx
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2004-01-30,11.002267,42.118580,16.467777,36.527378,55.197590
2004-02-02,10.987015,42.152905,16.467777,36.048595,54.348396
2004-02-03,11.030588,41.895462,16.491020,36.361977,54.569202
2004-02-04,10.677643,40.633953,16.088051,35.326054,52.921753
2004-02-05,10.712504,40.839943,16.111292,35.430519,53.626595
...,...,...,...,...,...
2022-05-02,91.500000,373.269989,143.570007,236.539993,414.160004
2022-05-03,91.620003,373.510010,143.820007,238.479996,417.589996
2022-05-04,94.970001,386.459991,148.869995,246.660004,433.790009
2022-05-05,90.150002,366.730011,141.710007,235.080002,412.779999


# Check for Stationarity

In [8]:
def stationarity(a, cutoff = 0.05):
    a = np.ravel(a)  # flatten the list
    result = adfuller(a)
    if result[1] < cutoff:
        print('The series is stationary')
        print('p-value = ', result[1])
        return True
    print('The series is NOT stationary')
    print('p-value = ', result[1])
    return False
# stationarity(vgt.history(period='max'))
# stationarity(Asset_2)

# Find cointegration
We will use the augmented Engle-Granger two-step cointegration test.</br>
The null hypothesis is that there is no cointegration between the pairs. </br>
Thus, we will be looking for low pvalue. </br>
The threshold we are using is 0.05, which was randomly chosen.

In [9]:
# auxiliary function to find all pairs given a list (nC2)
def n_choose_2(lst):
    pairs_list = []
    for i in range(len(lst) - 1):
        rest = lst[i+1:]
        for j in range(len(rest)):
            pairs_list.append([lst[i], rest[j]])
    return pairs_list
#n_choose_2([1, 2, 3, 4, 5])

In [23]:
def find_cointegrated_pairs(etf_price_list):
    cointegrated_pairs = []
    # filter the ETFs that aren't stationarity
    # etf_price_list = list(filter(lambda a: stationarity(a), etf_price_list))
    # create pairs
    etf_pairs = n_choose_2(etf_price_list)
    # for each pair check for cointegration
    for pair in etf_pairs:
        threshold = 0.05  
        coin_result = coint(pair[0], pair[1])
        print(result)
        if result[1] <= threshold:
            cointegrated_pairs.append(pair)
    return cointegrated_pairs
        
etf_price_list = [data_hist[['Close']] for data_hist in data_hist_list] 
etf_price_list = [data_hist[['Date']]= for data_hist in data_hist_list] 

etf_price_list = [data_hist.loc[:,'Close'] for data_hist in etf_price_list] 
etf_price_list
# find_cointegrated_pairs(data_hist_list)
    

[Date
 2004-01-30     42.118587
 2004-02-02     42.152912
 2004-02-03     41.895454
 2004-02-04     40.633968
 2004-02-05     40.839931
                  ...    
 2022-05-02    373.269989
 2022-05-03    373.510010
 2022-05-04    386.459991
 2022-05-05    366.730011
 2022-05-06    362.750000
 Name: Close, Length: 4600, dtype: float64,
 Date
 1998-12-22     24.593773
 1998-12-23     25.181334
 1998-12-24     25.085398
 1998-12-28     25.157351
 1998-12-29     25.229296
                  ...    
 2022-05-02    143.570007
 2022-05-03    143.820007
 2022-05-04    148.869995
 2022-05-05    141.710007
 2022-05-06    140.570007
 Name: Close, Length: 5882, dtype: float64,
 Date
 2000-06-05     85.910301
 2000-06-06     82.373787
 2000-06-07     83.733986
 2000-06-08     84.114822
 2000-06-09     85.855896
                  ...    
 2022-05-02    236.539993
 2022-05-03    238.479996
 2022-05-04    246.660004
 2022-05-05    235.080002
 2022-05-06    232.669998
 Name: Close, Length: 5517, dtype: f

# To add, backtesting and correlaiton and more
https://medium.datadriveninvestor.com/creating-and-implementing-a-pairs-trading-strategy-from-scratch-658267bab249