# Basic Idea

<b>We will create a semi-automatic algorithm that will alert to buy or sell in real-time.</br>
This will be done in 4 steps: </b></br>
1) Find 2 assets that move similarly with eachother over the past X periods of time. </br>
2) Calculate the ratio between them.</br>
3) Find the correct signal to buy or sell the assets, by their standard deviation from the mean.</br>
4) Alert buy / sell.

# Basic Concepts
<b>Let's go over some of the concpets we'll use in this project:</b></br>
1) <u>Cointegration:</u> Similar to correlation. Means that the ratio between two series will vary around a mean. </br>  The two series, Y and X follow the follwing: Y = ⍺ X + e where ⍺ is the constant ratio and e is white noise</br>
In plain terms, it means that the ratio between the two financial time series will vary around a constant mean </br>
</br>
2) <u>Stationarity:</u> A stochastic process whose unconditional joint probability distribution does not change when shifted in time. (basically - not time dependant). </br>
3) <u>P-value</u>: The probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct. </br> We will use it to test for conitegration. 

# Project requirements

In [34]:
from asyncio import threads
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint, adfuller
import yfinance as yf
from yahoo_fin.stock_info import get_data
from datetime import datetime
import matplotlib.pyplot as plt
import plotly.graph_objs as go

# Data that will be used
We will examine ETFs of tech companies.</br>
Our assumption is that each of them is stationary, and that they will probably be cointegrated, or at least correlated. </br>
We will be looking at the following ETFs: </br>
* VGT
* XLK
* SMH
* SOXX
* IYW 

Which are the Top 5 ETFs considering total assets and 5 years look back window profits. ([etfdb](https://etfdb.com/etfdb-category/technology-equities/))

# Loading data

In [59]:
vgt = yf.Ticker('VGT')
xlk = yf.Ticker('XLK')
smh = yf.Ticker('SMH')
soxx = yf.Ticker('SOXX')
iyw = yf.Ticker('IYW')

data_hist_list = []
for elem in [vgt, xlk, smh, soxx, iyw]:
    data_hist = elem.history(period='max')
    # check for null and zero entries 
    len(
      data_hist[
        data_hist.Close.isna() |
        data_hist.Close.isnull() |
        data_hist.Close < 1e-8
      ]
    )
    data_hist_list.append(data_hist)
    
# Print data history
data_hist_list[0]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2004-01-30,41.826827,42.255909,41.826827,42.118603,117600,0.0,0
2004-02-02,42.135741,42.152905,41.912620,42.152905,65400,0.0,0
2004-02-03,41.878298,41.895462,41.723829,41.895462,231100,0.0,0
2004-02-04,40.762682,40.762682,40.633957,40.633957,51000,0.0,0
2004-02-05,40.934326,40.934326,40.633967,40.839928,2600,0.0,0
...,...,...,...,...,...,...,...
2022-04-28,374.200012,385.390015,371.700012,383.130005,570500,0.0,0
2022-04-29,378.839996,384.220001,366.660004,367.200012,615200,0.0,0
2022-05-02,366.380005,373.600006,362.899994,373.269989,711700,0.0,0
2022-05-03,372.750000,375.500000,370.149994,373.510010,405900,0.0,0


# Check for Stationarity

In [68]:
def stationarity(a, cutoff = 0.05):
  a = np.ravel(a)
  if adfuller(a)[1] < cutoff:
    print(‘The series is stationary’)
    print(‘p-value = ‘, adfuller(a)[1])
  else:
    print(‘The series is NOT stationary’)
    print(‘p-value = ‘, adfuller(a)[1])
# stationarity(Asset_1)
# stationarity(Asset_2)

SyntaxError: invalid character '‘' (U+2018) (773963179.py, line 4)

# Find cointegration
We will use the augmented Engle-Granger two-step cointegration test.</br>
The null hypothesis is that there is no cointegration between the pairs. </br>
Thus, we will be looking for low pvalue. </br>
The threshold we are using is 0.05, which was randomly chosen.

In [44]:
# auxiliary function to find all pairs given a list (nC2)
def n_choose_2(lst):
    pairs_list = []
    for i in range(len(lst) - 1):
        rest = lst[i+1:]
        for j in range(len(rest)):
            pairs_list.append([lst[i], rest[j]])
    return pairs_list
#n_choose_2([1, 2, 3, 4, 5])

In [67]:
def find_cointegrated_pairs(etf_price_list):
    cointegrated_pairs = []
    # create pairs
    etf_pairs = n_choose_2(etf_price_list)
    # for each pair check for cointegration
    for pair in etf_pairs:
        threshold = 0.05  
        _, pvalue, _ = coint(pair[0], pair[1])
        if pvalue <= threshold:
            cointegrated_pairs.append(pair)
    return cointegrated_pairs
        

etf_price_list = [data_hist[['Close']] for data_hist in data_hist_list] 
# etf_price_list
find_cointegrated_pairs(etf_price_list)
    

ValueError: endog and exog matrices are different sizes