<a href="https://colab.research.google.com/github/numberjuani/cryptocoint/blob/master/Stat_Arb_Candidates.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The objective of this notebook is to identify crypto currencies that are suitable for statistical arbitrage.
In order to come up with an objective ranking, we will test all pairs for correlation and cointegration.

In [None]:
#!pip3 install binance-connector

In [None]:
from statsmodels.api import OLS
from statsmodels.tsa.stattools import adfuller
from binance.spot import Spot as SpotClient
import pandas as pd
from joblib import Parallel, delayed
import numpy as np

This is the function we will use to calculate wether a pair is cointegrated at a certain point in time, even though it returns a boolean,
We can later sum these bools as 0 - 1 to see which crypto are cointegrated most. This is necessary to prevent lookahead bias.

In [None]:
def is_cointegrated(x, y):
    result = OLS(x, y).fit()
    hedge_ratio = result.params[0]
    adf_results = adfuller(result.resid)    
    if adf_results[0] <= adf_results[4]['10%'] and adf_results[1] <= 0.1:
        return (True,hedge_ratio)
    else:
        return (False,hedge_ratio)

A simple wrapper function to convert the data provided by the binance API to a pandas dataframe.

In [None]:
def api_reponse_to_pandas(ohlc_dict: dict,symbol:str):
    df = pd.DataFrame(ohlc_dict,columns=['open_time', 'open', 'high', 'low', 'close', 'volume', 'close_time', 'quote_asset_volume', 'num_trades', 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume', 'ignore'])
    df['close_time'] = pd.to_datetime(df['close_time'], unit='ms', utc=True)
    #check if there are is daylight savings time change
    df['close'] = df['close'].astype(float)
    df.set_index('close_time', inplace=True)
    df = df[['close']]
    return (symbol,df)

This function checks if the pair is cointegrated, calculates the rolling correlation ,and outputs summary statistics that can be used to compare pairs.

In [None]:
def check_coins(coin_1_name:str,coin_1_data:pd.DataFrame,coin_2_name:str,coin_2_data:pd.DataFrame,rolling_window:int):
    both = pd.merge(coin_1_data, coin_2_data, how='inner', left_index=True, right_index=True,suffixes=('_1','_2'))
    both['rolling_corr'] = both['close_1'].rolling(window=rolling_window).corr(both['close_2'])
    both.replace([np.inf, -np.inf], np.nan, inplace=True)
    #print(both.isna().sum())
    both.reset_index(inplace=True)
    both['rolling_cointegration'] = False
    both['hedge_ratio'] = 0
    both['spread'] = both['close_1'] - both['close_2']
    if len(both) > rolling_window+1:
        for index in both.index:
            if index > rolling_window:
                cointegration,hedge_ratio = is_cointegrated(both.loc[index-rolling_window:index,'close_1'],both.loc[index-rolling_window:index,'close_2'])
                both.loc[index,'hedge_ratio'] = hedge_ratio
                if cointegration:
                    both.loc[index,'rolling_cointegration'] = True
        #now we save the corr abd cointegration data to be compared with all the other coins
        corr_mean = both['rolling_corr'].mean()
        coint_mean = both['rolling_cointegration'].mean()
        pair = coin_1_name + '_' + coin_2_name
        spread_mean = both['spread'].mean()
        spread_std = both['spread'].std()
        hedge_ratio_last = both['hedge_ratio'].iloc[-1]
        return {'pair':pair,'corr_mean':corr_mean,'coint_mean':coint_mean,'both':corr_mean*coint_mean,'spread_mean':spread_mean,'spread_std':spread_std,'hedge_ratio_last':hedge_ratio_last}
    else:
        return {}

In [None]:
spot_client = SpotClient()

In [None]:
spot_info = pd.DataFrame(spot_client.exchange_info()['symbols'])
#for the sake of simplicity in this exercise we will only consider coins with USDT stablecoin in the quote asset. 
#Also, we will only consider coins that can be shorted since we will need to buy/short both coins on the pair
quotes_to_consider = ['USDT']
spot_info = spot_info[(spot_info.quoteAsset.isin(quotes_to_consider)) & (spot_info.status == 'TRADING')& (spot_info.isMarginTradingAllowed)]

In [None]:
f'Considering {len(spot_info)} coins for pairs'

In [None]:
symbols:list[str] = spot_info.symbol.tolist()
pairs = []
#here we create a list of pairs, making sure that were not repeating combinations or using the same symbol twice
for x in range(0,len(symbols)-1):
    for y in range(len(symbols)-1,0,-1):
        if symbols[x] != symbols[y]:
            pair = [symbols[x],symbols[y]]
            reverse_pair = [symbols[y],symbols[x]]
            if pair not in pairs and reverse_pair not in pairs:
                pairs.append(pair)
f'A total of {len(pairs)} distinct pairs are possible'

Now we can proceed to request OHLCV data for the coins we have identified.

In [None]:
coins_data:list[tuple[str,pd.DataFrame]] = (Parallel(n_jobs=-1)(delayed(api_reponse_to_pandas)(spot_client.klines(symbol=sym, interval='5m',limit=1000),sym) for sym in symbols))
#now we transfrom the list of tuples to a dictionary to its easy to work with
coins_data:dict[str,pd.DataFrame] = dict(coins_data)

Testing all possible pairs.

In [None]:
results = (Parallel(n_jobs=-1)(delayed(check_coins)(pair[0],coins_data[pair[0]],pair[1],coins_data[pair[1]],500) for pair in pairs))
#remove all {} from the results
results = [x for x in results if x != {}]

In [None]:
results_frame = pd.DataFrame(results)
results_frame.sort_values(by=['both'],ascending=False,inplace=True)
results_frame