# Pairs Crash and Recovery Analysis

In the dynamic world of financial markets, volatility is a constant companion, presenting both risks and opportunities for traders and investors. During periods of heightened volatility, understanding the relative performance of assets can provide valuable insights for making informed trading decisions. The Pairs Crash and Recovery Analysis aims to identify which assets experience the most significant crashes and recoveries, thereby revealing which assets demonstrate strength or weakness during turbulent times.

Key Concepts:
- **Crash**: A significant decline in the price of an asset within a short time period.
- **Recovery**: A significant rebound in the price of an asset following a crash.
- **Relative Performance**: Comparing the behavior of paired assets to determine which is stronger or weaker during periods of market stress.

Here's how it works:

- **Identifying a Pair / Basket of Tokens**: Select a set of tokens that you want to analyze. These can be tokens with similar characteristics or those that are typically traded together.
- **Defining Crash and Recovery Thresholds**: Set predefined percentage thresholds to determine what constitutes a large crash or recovery. For example, a 5% drop might be defined as a crash, and a 5% rise as a recovery.
- **Calculating Percentage Changes**: Calculate the percentage change in closing prices from one period to the next for each token at selected intervals.
- **Identifying Crashes and Recoveries**: Identify periods where the percentage change exceeds the defined thresholds for crashes and recoveries. This can be done using simple conditional logic applied to the percentage changes.
- **Combining Consecutive Periods**: Combine consecutive periods that meet the crash or recovery criteria into single events. This step ensures that sustained trends are identified rather than isolated spikes.
- **Comparing Performance**: Compare the performance of the tokens during identified crash and recovery periods to determine which asset is stronger or weaker. This can involve looking at the magnitude and duration of crashes and recoveries.

In this notebook, the crash and recovery analysis is demonstrated using cryptocurrency price data sourced from the Binance, OKX, and Bybit APIs.

## Prepare your Environment

Ensure that the 'venv' kernel is selected for this notebook. If not, click on 'Kernel' at the top bar, select 'Change Kernel...' and select 'venv' as the kernel. For convenience, ensure that 'Always start the preferred kernel' is ticked. Click 'Select' to confirm the setting.

Install the environment's dependencies using the command below. After installation, restart the kernel to use the updated packages. To restart, click on 'Kernel' at the top bar and select 'Restart Kernel' and click on 'Restart'. Please skip this step if you have already done it.

In [None]:
pip install -r requirements.txt

## Import packages

In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import os
import sys
from datetime import datetime
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from statsmodels.tsa.stattools import coint
from itertools import combinations
from statsmodels.tsa.stattools import coint
from utils import calculate_profit, plot_strategy
from data_manager import load_ts_df, process_data, sanitize_data
from typing import List
import cvxpy as cvx
import warnings

pd.set_option('display.max_rows', None)
warnings.filterwarnings("ignore")

## Process Price Dataframe

- Before proceeding, ensure that the price data has been downloaded using ***'data_manager.py'***.
- Enter the ***cex*** (Centralized Exchange) and ***interval*** values used for data download to load the relevant *.pkl* files and retrieve the dataframe.
- You can specify a batch of pairs to load using the ***selected_pairs*** variable. The key represents the pair and its value represents the direction it must take (1: Long, -1: Short, 0: Does not matter). Unlike other notebooks, no pairs will be selected if the dictionary is empty.
- Note that some pairs might be new and may lack sufficient data within the downloaded timeframe. Such pairs will be removed based on the ***nan_remove_threshold*** setting, which defines the maximum percentage of NaN values allowed relative to the total data points. For example, with a ***nan_remove_threshold*** of 0.1, if a pair has 100 data points and 15 are NaN, the pair will be excluded.
- From the remaining pairs, you can filter the top N rolling or mean volume pairs using the ***top_n_volume_pairs*** and ***volume_filter_mode*** parameter. 
- This part of the code will also ensure that all timeseries columns have the same number of data points.
- The earliest and latest dates for all pairs will be recorded. These dates can then be used to determine the timeframe for slicing the data in the next step.

### Inputs

In [2]:
##### INPUTS #####
cex = 'binance'
interval = '4h'
nan_remove_threshold = 0.0

# Select only the pairs below to analyse. All pairs will be selected if the list is empty.
selected_pairs = []

# Select only the top N mean volume pairs from the selected pairs to analyse.
top_n_volume_pairs = 100

# Select volume filter mode. Options: ['rolling', 'mean'].
volume_filter_mode = 'rolling'
##################

In [4]:
print("\nMode: Pairs-Trading (Beta Neutral) Strategy")
print("CEX: {}".format(str(cex).capitalize()))
print("Interval: {}".format(interval))
print("NaN Remove Threshold: {}".format(nan_remove_threshold))
print("Selected pairs to analyse: {}".format(selected_pairs))
print("Top N Volume Pairs: {}".format(top_n_volume_pairs))
print("Volume Filter Mode: {}".format(str(volume_filter_mode).capitalize()))

merged_df = process_data('crash_recovery', cex, interval, nan_remove_threshold, selected_pairs,
                 top_n_volume_pairs, volume_filter_mode)

print("\n")


Mode: Pairs-Trading (Beta Neutral) Strategy
CEX: Binance
Interval: 4h
NaN Remove Threshold: 0.0
Selected pairs to analyse: ['ALTUSDT', 'SOLUSDT', 'TRXUSDT', 'BTCUSDT']
Top N Volume Pairs: 100
Volume Filter Mode: Rolling
Train-Test Split: False
Rolling Window: 10
Minimum Long Size (percentage of portfolio): 0.1
Minimum Short Size (percentage of portfolio): 0.1
Rolling Optimisation Unsolvable Threshold (percentage of available data): 0.05
Successfully loaded candlestick dataframe for all available pairs.

Earliest time series start date: 2024-02-18
Latest time series end date: 2024-08-03




## Sanitize the dataframe

- Slice the dataframe according to the specified ***start_date*** and ***end_date***. Choose ***start_date*** and ***end_date*** within the timeframe shown by the output of the previous cell.
- Interpolate any missing values in the dataframe.
- If the interpolation fails, just backfill with the latest valid value.
- Verify that all is as expected with an `assert` and check the shapes of 2 random pairs, which should have the same dimensions.

### Inputs

In [5]:
##### INPUTS #####
start_date = '2024-02-18' # refer to the previous output to set
end_date = '2024-08-03' # refer to the previous output to set
##################

In [6]:
print("\n")

data_sanitized, sorted_available_pairs = sanitize_data(merged_df, start_date, end_date)

if data_sanitized:
    print("-Data Check-")
    keys = list(data_sanitized.keys())
    count = 0

    for key in keys:
        print("{}'s Data Shape: {}".format(key, data_sanitized[key].shape))
        count+=1

        if count == 2:
            break
            
else:
    print("No data found.")

print("\n")



-Data Check-
BTCUSDT's Data Shape: (999, 1)
SOLUSDT's Data Shape: (999, 1)




## Compute Percentage Returns

To compute percentage returns with a time series of close data, you first need to calculate the daily returns by finding the percentage change between consecutive prices. This can be done using the formula below:
$$Percentage \space Return = \frac{Price_{t} - Price_{t-1}}{Price_{t-1}} \times 100$$
1. ${Price_{t}}$ is the calculated price on day t.
2. ${Price_{t-1}}$ is the calculated price on the previous candlestick.

In a pandas DataFrame, this can be efficiently computed using the pct_change() method on the price column. The resulting series represents the daily percentage returns, reflecting the day-to-day performance of the asset. Summarizing or compounding these daily returns over longer periods can provide insights into the overall performance of the asset over time.

In [7]:
print("\n")

returns = pd.DataFrame()

for pair, df in data_sanitized.items():
    tmp_return_df = df['Close'].pct_change().dropna().to_frame(name=pair)
    returns = pd.merge(returns, tmp_return_df, how='outer', left_index=True, right_index=True)
returns = returns.T.sort_index().T

print("-Data Check-")
print("Percentage Returns' Data Shape (should be 1 row lesser than printed above): {}".format(returns.shape))
print("\n")




-Data Check-
Percentage Returns' Data Shape (should be 1 row lesser than printed above): (998, 4)




## Select Pairs for Detailed Analysis

- Please select any pair combination from the output below.

In [10]:
print("\nSelectable pairs:")

for pair in sorted_available_pairs:
    print("- {}".format(pair))

sorted_available_pairs = {pair:selected_pairs[pair] for pair in sorted_available_pairs if pair in selected_pairs}

print("\n")


Selectable pairs:
- ALTUSDT
- BTCUSDT
- SOLUSDT
- TRXUSDT


