# Pairs-Trading (Beta Neutral) Strategy Analysis

The beta neutral trading strategy is a market-neutral approach designed to eliminate systematic market risk by constructing a portfolio that has a net beta of zero. This strategy aims to generate alpha (excess returns above the market) by taking both long and short positions in securities such that the overall portfolio is insulated from broad market movements.

Key Concepts:
- **Beta**: A measure of a security's volatility in relation to the overall market. A beta of 1 indicates that the security moves with the market, a beta greater than 1 indicates higher volatility than the market, and a beta less than 1 indicates lower volatility.
- **Market Neutral**: A strategy that seeks to minimize exposure to market risk, focusing on generating returns from the individual performance of securities rather than market movements.

Here's how it works:

- **Identifying a Pair / Basket of Tokens**: Identify a set of tokens with varying betas.
- **Calculating Beta**: Use historical price data to calculate the beta of each token relative to a benchmark token (e.g., BTC and ETH). This can be done using regression analysis where the token's returns are regressed against the benchmark's returns.
- **Constructing the Portfolio**: 
    - **Long Positions**: Select tokens with positive alpha (expected to outperform their beta-implied returns) and go long on these.
    - **Short Positions**: Select tokens with negative alpha (expected to underperform their beta-implied returns) and short these.
    - Ensure the weighted average beta of the long positions is equal to the weighted average beta of the short positions, resulting in a net beta of zero.
- **Position Sizing**: Calculate the dollar amount for each position to ensure that the portfolio remains beta neutral. This involves balancing the dollar value of long and short positions based on their respective betas.
- **Monitoring and Rebalancing**: Regularly monitor the portfolio to ensure it remains beta neutral. Adjust positions as necessary to maintain the desired beta neutrality, especially if there are significant changes in the market or individual tokens.

In this notebook, the pair trading negative / low correlation strategy is demonstrated using cryptocurrency price data sourced from the Binance, OKX, and Bybit APIs.

## Prepare your Environment

Ensure that the 'venv' kernel is selected for this notebook. If not, click on 'Kernel' at the top bar, select 'Change Kernel...' and select 'venv' as the kernel. For convenience, ensure that 'Always start the preferred kernel' is ticked. Click 'Select' to confirm the setting.

Install the environment's dependencies using the command below. After installation, restart the kernel to use the updated packages. To restart, click on 'Kernel' at the top bar and select 'Restart Kernel' and click on 'Restart'. Please skip this step if you have already done it.

In [2]:
pip install -r requirements.txt

Collecting cvxpy
  Downloading cvxpy-1.5.2-cp39-cp39-macosx_10_9_universal2.whl (1.4 MB)
[K     |████████████████████████████████| 1.4 MB 3.2 MB/s eta 0:00:01
Collecting scs>=3.2.4.post1
  Downloading scs-3.2.6-cp39-cp39-macosx_11_0_arm64.whl (93 kB)
[K     |████████████████████████████████| 93 kB 10.1 MB/s eta 0:00:01
[?25hCollecting ecos>=2
  Downloading ecos-2.0.14.tar.gz (142 kB)
[K     |████████████████████████████████| 142 kB 27.5 MB/s eta 0:00:01
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
[?25hCollecting clarabel>=0.5.0
  Downloading clarabel-0.9.0-cp37-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 14.0 MB/s eta 0:00:01
[?25hCollecting osqp>=0.6.2
  Downloading osqp-0.6.7.post0-cp39-cp39-macosx_11_0_arm64.whl (237 kB)
[K     |████████████████████████████████| 237 kB 32.8 M

## Import packages

In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import os
import sys
from datetime import datetime
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from statsmodels.tsa.stattools import coint
from itertools import combinations
from statsmodels.tsa.stattools import coint
from utils import calculate_profit, plot_strategy
from data_manager import load_ts_df, process_data, sanitize_data
from typing import List
import cvxpy as cp

## Process Price Dataframe

- Before proceeding, ensure that the price data has been downloaded using ***'data_manager.py'***.
- Enter the ***cex*** (Centralized Exchange) and ***interval*** values used for data download to load the relevant *.pkl* files and retrieve the dataframe.
- You can specify a batch of pairs to load using the ***selected_pairs*** variable. If no pairs are selected, all available pairs will be loaded by default.
- Note that some pairs might be new and may lack sufficient data within the downloaded timeframe. Such pairs will be removed based on the ***nan_remove_threshold*** setting, which defines the maximum percentage of NaN values allowed relative to the total data points. For example, with a ***nan_remove_threshold*** of 0.1, if a pair has 100 data points and 15 are NaN, the pair will be excluded.
- From the remaining pairs, you can filter the top N volume pairs using the ***top_n_volume_pairs*** parameter.
- This part of the code will also ensure that all timeseries columns have the same number of data points.
- The earliest and latest dates for all pairs will be recorded. These dates can then be used to determine the timeframe for slicing the data in the next step.

### Inputs

In [2]:
##### INPUTS #####
cex = 'binance'
interval = '1h'
nan_remove_threshold = 0.1

# Select only the pairs below to analyse. All pairs will be selected if the list is empty.
selected_pairs = []

# Select only the top N mean volume pairs from the selected pairs to analyse.
top_n_volume_pairs = 100

# Select volume filter mode. Options: ['rolling', 'mean'].
volume_filter_mode = 'rolling'
##################

In [3]:
print("\nMode: Pairs-Trading (Beta Neutral) Strategy")
print("CEX: {}".format(str(cex).capitalize()))
print("Interval: {}".format(interval))
print("NaN Remove Threshold: {}".format(nan_remove_threshold))
print("Selected pairs to analyse: {}".format(selected_pairs))
print("Top N Volume Pairs: {}".format(top_n_volume_pairs))
print("Volume Filter Mode: {}".format(str(volume_filter_mode).capitalize()))

merged_df = process_data('beta_neutral', cex, interval, nan_remove_threshold, selected_pairs,
                 top_n_volume_pairs, volume_filter_mode)

print("\n")


Mode: Pairs-Trading (Beta Neutral) Strategy
CEX: Binance
Interval: 1h
NaN Remove Threshold: 0.1
Selected pairs to analyse: []
Top N Volume Pairs: 100
Volume Filter Mode: Rolling

Filtered top 100 mean volume pairs.
Successfully loaded candlestick dataframe for all available pairs.

Earliest time series start date: 2024-06-21
Latest time series end date: 2024-07-21




## Sanitize the dataframe

- Slice the dataframe according to the specified ***start_date*** and ***end_date***. Choose ***start_date*** and ***end_date*** within the timeframe shown by the output of the previous cell.
- Interpolate any missing values in the dataframe.
- If the interpolation fails, just backfill with the latest valid value.
- Verify that all is as expected with an `assert` and check the shapes of 2 random pairs, which should have the same dimensions.

### Inputs

In [6]:
##### INPUTS #####
start_date = '2024-06-01'
end_date = '2024-07-22'
##################

In [13]:
print("\n")

data_sanitized, sorted_available_pairs = sanitize_data(merged_df, start_date, end_date)

if data_sanitized:
    print("-Data Check-")
    keys = list(data_sanitized.keys())
    count = 0

    for key in keys:
        print("{}'s Data Shape: {}".format(key, data_sanitized[key].shape))
        count+=1

        if count == 2:
            break
            
else:
    print("No data found.")

print("\n")



-Data Check-
BTCUSDT's Data Shape: (730, 1)
ETHUSDT's Data Shape: (730, 1)




## Compute Percentage Returns

To compute percentage returns with a time series of close data, you first need to calculate the daily returns by finding the percentage change between consecutive prices. This can be done using the formula below:
$$Percentage \space Return = \frac{Price_{t} - Price_{t-1}}{Price_{t-1}} \times 100$$
1. ${Price_{t}}$ is the calculated price on day t.
2. ${Price_{t-1}}$ is the calculated price on the previous candlestick.

In a pandas DataFrame, this can be efficiently computed using the pct_change() method on the price column. The resulting series represents the daily percentage returns, reflecting the day-to-day performance of the asset. Summarizing or compounding these daily returns over longer periods can provide insights into the overall performance of the asset over time.

In [19]:
print("\n")

returns = pd.DataFrame()

for pair, df in data_sanitized.items():
    tmp_return_df = df['Close'].pct_change().dropna().to_frame(name=pair)
    returns = pd.merge(returns, tmp_return_df, how='outer', left_index=True, right_index=True)

print("-Data Check-")
print("Percentage Returns' Data Shape (should be 1 row lesser than printed above): {}".format(returns.shape))
print("\n")



-Data Check-
Percentage Returns' Data Shape (should be 1 row lesser than printed above): (729, 100)


