# Backtesting the Strategy

Here we will be looking at how the strategy has performed in the last 6 months and then using this we will decide if we need to move further or not. 

One addition to your previous code would be to add another column for the beta between the assets. 

This way we can get the hedge ratio. 


I have made a dummy csv fiole for myself in my data folder and you can use that as reference. Bottom attached is the code.

Here is a task list for those who want to code. Otherwise you can use mine as reference

## Pairs Trading Strategy with Beta as Hedge Ratio: Task List
### Data Preparation and Research
Asset Pairs Selection: Identify and finalize the list of asset pairs for pairs trading. Record their hedge ratios (beta) based on historical data.
Data Collection: Research and decide on the data source for real-time or historical price data for selected asset pairs.
### Code Development
Fetch Asset Data: Write code to fetch historical or real-time price data for the selected asset pairs.
Calculate Spread: Implement logic to calculate the spread between each asset pair using the formula 
Spread = Asset2 − beta × Asset1
Spread=Asset2−β×Asset1.
Z-Score Calculation: Calculate the z-score of the spread to standardize it.
Signal Generation: Generate trading signals based on the calculated z-score. Define upper and lower thresholds for entering and exiting trades.
Use Beta as Hedge Ratio: Implement logic to use beta as a hedge ratio while placing trades to make the portfolio market-neutral.




# Pairs Trading Backtesting Framework

## Overview

This Python script performs a backtest on pairs trading strategies. Given a list of asset pairs and their hedge ratios (beta), the script does the following:

1. **Fetches historical data**: Uses Yahoo Finance to get historical price data for each asset in a pair.
2. **Calculates the spread**: Computes the spread between each pair based on the given beta.
3. **Generates trading signals**: Determines long and short positions based on the z-score of the spread.
4. **Calculates performance metrics**: Computes metrics like CAGR, Sharpe Ratio, Number of Trades, and Drawdown.
5. **Visualizes trading signals**: Plots the spread and trading signals for each pair.
6. **Saves Results**: Outputs the performance metrics to a CSV file and saves the plots.

## Code Sections

### Importing Libraries

The script starts by importing necessary Python libraries such as `pandas`, `numpy`, `datetime`, `yfinance`, `pyfolio`, and `plotly`.

### Function: `backtest(df_trading, asset1, asset2, beta)`

This function performs the core backtesting logic. It takes as input:

- `df_trading`: A DataFrame with historical prices for each asset in the pair
- `asset1`, `asset2`: The names of the assets in the pair
- `beta`: The hedge ratio for the pair

The function returns a dictionary of metrics like CAGR, Sharpe ratio, and drawdown. It also updates `df_trading` to include columns for spread, z-score, and trading signals.

### Function: `save_plotly_graph(df, asset1, asset2)`

This function takes a DataFrame with spread and trading signals, along with asset names, and saves a Plotly graph visualizing the spread and trading signals.

### Function: `main(csv_path)`

This is the main function that orchestrates the backtesting process. It:

1. Reads a CSV file with asset pairs and their hedge ratios.
2. Iterates through each pair, fetches historical data, and performs backtesting.
3. Saves the performance metrics to a CSV file.
4. Calls `save_plotly_graph()` to save the trading signals graph for each pair.

## How to Run

1. Make sure you have all the required libraries installed.
2. Place a CSV file with asset pairs and their hedge ratios in the same directory as the script. The CSV should have columns ['Asset1', 'Asset2', 'Beta'].
3. Run the script. The performance metrics will be saved to `data/backtest.csv` and the trading signals graphs will be saved in `img/signals/`.

```python
!pip install pyfolio
```

In [28]:
# Importing additional necessary libraries for quant metrics and plotting
from pyfolio.timeseries import perf_stats
import plotly.graph_objs as go
import os
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil.relativedelta import relativedelta
from scipy.stats import zscore
import yfinance as yf

```python
# Updated function to backtest a given asset pair
def backtest(df_trading, asset1 : str, asset2 : str, beta: float):
    # Initialize metrics dictionary
    metrics = {}

    # Calculate the spread and z-score
    df_trading['spread'] = df_trading[asset2] - beta * df_trading[asset1]
    df_trading['zscore'] = zscore(df_trading['spread'])

    # Trading signals
    UL = df_trading['zscore'].mean() + df_trading['zscore'].std()
    LL = df_trading['zscore'].mean() - df_trading['zscore'].std()

    # Generating trading signals based on z-score and beta
    df_trading['asset1_signal'] = np.where(df_trading['zscore'] > UL, -beta, 0)
    df_trading['asset1_signal'] = np.where(df_trading['zscore'] < LL, beta, df_trading['asset1_signal'])

    df_trading['asset2_signal'] = np.where(df_trading['zscore'] > UL, 1, 0)
    df_trading['asset2_signal'] = np.where(df_trading['zscore'] < LL, -1, df_trading['asset2_signal'])

    # Daily returns
    df_trading['asset1_returns'] = df_trading[asset1].pct_change() * df_trading['asset1_signal']
    df_trading['asset2_returns'] = df_trading[asset2].pct_change() * df_trading['asset2_signal']

    df_trading['portfolio_returns'] = df_trading['asset1_returns'] + df_trading['asset2_returns']

    # Quantitative metrics
    stats = perf_stats(df_trading['portfolio_returns'].dropna())
    metrics['CAGR'] = stats['Annual return']
    metrics['Sharpe ratio'] = stats['Sharpe ratio']
    metrics['Max Drawdown'] = stats['Max drawdown']
    metrics['Number of Trades'] = df_trading['asset1_signal'].ne(0).sum()  # Counting non-zero entries

    return metrics, df_trading
```

In [36]:
def backtest(df_trading, asset1 : str, asset2 : str, beta: float):
    metrics = {}

    df_trading['spread'] = df_trading[asset2] - beta * df_trading[asset1]
    df_trading['zscore'] = zscore(df_trading['spread'])

    UL = df_trading['zscore'].mean() + df_trading['zscore'].std()
    LL = df_trading['zscore'].mean() - df_trading['zscore'].std()

    holding_position = False  # Flag to indicate if we are holding a position
    df_trading['asset1_signal'] = 0
    df_trading['asset2_signal'] = 0

    for i in range(1, len(df_trading)):
        if not holding_position:
            if df_trading['zscore'].iloc[i] > UL:
                df_trading['asset1_signal'].iloc[i] = -beta
                df_trading['asset2_signal'].iloc[i] = 1
                holding_position = True  # Now holding a position

            elif df_trading['zscore'].iloc[i] < LL:
                df_trading['asset1_signal'].iloc[i] = beta
                df_trading['asset2_signal'].iloc[i] = -1
                holding_position = True  # Now holding a position

        elif holding_position:
            if LL <= df_trading['zscore'].iloc[i] <= UL:
                # Closing the trade
                df_trading['asset1_signal'].iloc[i] = -df_trading['asset1_signal'].iloc[i-1]  # Reverse the last trade
                df_trading['asset2_signal'].iloc[i] = -df_trading['asset2_signal'].iloc[i-1]
                holding_position = False  # No longer holding a position

        # Daily returns
    df_trading['asset1_returns'] = df_trading[asset1].pct_change() * df_trading['asset1_signal']
    df_trading['asset2_returns'] = df_trading[asset2].pct_change() * df_trading['asset2_signal']

    df_trading['portfolio_returns'] = df_trading['asset1_returns'] + df_trading['asset2_returns']

    # Quantitative metrics
    stats = perf_stats(df_trading['portfolio_returns'].dropna())
    metrics['CAGR'] = stats['Annual return']
    metrics['Sharpe ratio'] = stats['Sharpe ratio']
    metrics['Max Drawdown'] = stats['Max drawdown']
    metrics['Number of Trades'] = df_trading['asset1_signal'].ne(0).sum()  # Counting non-zero entries

    return metrics, df_trading

In [37]:
# Function to save the trading signals graph
def save_plotly_graph(df, asset1, asset2):
    # Create directory if it doesn't exist
    if not os.path.exists('img/signals'):
        os.makedirs('img/signals')

    # Create the figure
    fig = go.Figure()

    # Add z-score trace
    fig.add_trace(go.Scatter(x=df.index, y=df['zscore'], mode='lines', name='Z-Score'))

    # Add upper and lower limits as dashed lines
    UL = df['zscore'].mean() + df['zscore'].std()
    LL = df['zscore'].mean() - df['zscore'].std()

    fig.add_trace(go.Scatter(x=df.index, y=[UL]*len(df.index), mode='lines', name='Upper Limit', line=dict(dash='dash')))
    fig.add_trace(go.Scatter(x=df.index, y=[LL]*len(df.index), mode='lines', name='Lower Limit', line=dict(dash='dash')))

    # Add buy and sell signals
    fig.add_trace(go.Scatter(x=df.index, y=df['zscore'].where(df['asset1_signal'] > 0),
                             mode='markers', name='Buy Signal', marker=dict(color='green', symbol='triangle-up')))

    fig.add_trace(go.Scatter(x=df.index, y=df['zscore'].where(df['asset1_signal'] < 0),
                             mode='markers', name='Sell Signal', marker=dict(color='red', symbol='triangle-down')))

    # Layout options
    fig.update_layout(title=f'Trading Signals for {asset1} and {asset2}',
                      xaxis_title='Date',
                      yaxis_title='Z-Score')

    # Save the figure
    fig.write_html(f'img/signals/{asset1}_{asset2}.html')


In [38]:
# Main function
def main(csv_path: str):
    # Read the CSV file
    asset_pairs = pd.read_csv(csv_path)

    # Initialize metrics DataFrame
    metrics_df = pd.DataFrame(columns=['Pair Name', 'CAGR', 'Sharpe Ratio', 'Number of Trades', 'Max Drawdown'])

    # Date range for backtesting (Last 6 months)
    end = datetime.now().date()
    start = (datetime.now() - relativedelta(months=6)).date()

    for index, row in asset_pairs.iterrows():
        print("Running Backtest for Asset Pair")
        asset1 = row['Asset1']
        asset2 = row['Asset2']
        beta = row['Beta']
        print("Downloading Data")
        # Download data
        asset1_data = yf.download(asset1, start=start, end=end, progress=False)['Adj Close']
        asset2_data = yf.download(asset2, start=start, end=end, progress=False)['Adj Close']

        # DataFrame for backtesting
        df_trading = pd.DataFrame({asset1: asset1_data, asset2: asset2_data})
        print("Running Backtest metrics")
        # Backtest and get metrics and signals
        metrics, signals_df = backtest(df_trading, asset1, asset2, beta)

        # Append to metrics DataFrame
        metrics_df = metrics_df.append({
            'Pair Name': f'{asset1}_{asset2}',
            'CAGR': metrics['CAGR'],
            'Sharpe Ratio': metrics['Sharpe ratio'],
            'Number of Trades': metrics['Number of Trades'],
            'Max Drawdown': metrics['Max Drawdown']
        }, ignore_index=True)

        # Save the Plotly graph
        save_plotly_graph(signals_df, asset1, asset2)

    # Save the metrics DataFrame
    metrics_df.to_csv('data/backtest.csv', index=False)

In [39]:
main("data/final_pairs.csv")

Running Backtest for Asset Pair
Downloading Data
Running Backtest metrics
Running Backtest for Asset Pair
Downloading Data




A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy




divide by zero encountered in divide


divide by zero encountered in scalar divide


The frame.append method is deprecated and will be removed from pandas in a f

Running Backtest metrics




A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy




invalid value encountered in scalar divide


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead

In [40]:
backtest_df = pd.read_csv("data/backtest.csv")

In [41]:
backtest_df

Unnamed: 0,Pair Name,CAGR,Sharpe Ratio,Number of Trades,Max Drawdown
0,CMS_CEG,0.243454,3.566101,7,0.0
1,CNP_AEE,0.069908,2.85109,5,-5.02758e-07
