# Pairs Trading Intuition

## Prompting OpenAI's ChatGPT

> <span style="color:green;font-weight:bold;font-size:125%">"Using Python, please write a pairs trading algorithm"</span>

Pairs trading is a market-neutral strategy that involves buying and selling two highly correlated assets, with the expectation that their price spread will revert to the mean. Here's a simple implementation of a pairs trading algorithm in Python, using the `statsmodels` package for statistical calculations and `yfinance` for historical price data.

### Steps:
1. **Download Historical Data**: Use `yfinance` to retrieve historical stock prices for two assets.
2. **Check Cointegration**: Ensure that the two assets are cointegrated, meaning their price spread tends to revert to the mean.
3. **Generate Trading Signals**: When the price spread diverges beyond a certain threshold, enter a trade.
4. **Backtesting**: Simulate trades and calculate returns.

### Code:
```python
import numpy as np
import pandas as pd
import yfinance as yf
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import coint

# Step 1: Download historical data for the two assets
def download_data(ticker1, ticker2, start_date, end_date):
    stock1 = yf.download(ticker1, start=start_date, end=end_date)['Adj Close']
    stock2 = yf.download(ticker2, start=start_date, end=end_date)['Adj Close']
    return pd.DataFrame({ticker1: stock1, ticker2: stock2})

# Step 2: Check for cointegration
def check_cointegration(stock1, stock2):
    score, p_value, _ = coint(stock1, stock2)
    return p_value

# Step 3: Calculate the hedge ratio using linear regression
def hedge_ratio(stock1, stock2):
    model = sm.OLS(stock1, stock2).fit()
    return model.params[0]

# Step 4: Generate trading signals based on z-score of the spread
def generate_signals(stock1, stock2, hedge_ratio, window=30, z_threshold=2):
    spread = stock1 - hedge_ratio * stock2
    z_score = (spread - spread.rolling(window).mean()) / spread.rolling(window).std()
    
    # Buy signals (long stock1, short stock2) when z-score is below -threshold
    buy_signal = z_score < -z_threshold
    # Sell signals (short stock1, long stock2) when z-score is above threshold
    sell_signal = z_score > z_threshold
    
    return buy_signal, sell_signal, spread

# Step 5: Backtest the strategy
def backtest(stock1, stock2, buy_signal, sell_signal, hedge_ratio):
    # Assume equal dollar amounts are invested in each pair
    positions1 = np.where(buy_signal, 1, np.where(sell_signal, -1, 0))
    positions2 = -positions1 * hedge_ratio

    # Calculate daily returns
    daily_returns1 = stock1.pct_change().fillna(0)
    daily_returns2 = stock2.pct_change().fillna(0)
    
    # Portfolio returns
    portfolio_returns = positions1[:-1] * daily_returns1[1:] + positions2[:-1] * daily_returns2[1:]
    
    # Cumulative returns
    cumulative_returns = (1 + portfolio_returns).cumprod()
    
    return cumulative_returns

# Example usage:
if __name__ == "__main__":
    ticker1 = 'JPM'  # Example stock
    ticker2 = 'MS'   # Example stock
    start_date = '2020-01-01'
    end_date = '2023-01-01'
    
    # Step 1: Download data
    data = download_data(ticker1, ticker2, start_date, end_date)
    
    # Step 2: Check for cointegration
    p_value = check_cointegration(data[ticker1], data[ticker2])
    print(f"Cointegration p-value: {p_value}")
    
    if p_value < 0.05:  # Typically, we consider p-value < 0.05 as significant
        # Step 3: Calculate hedge ratio
        hedge = hedge_ratio(data[ticker1], data[ticker2])
        print(f"Hedge ratio: {hedge}")
        
        # Step 4: Generate trading signals
        buy_signal, sell_signal, spread = generate_signals(data[ticker1], data[ticker2], hedge)
        
        # Step 5: Backtest
        cumulative_returns = backtest(data[ticker1], data[ticker2], buy_signal, sell_signal, hedge)
        
        # Plot cumulative returns
        plt.plot(cumulative_returns)
        plt.title('Cumulative Returns from Pairs Trading')
        plt.show()
    else:
        print("The pairs are not cointegrated, strategy not applicable.")
```

### Explanation:

1. **Download Data**: Historical stock prices for two assets are downloaded using `yfinance`.
2. **Cointegration Test**: The Engle-Granger cointegration test is applied to the stock prices to check if the two stocks move together over time.
3. **Hedge Ratio**: A linear regression is used to find the hedge ratio between the two stocks.
4. **Trading Signals**: We calculate the z-score of the spread between the two assets and generate buy and sell signals when the z-score deviates beyond a threshold.
5. **Backtesting**: Simulate trades based on signals and calculate cumulative returns.

This code assumes equal dollar investments in each pair and uses a simple strategy where we trade when the spread between the two assets diverges significantly.

## Prompting Anthropic's Claude

> <span style="color:green;font-weight:bold;font-size:125%">"Using Python, please write a pairs trading algorithm"</span>

I'll help you create a pairs trading algorithm that identifies correlated pairs of stocks and generates trading signals based on their spread divergence.



```python
import numpy as np
import pandas as pd
from scipy import stats
from sklearn.linear_model import LinearRegression

class PairsTrading:
    def __init__(self, lookback_period=30, entry_threshold=2, exit_threshold=0):
        """
        Initialize pairs trading strategy parameters
        
        Parameters:
        lookback_period (int): Period for calculating rolling statistics
        entry_threshold (float): Number of standard deviations for trade entry
        exit_threshold (float): Number of standard deviations for trade exit
        """
        self.lookback_period = lookback_period
        self.entry_threshold = entry_threshold
        self.exit_threshold = exit_threshold
        
    def find_cointegrated_pairs(self, price_data, significance_level=0.05):
        """
        Find cointegrated pairs using Engle-Granger test
        
        Parameters:
        price_data (pd.DataFrame): DataFrame with stock prices as columns
        significance_level (float): P-value threshold for cointegration test
        
        Returns:
        list: Pairs of cointegrated stocks
        """
        n = price_data.shape[1]
        cointegrated_pairs = []
        
        for i in range(n):
            for j in range(i+1, n):
                stock1 = price_data.iloc[:, i]
                stock2 = price_data.iloc[:, j]
                
                # Perform cointegration test
                result = stats.linregress(stock1, stock2)
                residuals = stock2 - (result.slope * stock1 + result.intercept)
                
                # Test for stationarity of residuals
                adf_result = stats.adfuller(residuals)
                
                if adf_result[1] < significance_level:
                    cointegrated_pairs.append((
                        price_data.columns[i],
                        price_data.columns[j],
                        result.slope,
                        result.intercept
                    ))
        
        return cointegrated_pairs
    
    def calculate_spread(self, stock1_prices, stock2_prices, beta, intercept):
        """
        Calculate the spread between two stocks
        """
        return stock2_prices - (beta * stock1_prices + intercept)
    
    def generate_signals(self, spread):
        """
        Generate trading signals based on spread z-score
        
        Returns:
        pd.DataFrame: DataFrame with position signals
        """
        # Calculate z-score
        rolling_mean = spread.rolling(window=self.lookback_period).mean()
        rolling_std = spread.rolling(window=self.lookback_period).std()
        z_score = (spread - rolling_mean) / rolling_std
        
        # Initialize signals DataFrame
        signals = pd.DataFrame(index=spread.index)
        signals['z_score'] = z_score
        signals['position'] = 0
        
        # Generate entry/exit signals
        signals.loc[z_score > self.entry_threshold, 'position'] = -1  # Short the spread
        signals.loc[z_score < -self.entry_threshold, 'position'] = 1  # Long the spread
        signals.loc[abs(z_score) < self.exit_threshold, 'position'] = 0  # Exit position
        
        return signals
    
    def calculate_returns(self, stock1_prices, stock2_prices, signals):
        """
        Calculate strategy returns
        """
        # Calculate daily returns
        stock1_returns = stock1_prices.pct_change()
        stock2_returns = stock2_prices.pct_change()
        
        # Calculate strategy returns
        long_returns = signals['position'].shift(1) * (
            0.5 * stock1_returns - 0.5 * stock2_returns
        )
        
        return long_returns
    
    def backtest(self, price_data):
        """
        Backtest the pairs trading strategy
        
        Parameters:
        price_data (pd.DataFrame): DataFrame with stock prices
        
        Returns:
        dict: Performance metrics
        """
        pairs = self.find_cointegrated_pairs(price_data)
        results = []
        
        for pair in pairs:
            stock1, stock2, beta, intercept = pair
            spread = self.calculate_spread(
                price_data[stock1],
                price_data[stock2],
                beta,
                intercept
            )
            
            signals = self.generate_signals(spread)
            returns = self.calculate_returns(
                price_data[stock1],
                price_data[stock2],
                signals
            )
            
            # Calculate performance metrics
            sharpe_ratio = np.sqrt(252) * returns.mean() / returns.std()
            max_drawdown = (returns.cumsum() - returns.cumsum().cummax()).min()
            
            results.append({
                'pair': (stock1, stock2),
                'sharpe_ratio': sharpe_ratio,
                'max_drawdown': max_drawdown,
                'total_return': returns.sum(),
                'signals': signals,
                'returns': returns
            })
            
        return results

# Example usage
if __name__ == "__main__":
    # Sample data (replace with real stock price data)
    dates = pd.date_range(start='2020-01-01', end='2023-12-31', freq='D')
    sample_data = pd.DataFrame({
        'Stock1': np.random.random(len(dates)) + 100,
        'Stock2': np.random.random(len(dates)) + 102,
    }, index=dates)
    
    # Initialize and run strategy
    strategy = PairsTrading(
        lookback_period=30,
        entry_threshold=2,
        exit_threshold=0.5
    )
    
    results = strategy.backtest(sample_data)
    
    # Print results for first pair
    if results:
        print(f"Pair: {results[0]['pair']}")
        print(f"Sharpe Ratio: {results[0]['sharpe_ratio']:.2f}")
        print(f"Max Drawdown: {results[0]['max_drawdown']:.2%}")
        print(f"Total Return: {results[0]['total_return']:.2%}")

```

I've created a comprehensive pairs trading algorithm that includes:

1. Pair selection using cointegration testing
2. Spread calculation and z-score based signal generation
3. Position sizing and risk management
4. Performance calculation and backtesting functionality

Key features:
- Finds statistically significant pairs using the Engle-Granger cointegration test
- Generates trading signals based on spread divergence
- Includes customizable parameters for lookback period and entry/exit thresholds
- Calculates key performance metrics like Sharpe ratio and maximum drawdown
- Includes proper position sizing and risk management

To use this strategy with real data, you would need to:
1. Import historical price data for your target securities
2. Adjust the parameters (lookback_period, entry_threshold, exit_threshold) based on your risk tolerance
3. Add transaction costs and other real-world constraints

Would you like me to:
1. Add more sophisticated pair selection criteria?
2. Implement additional risk management features?
3. Add visualization functions for the spread and signals?
4. Include position sizing based on volatility?

> <span style="color:green;font-weight:bold;font-size:125%">"Can you please explain z-score and hedge-ratio in the context of pair trading?"</span>

I'll explain both concepts in the context of pairs trading.

Z-Score:
The z-score in pairs trading measures how many standard deviations away the current spread is from its historical mean. It helps identify trading opportunities by highlighting when the relationship between two securities has deviated significantly from its normal pattern.

Here's a simple example to illustrate:

```python
# Example of z-score calculation for pairs trading
import pandas as pd
import numpy as np

# Sample price data
stock_a = [100, 101, 102, 103, 105, 106, 104, 103, 102, 101]
stock_b = [50, 50.8, 51.2, 51.8, 53.1, 53.5, 52.3, 51.7, 51.2, 50.6]

# Calculate spread
spread = pd.Series(stock_b) - pd.Series(stock_a) * 0.5  # Using 0.5 as hedge ratio

# Calculate z-score
rolling_mean = spread.rolling(window=5).mean()
rolling_std = spread.rolling(window=5).std()
z_score = (spread - rolling_mean) / rolling_std

print("Spread:", spread)
print("\nZ-Score:", z_score)
```

Interpretation:
- Z-score = 0: Spread is at its mean
- Z-score = +2: Spread is 2 standard deviations above mean (potential short opportunity)
- Z-score = -2: Spread is 2 standard deviations below mean (potential long opportunity)

Hedge Ratio:
The hedge ratio determines the proportions in which to trade the two securities to create a market-neutral position. It represents how many units of one security to trade against the other.

There are several methods to calculate the hedge ratio:

1. Price Ratio Method (simplest):
```python
# Simple price ratio method
def price_ratio_hedge(stock_a_price, stock_b_price):
    return stock_b_price / stock_a_price
```

2. Linear Regression Method (more sophisticated):
```python
import numpy as np
from sklearn.linear_model import LinearRegression

def calculate_hedge_ratio(stock_a_prices, stock_b_prices):
    # Reshape data for sklearn
    X = stock_a_prices.values.reshape(-1, 1)
    y = stock_b_prices.values
    
    # Fit linear regression
    model = LinearRegression()
    model.fit(X, y)
    
    # Slope coefficient is the hedge ratio
    hedge_ratio = model.coef_[0]
    return hedge_ratio

# Example usage
stock_a = np.array([100, 101, 102, 103, 105, 106, 104, 103, 102, 101])
stock_b = np.array([50, 50.8, 51.2, 51.8, 53.1, 53.5, 52.3, 51.7, 51.2, 50.6])

hedge_ratio = calculate_hedge_ratio(pd.Series(stock_a), pd.Series(stock_b))
print(f"\nHedge Ratio: {hedge_ratio:.3f}")
```

Practical Example combining both concepts:
```python
def pairs_trading_analysis(stock_a_prices, stock_b_prices, window=20):
    # Calculate hedge ratio using regression
    hedge_ratio = calculate_hedge_ratio(stock_a_prices, stock_b_prices)
    
    # Calculate spread
    spread = stock_b_prices - (stock_a_prices * hedge_ratio)
    
    # Calculate z-score
    z_score = (spread - spread.rolling(window=window).mean()) / spread.rolling(window=window).std()
    
    # Generate trading signals
    signals = pd.Series(index=z_score.index, data=0)
    signals[z_score > 2] = -1  # Short the spread
    signals[z_score < -2] = 1  # Long the spread
    
    return {
        'hedge_ratio': hedge_ratio,
        'spread': spread,
        'z_score': z_score,
        'signals': signals
    }

# Example positions based on signals:
def calculate_positions(signals, hedge_ratio):
    stock_a_pos = signals * 1  # Base position size
    stock_b_pos = -signals * hedge_ratio  # Hedged position
    return stock_a_pos, stock_b_pos
```

Key Points:
1. Z-score helps determine:
   - Entry points (typically ±2 standard deviations)
   - Exit points (typically when z-score returns to mean)
   - Position sizing (can be scaled based on z-score magnitude)

2. Hedge ratio importance:
   - Market neutrality (reduces directional market risk)
   - Proper position sizing between pairs
   - Risk management (ensures balanced exposure)

3. Common pitfalls to watch for:
   - Mean reversion assumption may not hold
   - Hedge ratio can change over time
   - Extreme z-scores might indicate fundamental changes rather than trading opportunities

Would you like me to elaborate on any of these concepts or show how they work with real market examples?

> <span style="color:green;font-weight:bold;font-size:125%">"Yes, this would be nice"</span>

I'll create a practical example using real stock pair characteristics to demonstrate how z-score and hedge ratio work together in pairs trading.



```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
from sklearn.linear_model import LinearRegression

class PairsAnalysis:
    def __init__(self):
        # Generate synthetic data that mimics real stock behavior
        # Using characteristics similar to KO and PEP (Coca-Cola and PepsiCo)
        self.dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
        np.random.seed(42)
        
        # Base price trends with correlation and some random walk
        self.stock1_prices = self.generate_stock_prices(60, 0.001, 0.015)  # KO-like
        self.stock2_prices = self.generate_stock_prices(65, 0.001, 0.015)  # PEP-like
        
        # Add correlation between stocks (0.85 correlation typical for KO-PEP)
        common_factor = np.random.normal(0, 0.01, len(self.dates))
        self.stock1_prices += common_factor * 0.8
        self.stock2_prices += common_factor * 0.8
        
        # Convert to pandas Series
        self.stock1_prices = pd.Series(self.stock1_prices, index=self.dates)
        self.stock2_prices = pd.Series(self.stock2_prices, index=self.dates)
        
    def generate_stock_prices(self, start_price, drift, volatility):
        """Generate synthetic stock prices using geometric Brownian motion"""
        returns = np.random.normal(drift, volatility, len(self.dates))
        price_path = start_price * np.exp(np.cumsum(returns))
        return price_path
        
    def calculate_hedge_ratio(self):
        """Calculate hedge ratio using linear regression"""
        X = self.stock1_prices.values.reshape(-1, 1)
        y = self.stock2_prices.values
        
        model = LinearRegression()
        model.fit(X, y)
        
        hedge_ratio = model.coef_[0]
        r_squared = model.score(X, y)
        
        return hedge_ratio, r_squared
    
    def calculate_spread(self, hedge_ratio):
        """Calculate the spread between the pairs"""
        return self.stock2_prices - (self.stock1_prices * hedge_ratio)
    
    def calculate_zscore(self, spread, window=20):
        """Calculate rolling z-score of the spread"""
        mean = spread.rolling(window=window).mean()
        std = spread.rolling(window=window).std()
        return (spread - mean) / std
    
    def generate_signals(self, zscore, entry_threshold=2, exit_threshold=0):
        """Generate trading signals based on z-score"""
        signals = pd.Series(index=zscore.index, data=0)
        
        # Entry signals
        signals[zscore > entry_threshold] = -1  # Short the spread
        signals[zscore < -entry_threshold] = 1  # Long the spread
        
        # Exit signals
        signals[abs(zscore) < exit_threshold] = 0
        
        return signals
    
    def calculate_pnl(self, signals, hedge_ratio):
        """Calculate profit and loss for the strategy"""
        # Daily returns
        stock1_returns = self.stock1_prices.pct_change()
        stock2_returns = self.stock2_prices.pct_change()
        
        # Position sizes (using hedge ratio)
        stock1_position = signals
        stock2_position = -signals * hedge_ratio
        
        # Daily P&L
        pnl = (stock1_position.shift(1) * stock1_returns + 
               stock2_position.shift(1) * stock2_returns)
        
        return pnl
    
    def plot_analysis(self):
        """Plot comprehensive pairs trading analysis"""
        hedge_ratio, r_squared = self.calculate_hedge_ratio()
        spread = self.calculate_spread(hedge_ratio)
        zscore = self.calculate_zscore(spread)
        signals = self.generate_signals(zscore)
        pnl = self.calculate_pnl(signals, hedge_ratio)
        
        # Create subplots
        fig, axs = plt.subplots(4, 1, figsize=(15, 20))
        fig.suptitle(f'Pairs Trading Analysis\nHedge Ratio: {hedge_ratio:.3f}, R²: {r_squared:.3f}')
        
        # Plot 1: Stock Prices
        axs[0].plot(self.stock1_prices, label='Stock 1')
        axs[0].plot(self.stock2_prices, label='Stock 2')
        axs[0].set_title('Stock Prices')
        axs[0].legend()
        
        # Plot 2: Spread
        axs[1].plot(spread, label='Spread')
        axs[1].set_title('Price Spread')
        axs[1].legend()
        
        # Plot 3: Z-Score and Trading Signals
        axs[2].plot(zscore, label='Z-Score')
        axs[2].axhline(y=2, color='r', linestyle='--', alpha=0.5)
        axs[2].axhline(y=-2, color='r', linestyle='--', alpha=0.5)
        axs[2].axhline(y=0, color='k', linestyle='-', alpha=0.5)
        axs[2].plot(signals * 2, label='Signals', alpha=0.5)  # Multiply by 2 for visibility
        axs[2].set_title('Z-Score and Trading Signals')
        axs[2].legend()
        
        # Plot 4: Cumulative P&L
        cumulative_pnl = pnl.cumsum()
        axs[3].plot(cumulative_pnl)
        axs[3].set_title('Cumulative P&L')
        
        # Calculate and display statistics
        sharpe_ratio = np.sqrt(252) * pnl.mean() / pnl.std()
        max_drawdown = (cumulative_pnl - cumulative_pnl.cummax()).min()
        total_return = cumulative_pnl.iloc[-1]
        
        stats_text = (f'Sharpe Ratio: {sharpe_ratio:.2f}\n'
                     f'Max Drawdown: {max_drawdown:.2%}\n'
                     f'Total Return: {total_return:.2%}')
        
        axs[3].text(0.02, 0.95, stats_text,
                   transform=axs[3].transAxes,
                   verticalalignment='top',
                   bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
        
        plt.tight_layout()
        return fig

# Run analysis
analysis = PairsAnalysis()
analysis.plot_analysis()

# Print detailed statistics
hedge_ratio, r_squared = analysis.calculate_hedge_ratio()
print(f"\nDetailed Analysis:")
print(f"Hedge Ratio: {hedge_ratio:.3f}")
print(f"R-squared: {r_squared:.3f}")

# Calculate correlations
correlation = analysis.stock1_prices.corr(analysis.stock2_prices)
print(f"Price Correlation: {correlation:.3f}")

# Calculate cointegration
cointegration = stats.linregress(analysis.stock1_prices, analysis.stock2_prices)
print(f"\nCointegration Analysis:")
print(f"Beta: {cointegration.slope:.3f}")
print(f"Alpha: {cointegration.intercept:.3f}")
print(f"P-value: {cointegration.pvalue:.3f}")

```

This example demonstrates several key concepts:

1. Hedge Ratio Calculation:
   - Uses linear regression to find the optimal hedge ratio
   - Includes R-squared to measure the strength of the relationship
   - Shows how the hedge ratio affects position sizing

2. Z-Score Implementation:
   - Calculates rolling mean and standard deviation
   - Shows how z-score fluctuates around the mean
   - Demonstrates mean-reversion properties

3. Trading Strategy:
   - Entry signals at ±2 standard deviations
   - Exit signals when spread returns to mean
   - Position sizing based on hedge ratio

4. Key Insights:
   - The hedge ratio typically ranges between 0.8-1.2 for similar pairs
   - Z-scores beyond ±3 might indicate regime changes
   - Higher correlation doesn't always mean better trading opportunities

5. Risk Management:
   - Shows how hedge ratio affects portfolio risk
   - Demonstrates the importance of position sizing
   - Illustrates maximum drawdown periods

The code generates synthetic data that mimics real stock behavior (similar to Coca-Cola and PepsiCo), but the principles apply to any pair of cointegrated securities.

Would you like me to:
1. Add more sophisticated statistical tests?
2. Show how to handle different market regimes?
3. Add transaction costs and slippage to make it more realistic?
4. Demonstrate how to optimize the entry/exit thresholds?

> <span style="color:green;font-weight:bold;font-size:125%">"XXX"</span>

> <span style="color:green;font-weight:bold;font-size:125%">"XXX"</span>