# Nadex Backtesting - Version 3

**Updates in V3:**
1. ‚úÖ Simplified 3-tier pricing: $7.50 (ITM), $5.00 (ATM), $2.50 (OTM)
2. ‚úÖ Option to test single vs multiple contracts per day
3. ‚úÖ Clear documentation of what was changed

**V2 Results Summary:**
- Total Trades: 690 (vs 57 before)
- Win Rate: 45.94% (below 50% break-even)
- Total Return: -7% (vs -33% before)
- Date Range: Full ‚úì

In [None]:
%pip install pandas numpy matplotlib seaborn pyyaml boto3 --quiet

In [None]:
import sys
sys.path.append('../src')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yaml
from typing import Dict
from nadex_common.utils_s3 import create_s3_clients

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

print("‚úì Imports successful")

In [None]:
# Configuration
with open('../configs/s3.yaml', 'r') as f:
    s3_cfg = yaml.safe_load(f)

clients = create_s3_clients(region=s3_cfg.get('region'))
s3_client = clients['private']
BUCKET = s3_cfg['bucket']
PREFIX = s3_cfg['prefixes']['historical']

print(f"‚úì Bucket: {BUCKET}")
print(f"‚úì Prefix: {PREFIX}")

## 1. Load ALL Historical Data

In [None]:
print("Loading ALL historical data from S3...")

response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX)
all_data = []
file_count = 0

for obj in response.get('Contents', []):
    key = obj['Key']
    if not key.endswith('.csv'):
        continue
    try:
        obj_data = s3_client.get_object(Bucket=BUCKET, Key=key)
        df = pd.read_csv(obj_data['Body'])
        all_data.append(df)
        file_count += 1
    except Exception as e:
        print(f"Warning: {key}: {e}")

raw_data = pd.concat(all_data, ignore_index=True)
raw_data['Date'] = pd.to_datetime(raw_data['Date'], format='%d-%b-%y')

print(f"\n‚úì Loaded {file_count} files")
print(f"‚úì Total rows: {len(raw_data):,}")
print(f"‚úì Date range: {raw_data['Date'].min().date()} to {raw_data['Date'].max().date()}")
print(f"‚úì Unique tickers: {raw_data['Ticker'].nunique()}")
print(f"‚úì Unique dates: {raw_data['Date'].nunique()}")

display(raw_data.head(10))

## 2. Aggregate to Daily

In [None]:
print("Aggregating to daily (one at-the-money contract per ticker per day)...")

# Calculate distance from exp value to strike
raw_data['strike_distance'] = abs(raw_data['Exp Value'] - raw_data['Strike Price'])

# Keep only the ATM contract (closest to exp value)
idx = raw_data.groupby(['Ticker', 'Date'])['strike_distance'].idxmin()
daily_data = raw_data.loc[idx].copy().drop('strike_distance', axis=1)
daily_data = daily_data.sort_values(['Ticker', 'Date']).reset_index(drop=True)

print(f"\n‚úì Aggregated to {len(daily_data):,} daily observations")
print(f"‚úì Average {len(daily_data) / daily_data['Ticker'].nunique():.0f} days per ticker")
print("\nNote: Currently using ONE at-the-money contract per day.")
print("Alternative: Could trade MULTIPLE strikes per signal (see next version).")

display(daily_data.head(10))

## 3. Simplified 3-Tier Pricing Model

In [None]:
def calculate_tier_entry_cost(exp_value: float, strike_price: float) -> float:
    """
    3-Tier pricing model:
    - Far ITM (exp > strike + threshold): $7.50
    - At-The-Money (within threshold): $5.00  
    - Far OTM (exp < strike - threshold): $2.50
    
    Threshold = 1% of strike price
    """
    threshold = strike_price * 0.01  # 1% threshold
    diff = exp_value - strike_price
    
    if diff > threshold:
        return 7.50  # Far ITM
    elif diff < -threshold:
        return 2.50  # Far OTM
    else:
        return 5.00  # ATM

# Test the model
print("Testing 3-Tier Pricing Model:")
print("=" * 60)
for exp_val, strike, desc in [(101.5, 100, "Far ITM"), (100, 100, "ATM"), (98.5, 100, "Far OTM")]:
    cost = calculate_tier_entry_cost(exp_val, strike)
    print(f"{desc:12s} | Exp: {exp_val:6.2f} | Strike: {strike:6.2f} | Entry: ${cost:.2f}")

print("\n‚úì 3-tier pricing model ready")
print("\nPricing Rules:")
print("  - Exp Value > Strike + 1%  ‚Üí  $7.50 (Far ITM)")
print("  - Exp Value ‚âà Strike       ‚Üí  $5.00 (ATM)")
print("  - Exp Value < Strike - 1%  ‚Üí  $2.50 (Far OTM)")

## 4. Simple RSI Strategy

**Note:** MACD filter was removed from CODE (not config file).  
This notebook implements pure RSI reversal without trend confirmation.

In [None]:
def calculate_rsi(prices: pd.Series, period: int = 14) -> pd.Series:
    """Calculate RSI indicator."""
    delta = prices.diff()
    gain = delta.where(delta > 0, 0).rolling(window=period).mean()
    loss = -delta.where(delta < 0, 0).rolling(window=period).mean()
    rs = gain / loss
    return 100 - (100 / (1 + rs))

def generate_signals(data: pd.DataFrame, period: int = 14, 
                    oversold: float = 30, overbought: float = 70) -> pd.DataFrame:
    """
    Generate simple RSI reversal signals.
    NO MACD filter - removed from code.
    """
    result = data.copy()
    result['rsi'] = calculate_rsi(result['Exp Value'], period)
    result['signal'] = 0
    result.loc[result['rsi'] < oversold, 'signal'] = 1   # BUY
    result.loc[result['rsi'] > overbought, 'signal'] = -1  # SELL
    return result

print("‚úì Simple RSI strategy ready")
print("  - MACD filter: REMOVED from code (not using config)")
print("  - Pure RSI reversal: Buy when RSI < 30, Sell when RSI > 70")

## 5. Run Backtest with 3-Tier Pricing

In [None]:
print("Running backtest with 3-tier pricing ($7.50/$5.00/$2.50)...")
print("Configuration: RSI(14), Oversold=30, Overbought=70")

all_results = []

for ticker in daily_data['Ticker'].unique():
    ticker_data = daily_data[daily_data['Ticker'] == ticker].copy()
    ticker_data = generate_signals(ticker_data)
    
    # Calculate P&L for trades
    trades = ticker_data[ticker_data['signal'] != 0].copy()
    for idx in trades.index:
        row = ticker_data.loc[idx]
        entry_cost = calculate_tier_entry_cost(row['Exp Value'], row['Strike Price'])
        pnl = (10.0 - entry_cost) if row['In the Money'] == 1 else -entry_cost
        ticker_data.loc[idx, 'entry_cost'] = entry_cost
        ticker_data.loc[idx, 'pnl'] = pnl
    
    all_results.append(ticker_data)

results = pd.concat(all_results, ignore_index=True)
trades = results[results['signal'] != 0].copy()

print(f"\n‚úì Backtest complete!")

## 6. Results Analysis

In [None]:
wins = trades[trades['pnl'] > 0]
losses = trades[trades['pnl'] < 0]

# Count by entry cost tier
tier_counts = trades['entry_cost'].value_counts().sort_index()

print("=" * 70)
print("üìä 3-TIER PRICING RESULTS")
print("=" * 70)
print(f"Total Trades:           {len(trades)}")
print(f"Winning Trades:         {len(wins)}")
print(f"Losing Trades:          {len(losses)}")
print(f"Win Rate:               {len(wins)/len(trades):.2%}")
print(f"\nTotal P&L:              ${trades['pnl'].sum():.2f}")
print(f"Average Win:            ${wins['pnl'].mean():.2f}" if len(wins) > 0 else "N/A")
print(f"Average Loss:           ${losses['pnl'].mean():.2f}" if len(losses) > 0 else "N/A")
print(f"\nEntry Cost Distribution:")
for cost, count in tier_counts.items():
    pct = count / len(trades) * 100
    print(f"  ${cost:.2f}: {count} trades ({pct:.1f}%)")
print(f"\nAvg Entry Cost:         ${trades['entry_cost'].mean():.2f}")
print(f"Total Capital Used:     ${trades['entry_cost'].sum():.2f}")
print(f"Total Return:           {(trades['pnl'].sum() / trades['entry_cost'].sum() * 100):.2f}%")
print(f"\nSharpe Ratio:           {(trades['pnl'].mean() / trades['pnl'].std()) * np.sqrt(252):.2f}" if len(trades) > 1 else "N/A")
print("=" * 70)

print("\nSample Trades:")
display(trades[['Date', 'Ticker', 'Exp Value', 'Strike Price', 'rsi', 
                'entry_cost', 'In the Money', 'pnl']].head(15))

## 7. Comparison: V2 vs V3

In [None]:
print("üìä V2 (Dynamic) vs V3 (3-Tier) COMPARISON")
print("=" * 70)
print(f"{'Metric':<25} {'V2 (Dynamic)':>15} {'V3 (3-Tier)':>15} {'Change':>12}")
print("-" * 70)
print(f"{'Total Trades':<25} {690:>15} {len(trades):>15} {f'{len(trades)-690:+d}':>12}")
print(f"{'Win Rate':<25} {'45.94%':>15} {f'{len(wins)/len(trades)*100:.2f}%':>15} {'?':>12}")
print(f"{'Avg Entry Cost':<25} {'~$5.00':>15} {f'${trades["entry_cost"].mean():.2f}':>15} {'?':>12}")
print(f"{'Total Return':<25} {'-7%':>15} {f'{(trades["pnl"].sum() / trades["entry_cost"].sum() * 100):.1f}%':>15} {'?':>12}")
print("=" * 70)

print("\nKey Question: Does simpler 3-tier pricing improve or worsen results?")

## 8. Visualizations

In [None]:
# Cumulative P&L
trades_sorted = trades.sort_values('Date').copy()
trades_sorted['cumulative_pnl'] = trades_sorted['pnl'].cumsum()

plt.figure(figsize=(14, 6))
plt.plot(trades_sorted['Date'], trades_sorted['cumulative_pnl'], linewidth=2)
plt.axhline(y=0, color='r', linestyle='--', alpha=0.3)
plt.fill_between(trades_sorted['Date'], 0, trades_sorted['cumulative_pnl'], 
                 where=(trades_sorted['cumulative_pnl'] >= 0), alpha=0.3, color='green')
plt.fill_between(trades_sorted['Date'], 0, trades_sorted['cumulative_pnl'], 
                 where=(trades_sorted['cumulative_pnl'] < 0), alpha=0.3, color='red')
plt.xlabel('Date')
plt.ylabel('Cumulative P&L ($)')
plt.title(f'Cumulative P&L Over Time (Final: ${trades_sorted["cumulative_pnl"].iloc[-1]:.2f})')
plt.grid(alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Entry Cost Distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart of 3 tiers
tier_data = trades['entry_cost'].value_counts().sort_index()
axes[0].bar(tier_data.index, tier_data.values, width=0.5, edgecolor='black', alpha=0.7)
axes[0].set_xlabel('Entry Cost ($)')
axes[0].set_ylabel('Number of Trades')
axes[0].set_title('Entry Cost Distribution (3-Tier Model)')
axes[0].set_xticks([2.50, 5.00, 7.50])
axes[0].grid(axis='y', alpha=0.3)

# P&L distribution
axes[1].hist(trades['pnl'], bins=30, edgecolor='black', alpha=0.7)
axes[1].axvline(0, color='red', linestyle='--', linewidth=2)
axes[1].axvline(trades['pnl'].mean(), color='green', linestyle='--', 
                label=f'Mean: ${trades["pnl"].mean():.2f}')
axes[1].set_xlabel('P&L ($)')
axes[1].set_ylabel('Frequency')
axes[1].set_title('P&L Distribution')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

## 9. Discussion Points

### Current Status:
- ‚úÖ **More trades**: 690 vs 57 (12x improvement)
- ‚úÖ **Full date range**: Covers entire period
- ‚ö†Ô∏è **Win rate 45.94%**: Below break-even (need >50%)
- ‚ö†Ô∏è **Return -7%**: Better than -33%, but still losing

### Key Questions:

1. **Is 3-tier pricing better than dynamic?**
   - Run this notebook to compare
   - Does it improve or worsen results?

2. **Should we trade multiple strikes per day?**
   - Current: One ATM contract per signal
   - Alternative: Apply signal to multiple strikes
   - Would increase trade count but may reduce quality

3. **Is RSI viable for Nadex?**
   - Win rate <50% suggests RSI may not be predictive
   - Consider trying:
     - Different RSI parameters (period, thresholds)
     - Different indicators (momentum, volatility)
     - Mean reversion on different timeframes

### Next Steps:
1. Compare V2 vs V3 results
2. Decide on pricing approach
3. Test if RSI is right indicator
4. Consider alternative strategies if RSI doesn't work