# Backtesting with Zipline Bundles

This notebook demonstrates how to use zipline's data bundle system for persistent storage and backtesting.

## Why Use Bundles?

- **Persistent Storage**: Data is stored in zipline's efficient format (bcolz)
- **Fast Access**: Optimized for backtesting performance
- **Adjustments**: Handles splits, dividends, and other corporate actions
- **Compatibility**: Works seamlessly with `zipline.run_algorithm()`
- **Daily Updates**: Easy to update with new data

## Bundle vs CustomData

| Feature | CustomData (Pipeline) | Bundles (Backtesting) |
|---------|----------------------|------------------------|
| Use Case | Pipeline analysis | Full backtests |
| Storage | SQLite | bcolz (optimized) |
| API | Pipeline API | TradingAlgorithm API |
| Performance | Good | Excellent |
| Adjustments | Manual | Automatic |

**Best Practice**: Use bundles for backtesting, CustomData for custom indicators.

## Part 1: Setting Up Bundles

### Option A: Using the Management Script (Recommended)

In [None]:
# Run this in terminal (not in notebook)
# For Yahoo Finance (free):
# python scripts/manage_data.py setup --source yahoo

# For NASDAQ Data Link (requires API key):
# export NASDAQ_DATA_LINK_API_KEY=your_key
# python scripts/manage_data.py setup --source nasdaq --dataset EOD

# Custom tickers:
# python scripts/manage_data.py setup --source yahoo --tickers AAPL,MSFT,GOOGL,AMZN,TSLA

print("See terminal commands above for bundle setup")

### Option B: Register and Ingest Programmatically

In [None]:
import os
from zipline.data.bundles import register
from zipline.data.bundles.yahoo_bundle import yahoo_bundle
from zipline.data.bundles.nasdaq_bundle import nasdaq_bundle

# Register Yahoo Finance bundle
register(
    'yahoo-demo',
    yahoo_bundle(
        tickers=['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA'],
    ),
)

print("âœ“ Yahoo bundle registered as 'yahoo-demo'")
print("\nTo ingest, run in terminal:")
print("  zipline ingest -b yahoo-demo")

### Check Available Bundles

In [None]:
# List all available bundles
!zipline bundles

## Part 2: Simple Backtest Example

Let's create a simple buy-and-hold strategy:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from zipline import run_algorithm
from zipline.api import (
    order_target_percent,
    symbol,
    record,
    get_datetime,
    set_benchmark,
)
from zipline.finance import commission, slippage

import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print("âœ“ Imports complete")

### Define Strategy: Buy and Hold

In [None]:
def initialize(context):
    """
    Called once at the start of the algorithm.
    """
    # Define our stocks
    context.stocks = [
        symbol('AAPL'),
        symbol('MSFT'),
        symbol('GOOGL'),
    ]
    
    # Set commissions and slippage
    context.set_commission(commission.PerShare(cost=0.001, min_trade_cost=1.0))
    context.set_slippage(slippage.VolumeShareSlippage())
    
    # Set benchmark
    set_benchmark(symbol('AAPL'))
    
    print(f"Strategy initialized with {len(context.stocks)} stocks")


def handle_data(context, data):
    """
    Called every trading day.
    """
    # Equal weight portfolio
    weight = 1.0 / len(context.stocks)
    
    # Rebalance portfolio
    for stock in context.stocks:
        if data.can_trade(stock):
            order_target_percent(stock, weight)
    
    # Record portfolio value
    record(
        portfolio_value=context.portfolio.portfolio_value,
        cash=context.portfolio.cash,
    )


def analyze(context, perf):
    """
    Called once at the end of the backtest.
    """
    # This will be called automatically
    pass

print("âœ“ Strategy defined")

### Run the Backtest

In [None]:
# Define backtest parameters
start_date = pd.Timestamp('2022-01-01', tz='UTC')
end_date = pd.Timestamp('2023-12-31', tz='UTC')
capital_base = 100000  # Starting capital: $100,000

print(f"Running backtest...")
print(f"  Period: {start_date.date()} to {end_date.date()}")
print(f"  Capital: ${capital_base:,}")
print(f"  Bundle: yahoo-demo (or 'yahoo' if you ingested that)\n")

# Run the algorithm
results = run_algorithm(
    start=start_date,
    end=end_date,
    initialize=initialize,
    handle_data=handle_data,
    capital_base=capital_base,
    bundle='yahoo-demo',  # Change to your bundle name
    data_frequency='daily',
)

print("\nâœ“ Backtest complete!")

### Analyze Results

In [None]:
# Display performance summary
print("\n" + "="*70)
print("PERFORMANCE SUMMARY")
print("="*70 + "\n")

initial_value = capital_base
final_value = results['portfolio_value'].iloc[-1]
total_return = (final_value - initial_value) / initial_value

print(f"Initial Capital:  ${initial_value:,.2f}")
print(f"Final Value:      ${final_value:,.2f}")
print(f"Total Return:     {total_return*100:+.2f}%")
print(f"Total Profit/Loss: ${final_value - initial_value:,.2f}")
print()

# Additional metrics
print(f"Sharpe Ratio:     {results['sharpe'].iloc[-1]:.2f}")
print(f"Max Drawdown:     {results['max_drawdown'].min()*100:.2f}%")
print(f"Volatility:       {results['algo_volatility'].mean():.4f}")
print()

print(f"Trading Days:     {len(results)}")
print(f"Trades Executed:  {len([t for t in results['orders'] if t])}")
print("\n" + "="*70)

### Visualize Performance

In [None]:
# Plot 1: Portfolio Value Over Time
fig, axes = plt.subplots(3, 1, figsize=(14, 12))
fig.suptitle('Backtest Results', fontsize=16, fontweight='bold')

# Portfolio value
ax1 = axes[0]
ax1.plot(results.index, results['portfolio_value'], linewidth=2)
ax1.axhline(y=capital_base, color='gray', linestyle='--', alpha=0.5, label='Initial Capital')
ax1.set_title('Portfolio Value', fontweight='bold')
ax1.set_ylabel('Value ($)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Returns
ax2 = axes[1]
ax2.plot(results.index, results['algorithm_period_return']*100, 
         linewidth=2, label='Strategy', color='blue')
ax2.plot(results.index, results['benchmark_period_return']*100, 
         linewidth=2, label='Benchmark (AAPL)', color='orange', alpha=0.7)
ax2.set_title('Cumulative Returns', fontweight='bold')
ax2.set_ylabel('Return (%)')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Drawdown
ax3 = axes[2]
ax3.fill_between(results.index, 0, results['max_drawdown']*100, 
                 alpha=0.3, color='red')
ax3.plot(results.index, results['max_drawdown']*100, 
         linewidth=2, color='darkred')
ax3.set_title('Drawdown', fontweight='bold')
ax3.set_xlabel('Date')
ax3.set_ylabel('Drawdown (%)')
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("âœ“ Performance charts displayed")

## Part 3: Advanced Strategy with Pipeline

Combine bundles (for pricing) with Pipeline (for signals):

In [None]:
from zipline.pipeline import Pipeline
from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.factors import SimpleMovingAverage, Returns
from zipline.api import attach_pipeline, pipeline_output, schedule_function, date_rules, time_rules

def initialize_momentum(context):
    """
    Momentum strategy using Pipeline.
    """
    # Create pipeline
    pipe = Pipeline()
    
    # Calculate momentum (20-day returns)
    momentum = Returns(window_length=20)
    
    # Calculate moving averages
    sma_50 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=50)
    sma_200 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=200)
    
    # Add factors to pipeline
    pipe.add(momentum, 'momentum')
    pipe.add(sma_50, 'sma_50')
    pipe.add(sma_200, 'sma_200')
    
    # Filter: only stocks in golden cross (SMA50 > SMA200)
    pipe.set_screen(sma_50 > sma_200)
    
    # Attach pipeline
    attach_pipeline(pipe, 'momentum_pipe')
    
    # Schedule rebalancing
    schedule_function(
        rebalance,
        date_rules.month_start(),
        time_rules.market_open(),
    )
    
    # Set commissions
    context.set_commission(commission.PerShare(cost=0.001, min_trade_cost=1.0))
    context.set_slippage(slippage.VolumeShareSlippage())


def rebalance(context, data):
    """
    Rebalance portfolio to top momentum stocks.
    """
    # Get pipeline output
    pipeline_data = pipeline_output('momentum_pipe')
    
    if pipeline_data.empty:
        return
    
    # Sort by momentum and select top 5
    top_stocks = pipeline_data.nlargest(5, 'momentum').index
    
    # Equal weight
    weight = 1.0 / len(top_stocks)
    
    # Rebalance
    for stock in top_stocks:
        if data.can_trade(stock):
            order_target_percent(stock, weight)
    
    # Close positions not in top stocks
    for stock in context.portfolio.positions:
        if stock not in top_stocks and data.can_trade(stock):
            order_target_percent(stock, 0)
    
    # Record
    record(
        portfolio_value=context.portfolio.portfolio_value,
        num_positions=len(context.portfolio.positions),
    )


print("âœ“ Advanced momentum strategy defined")
print("\nTo run:")
print("  results = run_algorithm(")
print("      start=start_date,")
print("      end=end_date,")
print("      initialize=initialize_momentum,")
print("      capital_base=100000,")
print("      bundle='yahoo-demo',")
print("  )")

## Part 4: Daily Data Updates

### Manual Update

In [None]:
# Run in terminal:
# zipline ingest -b yahoo-demo

# Or using management script:
# python scripts/manage_data.py update --bundle yahoo-demo

print("See terminal commands above for manual updates")

### Automated Daily Updates

Set up a cron job to update data automatically:

```bash
# Edit crontab
crontab -e

# Add this line to run at 5 PM ET, Monday-Friday
# (Adjust timezone as needed)
0 17 * * 1-5 cd /path/to/zipline-reloaded && python scripts/manage_data.py update --all >> /var/log/zipline_update.log 2>&1
```

Or create a Python script for scheduled updates:

In [None]:
# Save this as: scripts/daily_update.py

daily_update_script = '''
#!/usr/bin/env python
"""Daily data update script"""

import subprocess
import smtplib
from datetime import datetime
from email.message import EmailMessage

BUNDLES = ['yahoo', 'nasdaq']  # Your bundle names
ADMIN_EMAIL = 'your@email.com'  # For notifications

def update_bundles():
    print(f"Starting daily update: {datetime.now()}")
    
    results = []
    
    for bundle in BUNDLES:
        print(f"\nUpdating {bundle}...")
        
        try:
            result = subprocess.run(
                ['zipline', 'ingest', '-b', bundle],
                capture_output=True,
                text=True,
                timeout=600,  # 10 minute timeout
            )
            
            if result.returncode == 0:
                print(f"âœ“ {bundle} updated successfully")
                results.append((bundle, 'SUCCESS'))
            else:
                print(f"âœ— {bundle} update failed")
                print(result.stderr)
                results.append((bundle, 'FAILED'))
                
        except subprocess.TimeoutExpired:
            print(f"âœ— {bundle} update timed out")
            results.append((bundle, 'TIMEOUT'))
        except Exception as e:
            print(f"âœ— {bundle} error: {e}")
            results.append((bundle, 'ERROR'))
    
    print(f"\nUpdate complete: {datetime.now()}")
    return results

def send_notification(results):
    """Send email notification (optional)"""
    # Implement email notification if desired
    pass

if __name__ == '__main__':
    results = update_bundles()
    # send_notification(results)  # Uncomment to enable
'''

print("Daily update script template shown above")
print("Save it to scripts/daily_update.py and schedule with cron")

## Part 5: Bundle Management

### Check Bundle Data

In [None]:
from zipline.data.bundles import bundles

# List all registered bundles
print("Registered bundles:")
for name in bundles.keys():
    print(f"  - {name}")

### Clean Old Data

In [None]:
# Clean old ingestions, keep last 3
# !zipline clean -b yahoo-demo --keep-last 3

# Or using management script
# !python scripts/manage_data.py clean --bundle yahoo-demo --keep-last 3

print("See terminal commands above for cleaning old data")

## Summary

### What We've Accomplished

1. âœ… Set up persistent data bundles (Yahoo Finance & NASDAQ Data Link)
2. âœ… Created management script for easy updates
3. âœ… Ran backtests using bundle data
4. âœ… Analyzed performance with metrics and visualizations
5. âœ… Combined bundles with Pipeline for advanced strategies
6. âœ… Set up automated daily updates

### Data Flow

```
Data Source (Yahoo/NASDAQ)
    â†“
Bundle Ingestion (zipline ingest)
    â†“
Bundle Storage (~/.zipline/data/)
    â†“
Backtest (run_algorithm)
    â†“
Results & Analysis
```

### Daily Workflow

1. **Morning**: Check if bundles need updating
2. **5 PM ET**: Automated data update runs
3. **Evening**: Review updated data, run backtests
4. **Weekly**: Clean old ingestions to save space

### Best Practices

1. **Update After Market Close**: Run updates at 5 PM ET or later
2. **Keep Multiple Ingestions**: Store 3-5 recent ingestions as backup
3. **Monitor Updates**: Check logs for failed updates
4. **Version Control**: Track bundle configurations in git
5. **Test Strategies**: Always backtest before live trading

### Next Steps

- Build more sophisticated strategies
- Add custom factors and filters
- Implement risk management
- Connect to broker API for paper/live trading
- Create dashboards for monitoring

## Resources

- [Zipline Documentation](https://zipline.ml4trading.io/)
- [Bundle System Docs](https://zipline.ml4trading.io/bundles.html)
- [TradingAlgorithm API](https://zipline.ml4trading.io/api-reference.html)
- [Pipeline API](https://zipline.ml4trading.io/api-reference.html#pipeline-api)
- [Custom Bundles Guide](../docs/BUNDLES.md)

Happy backtesting! ðŸš€