# Multi-Source Fundamentals Strategy

This notebook demonstrates:
1. **Multi-source data integration** - Combining Sharadar and custom LSEG fundamentals
2. **FlightLog monitoring** - Real-time log streaming
3. **Pyfolio analysis** - Comprehensive performance tearsheet
4. **run_strategy helper** - Easy strategy execution

## Strategy Overview

**Consensus Quality Strategy:**
- Universe: Top 100 stocks by market cap (Sharadar)
- Quality filter: ROE > 15% from BOTH Sharadar AND LSEG
- Selection: Top 5 by Sharadar ROE with dual confirmation
- Rebalance: Weekly

The key insight: When multiple data sources agree on quality metrics, we have higher confidence.

## Prerequisites

1. **Sharadar bundle ingested:** `zipline ingest -b sharadar`
2. **Custom LSEG database:** `~/.zipline/data/custom/fundamentals.sqlite`
3. **FlightLog running:** Check terminal on port 9020 (optional)
4. **Pyfolio installed:** `pip install pyfolio-reloaded`

## Setup: Register Bundle and Import Multi-Source Module

In [None]:
# Register Sharadar bundle (required for Jupyter notebooks)
from zipline.data.bundles import register
from zipline.data.bundles.sharadar_bundle import sharadar_bundle

register(
    'sharadar',
    sharadar_bundle(
        tickers=None,
        incremental=True,
        include_funds=True,
    ),
)

print("✓ Sharadar bundle registered")

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import logging
import warnings
warnings.filterwarnings('ignore')

# Import the centralized multi-source module - simple!
from zipline.pipeline import multi_source as ms

# Zipline API
from zipline import run_algorithm
from zipline.api import (
    attach_pipeline,
    pipeline_output,
    order_target_percent,
    schedule_function,
    date_rules,
    time_rules,
    record,
)

# Progress logging
from zipline.utils.progress import enable_progress_logging

# Setup logging
logging.basicConfig(level=logging.INFO, force=True)
logging.getLogger('matplotlib.category').setLevel(logging.WARNING)

print("✓ Imports successful")
print(f"\nAvailable multi_source components:")
print(f"  - ms.Pipeline")
print(f"  - ms.Database")
print(f"  - ms.Column")
print(f"  - ms.sharadar")
print(f"  - ms.setup_auto_loader()")

## Setup FlightLog (Optional)

FlightLog provides real-time log monitoring in a separate terminal.

**To use FlightLog:**
1. Open a second terminal
2. Run: `docker logs -f zipline-flightlog`
3. You'll see colorized logs streaming in real-time during the backtest

In [None]:
from zipline.utils.flightlog_client import enable_flightlog, log_to_flightlog

# Enable FlightLog
try:
    enable_flightlog(host='localhost', port=9020)
    log_to_flightlog('🚀 Multi-Source Strategy - FlightLog Connected', level='INFO')
    print("✅ FlightLog enabled - check your second terminal!")
    FLIGHTLOG_ENABLED = True
except Exception as e:
    print(f"⚠️  FlightLog not available: {e}")
    print("   Continuing without FlightLog...")
    FLIGHTLOG_ENABLED = False
    # Define no-op function
    def log_to_flightlog(msg, level='INFO'):
        pass

In [None]:
# Enable progress logging
enable_progress_logging(
    algo_name='MultiSource-Consensus',
    update_interval=5  # Update every 5 trading days
)

print("✅ Progress logging enabled")

## Define Custom LSEG Fundamentals Database

Using the new centralized imports, defining a custom database is simple:

In [None]:
class LSEGFundamentals(ms.Database):
    """Custom LSEG fundamentals database."""
    
    CODE = "fundamentals"  # Must match SQLite database name
    LOOKBACK_WINDOW = 252
    
    # Define columns matching database schema
    ReturnOnEquity_SmartEstimat = ms.Column(float)
    ForwardPEG_DailyTimeSeriesRatio_ = ms.Column(float)
    CompanyMarketCap = ms.Column(float)
    Debt_Total = ms.Column(float)

print("✓ LSEG Fundamentals database defined")
print(f"  Database: {LSEGFundamentals.CODE}")
print(f"  Location: ~/.zipline/data/custom/{LSEGFundamentals.CODE}.sqlite")
print(f"  Columns: {', '.join([k for k in dir(LSEGFundamentals) if not k.startswith('_') and k not in ['CODE', 'LOOKBACK_WINDOW']])}")

## Create Multi-Source Pipeline

Mix Sharadar and LSEG data seamlessly in one pipeline:

In [None]:
def make_pipeline(universe_size=100, selection_size=5):
    """
    Multi-source consensus quality pipeline.
    
    Combines Sharadar and LSEG fundamentals to find high-quality stocks
    where both data sources agree on quality metrics.
    """
    # ========================================================================
    # Sharadar Fundamentals
    # ========================================================================
    s_roe = ms.SharadarFundamentals.roe.latest
    s_fcf = ms.SharadarFundamentals.fcf.latest
    s_marketcap = ms.SharadarFundamentals.marketcap.latest
    s_pe = ms.SharadarFundamentals.pe.latest
    
    # ========================================================================
    # LSEG Custom Fundamentals
    # ========================================================================
    l_roe = LSEGFundamentals.ReturnOnEquity_SmartEstimat.latest
    l_peg = LSEGFundamentals.ForwardPEG_DailyTimeSeriesRatio_.latest
    l_marketcap = LSEGFundamentals.CompanyMarketCap.latest
    l_debt = LSEGFundamentals.Debt_Total.latest
    
    # ========================================================================
    # Universe: Top N by market cap (Sharadar)
    # ========================================================================
    universe = s_marketcap.top(universe_size)
    
    # ========================================================================
    # Quality Filters
    # ========================================================================
    # Sharadar quality
    sharadar_quality = (
        (s_roe > 15.0) &
        (s_fcf > 0) &
        (s_pe > 0) & (s_pe < 30)
    )
    
    # LSEG quality
    lseg_quality = (
        (l_roe > 15.0) &
        (l_peg > 0) & (l_peg < 2.5)
    )
    
    # ========================================================================
    # Consensus: BOTH sources agree = higher confidence
    # ========================================================================
    both_confirm_quality = sharadar_quality & lseg_quality
    
    # ========================================================================
    # Selection: Top M by Sharadar ROE with dual confirmation
    # ========================================================================
    selection = s_roe.top(selection_size, mask=universe & both_confirm_quality)
    
    return ms.Pipeline(
        columns={
            # Sharadar metrics
            's_roe': s_roe,
            's_fcf': s_fcf,
            's_marketcap': s_marketcap,
            's_pe': s_pe,
            # LSEG metrics
            'l_roe': l_roe,
            'l_peg': l_peg,
            'l_marketcap': l_marketcap,
            'l_debt': l_debt,
            # Quality flags
            'sharadar_quality': sharadar_quality,
            'lseg_quality': lseg_quality,
            'both_confirm': both_confirm_quality,
        },
        screen=selection,
    )

print("✓ Multi-source pipeline defined")
print("\nPipeline features:")
print("  - Sharadar: ROE, FCF, MarketCap, PE")
print("  - LSEG: ROE, PEG, MarketCap, Debt")
print("  - Consensus scoring: Both sources must agree")
print("  - Universe: Top 100 by market cap")
print("  - Selection: Top 5 by ROE with dual confirmation")

## Define Strategy Functions

In [None]:
def initialize(context):
    """Initialize strategy."""
    log_to_flightlog('='*80, level='INFO')
    log_to_flightlog('🚀 Multi-Source Consensus Quality Strategy', level='INFO')
    log_to_flightlog('='*80, level='INFO')
    
    # Attach pipeline
    pipe = make_pipeline(universe_size=100, selection_size=5)
    attach_pipeline(pipe, 'multi_source')
    
    log_to_flightlog('Pipeline attached:', level='INFO')
    log_to_flightlog('  - Data sources: Sharadar + LSEG', level='INFO')
    log_to_flightlog('  - Universe: Top 100 by market cap', level='INFO')
    log_to_flightlog('  - Selection: Top 5 by ROE (dual confirmation)', level='INFO')
    
    # Schedule weekly rebalancing
    schedule_function(
        rebalance,
        date_rules.week_start(),
        time_rules.market_open(hours=1)
    )
    
    log_to_flightlog('  - Rebalancing: Weekly (Monday at market open)', level='INFO')
    log_to_flightlog('='*80, level='INFO')
    
    # Initialize tracking
    context.rebalance_count = 0
    context.confirmed_selections = []


def before_trading_start(context, data):
    """Get pipeline data before market opens."""
    context.pipeline_data = pipeline_output('multi_source')


def rebalance(context, data):
    """Weekly rebalancing."""
    context.rebalance_count += 1
    
    if context.pipeline_data is None or context.pipeline_data.empty:
        log_to_flightlog(f'⚠️  Rebalance #{context.rebalance_count}: No stocks in pipeline', level='WARNING')
        return
    
    # Get tradeable stocks
    all_selected = context.pipeline_data.index
    selected_stocks = [s for s in all_selected if data.can_trade(s)]
    
    if not selected_stocks:
        log_to_flightlog(f'⚠️  Rebalance #{context.rebalance_count}: No tradeable stocks', level='WARNING')
        return
    
    # Equal weight
    target_weight = 1.0 / len(selected_stocks)
    
    # Get current positions
    current_positions = set(context.portfolio.positions.keys())
    target_positions = set(selected_stocks)
    
    # Sell positions no longer selected
    for stock in current_positions - target_positions:
        if data.can_trade(stock):
            order_target_percent(stock, 0.0)
    
    # Buy/rebalance selected stocks
    for stock in selected_stocks:
        if data.can_trade(stock):
            order_target_percent(stock, target_weight)
    
    # Count confirmations
    confirmed = context.pipeline_data['both_confirm'].sum()
    
    # Get stock symbols and key metrics
    holdings_info = []
    for stock in selected_stocks:
        symbol = stock.symbol
        s_roe = context.pipeline_data.loc[stock, 's_roe']
        l_roe = context.pipeline_data.loc[stock, 'l_roe']
        confirmed_flag = '✓' if context.pipeline_data.loc[stock, 'both_confirm'] else '✗'
        holdings_info.append(f"{symbol} (S_ROE:{s_roe:.1f}%, L_ROE:{l_roe:.1f}% {confirmed_flag})")
    
    log_to_flightlog(
        f'📊 Rebalance #{context.rebalance_count}: '
        f'{len(selected_stocks)} stocks, {confirmed} with dual confirmation',
        level='INFO'
    )
    log_to_flightlog(f'   Holdings: {", ".join(holdings_info)}', level='INFO')
    
    # Track for analysis
    context.confirmed_selections.append((data.current_dt, confirmed, len(selected_stocks)))


def handle_data(context, data):
    """Record daily metrics."""
    record(
        portfolio_value=context.portfolio.portfolio_value,
        cash=context.portfolio.cash,
        leverage=context.account.leverage,
        positions_count=len(context.portfolio.positions),
    )


def analyze(context, perf):
    """Analyze results at end of backtest."""
    log_to_flightlog('='*80, level='INFO')
    log_to_flightlog('✅ Backtest Complete!', level='INFO')
    log_to_flightlog('='*80, level='INFO')
    
    # Calculate key metrics
    returns = perf['returns']
    final_value = perf['portfolio_value'].iloc[-1]
    initial_value = perf['portfolio_value'].iloc[0]
    total_return = (final_value / initial_value - 1) * 100
    
    # Sharpe ratio (annualized)
    sharpe = returns.mean() / returns.std() * np.sqrt(252) if returns.std() > 0 else 0
    
    # Max drawdown
    cum_returns = (1 + returns).cumprod()
    running_max = cum_returns.expanding().max()
    drawdown = (cum_returns - running_max) / running_max
    max_dd = drawdown.min() * 100
    
    # Win rate
    winning_days = (returns > 0).sum()
    total_days = len(returns[returns != 0])
    win_rate = (winning_days / total_days * 100) if total_days > 0 else 0
    
    log_to_flightlog('BACKTEST SUMMARY:', level='INFO')
    log_to_flightlog(f'  Period: {perf.index[0].date()} to {perf.index[-1].date()}', level='INFO')
    log_to_flightlog(f'  Trading Days: {len(perf)}', level='INFO')
    log_to_flightlog(f'  Rebalances: {context.rebalance_count}', level='INFO')
    log_to_flightlog('', level='INFO')
    log_to_flightlog('PERFORMANCE:', level='INFO')
    log_to_flightlog(f'  Initial Value: ${initial_value:,.2f}', level='INFO')
    log_to_flightlog(f'  Final Value: ${final_value:,.2f}', level='INFO')
    log_to_flightlog(f'  Total Return: {total_return:.2f}%', level='INFO')
    log_to_flightlog(f'  Sharpe Ratio: {sharpe:.2f}', level='INFO')
    log_to_flightlog(f'  Max Drawdown: {max_dd:.2f}%', level='INFO')
    log_to_flightlog(f'  Win Rate: {win_rate:.1f}%', level='INFO')
    log_to_flightlog('='*80, level='INFO')
    
    return perf

print("✓ Strategy functions defined")

## Run Backtest with Auto Loader

Notice how simple this is - just one line for the loader setup!

In [None]:
# Backtest parameters
START = pd.Timestamp('2023-01-01')
END = pd.Timestamp('2024-11-01')
CAPITAL = 100000

print(f"Running backtest from {START.date()} to {END.date()}...")
print(f"Starting capital: ${CAPITAL:,.2f}")
print(f"Bundle: sharadar")
print(f"Custom DB: fundamentals")
print()

# Run backtest - notice the simple one-line loader setup!
results = run_algorithm(
    start=START,
    end=END,
    initialize=initialize,
    before_trading_start=before_trading_start,
    handle_data=handle_data,
    analyze=analyze,
    capital_base=CAPITAL,
    bundle='sharadar',
    custom_loader=ms.setup_auto_loader(),  # That's it! Auto-detects everything
)

print("\n✅ Backtest complete!")
print(f"Final portfolio value: ${results['portfolio_value'].iloc[-1]:,.2f}")

## Quick Performance Summary

In [None]:
# Calculate key metrics
returns = results['returns']
final_value = results['portfolio_value'].iloc[-1]
initial_value = results['portfolio_value'].iloc[0]
total_return = (final_value / initial_value - 1) * 100

# Sharpe ratio
sharpe = returns.mean() / returns.std() * np.sqrt(252) if returns.std() > 0 else 0

# Max drawdown
cum_returns = (1 + returns).cumprod()
running_max = cum_returns.expanding().max()
drawdown = (cum_returns - running_max) / running_max
max_dd = drawdown.min() * 100

# Volatility
volatility = returns.std() * np.sqrt(252) * 100

# CAGR
days = (results.index[-1] - results.index[0]).days
years = days / 365.25
cagr = ((final_value / initial_value) ** (1 / years) - 1) * 100

print("\n" + "="*60)
print("PERFORMANCE SUMMARY")
print("="*60)
print(f"Period: {results.index[0].date()} to {results.index[-1].date()}")
print(f"Days: {len(results)} trading days ({years:.2f} years)")
print()
print(f"Initial Value: ${initial_value:,.2f}")
print(f"Final Value: ${final_value:,.2f}")
print(f"Total Return: {total_return:.2f}%")
print(f"CAGR: {cagr:.2f}%")
print()
print(f"Sharpe Ratio: {sharpe:.2f}")
print(f"Max Drawdown: {max_dd:.2f}%")
print(f"Volatility: {volatility:.2f}%")
print("="*60)

## Basic Visualizations

In [None]:
# Portfolio value and drawdown
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))

# Portfolio value
results['portfolio_value'].plot(ax=ax1, label='Portfolio Value', linewidth=2)
ax1.set_ylabel('Portfolio Value ($)', fontsize=12)
ax1.set_title('Multi-Source Consensus Quality Strategy - Portfolio Value', fontsize=14, fontweight='bold')
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))

# Drawdown
(drawdown * 100).plot(ax=ax2, label='Drawdown', color='red', alpha=0.7, linewidth=2)
ax2.fill_between(drawdown.index, 0, drawdown * 100, color='red', alpha=0.3)
ax2.set_ylabel('Drawdown (%)', fontsize=12)
ax2.set_xlabel('Date', fontsize=12)
ax2.set_title('Drawdown Over Time', fontsize=14, fontweight='bold')
ax2.legend(loc='lower left')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Prepare Data for Pyfolio Analysis

In [None]:
# Import pyfolio
try:
    import pyfolio as pf
    PYFOLIO_AVAILABLE = True
    print("✓ Pyfolio imported successfully")
except ImportError:
    PYFOLIO_AVAILABLE = False
    print("⚠️  Pyfolio not installed. Install with: pip install pyfolio-reloaded")
    print("   Skipping pyfolio analysis...")

In [None]:
if PYFOLIO_AVAILABLE:
    # Extract returns (required)
    returns = results['returns']
    
    # Extract positions (optional but recommended)
    positions_data = []
    
    for date, row in results.iterrows():
        pos_dict = {}
        
        # Get cash
        if 'cash' in results.columns:
            pos_dict['cash'] = row['cash']
        
        # Get stock positions
        if row['positions']:
            for pos in row['positions']:
                sid = pos['sid']
                amount = pos['amount']
                last_sale_price = pos['last_sale_price']
                
                if hasattr(sid, 'symbol'):
                    symbol = sid.symbol
                    pos_dict[symbol] = amount * last_sale_price
        
        if pos_dict:
            positions_data.append((date, pos_dict))
    
    if positions_data:
        positions = pd.DataFrame([p[1] for p in positions_data],
                                index=[p[0] for p in positions_data])
        positions = positions.fillna(0)
    else:
        positions = None
    
    # Extract transactions (optional)
    transactions_list = []
    for date, row in results.iterrows():
        if row['transactions']:
            for txn in row['transactions']:
                sid = txn['sid']
                symbol = sid.symbol if hasattr(sid, 'symbol') else str(sid)
                
                transactions_list.append({
                    'symbol': symbol,
                    'amount': txn['amount'],
                    'price': txn['price'],
                    'value': txn['amount'] * txn['price'],
                })
    
    if transactions_list:
        transactions = pd.DataFrame(transactions_list,
                                   index=[date for date, row in results.iterrows()
                                         if row['transactions'] for _ in row['transactions']])
    else:
        transactions = None
    
    print("✓ Data prepared for pyfolio")
    print(f"  Returns: {len(returns)} days")
    if positions is not None:
        print(f"  Positions: {len(positions)} days, {len(positions.columns)} assets")
    print(f"  Transactions: {len(transactions) if transactions is not None else 0} trades")

## Generate Pyfolio Tearsheet

This creates a comprehensive analysis including:
- Summary statistics
- Worst drawdown periods
- Rolling metrics (Sharpe, volatility)
- Monthly/yearly returns heatmap
- Return distribution plots
- Position analysis
- Transaction analysis

In [None]:
if PYFOLIO_AVAILABLE:
    # Create full tearsheet
    pf.create_full_tear_sheet(
        returns,
        positions=positions,
        transactions=transactions,
        live_start_date=None,
        round_trips=False,
        estimate_intraday=False,
    )
else:
    print("Pyfolio not available - skipping tearsheet generation")

## Additional Pyfolio Metrics

In [None]:
if PYFOLIO_AVAILABLE:
    # Calculate detailed metrics
    annual_return = pf.timeseries.annual_return(returns)
    sharpe_ratio = pf.timeseries.sharpe_ratio(returns)
    max_drawdown = pf.timeseries.max_drawdown(returns)
    sortino_ratio = pf.timeseries.sortino_ratio(returns)
    calmar_ratio = pf.timeseries.calmar_ratio(returns)
    volatility = pf.timeseries.annual_volatility(returns)
    
    print("\n" + "="*60)
    print("DETAILED PYFOLIO METRICS")
    print("="*60)
    print(f"Annual Return: {annual_return*100:.2f}%")
    print(f"Annual Volatility: {volatility*100:.2f}%")
    print(f"Sharpe Ratio: {sharpe_ratio:.2f}")
    print(f"Sortino Ratio: {sortino_ratio:.2f}")
    print(f"Calmar Ratio: {calmar_ratio:.2f}")
    print(f"Max Drawdown: {max_drawdown*100:.2f}%")
    print("="*60)
    
    # Cumulative returns
    cum_returns = pf.timeseries.cum_returns(returns)
    total_return = cum_returns.iloc[-1]
    print(f"\nTotal Return: {total_return*100:.2f}%")

## Export Results

In [None]:
# Uncomment to save results
# results.to_csv('multi_source_backtest_results.csv')
# returns.to_csv('multi_source_returns.csv')
# if positions is not None:
#     positions.to_csv('multi_source_positions.csv')
# if transactions is not None:
#     transactions.to_csv('multi_source_transactions.csv')

print("To save results, uncomment the lines above.")

## Summary

### What We Accomplished

1. **Multi-Source Integration:** Successfully combined Sharadar and LSEG fundamentals in one pipeline
2. **Simple Setup:** Used centralized `multi_source` module for clean imports
3. **Auto Loader:** One-line setup with `ms.setup_auto_loader()`
4. **FlightLog Monitoring:** Real-time log streaming during backtest
5. **Comprehensive Analysis:** Pyfolio tearsheet with detailed metrics

### Key Features

- **Consensus Scoring:** Higher confidence when multiple sources agree
- **Quality Filters:** ROE > 15%, FCF > 0, PE < 30, PEG < 2.5
- **Weekly Rebalancing:** Systematic position updates
- **Equal Weighting:** Simple, transparent allocation

### Next Steps

1. **Experiment with parameters:**
   - Universe size (50, 100, 200)
   - Selection size (3, 5, 10)
   - Quality thresholds
   - Rebalancing frequency

2. **Add more factors:**
   - Momentum
   - Value metrics
   - Technical indicators

3. **Compare strategies:**
   - Sharadar only
   - LSEG only
   - Consensus (both sources)

4. **Risk management:**
   - Position sizing
   - Stop losses
   - Volatility targeting

### Resources

- **Documentation:** `docs/MULTI_SOURCE_DATA.md`
- **Quick Reference:** `docs/MULTI_SOURCE_QUICKREF.md`
- **Examples:** `examples/custom_data/simple_multi_source_example.py`