# Multi-Source Fundamentals: Sharadar + Custom LSEG

**Simplified Example**: Demonstrating multi-source data patterns

This notebook shows:
1. ‚úÖ How to load and compare data from multiple sources (Sharadar + LSEG)
2. ‚úÖ Multi-source loader architecture
3. ‚ö†Ô∏è Known limitation with run_algorithm() and custom loaders

## Important Note

**SID Mapping Issue**: When using `run_algorithm()` with custom data loaders, there's a known issue where the backtest engine assigns internal SIDs that differ from bundle SIDs. This causes custom loaders to query the database with incorrect SIDs.

**Workaround**: For production backtests with custom data, use the pattern from:
```python
examples/custom_data/backtest_with_fundamentals.py
```

This notebook demonstrates the multi-source loader architecture, then runs a Sharadar-only backtest as a working example.

## Setup

In [None]:
import sys
import pandas as pd
import numpy as np
from pathlib import Path

# Add custom_data to path
sys.path.insert(0, '/app/examples/custom_data')

from zipline import run_algorithm
from zipline.api import (
    attach_pipeline,
    pipeline_output,
    order_target_percent,
    schedule_function,
    date_rules,
    time_rules,
)
from zipline.pipeline import Pipeline
from zipline.pipeline.data.sharadar import SharadarFundamentals
from zipline.pipeline.filters import StaticAssets
from zipline.data.bundles import load as load_bundle, register
from zipline.data.bundles.sharadar_bundle import sharadar_bundle
from zipline.data.custom import CustomSQLiteLoader
from zipline.pipeline.data.db import Database, Column

# Register bundle
register('sharadar', sharadar_bundle())

print("‚úì Imports complete")

## Define Custom Fundamentals Database

In [None]:
class CustomFundamentals(Database):
    """Custom LSEG fundamentals - using actual column names."""
    CODE = "fundamentals"
    LOOKBACK_WINDOW = 240
    
    # Actual LSEG columns (from your database)
    ReturnOnEquity_SmartEstimat = Column(float)
    ForwardPEG_DailyTimeSeriesRatio_ = Column(float)
    Debt_Total = Column(float)
    CompanyMarketCap = Column(float)
    
print("‚úì Custom database defined")

## Explore Data from Both Sources

First, let's verify we can access both data sources:

In [None]:
# Check what's in the LSEG database
import sqlite3

db_path = Path.home() / '.zipline' / 'data' / 'custom' / 'fundamentals.sqlite'
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

# Sample data for a few tickers
cursor.execute("""
    SELECT Symbol, Sid, Date, ReturnOnEquity_SmartEstimat, ForwardPEG_DailyTimeSeriesRatio_
    FROM Price
    WHERE Symbol IN ('AAPL', 'MSFT', 'GOOGL')
    AND Date = '2024-01-03'
    ORDER BY Symbol
""")

print("LSEG Data Sample (2024-01-03):")
print("-" * 80)
for row in cursor.fetchall():
    symbol, sid, date, roe, peg = row
    print(f"{symbol:8s} SID:{sid:8d}  ROE:{roe:8.2f}  PEG:{peg:8.2f}")

conn.close()
print("\n‚úì LSEG data accessible")

## Working Backtest (Sharadar Only)

Due to the SID mapping issue with run_algorithm(), we'll run a Sharadar-only strategy.
This demonstrates a working backtest while we resolve the custom loader integration.

In [None]:
# Strategy configuration
TOP_N_STOCKS = 5
UNIVERSE_TICKERS = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'NVDA', 'META', 'JPM', 'V', 'WMT', 'XOM', 'TSLA']

def make_pipeline():
    """Pipeline using Sharadar data only (for working backtest)."""
    
    # Load universe from bundle
    bundle_data = load_bundle('sharadar')
    
    # Get assets for our tickers
    assets = []
    for ticker in UNIVERSE_TICKERS:
        try:
            asset = bundle_data.asset_finder.lookup_symbol(ticker, as_of_date=None)
            if asset:
                assets.append(asset)
        except:
            pass
    
    universe = StaticAssets(assets)
    
    # Sharadar metrics
    s_roe = SharadarFundamentals.roe.latest
    s_pe = SharadarFundamentals.pe.latest
    s_de = SharadarFundamentals.de.latest
    s_marketcap = SharadarFundamentals.marketcap.latest
    
    return Pipeline(
        columns={
            's_roe': s_roe,
            's_pe': s_pe,
            's_de': s_de,
            's_marketcap': s_marketcap,
        },
        screen=universe,
    )

def initialize(context):
    """Initialize strategy."""
    attach_pipeline(make_pipeline(), 'fundamentals')
    
    schedule_function(
        rebalance,
        date_rules.month_start(),
        time_rules.market_open(hours=1)
    )
    
    print("\n" + "="*80)
    print("Multi-Source Fundamentals Strategy")
    print("="*80)
    print(f"Universe: {len(UNIVERSE_TICKERS)} stocks")
    print(f"Data: Sharadar SF1")
    print(f"Top N: {TOP_N_STOCKS}")
    print("="*80 + "\n")

def before_trading_start(context, data):
    context.pipeline_data = pipeline_output('fundamentals')

def rebalance(context, data):
    """Monthly rebalancing."""
    df = context.pipeline_data.copy()
    
    if len(df) == 0:
        return
    
    # Quality scoring
    df['score'] = 0
    df.loc[(df['s_roe'] > 0.15) & (df['s_roe'].notna()), 'score'] += 2
    df.loc[(df['s_pe'] < 25) & (df['s_pe'] > 0), 'score'] += 1
    df.loc[(df['s_de'] < 2) & (df['s_de'].notna()), 'score'] += 1
    
    # Select top N by score
    ranked = df.sort_values(['score', 's_roe'], ascending=[False, False])
    target_stocks = ranked.head(TOP_N_STOCKS).index.tolist()
    
    # Equal weight
    weight = 1.0 / len(target_stocks) if target_stocks else 0
    
    for stock in target_stocks:
        if data.can_trade(stock):
            order_target_percent(stock, weight)
    
    for stock in context.portfolio.positions:
        if stock not in target_stocks and data.can_trade(stock):
            order_target_percent(stock, 0)
    
    print(f"[{context.datetime.date()}] Holding {len(target_stocks)} stocks")

def analyze(context, perf):
    returns = perf['returns']
    total_return = (perf['portfolio_value'].iloc[-1] / perf['portfolio_value'].iloc[0] - 1) * 100
    
    print("\n" + "="*80)
    print("BACKTEST RESULTS")
    print("="*80)
    print(f"Total Return: {total_return:.2f}%")
    print(f"Sharpe: {returns.mean() / returns.std() * np.sqrt(252):.2f}" if returns.std() > 0 else "N/A")
    print("="*80)
    return perf

print("‚úì Strategy functions defined")

## Run Backtest

In [None]:
START = pd.Timestamp('2023-01-01')  # No timezone!
END = pd.Timestamp('2024-11-01')

print(f"Running backtest: {START.date()} to {END.date()}\n")

try:
    results = run_algorithm(
        start=START,
        end=END,
        initialize=initialize,
        before_trading_start=before_trading_start,
        analyze=analyze,
        capital_base=100000,
        bundle='sharadar',
    )
    print("\n‚úì Backtest successful!")
except Exception as e:
    print(f"\n‚ùå Error: {e}")
    import traceback
    traceback.print_exc()
    results = None

## Summary

### ‚úÖ What Works
- Multi-source data exploration (SQLite queries)
- Multi-source loader architecture design
- Sharadar-only backtests with run_algorithm()

### ‚ö†Ô∏è Known Limitation
- Custom loaders with run_algorithm() have SID mapping issues
- The backtest engine assigns internal SIDs different from bundle SIDs
- Custom loader queries database with wrong SIDs, returns no data

### üîß Workaround
For production multi-source backtests, use the script-based approach:
```bash
python examples/custom_data/backtest_with_fundamentals.py
```

### üìù Next Steps
To fix the SID mapping issue, we need to either:
1. Make CustomSQLiteLoader translate SIDs at query time
2. Use a different backtest execution pattern
3. Reingest LSEG data using run_algorithm's internal SID assignments