# Pipeline API - Factor-Based Stock Screening

Zipline's Pipeline API is a powerful framework for:
- Defining reusable factors (momentum, value, quality, etc.)
- Screening large universes of stocks
- Building quantitative trading strategies
- Combining multiple factors

This notebook demonstrates how to use Pipeline to build a multi-factor strategy.

## Setup

In [None]:
# Register Sharadar bundle (required for Jupyter notebooks)
from zipline.data.bundles import register
from zipline.data.bundles.sharadar_bundle import sharadar_bundle

register('sharadar', sharadar_bundle(tickers=None, incremental=True, include_funds=True))
print("✓ Sharadar bundle registered")

In [None]:
import logging
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from zipline import run_algorithm
from zipline.api import (
    attach_pipeline,
    pipeline_output,
    order_target_percent,
    record,
    schedule_function,
    date_rules,
    time_rules,
)
from zipline.pipeline import Pipeline, CustomFactor
from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.factors import (
    SimpleMovingAverage,
    Returns,
    AverageDollarVolume,
)
from zipline.pipeline.filters import StaticAssets
from zipline.utils.progress import enable_progress_logging

# Enable logging
logging.basicConfig(level=logging.INFO, force=True)
enable_progress_logging(algo_name='Pipeline-Demo', update_interval=20)

## Define Custom Factors

Create reusable factors for stock selection.

In [None]:
class Momentum(CustomFactor):
    """
    Price momentum factor.
    Returns the % change over the lookback window.
    """
    inputs = [USEquityPricing.close]
    window_length = 60  # 60-day momentum
    
    def compute(self, today, assets, out, close):
        out[:] = (close[-1] - close[0]) / close[0]


class Volatility(CustomFactor):
    """
    Price volatility factor.
    Returns the standard deviation of returns.
    """
    inputs = [USEquityPricing.close]
    window_length = 20  # 20-day volatility
    
    def compute(self, today, assets, out, close):
        # Calculate daily returns
        returns = np.diff(close, axis=0) / close[:-1]
        # Calculate standard deviation
        out[:] = np.std(returns, axis=0)


print("✓ Custom factors defined")

## Build Pipeline

Combine factors to screen stocks.

In [None]:
def make_pipeline():
    """
    Create a pipeline that:
    1. Calculates momentum and volatility
    2. Filters for liquid stocks (high dollar volume)
    3. Ranks stocks by momentum
    4. Selects top 5 high-momentum, low-volatility stocks
    """
    # Define universe: liquid stocks (top 500 by dollar volume)
    dollar_volume = AverageDollarVolume(window_length=30)
    liquid_stocks = dollar_volume.top(500)
    
    # Calculate factors
    momentum = Momentum()
    volatility = Volatility()
    
    # Combine factors:
    # Score = High momentum + Low volatility
    # Normalize each factor to z-scores for fair comparison
    momentum_z = momentum.zscore(mask=liquid_stocks)
    volatility_z = volatility.zscore(mask=liquid_stocks)
    
    # Combined score (high momentum, low volatility)
    combined_score = momentum_z - volatility_z  # Subtract vol (want low vol)
    
    # Select top stocks
    top_stocks = combined_score.top(5, mask=liquid_stocks)
    
    # Create pipeline
    pipe = Pipeline(
        columns={
            'momentum': momentum,
            'volatility': volatility,
            'combined_score': combined_score,
        },
        screen=top_stocks
    )
    
    return pipe

print("✓ Pipeline defined")

## Strategy Implementation

Use pipeline output to trade.

In [None]:
def initialize(context):
    """
    Initialize strategy and attach pipeline.
    """
    # Attach pipeline
    pipe = make_pipeline()
    attach_pipeline(pipe, 'stock_screen')
    
    # Schedule rebalance monthly
    schedule_function(
        rebalance,
        date_rules.month_start(),
        time_rules.market_open(hours=1)
    )
    
    context.rebalance_count = 0
    
    logging.info("Pipeline strategy initialized")
    logging.info("  Rebalancing: Monthly")
    logging.info("  Universe: Top 500 liquid stocks")
    logging.info("  Selection: Top 5 by (momentum - volatility)")


def before_trading_start(context, data):
    """
    Called daily before market opens.
    Get pipeline output.
    """
    # Get pipeline output
    context.output = pipeline_output('stock_screen')


def rebalance(context, data):
    """
    Monthly rebalance based on pipeline output.
    """
    context.rebalance_count += 1
    
    # Get current pipeline output
    if context.output is None or context.output.empty:
        logging.warning("No stocks in pipeline output")
        return
    
    # Get selected stocks - FILTER OUT UNTRADEABLE ASSETS
    all_selected = context.output.index
    selected_stocks = [stock for stock in all_selected if data.can_trade(stock)]
    
    if len(selected_stocks) == 0:
        logging.warning("No tradeable stocks in pipeline output")
        return
    
    # Log if any stocks were filtered out
    if len(selected_stocks) < len(all_selected):
        filtered_out = [s.symbol for s in all_selected if s not in selected_stocks]
        logging.info(f"  Filtered out untradeable: {', '.join(filtered_out)}")
    
    # Equal weight
    target_weight = 1.0 / len(selected_stocks)
    
    # Get current positions
    current_positions = set(context.portfolio.positions.keys())
    
    # Sell stocks no longer in selection (only if tradeable)
    for stock in current_positions:
        if stock not in selected_stocks and data.can_trade(stock):
            order_target_percent(stock, 0.0)
    
    # Buy/rebalance selected stocks (already filtered for tradeability)
    for stock in selected_stocks:
        order_target_percent(stock, target_weight)
    
    # Log holdings
    holdings = [s.symbol for s in selected_stocks]
    logging.info(f"Rebalance #{context.rebalance_count}: {', '.join(holdings)}")
    
    # Log factor values (for tradeable stocks only)
    tradeable_output = context.output.loc[selected_stocks]
    if not tradeable_output.empty:
        logging.info(f"  Avg momentum: {tradeable_output['momentum'].mean():.2%}")
        logging.info(f"  Avg volatility: {tradeable_output['volatility'].mean():.4f}")


def handle_data(context, data):
    """
    Record daily metrics.
    """
    record(
        portfolio_value=context.portfolio.portfolio_value,
        num_positions=len(context.portfolio.positions),
        leverage=context.account.leverage,
    )

## Run Backtest

In [None]:
# Run backtest
results = run_algorithm(
    start=pd.Timestamp('2020-01-01'),
    end=pd.Timestamp('2023-12-31'),
    initialize=initialize,
    before_trading_start=before_trading_start,
    handle_data=handle_data,
    capital_base=100000,
    data_frequency='daily',
    bundle='sharadar',
)

print(f"\n✓ Backtest complete!")

## Analyze Results

In [None]:
# Plot results
fig, axes = plt.subplots(3, 1, figsize=(14, 12), sharex=True)

# Portfolio value
axes[0].plot(results.index, results['portfolio_value'], linewidth=2)
axes[0].set_ylabel('Portfolio Value ($)', fontsize=12)
axes[0].set_title('Pipeline Strategy: Multi-Factor Stock Selection', fontsize=14, fontweight='bold')
axes[0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))
axes[0].grid(True, alpha=0.3)

# Number of positions
axes[1].plot(results.index, results['num_positions'], linewidth=2, color='orange')
axes[1].set_ylabel('Number of Positions', fontsize=12)
axes[1].set_ylim(0, 6)
axes[1].grid(True, alpha=0.3)

# Leverage
axes[2].plot(results.index, results['leverage'], linewidth=2, color='green')
axes[2].set_ylabel('Leverage', fontsize=12)
axes[2].set_xlabel('Date', fontsize=12)
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Performance metrics
total_return = ((results['portfolio_value'].iloc[-1] / results['portfolio_value'].iloc[0]) - 1) * 100
daily_returns = results['portfolio_value'].pct_change().dropna()
sharpe_ratio = (daily_returns.mean() / daily_returns.std()) * np.sqrt(252)
max_drawdown = ((results['portfolio_value'] / results['portfolio_value'].cummax()) - 1).min() * 100

print("\n" + "="*60)
print("PERFORMANCE SUMMARY")
print("="*60)
print(f"Total Return: {total_return:.2f}%")
print(f"Sharpe Ratio: {sharpe_ratio:.2f}")
print(f"Max Drawdown: {max_drawdown:.2f}%")
print(f"Final Portfolio Value: ${results['portfolio_value'].iloc[-1]:,.2f}")
print(f"Avg Positions: {results['num_positions'].mean():.1f}")
print("="*60)

## Holdings Analysis

In [None]:
# Analyze which stocks were held most frequently
all_holdings = {}
for date, row in results.iterrows():
    if row['positions']:
        for position in row['positions']:
            symbol = position['sid'].symbol
            all_holdings[symbol] = all_holdings.get(symbol, 0) + 1

if all_holdings:
    holdings_df = pd.DataFrame([
        {'Symbol': symbol, 'Days Held': days, 'Frequency': days/len(results)*100}
        for symbol, days in sorted(all_holdings.items(), key=lambda x: x[1], reverse=True)
    ])
    
    print("\nMost Frequently Held Stocks (Top 10):")
    print("="*60)
    print(holdings_df.head(10).to_string(index=False))
else:
    print("\nNo holdings to analyze.")

## Pipeline Testing (Standalone)

You can also run pipelines standalone to see what they output.

In [None]:
# Example: Run pipeline for a single day
from zipline.pipeline import run_pipeline
from zipline.pipeline.loaders import USEquityPricingLoader
from zipline.data import bundles

# Note: This requires additional setup with bundle data
# For now, this is just to show the concept
print("\nTo run pipeline standalone:")
print("1. Load bundle data")
print("2. Create pricing loader")
print("3. Use run_pipeline() function")
print("4. Analyze output DataFrame")
print("\nSee Zipline documentation for details.")

## Advanced Pipeline Techniques

**Custom Factors:**
- Value factors (P/E, P/B, dividend yield)
- Quality factors (ROE, ROA, profit margins)
- Technical factors (RSI, MACD, Bollinger Bands)
- Sentiment factors (news, social media)

**Combining Factors:**
- Weighted combinations
- Quantile-based selection
- Factor neutralization
- Multi-factor rankings

**Advanced Screens:**
- Sector/industry filters
- Market cap constraints
- Liquidity requirements
- Correlation/covariance filters

## Next Steps

1. **Add more factors** - Implement value, quality, or technical factors
2. **Tune weights** - Optimize factor combinations
3. **Test sectors** - Apply to specific industries
4. **Compare periods** - Test in different market regimes
5. **Add constraints** - Maximum position sizes, sector limits

**Resources:**
- Zipline Pipeline docs: https://zipline.ml4trading.io/pipeline.html
- Factor research: AQR, Research Affiliates papers
- Quantitative value/momentum strategies