# Alphalens Factor Analysis

This notebook demonstrates how to use Alphalens to analyze factor performance:
- Test if a factor predicts future returns
- Analyze factor returns across quantiles
- Measure information coefficient (IC)
- Evaluate factor turnover and decay

Alphalens is the industry-standard tool for quantitative factor research.

## Prerequisites

```bash
pip install alphalens-reloaded
```

In [None]:
# Register Sharadar bundle (required for Jupyter notebooks)
from zipline.data.bundles import register
from zipline.data.bundles.sharadar_bundle import sharadar_bundle

register('sharadar', sharadar_bundle(tickers=None, incremental=True, include_funds=True))
print("✓ Sharadar bundle registered")

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Import Alphalens
try:
    import alphalens as al
    print("✓ Alphalens imported")
except ImportError:
    print("⚠️  Alphalens not installed")
    print("   Install with: pip install alphalens-reloaded")
    raise

from zipline.pipeline import Pipeline, CustomFactor
from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.factors import AverageDollarVolume
from zipline.data.bundles import load
from zipline.utils.calendar_utils import get_calendar
from zipline.pipeline.engine import SimplePipelineEngine
from zipline.pipeline.loaders import USEquityPricingLoader

plt.rcParams['figure.figsize'] = (14, 8)
sns.set_style('darkgrid')

print("✓ Imports complete")

## Setup Pipeline Environment

In [None]:
# Load bundle and create pipeline engine
bundle_data = load('sharadar')
trading_calendar = get_calendar('XNYS')

pricing_loader = USEquityPricingLoader.without_fx(
    bundle_data.equity_daily_bar_reader,
    bundle_data.adjustment_reader,
)

engine = SimplePipelineEngine(
    get_loader=lambda column: pricing_loader,
    asset_finder=bundle_data.asset_finder,
)

print("✓ Pipeline engine initialized")

## Define Research Factor: Momentum

We'll test a momentum factor to see if it predicts future returns.

In [None]:
class Momentum(CustomFactor):
    """
    Price momentum: % return over lookback period.
    
    This is our alpha factor we want to test.
    """
    inputs = [USEquityPricing.close]
    window_length = 60  # 60-day momentum
    
    def compute(self, today, assets, out, close):
        out[:] = (close[-1] - close[0]) / close[0]


def make_momentum_pipeline():
    """
    Create a pipeline that computes momentum for liquid stocks.
    """
    # Calculate momentum
    momentum = Momentum()
    
    # Universe: top 500 liquid stocks
    dollar_volume = AverageDollarVolume(window_length=30)
    universe = dollar_volume.top(500)
    
    return Pipeline(
        columns={
            'momentum': momentum,
        },
        screen=universe
    )

print("✓ Momentum factor defined")

## Run Pipeline to Get Factor Data

Compute the momentum factor for all stocks over our research period.

In [None]:
# Define research period (need enough data for forward returns)
# Ensure dates are valid trading days
all_sessions = trading_calendar.all_sessions

desired_start = pd.Timestamp('2022-01-01')
desired_end = pd.Timestamp('2023-06-30')

# Get first trading day on or after desired start
start_date = all_sessions[all_sessions >= desired_start][0]
# Get last trading day on or before desired end
end_date = all_sessions[all_sessions <= desired_end][-1]

print(f"Running pipeline from {start_date.date()} to {end_date.date()}...")
print(f"  (Adjusted to valid trading days)")

pipeline = make_momentum_pipeline()
factor_data = engine.run_pipeline(pipeline, start_date, end_date)

print(f"\n✓ Pipeline complete")
print(f"  Observations: {len(factor_data):,}")
print(f"  Unique stocks: {factor_data.index.get_level_values(1).nunique()}")
print(f"  Trading days: {factor_data.index.get_level_values(0).nunique()}")

factor_data.head()

## Get Pricing Data for Alphalens

Alphalens needs pricing data to compute forward returns.

In [None]:
# Get all unique assets from factor data
assets = factor_data.index.get_level_values(1).unique()

# Get pricing data (need extra days for forward returns)
# Find a valid pricing end date ~30 days after end_date
pricing_start = start_date
pricing_end = all_sessions[all_sessions > end_date][29]  # ~30 trading days later

# Get pricing from bundle
pricing_data = bundle_data.equity_daily_bar_reader.load_raw_arrays(
    columns=['close'],
    start_date=pricing_start,
    end_date=pricing_end,
    assets=assets,
)

# Convert to DataFrame
dates = trading_calendar.sessions_in_range(pricing_start, pricing_end)
prices = pd.DataFrame(
    data=pricing_data[0],  # Already in correct shape (dates, assets), no transpose needed
    index=dates,
    columns=assets
)

print(f"✓ Pricing data loaded")
print(f"  Shape: {prices.shape}")
print(f"  Date range: {prices.index.min()} to {prices.index.max()}")

prices.head()

## Prepare Data for Alphalens

Alphalens requires a specific format: MultiIndex Series with (date, asset) and aligned pricing.

In [None]:
# Extract factor as Series
factor = factor_data['momentum']

# Get alphalens format (computes forward returns automatically)
factor_data_clean = al.utils.get_clean_factor_and_forward_returns(
    factor=factor,
    prices=prices,
    quantiles=5,  # Divide into 5 quintiles
    periods=(1, 5, 10),  # Forward returns: 1, 5, 10 days
    max_loss=0.35,  # Allow up to 35% data loss (for missing data)
)

print("✓ Factor data prepared for Alphalens")
print(f"  Clean observations: {len(factor_data_clean):,}")
print(f"\nColumns:")
for col in factor_data_clean.columns:
    print(f"  - {col}")

factor_data_clean.head()

## Alphalens Analysis: Full Tearsheet

Generate comprehensive factor analysis including:
- Returns Analysis
- Information Coefficient (IC)
- Turnover Analysis
- Quantile Analysis

In [None]:
# Create full tearsheet
al.tears.create_full_tear_sheet(factor_data_clean)

## Interpretation Guide

### Key Metrics to Look For:

**1. Returns by Quantile**
- Does the top quantile (Q5) outperform the bottom (Q1)?
- Is there a monotonic relationship (Q1 < Q2 < Q3 < Q4 < Q5)?

**2. Information Coefficient (IC)**
- **IC > 0.05**: Good predictive power
- **IC > 0.10**: Strong predictive power
- **IC < 0**: Factor might be reversed

**3. IC Over Time**
- Is IC stable or does it decay?
- Are there regime changes?

**4. Turnover**
- High turnover = More trading costs
- Lower is better for practical implementation

**5. Factor Returns**
- Cumulative returns of top vs bottom quantile
- Sharpe ratio of factor returns

## Custom Analysis: Quantile Returns

In [None]:
# Calculate mean returns by quantile
mean_returns = factor_data_clean.groupby('factor_quantile')[['1D', '5D', '10D']].mean()

print("\nMean Forward Returns by Momentum Quantile:")
print("="*60)
print(mean_returns)
print("\n")

# Calculate spread (Q5 - Q1)
spread = mean_returns.loc[5] - mean_returns.loc[1]
print("Spread (Q5 - Q1):")
print(spread)
print("\nA positive spread means the factor is predictive!")

## Information Coefficient Analysis

In [None]:
# Calculate IC for each period
ic = al.performance.factor_information_coefficient(factor_data_clean)

print("\nInformation Coefficient (IC) Statistics:")
print("="*60)
print(ic.describe())

# Plot IC over time
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# IC time series
ic.plot(ax=axes[0], alpha=0.6)
axes[0].axhline(0, color='black', linestyle='-', linewidth=0.8)
axes[0].set_title('Information Coefficient Over Time', fontsize=14, fontweight='bold')
axes[0].set_ylabel('IC')
axes[0].legend(['1-Day', '5-Day', '10-Day'])
axes[0].grid(True, alpha=0.3)

# IC distribution
ic.plot(kind='hist', bins=50, ax=axes[1], alpha=0.7)
axes[1].set_title('IC Distribution', fontsize=14, fontweight='bold')
axes[1].set_xlabel('IC Value')
axes[1].set_ylabel('Frequency')
axes[1].axvline(0, color='black', linestyle='--', linewidth=0.8)
axes[1].legend(['1-Day', '5-Day', '10-Day'])

plt.tight_layout()
plt.show()

## Factor Returns Analysis

Analyze the returns you would get from going long the top quantile and short the bottom.

In [None]:
# Calculate factor returns (long-short portfolio)
factor_returns = al.performance.factor_returns(factor_data_clean)

# Calculate cumulative returns
cumulative_returns = (1 + factor_returns).cumprod()

# Plot
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Cumulative factor returns
cumulative_returns.plot(ax=axes[0], linewidth=2)
axes[0].set_title('Cumulative Factor Returns (Long Q5, Short Q1)', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Cumulative Return')
axes[0].axhline(1, color='black', linestyle='--', linewidth=0.8, alpha=0.5)
axes[0].legend(['1-Day', '5-Day', '10-Day'])
axes[0].grid(True, alpha=0.3)

# Daily factor returns
factor_returns['5D'].plot(ax=axes[1], linewidth=1, alpha=0.7)
axes[1].set_title('Daily 5-Day Factor Returns', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Daily Return')
axes[1].set_xlabel('Date')
axes[1].axhline(0, color='black', linestyle='-', linewidth=0.8)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate Sharpe ratios
sharpe_ratios = factor_returns.mean() / factor_returns.std() * np.sqrt(252)
print("\nFactor Sharpe Ratios:")
print(sharpe_ratios)

## Summary Statistics

In [None]:
# Calculate key metrics
print("\n" + "="*80)
print("FACTOR ANALYSIS SUMMARY")
print("="*80)

# Mean IC
mean_ic = ic.mean()
print("\nMean Information Coefficient:")
print(mean_ic)

# IC t-stat
ic_tstat = ic.mean() / ic.std() * np.sqrt(len(ic))
print("\nIC t-statistic:")
print(ic_tstat)
print("(t-stat > 2 indicates statistical significance)")

# Quantile spread
print("\nQuantile Spread (Q5 - Q1):")
print(spread)

# Factor returns
print("\nAnnualized Factor Returns:")
annual_returns = factor_returns.mean() * 252
print(annual_returns)

print("\nFactor Sharpe Ratios:")
print(sharpe_ratios)

print("\n" + "="*80)

## Conclusions

Based on the analysis above, evaluate:

1. **Is the factor predictive?**
   - Look at IC (> 0.05 is good)
   - Check quantile spread (Q5 > Q1)
   - Examine cumulative factor returns (trending up?)

2. **Is the factor stable?**
   - Is IC consistent over time?
   - Are there regime changes?

3. **Is it tradeable?**
   - What's the turnover?
   - Does it work across different holding periods?

## Next Steps

- Test different factor definitions (window lengths, calculations)
- Analyze factor performance in different market regimes
- Combine multiple factors (see `09_multi_factor_research.ipynb`)
- Account for transaction costs in factor returns
- Build a strategy using the factor (if it's predictive!)