# Signal Research Notebook

Interactive research and development notebook for signal creation and testing.

## Workflow
1. **Load & Explore Data** - Load market data and inspect distributions
2. **Develop Signal** - Build and test signal logic interactively
3. **Implement** - Copy validated logic to `create_signal.py`
4. **Validate Signal** - use `research.py` to view signal characteristics.
5. **Execute** - Run `python src/create_signal.py` to generate `data/signal.parquet`
6. **Backtest** - Use `run_backtest.py` for slurm backtest on super computer
7. **Performance** - Use `production.py` for in depth analysis of mvo backtested signal.

## Tips
- Use cells to isolate different aspects of your signal
- Modify parameters directly in cells to test variations
- Check signal statistics regularly to catch issues early
- Document your assumptions and findings as you develop

## Setup

In [None]:
import polars as pl
import numpy as np
import datetime as dt
import sf_quant.data as sfd
import sf_quant.research as sfr

## 1. Load & Explore Data

Load your market data and inspect key characteristics before developing the signal.

In [None]:
def load_data() -> pl.DataFrame:
    """
    Load and prepare market data for signal development.
    
    Returns:
        pl.DataFrame: Market data with required columns
    """
    
    start = dt.date(1996, 1, 1)
    end = dt.date(2024, 12, 31)

    columns = [
        'date',
        'barrid',
        'ticker',
        'price',
        'return',
        'specific_risk',
        'predicted_beta'
    ]

    return sfd.load_assets(
        start=start,
        end=end,
        in_universe=True,
        columns=columns
    ).filter(
        (pl.col('price')
        .shift(1)
        .gt(5))
    )

df = load_data()
df.head()

## 2. Signal Development

Build and test your signal logic. Modify parameters and logic here to find optimal configurations.

In [None]:
def create_signal(df: pl.DataFrame) -> pl.DataFrame:
    """
    Create signal based on market data.
    
    Args:
        df: Market data DataFrame
        
    Returns:
        pl.DataFrame: DataFrame with signal column added
    """
    return (
        df
        .sort('barrid', 'date')
        .with_columns(
            pl.col('return')
            .log1p()
            .rolling_sum(window_size=230)
            .shift(22)
            .alias('signal')
        ).filter(
            pl.col('date') >= dt.date(1997, 1, 1) & pl.col('signal').is_not_null()
        ).with_columns(
            pl.col('specific_risk')
            .fill_null(strategy='forward')
            .over('barrid')
        )
        .with_columns(
            pl.col('signal')
            .sub(pl.col('signal').mean())
            .truediv(pl.col('signal').std())
            .over('date')
            .alias('score')
        )
        .with_columns(
            pl.lit(0.05).mul('score').mul('specific_risk').alias('alpha')
        )
    )
signal = create_signal(df)
signal.head()

## 3. Signal Analysis

Examine signal statistics and distributions to understand its characteristics.

### Statistics

In [None]:
sfr.signal_stats(signal)

### Distribution

In [None]:
sfr.signal_distribution(signal)

## 4. Validation Checks ?

Verify signal quality and identify any issues before implementation.

## 5. Next Steps

When satisfied with your signal:

1. **Copy** your data loading and signal calculation logic to `create_signal.py`
2. **Run** `python src/create_signal.py` to save the signal to `data/signal.parquet`
3. **Open** `research.py` to analyze the signal before backtesting