# Signal Research Notebook

Interactive research and development notebook for signal creation and testing.

## Workflow
1. **Load & Explore Data** - Load market data and inspect distributions
2. **Develop Signal** - Build and test signal logic interactively
3. **Implement** - Copy validated logic to `create_signal.py`
4. **Validate Signal** - use `research.py` to view signal characteristics.
5. **Execute** - Run `python src/create_signal.py` to generate `data/signal.parquet`
6. **Backtest** - Use `run_backtest.py` for slurm backtest on super computer
7. **Performance** - Use `production.py` for in depth analysis of mvo backtested signal.

## Tips
- Use cells to isolate different aspects of your signal
- Modify parameters directly in cells to test variations
- Check signal statistics regularly to catch issues early
- Document your assumptions and findings as you develop

## Setup

In [39]:
import polars as pl
import numpy as np
import datetime as dt
import sf_quant.data as sfd
import sf_quant.research as sfr
import polars_ols

## 1. Load & Explore Data

Load your market data and inspect key characteristics before developing the signal.

In [26]:
def load_data() -> pl.DataFrame:
    """
    Load and prepare market data for signal development.
    
    Returns:
        pl.DataFrame: Market data with required columns
    """
    
    start = dt.date(1996, 1, 1)
    end = dt.date(2024, 12, 31)

    columns = [
        'date',
        'barrid',
        'ticker',
        'price',
        'return',
        'specific_risk',
        'predicted_beta'
    ]

    return sfd.load_assets(
        start=start,
        end=end,
        in_universe=True,
        columns=columns
    ).filter(
        (pl.col('price')
        .shift(1)
        .gt(5))
    )

df = load_data()
df.head()

date,barrid,ticker,price,return,specific_risk,predicted_beta
date,str,str,f64,f64,f64,f64
2013-08-01,"""USA06Z1""","""MDXG""",6.32,0.9585,55.028021,0.353329
2013-08-02,"""USA06Z1""","""MDXG""",6.31,-0.1582,54.807402,0.363624
2013-08-05,"""USA06Z1""","""MDXG""",6.45,2.2187,54.76671,0.356596
2013-08-06,"""USA06Z1""","""MDXG""",6.29,-2.4806,54.692162,0.399196
2013-08-07,"""USA06Z1""","""MDXG""",5.78,-8.1081,55.020769,0.504134


## 2. Signal Development

Build and test your signal logic. Modify parameters and logic here to find optimal configurations.

In [34]:
def create_signal(df: pl.DataFrame) -> pl.DataFrame:
    """
    Create signal based on market data.
    
    Args:
        df: Market data DataFrame
        
    Returns:
        pl.DataFrame: DataFrame with signal column added
    """
    return (
        df
        .with_columns(
            pl.col('return', 'specific_risk').truediv(100)
        )
        .with_columns(
            pl.col('return').log1p().alias('log_return')
        )
        .sort('barrid', 'date')
        .with_columns(
            pl.col('log_return').rolling_sum(window_size=230).over('barrid').alias('signal')
        )
        .with_columns(
            pl.col('signal').shift(22).over('barrid')
        )
        .drop('log_return')
        .with_columns(
            pl.col('signal')
            .sub(pl.col('signal').mean())
            .truediv(pl.col('signal').std())
            .over('date')
            .alias('score')
        )
        .with_columns(
            pl.lit(0.05).mul('score').mul('specific_risk').alias('alpha')
        )
        .filter(
            pl.col('alpha').is_not_null()
        )
    )
signal = create_signal(df)
signal.head(500)

date,barrid,ticker,price,return,specific_risk,predicted_beta,signal,score,alpha
date,str,str,f64,f64,f64,f64,f64,f64,f64
2014-09-15,"""USA06Z1""","""MDXG""",6.92,-0.041551,0.469176,1.573021,-0.16477,-0.831711,-0.019511
2014-09-16,"""USA06Z1""","""MDXG""",6.94,0.00289,0.4685167,1.530437,-0.157191,-0.817641,-0.019154
2014-09-17,"""USA06Z1""","""MDXG""",7.13,0.027378,0.468748,1.533007,-0.154195,-0.766171,-0.017957
2014-09-18,"""USA06Z1""","""MDXG""",7.4,0.037868,0.4697122,1.575733,-0.174728,-0.855052,-0.020081
2014-09-19,"""USA06Z1""","""MDXG""",7.29,-0.014865,0.466126,1.567784,-0.153849,-0.793256,-0.018488
…,…,…,…,…,…,…,…,…,…
2016-08-31,"""USA06Z1""","""MDXG""",7.24,0.025496,0.3840362,1.396332,-0.188565,-0.605529,-0.011627
2016-09-01,"""USA06Z1""","""MDXG""",7.14,-0.013812,0.382929,1.359901,-0.240113,-0.683706,-0.013091
2016-09-02,"""USA06Z1""","""MDXG""",7.34,0.028011,0.383467,1.410638,-0.247475,-0.742292,-0.014232
2016-09-06,"""USA06Z1""","""MDXG""",7.66,0.043597,0.386088,1.410135,-0.245796,-0.754878,-0.014572


In [35]:
quantile_df = sfr.generate_quantile_ports(
        signal,
        num_bins=10,
        signal_col='alpha'
    )

In [40]:
ff_results = sfr.run_ff_regression(quantile_df)

## 3. Signal Analysis

Examine signal statistics and distributions to understand its characteristics.

### Statistics

In [28]:
sfr.signal_stats(signal)

AttributeError: module 'sf_quant.research' has no attribute 'signal_stats'

### Distribution

In [None]:
sfr.signal_distribution(signal)

## 4. Validation Checks ?

Verify signal quality and identify any issues before implementation.

## 5. Next Steps

When satisfied with your signal:

1. **Copy** your data loading and signal calculation logic to `create_signal.py`
2. **Run** `python src/create_signal.py` to save the signal to `data/signal.parquet`
3. **Open** `research.py` to analyze the signal before backtesting

In [8]:
signal_df = pl.read_parquet("/home/stiten/silverfund/momentum/data/signal.parquet")
signal_df.head()

date,barrid,ticker,price,return,specific_risk,predicted_beta,signal,score,alpha
date,str,str,f64,f64,f64,f64,f64,f64,f64
2014-09-15,"""USA06Z1""","""MDXG""",6.92,-4.1551,46.91758,1.573021,,,
2014-09-16,"""USA06Z1""","""MDXG""",6.94,0.289,46.85167,1.530437,,,
2014-09-17,"""USA06Z1""","""MDXG""",7.13,2.7378,46.874822,1.533007,,,
2014-09-18,"""USA06Z1""","""MDXG""",7.4,3.7868,46.97122,1.575733,,,
2014-09-19,"""USA06Z1""","""MDXG""",7.29,-1.4865,46.612619,1.567784,,,
