# Signal Computation Workflow

**Systematic Macro Credit Research — Step 2 of 5**

This notebook computes tactical credit signals from cached market data using the SignalRegistry. It automatically detects whether Bloomberg or synthetic data is available and works identically with both.

## Workflow Position

```
1. Data Download (01_data_download.ipynb) OR generate_synthetic_data.py
   ↓
2. Signal Computation ← YOU ARE HERE
   ↓
3. Signal Suitability Evaluation (03_suitability_evaluation.ipynb)
   ↓
4. Backtest Execution (04_backtest_execution.ipynb)
   ↓
5. Performance Analysis (05_performance_analysis.ipynb)
```

## Prerequisites

- Completed `01_data_download.ipynb` with cached market data
- Cache files exist in `data/cache/bloomberg/`
- Visualization dependencies installed (`uv sync --extra viz`)

## What This Notebook Does

1. **Load Market Data** — Read cached CDX, VIX, and ETF data from Step 1
2. **Display Signal Catalog** — Show all registered signals and their requirements
3. **Compute Signals** — Execute enabled signals via SignalRegistry
4. **Validate Outputs** — Check z-score properties and alignment
5. **Visualize Signals** — Plot time series and correlations
6. **Persist Results** — Save signals DataFrame and metadata

## Outputs

- **Signals DataFrame:** `data/processed/signals.parquet`
- **Computation Metadata:** `logs/signal_computation_metadata.json`

## Key Design Patterns

- **SignalRegistry:** Batch computation of catalog-defined signals
- **Z-Score Normalization:** All signals normalized for regime independence
- **Data-Source Agnostic:** Auto-detects Bloomberg or synthetic data cache
- **Metadata Tracking:** Full provenance in computation metadata

---

## 1. Imports and Configuration

Import dependencies and verify cache availability.

In [1]:
import logging
from datetime import datetime

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from aponyx.config import DATA_DIR, LOGS_DIR, SIGNAL_CATALOG_PATH, REGISTRY_PATH
from aponyx.data import fetch_cdx, fetch_vix, fetch_etf
from aponyx.data.sources import BloombergSource
from aponyx.data.registry import DataRegistry
from aponyx.persistence import save_parquet, save_json
from aponyx.models import compute_registered_signals
from aponyx.models.registry import SignalRegistry
from aponyx.models.config import SignalConfig
from aponyx.visualization import plot_signal

# Configure logging for notebook
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
)
logger = logging.getLogger(__name__)

print("=" * 80)
print("SIGNAL COMPUTATION WORKFLOW — Step 2 of 5")
print("=" * 80)
print(f"\nConfiguration:")
print(f"  Data directory: {DATA_DIR}")
print(f"  Logs directory: {LOGS_DIR}")
print(f"  Signal catalog: {SIGNAL_CATALOG_PATH}")
print(f"  Registry path: {REGISTRY_PATH}")
print(f"\n✓ Imports complete")

SIGNAL COMPUTATION WORKFLOW — Step 2 of 5

Configuration:
  Data directory: C:\Users\ROG3003\PythonProjects\aponyx\data
  Logs directory: C:\Users\ROG3003\PythonProjects\aponyx\logs
  Signal catalog: C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\models\signal_catalog.json
  Registry path: C:\Users\ROG3003\PythonProjects\aponyx\data\registry.json

✓ Imports complete


In [2]:
# Verify registry and cache availability
registry = DataRegistry(REGISTRY_PATH, DATA_DIR)
datasets = registry.list_datasets()

print(f"\n{'='*80}")
print(f"DATA REGISTRY STATUS")
print(f"{'='*80}\n")
print(f"Registry path: {REGISTRY_PATH}")
print(f"Total datasets: {len(datasets)}")

if not datasets:
    raise FileNotFoundError(
        "No datasets found in registry.\n"
        "Run 01_data_download.ipynb first to download market data."
    )

# Display registered datasets by instrument
cdx_datasets = registry.list_datasets(instrument="cdx")
vix_datasets = registry.list_datasets(instrument="vix")
etf_datasets = registry.list_datasets(instrument="etf")

print(f"\nBy Instrument Type:")
print(f"  CDX: {len(cdx_datasets)} datasets")
print(f"  VIX: {len(vix_datasets)} datasets")
print(f"  ETF: {len(etf_datasets)} datasets")

print(f"\n✓ Registry verified with {len(datasets)} datasets")

2025-11-13 22:34:41,503 - aponyx.persistence.json_io - INFO - Loading JSON from C:\Users\ROG3003\PythonProjects\aponyx\data\registry.json
2025-11-13 22:34:41,515 - aponyx.data.registry - INFO - Loaded existing registry: path=C:\Users\ROG3003\PythonProjects\aponyx\data\registry.json, datasets=5



DATA REGISTRY STATUS

Registry path: C:\Users\ROG3003\PythonProjects\aponyx\data\registry.json
Total datasets: 5

By Instrument Type:
  CDX: 2 datasets
  VIX: 1 datasets
  ETF: 2 datasets

✓ Registry verified with 5 datasets


## 2. Load Cached Market Data

Load data using fetch functions which automatically handle cache lookups.

In [3]:
print(f"\n{'='*80}")
print(f"LOADING MARKET DATA")
print(f"{'='*80}\n")

# Detect data source - try Bloomberg cache first, fallback to file cache
use_cache = True
cache_bloomberg = DATA_DIR / "cache" / "bloomberg"
cache_file = DATA_DIR / "cache" / "file"

if cache_bloomberg.exists() and list(cache_bloomberg.glob("*.parquet")):
    # Bloomberg cache available
    from aponyx.data.sources import BloombergSource
    source_type = "Bloomberg"
    print("Data source: Bloomberg cache")
    
    source = BloombergSource()
    cdx_df = fetch_cdx(source=source, security="cdx_ig_5y", use_cache=use_cache)
    vix_df = fetch_vix(source=source, use_cache=use_cache)
    etf_df = fetch_etf(source=source, security="hyg", use_cache=use_cache)
    
elif cache_file.exists() and list(cache_file.glob("*.parquet")):
    # Synthetic data cache available
    from aponyx.data.sources import FileSource
    source_type = "Synthetic (File)"
    print("Data source: Synthetic data cache")
    print("  (Run generate_synthetic_data.py if files are missing)\n")
    
    cdx_source = FileSource(cache_file / "cdx_cdx_ig_5y.parquet")
    vix_source = FileSource(cache_file / "vix_vix.parquet")
    etf_source = FileSource(cache_file / "etf_hyg.parquet")
    
    cdx_df = fetch_cdx(source=cdx_source, security="cdx_ig_5y", use_cache=use_cache)
    vix_df = fetch_vix(source=vix_source, use_cache=use_cache)
    etf_df = fetch_etf(source=etf_source, security="hyg", use_cache=use_cache)
else:
    raise FileNotFoundError(
        "No data cache found. Please run either:\n"
        "  1. 01_data_download.ipynb (Bloomberg Terminal), or\n"
        "  2. python 00_generate_synthetic_data.py (synthetic data)"
    )

# Load and verify CDX IG 5Y data
print(f"Loading CDX IG 5Y...")
print(f"✓ Loaded CDX IG 5Y: {len(cdx_df)} rows")
print(f"  Columns: {list(cdx_df.columns)}")
print(f"  Date range: {cdx_df.index.min()} to {cdx_df.index.max()}")

if 'spread' not in cdx_df.columns:
    raise ValueError(f"CDX data missing 'spread' column. Found: {list(cdx_df.columns)}")

print()

# Load and verify VIX data
print("Loading VIX...")
print(f"✓ Loaded VIX: {len(vix_df)} rows")
print(f"  Columns: {list(vix_df.columns)}")
print(f"  Date range: {vix_df.index.min()} to {vix_df.index.max()}")

if 'level' not in vix_df.columns:
    raise ValueError(f"VIX data missing 'level' column. Found: {list(vix_df.columns)}")

print()

# Load and verify ETF (HYG) data
print("Loading HYG ETF...")
print(f"✓ Loaded HYG ETF: {len(etf_df)} rows")
print(f"  Columns: {list(etf_df.columns)}")
print(f"  Date range: {etf_df.index.min()} to {etf_df.index.max()}")

if 'spread' not in etf_df.columns:
    raise ValueError(f"ETF data missing 'spread' column. Found: {list(etf_df.columns)}")

print()

# Create market data dictionary
market_data = {
    "cdx": cdx_df,
    "vix": vix_df,
    "etf": etf_df,
}

# Display summary
summary_data = [
    {
        'Dataset': 'CDX IG 5Y',
        'Rows': len(cdx_df),
        'Start': cdx_df.index.min().strftime('%Y-%m-%d'),
        'End': cdx_df.index.max().strftime('%Y-%m-%d'),
        'Columns': ', '.join(cdx_df.columns),
    },
    {
        'Dataset': 'VIX',
        'Rows': len(vix_df),
        'Start': vix_df.index.min().strftime('%Y-%m-%d'),
        'End': vix_df.index.max().strftime('%Y-%m-%d'),
        'Columns': ', '.join(vix_df.columns),
    },
    {
        'Dataset': 'HYG ETF',
        'Rows': len(etf_df),
        'Start': etf_df.index.min().strftime('%Y-%m-%d'),
        'End': etf_df.index.max().strftime('%Y-%m-%d'),
        'Columns': ', '.join(etf_df.columns),
    },
]

summary_df = pd.DataFrame(summary_data)
print(f"\nMarket Data Summary:\n")
print(summary_df.to_markdown(index=False))
print(f"\n✓ All market data loaded successfully from {source_type} cache")


2025-11-13 22:34:41,527 - aponyx.data.cache - INFO - Cache hit: cdx_c3bedc49b771b0f2.parquet
2025-11-13 22:34:41,528 - aponyx.persistence.parquet_io - INFO - Loading Parquet file: path=C:\Users\ROG3003\PythonProjects\aponyx\data\cache\file\cdx_c3bedc49b771b0f2.parquet, columns=all
2025-11-13 22:34:41,561 - aponyx.persistence.parquet_io - INFO - Loaded 1304 rows, 2 columns from C:\Users\ROG3003\PythonProjects\aponyx\data\cache\file\cdx_c3bedc49b771b0f2.parquet
2025-11-13 22:34:41,563 - aponyx.data.cache - INFO - Cache hit: vix_d09015690dfa93d9.parquet
2025-11-13 22:34:41,564 - aponyx.persistence.parquet_io - INFO - Loading Parquet file: path=C:\Users\ROG3003\PythonProjects\aponyx\data\cache\file\vix_d09015690dfa93d9.parquet, columns=all
2025-11-13 22:34:41,567 - aponyx.persistence.parquet_io - INFO - Loaded 1304 rows, 1 columns from C:\Users\ROG3003\PythonProjects\aponyx\data\cache\file\vix_d09015690dfa93d9.parquet
2025-11-13 22:34:41,568 - aponyx.data.cache - INFO - Cache hit: etf_6ca5


LOADING MARKET DATA

Data source: Synthetic data cache
  (Run generate_synthetic_data.py if files are missing)

Loading CDX IG 5Y...
✓ Loaded CDX IG 5Y: 1304 rows
  Columns: ['spread', 'security']
  Date range: 2020-11-11 00:00:00 to 2024-06-06 00:00:00

Loading VIX...
✓ Loaded VIX: 1304 rows
  Columns: ['level']
  Date range: 2020-11-11 00:00:00 to 2024-06-06 00:00:00

Loading HYG ETF...
✓ Loaded HYG ETF: 1304 rows
  Columns: ['spread', 'security']
  Date range: 2020-11-11 00:00:00 to 2024-06-06 00:00:00


Market Data Summary:

| Dataset   |   Rows | Start      | End        | Columns          |
|:----------|-------:|:-----------|:-----------|:-----------------|
| CDX IG 5Y |   1304 | 2020-11-11 | 2024-06-06 | spread, security |
| VIX       |   1304 | 2020-11-11 | 2024-06-06 | level            |
| HYG ETF   |   1304 | 2020-11-11 | 2024-06-06 | spread, security |

✓ All market data loaded successfully from Synthetic (File) cache


## 3. Display Signal Catalog

Review all registered signals and identify which are enabled for computation.

In [4]:
# Initialize signal registry
registry = SignalRegistry(SIGNAL_CATALOG_PATH)

print(f"\n{'='*80}")
print(f"SIGNAL CATALOG")
print(f"{'='*80}\n")
print(f"Catalog path: {SIGNAL_CATALOG_PATH}")

# Get all signals
all_signals = registry.list_all()
enabled_signals = registry.get_enabled()

print(f"Total signals: {len(all_signals)}")
print(f"Enabled signals: {len(enabled_signals)}")

# Display all signals
print(f"\n\nAll Registered Signals:\n")
catalog_data = []
for name, metadata in all_signals.items():
    data_req_str = ', '.join([f"{k}:{v}" for k, v in metadata.data_requirements.items()])
    catalog_data.append({
        'Signal': name,
        'Description': metadata.description,
        'Data Requirements': data_req_str,
        'Enabled': '✓' if metadata.enabled else '✗',
    })

catalog_df = pd.DataFrame(catalog_data)
print(catalog_df.to_markdown(index=False))

# Display enabled signals separately
if enabled_signals:
    print(f"\n\nEnabled Signals for Computation:\n")
    enabled_data = []
    for name, metadata in enabled_signals.items():
        enabled_data.append({
            'Signal': name,
            'Description': metadata.description,
            'Function': metadata.compute_function_name,
        })
    enabled_df = pd.DataFrame(enabled_data)
    print(enabled_df.to_markdown(index=False))
else:
    print("\n⚠️  No signals enabled in catalog")

print(f"\n✓ Signal catalog loaded")

2025-11-13 22:34:41,586 - aponyx.models.registry - INFO - Loaded signal registry: catalog=C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\models\signal_catalog.json, signals=3, enabled=3



SIGNAL CATALOG

Catalog path: C:\Users\ROG3003\PythonProjects\aponyx\src\aponyx\models\signal_catalog.json
Total signals: 3
Enabled signals: 3


All Registered Signals:

| Signal          | Description                                                                | Data Requirements      | Enabled   |
|:----------------|:---------------------------------------------------------------------------|:-----------------------|:----------|
| cdx_etf_basis   | Flow-driven mispricing signal from CDX-ETF basis divergence                | cdx:spread, etf:spread | ✓         |
| cdx_vix_gap     | Cross-asset risk sentiment divergence between credit and equity volatility | cdx:spread, vix:level  | ✓         |
| spread_momentum | Short-term volatility-adjusted momentum in CDX spreads                     | cdx:spread             | ✓         |


Enabled Signals for Computation:

| Signal          | Description                                                                | Function                |


## 4. Configure Signal Computation

Set parameters for signal calculation.

In [5]:
# Signal configuration
config = SignalConfig(
    lookback=20,      # 20-day rolling window for tactical signals
    min_periods=10,   # Minimum 10 observations for statistical validity
)

print(f"\n{'='*80}")
print(f"SIGNAL CONFIGURATION")
print(f"{'='*80}\n")
print(f"Configuration:")
print(f"  Lookback window: {config.lookback} days")
print(f"  Minimum periods: {config.min_periods} observations")
print(f"\nRationale:")
print(f"  - 20-day lookback captures tactical credit signal dynamics")
print(f"  - Minimum 10 observations ensures statistical validity")
print(f"  - Z-score normalization for regime independence")
print(f"\n✓ Configuration ready")


SIGNAL CONFIGURATION

Configuration:
  Lookback window: 20 days
  Minimum periods: 10 observations

Rationale:
  - 20-day lookback captures tactical credit signal dynamics
  - Minimum 10 observations ensures statistical validity
  - Z-score normalization for regime independence

✓ Configuration ready


## 5. Compute Signals via Registry

Execute enabled signals using batch computation.

In [6]:
print(f"\n{'='*80}")
print(f"COMPUTING SIGNALS")
print(f"{'='*80}\n")

# Compute all enabled signals
try:
    signals_dict = compute_registered_signals(registry, market_data, config)
    print(f"\n✓ Successfully computed {len(signals_dict)} signals")
except ValueError as e:
    print(f"\n✗ Data requirement error: {e}")
    raise
except AttributeError as e:
    print(f"\n✗ Compute function error: {e}")
    raise

# Verify expected number of signals
expected_count = len(enabled_signals)
actual_count = len(signals_dict)

if actual_count != expected_count:
    raise ValueError(
        f"Signal count mismatch: expected {expected_count}, got {actual_count}"
    )

# Display per-signal statistics
print(f"\n\nSignal Statistics:\n")
stats_data = []
for name, series in signals_dict.items():
    stats_data.append({
        'Signal': name,
        'Valid Obs': series.notna().sum(),
        'Mean': f"{series.mean():.3f}",
        'Std': f"{series.std():.3f}",
        'Min': f"{series.min():.2f}",
        'Max': f"{series.max():.2f}",
    })

stats_df = pd.DataFrame(stats_data)
print(stats_df.to_markdown(index=False))

# Combine into single DataFrame (preserves catalog order)
signals = pd.DataFrame(signals_dict)

print(f"\n\nCombined Signals DataFrame:")
print(f"  Shape: {signals.shape}")
print(f"  Columns: {list(signals.columns)}")
print(f"  Index: {signals.index.name} ({len(signals)} dates)")
print(f"  Date range: {signals.index.min()} to {signals.index.max()}")

print(f"\n✓ Signals computed and combined")

2025-11-13 22:34:41,599 - aponyx.models.catalog - INFO - Computing 3 enabled signals: cdx_etf_basis, cdx_vix_gap, spread_momentum
2025-11-13 22:34:41,599 - aponyx.models.signals - INFO - Computing CDX-ETF basis: cdx_rows=1304, etf_rows=1304, lookback=20
2025-11-13 22:34:41,610 - aponyx.models.signals - INFO - Computing CDX-VIX gap: cdx_rows=1304, vix_rows=1304, lookback=20
2025-11-13 22:34:41,612 - aponyx.models.signals - INFO - Computing spread momentum: cdx_rows=1304, lookback=20
2025-11-13 22:34:41,614 - aponyx.models.catalog - INFO - Successfully computed 3 signals



COMPUTING SIGNALS


✓ Successfully computed 3 signals


Signal Statistics:

| Signal          |   Valid Obs |   Mean |   Std |   Min |   Max |
|:----------------|------------:|-------:|------:|------:|------:|
| cdx_etf_basis   |        1295 | -0.136 | 1.287 | -3.33 |  2.62 |
| cdx_vix_gap     |        1286 |  0.029 | 1.205 | -3.8  |  3.36 |
| spread_momentum |        1284 | -0.034 | 1.944 | -5.71 |  5.1  |


Combined Signals DataFrame:
  Shape: (1304, 3)
  Columns: ['cdx_etf_basis', 'cdx_vix_gap', 'spread_momentum']
  Index: date (1304 dates)
  Date range: 2020-11-11 00:00:00 to 2024-06-06 00:00:00

✓ Signals computed and combined


## 6. Validate Signal Properties

Check z-score normalization, alignment, and correlations.

In [7]:
print(f"\n{'='*80}")
print(f"SIGNAL VALIDATION")
print(f"{'='*80}\n")

validation_results = []
all_passed = True

# Check 1: Z-score normalization
print("Check 1: Z-Score Normalization\n")
for col in signals.columns:
    mean = signals[col].mean()
    std = signals[col].std()
    mean_ok = -0.3 <= mean <= 0.3
    std_ok = 0.7 <= std <= 1.3
    passed = mean_ok and std_ok
    
    validation_results.append({
        'Check': 'Z-Score Normalization',
        'Signal': col,
        'Status': '✓ PASS' if passed else '✗ FAIL',
        'Details': f"mean={mean:.3f}, std={std:.3f}",
    })
    
    if not passed:
        all_passed = False
        print(f"  ⚠️  {col}: mean={mean:.3f} (target: ±0.3), std={std:.3f} (target: 0.7-1.3)")

if all([r['Status'] == '✓ PASS' for r in validation_results if r['Check'] == 'Z-Score Normalization']):
    print("  ✓ All signals properly normalized\n")

# Check 2: DatetimeIndex alignment
print("Check 2: DatetimeIndex Alignment\n")
aligned = signals.index.equals(cdx_df.index)
validation_results.append({
    'Check': 'Index Alignment',
    'Signal': 'All',
    'Status': '✓ PASS' if aligned else '✗ FAIL',
    'Details': f"signals.index == cdx_df.index: {aligned}",
})

if aligned:
    print(f"  ✓ Signals aligned with CDX index ({len(signals)} dates)\n")
else:
    all_passed = False
    print(f"  ✗ Index mismatch detected\n")

# Check 3: Correlation matrix
print("Check 3: Signal Correlations\n")
corr_matrix = signals.corr()

# Check for excessive correlation (>0.9 indicates redundancy)
high_corr = []
for i in range(len(corr_matrix.columns)):
    for j in range(i+1, len(corr_matrix.columns)):
        corr_val = corr_matrix.iloc[i, j]
        if abs(corr_val) > 0.9:
            high_corr.append(f"{corr_matrix.columns[i]} vs {corr_matrix.columns[j]}: {corr_val:.3f}")

corr_ok = len(high_corr) == 0
validation_results.append({
    'Check': 'Correlation Range',
    'Signal': 'All',
    'Status': '✓ PASS' if corr_ok else '⚠️  WARNING',
    'Details': f"High correlations (>0.9): {len(high_corr)}",
})

if corr_ok:
    print(f"  ✓ No excessive correlations detected\n")
else:
    print(f"  ⚠️  High correlations found:")
    for item in high_corr:
        print(f"    {item}")
    print()

print(f"Correlation Matrix:\n")
print(corr_matrix.to_string(float_format='%.3f'))

# Display validation summary
print(f"\n\nValidation Summary:\n")
validation_df = pd.DataFrame(validation_results)
print(validation_df.to_markdown(index=False))

if all_passed:
    print(f"\n✓ All validation checks passed")
else:
    print(f"\n⚠️  Some validation checks failed - review warnings above")
    print(f"Signal computation will continue, but results should be reviewed carefully.")


SIGNAL VALIDATION

Check 1: Z-Score Normalization

  ⚠️  spread_momentum: mean=-0.034 (target: ±0.3), std=1.944 (target: 0.7-1.3)
Check 2: DatetimeIndex Alignment

  ✓ Signals aligned with CDX index (1304 dates)

Check 3: Signal Correlations

  ✓ No excessive correlations detected

Correlation Matrix:

                 cdx_etf_basis  cdx_vix_gap  spread_momentum
cdx_etf_basis            1.000        0.040           -0.033
cdx_vix_gap              0.040        1.000           -0.623
spread_momentum         -0.033       -0.623            1.000


Validation Summary:

| Check                 | Signal          | Status   | Details                             |
|:----------------------|:----------------|:---------|:------------------------------------|
| Z-Score Normalization | cdx_etf_basis   | ✓ PASS   | mean=-0.136, std=1.287              |
| Z-Score Normalization | cdx_vix_gap     | ✓ PASS   | mean=0.029, std=1.205               |
| Z-Score Normalization | spread_momentum | ✗ FAIL   | m

## 7. Visualize Signal Time Series

Plot individual signals and comparative analysis.

In [8]:
print(f"\n{'='*80}")
print(f"SIGNAL VISUALIZATION")
print(f"{'='*80}\n")

# Plot each signal individually
for signal_name in signals.columns:
    print(f"Plotting {signal_name}...")
    fig = plot_signal(
        signals[signal_name],
        title=f"Signal: {signal_name}",
        threshold_lines=[-2, 2],
    )
    fig.show()

print(f"\n✓ Individual signal plots complete")

2025-11-13 22:34:41,641 - aponyx.visualization.plots - INFO - Plotting signal: 1304 observations



SIGNAL VISUALIZATION

Plotting cdx_etf_basis...


2025-11-13 22:34:43,185 - aponyx.visualization.plots - INFO - Plotting signal: 1304 observations


Plotting cdx_vix_gap...


2025-11-13 22:34:43,229 - aponyx.visualization.plots - INFO - Plotting signal: 1304 observations


Plotting spread_momentum...



✓ Individual signal plots complete


In [9]:
# Create 3-panel comparison subplot
print(f"\nCreating signal comparison subplot...")

signal_names = list(signals.columns)
fig = make_subplots(
    rows=3,
    cols=1,
    subplot_titles=signal_names,
    vertical_spacing=0.08,
)

colors = ['steelblue', 'darkorange', 'darkgreen']

for i, signal_name in enumerate(signal_names, start=1):
    fig.add_trace(
        go.Scatter(
            x=signals.index,
            y=signals[signal_name],
            name=signal_name,
            line=dict(color=colors[i-1], width=1.5),
        ),
        row=i,
        col=1,
    )
    
    # Add threshold lines
    for threshold in [-2, 2]:
        fig.add_hline(
            y=threshold,
            line_dash="dot",
            line_color="red",
            opacity=0.4,
            row=i,
            col=1,
        )
    
    # Add zero line
    fig.add_hline(
        y=0,
        line_dash="dash",
        line_color="gray",
        opacity=0.5,
        row=i,
        col=1,
    )

fig.update_layout(
    height=900,
    title_text="Signal Comparison - All Signals",
    showlegend=False,
    template="plotly_white",
)

fig.update_yaxes(title_text="Signal Value")

fig.show()

print(f"✓ Comparison subplot complete")


Creating signal comparison subplot...


✓ Comparison subplot complete


In [10]:
# Create correlation heatmap
print(f"\nCreating correlation heatmap...")

fig = px.imshow(
    corr_matrix,
    text_auto=".2f",
    color_continuous_scale='RdBu_r',
    aspect="auto",
    title="Signal Correlation Matrix",
    labels=dict(color="Correlation"),
    zmin=-1,
    zmax=1,
)

fig.update_layout(
    width=600,
    height=500,
    template="plotly_white",
)

fig.show()

print(f"✓ Correlation heatmap complete")
print(f"\n✓ All visualizations complete")


Creating correlation heatmap...


✓ Correlation heatmap complete

✓ All visualizations complete


## 8. Persist Signals and Metadata

Save signals DataFrame and computation metadata for reproducibility.

In [11]:
print(f"\n{'='*80}")
print(f"PERSISTING OUTPUTS")
print(f"{'='*80}\n")

# Save signals DataFrame
signals_path = DATA_DIR / "processed" / "signals.parquet"
save_parquet(signals, signals_path)

signals_size_mb = signals_path.stat().st_size / (1024 * 1024)
print(f"✓ Signals saved to: {signals_path}")
print(f"  Size: {signals_size_mb:.2f} MB")
print(f"  Shape: {signals.shape}")

# Create computation metadata
metadata = {
    "timestamp": datetime.now().isoformat(),
    "config": {
        "lookback": config.lookback,
        "min_periods": config.min_periods,
    },
    "date_range": {
        "start": signals.index.min().isoformat(),
        "end": signals.index.max().isoformat(),
    },
    "signal_names": list(signals.columns),
    "observation_count": len(signals),
}

# Save metadata
metadata_path = LOGS_DIR / "signal_computation_metadata.json"
save_json(metadata, metadata_path)

metadata_size_kb = metadata_path.stat().st_size / 1024
print(f"\n✓ Metadata saved to: {metadata_path}")
print(f"  Size: {metadata_size_kb:.2f} KB")

print(f"\n✓ All outputs persisted successfully")

2025-11-13 22:34:43,511 - aponyx.persistence.parquet_io - INFO - Saving DataFrame to Parquet: path=C:\Users\ROG3003\PythonProjects\aponyx\data\processed\signals.parquet, rows=1304, columns=3, compression=snappy
2025-11-13 22:34:43,511 - aponyx.persistence.json_io - INFO - Saving JSON to C:\Users\ROG3003\PythonProjects\aponyx\logs\signal_computation_metadata.json (5 top-level keys)



PERSISTING OUTPUTS

✓ Signals saved to: C:\Users\ROG3003\PythonProjects\aponyx\data\processed\signals.parquet
  Size: 0.05 MB
  Shape: (1304, 3)

✓ Metadata saved to: C:\Users\ROG3003\PythonProjects\aponyx\logs\signal_computation_metadata.json
  Size: 0.32 KB

✓ All outputs persisted successfully


---

## Workflow Complete

Signal computation successful! The computed signals are now ready for suitability evaluation in Step 3.

### What Was Accomplished

✓ **Market Data Loaded** — CDX, VIX, and ETF data from Bloomberg cache  
✓ **Signals Computed** — All enabled signals via SignalRegistry  
✓ **Validation Passed** — Z-score normalization and alignment verified  
✓ **Visualizations Created** — Time series and correlation analysis  
✓ **Outputs Persisted** — Signals and metadata saved for reproducibility

### Data Flow

```
Cached Market Data (Step 1)
    ↓
Signal Computation (this notebook)
    ↓
signals: pd.DataFrame
├─ cdx_etf_basis: z-score normalized
├─ cdx_vix_gap: z-score normalized
└─ spread_momentum: z-score normalized
    ↓
Suitability Evaluation (next notebook)
```

### Re-Running This Notebook

- **Data source:** Loads from cache created in Step 1
- **Recomputation:** Signals are recomputed from scratch each run
- **Outputs:** Overwrites `signals.parquet` and metadata JSON
- **Catalog changes:** Edit `signal_catalog.json` to enable/disable signals

### Key Files Generated

```
data/
└── processed/
    └── signals.parquet (multi-column DataFrame)

logs/
└── signal_computation_metadata.json
```

### Troubleshooting

**Cache files not found:**
- Run `01_data_download.ipynb` first
- Verify cache directory: `data/cache/bloomberg/`
- Check for files: `cdx_cdx_ig_5y.parquet`, `vix_vix.parquet`, `etf_hyg.parquet`

**Signal computation errors:**
- Check data requirements in signal catalog
- Verify market_data dict has correct keys: cdx, vix, etf
- Review ERROR logs for missing columns or functions

**Validation warnings:**
- Z-score normalization may vary with data regime
- Minor deviations (mean ±0.5, std 0.5-1.5) are acceptable
- Review signal statistics and proceed if reasonable

**Visualization errors:**
- Ensure visualization dependencies installed: `uv sync --extra viz`
- Check plotly import succeeds in first cell
- Verify Jupyter can render plotly figures