# Experiment 01: Baseline & TTT Benchmark

**Research Question** (from README):
> Can self-supervised Test-Time Training adapt a vision-based volatility classifier to non-stationary crypto market regimes without access to ground-truth labels during inference?

This notebook reproduces the baseline benchmark and compares three evaluation modes:
1. **Baseline** - fixed model, no adaptation
2. **TTT (standard)** - adapt per sample, reset encoder
3. **TTT (online)** - adapt sequentially, keep encoder state

**Requirements**: `data/raw/btcusdt_1h.parquet` must exist. Run from project root.

## 1. Setup

In [None]:
import os
import sys

# Ensure project root is on path (works from experiments/ or project root)
cwd = os.getcwd()
PROJECT_ROOT = os.path.dirname(cwd) if os.path.basename(cwd) == "experiments" else cwd
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)
os.chdir(PROJECT_ROOT)
print(f"Working directory: {os.getcwd()}")

## 2. Data Preparation

Creates `data/processed/dataset.pt` from raw OHLCV parquet. Skips if already exists.

In [None]:
from pathlib import Path
from src.dataset import prepare_dataset

pt_path = Path("data/processed/dataset.pt")
if not pt_path.exists():
    prepare_dataset(
        parquet_path="data/raw/btcusdt_1h.parquet",
        output_dir="data/processed",
        train_end="2022-12-31",
        val_end="2023-12-31",
        seed=42,
    )
    print("Dataset created.")
else:
    print(f"Dataset exists: {pt_path}")

## 3. Training

**Baseline (no aux)**: `--aux_task none`

**Joint (mask aux)**: `--aux_task mask --lambda_aux 1.0` — required for TTT.

In [None]:
# Joint training with temporal masking aux task (for TTT)
!python -m src.train --parquet data/raw/btcusdt_1h.parquet \
  --train_end 2022-12-31 --val_end 2023-12-31 \
  --epochs 30 --aux_task mask --lambda_aux 1.0 \
  --checkpoint_dir checkpoints/joint

In [None]:
# Optional: Baseline (no aux) for comparison
# !python -m src.train --parquet data/raw/btcusdt_1h.parquet \
#   --train_end 2022-12-31 --val_end 2023-12-31 \
#   --epochs 30 --aux_task none --checkpoint_dir checkpoints/baseline

## 4. Evaluation

Runs baseline, TTT (standard), and TTT (online). Uses `--threshold 0.35` to improve recall on minority (high-vol) class.

In [None]:
!python -m src.eval --checkpoint checkpoints/joint/best.pt \
  --ttt_steps 10 --ttt_lr 0.5 --ttt_optimizer adam \
  --entropy_adaptive --threshold 0.35

## 5. Results Summary

| Mode | Accuracy | F1 | ECE | Brier | IC |
|------|----------|-----|-----|-------|-----|
| Baseline | - | - | - | - | - |
| TTT (standard) | - | - | - | - | - |
| TTT (online) | - | - | - | - | - |

*Copy values from eval output above. Example from a typical run:*

| Mode | Accuracy | F1 | ECE | Brier | IC |
|------|----------|-----|-----|-------|-----|
| Baseline | 0.76 | 0.08 | 0.06 | 0.17 | 0.09 |
| TTT (standard) | 0.47 | 0.32 | 0.15 | 0.19 | 0.06 |
| TTT (online) | 0.34 | 0.35 | 0.20 | 0.21 | -0.03 |

**Interpretation**: TTT increases F1 (better recall on high-vol) at the cost of accuracy and calibration. Check `TTT aux_loss: initial=... → final=...` to confirm adaptation is occurring.

## 6. Future Experiments

Add new cells with different configs and record results:

- `--ttt_steps 5` or `20`
- `--ttt_lr 0.1` or `0.3`
- `--ttt_optimizer sgd`
- `--aux_task rotation` (train with rotation aux)
- `--mask_mode rightmost` (extrapolation instead of interpolation)