# Experiment 02: Rotation vs Masking Auxiliary Task

**Proposal**: "Explore whether finance-specific self-supervised tasks can better align auxiliary and main objectives than generic rotation-based tasks."

This notebook trains a model with **rotation** as the aux task and compares metrics to the **mask**-trained model from Experiment 01.

**Prerequisites**: Run Experiment 01 first to obtain `checkpoints/joint/best.pt` (mask aux).

## 1. Setup

In [1]:
import os
import sys

cwd = os.getcwd()
PROJECT_ROOT = os.path.dirname(cwd) if os.path.basename(cwd) == "experiments" else cwd
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)
os.chdir(PROJECT_ROOT)
print(f"Working directory: {os.getcwd()}")

Working directory: /home/psinghavi/crypto-ttt-regime


## 2. Train with Rotation Aux Task

In [2]:
# Training already complete — checkpoints exist
# !python -m src.train --parquet data/raw/btcusdt_1h.parquet \
#   --train_end 2022-12-31 --val_end 2023-12-31 \
#   --epochs 30 --aux_task rotation --lambda_aux 1.0 \
#   --checkpoint_dir checkpoints/rotation

## 3. Evaluation

In [3]:
!python -m src.eval --checkpoint checkpoints/rotation/best.pt \
  --ttt_steps 10 --ttt_lr 0.05 --ttt_optimizer adam \
  --entropy_adaptive --entropy_gate_threshold 0.3 --threshold 0.35

02:37:51 | INFO    | __main__ | Device: cuda
02:37:53 | INFO    | __main__ | Loaded checkpoint from epoch 4 (val_acc=0.668)
02:37:55 | INFO    | __main__ | Test set size: 770
02:37:57 | INFO    | __main__ | Running baseline evaluation …
02:38:08 | INFO    | __main__ | Running standard TTT (steps=10, lr=0.0500) …
Standard TTT: 100%|███████████████████████████| 770/770 [01:46<00:00,  7.20it/s]
02:39:55 | INFO    | __main__ | TTT aux_loss (first 100 samples): initial=0.0037 → final=22.5250
02:39:55 | INFO    | __main__ | Running online TTT …

----------------------------------------------------------------------
Mode                  accuracy        f1       ece     brier        IC
----------------------------------------------------------------------
Baseline                0.5013    0.3663    0.1532    0.1892    0.1913
TTT (standard)          0.4987    0.3299    0.1421    0.1895    0.0147
TTT (online)            0.4026    0.3072    0.1738    0.2068   -0.0587
----------------------------

## 4. Comparison: Mask vs Rotation

| Aux Task | Mode | Accuracy | F1 | ECE | Brier | IC |
|----------|------|----------|-----|-----|-------|------|
| **Mask** | Baseline | 0.7636 | 0.0808 | 0.0585 | 0.1723 | 0.0951 |
| Mask | TTT (standard) | 0.5688 | 0.3054 | 0.1149 | 0.1850 | 0.0046 |
| Mask | TTT (online) | 0.7065 | 0.1567 | 0.1104 | 0.1827 | -0.0677 |
| **Rotation** | Baseline | 0.5013 | 0.3663 | 0.1532 | 0.1892 | 0.1913 |
| Rotation | TTT (standard) | 0.4987 | 0.3299 | 0.1421 | 0.1895 | 0.0147 |
| Rotation | TTT (online) | 0.4026 | 0.3072 | 0.1738 | 0.2068 | -0.0587 |

**Summary**: Rotation baseline has better F1 (0.37) and IC (0.19) than mask baseline, suggesting rotation learns more discriminative features during joint training. However, rotation TTT aux loss explodes during adaptation (0.004 → 22.5), confirming catastrophic forgetting — the rotation task has no regime semantics, so adapting on it destroys the encoder. Mask TTT (standard) improves F1 from 0.08 to 0.31 with the confidence gate preventing over-adaptation on easy samples. Mask online TTT now maintains 0.71 accuracy (vs 0.34 without fixes), demonstrating that the entropy gate and consistent mask objective stabilize sequential adaptation.

## 5. Regime-Stratified Evaluation (03)

See Experiment 03 for performance stability across volatility percentiles and regime-specific analysis.