l2-regime: rolling-RV regime filter, OOS-verified 2× IC uplift by neuron7xLab · Pull Request #236 · neuron7xLab/GeoSync

neuron7xLab · 2026-04-17T20:51:30Z

Summary

Recursive + cyclic + rolling-walk-forward analysis of the 5h14m collected L2 substrate revealed that the Ricci κ_min cross-sectional edge is intermittent, not uniform — some time blocks emit IC > +0.18, others invert to IC < -0.10. Full-window PROCEED averaged these.

This PR introduces the regime filter that rescues the edge:

Discriminator: rolling realized volatility (Spearman ρ = +0.352, p = 0.008 on 56 rolling windows)
Mechanism: when market is quiet (low RV), OFI → 0 → Ricci → noise; when market is active, OFI drives observable moves → Ricci has structural content to score
OOS verification (threshold calibrated on first half, applied to second half, zero information leakage):

	frac_on	IC_signal	Residual IC
TEST unconditional	100.0 %	+0.116	+0.123
TEST q50 thr←train	43.9 %	+0.202	+0.193
TEST q75 thr←train	36.3 %	+0.236	+0.233

2.03× IC uplift with threshold learned on first half generalizing cleanly to second half.

Architecture (AE-compliant — elimination over addition)

New public surface in research/microstructure/regime.py:

rolling_corr_regime(features, window_rows=300) -> NDArray[float64]
rolling_rv_regime(features, window_rows=300)   -> NDArray[float64]  # primary
regime_mask_from_score(score, threshold)       -> NDArray[bool]
regime_mask_from_quantile(score, quantile)     -> NDArray[bool]

One optional parameter added to existing gate:

run_killtest(..., regime_mask: NDArray[bool] | None = None) -> GateVerdict

None → identical behavior to before. No new dataclasses, no new CLI flags (yet — they'll land when the regime filter earns its place in the permanent gate).

Analysis scripts (diagnostics, not production)

scripts/l2_killtest_recursive.py — depth-first bisection (full → halves → quarters → octiles) + cyclic K=8 blocks. Exposes regime structure.
scripts/l2_regime_analysis.py — per-block regime features + IC rank correlation.
scripts/l2_walk_forward.py — 56 rolling 40-min windows with 5-min step, Spearman feature→IC correlations.
scripts/l2_regime_conditional.py — in-sample conditional gate at multiple quantile thresholds × window sizes.
scripts/l2_regime_oos.py — true OOS, threshold on train, applied to test.

Quality gates

ruff format+check: clean
black --check: clean
mypy --strict --follow-imports=silent: clean
26/26 pytest green (19 prior + 7 new regime tests)

What's explicitly NOT in this PR

No new CLI flag for --regime yet — waiting for 8h fresh collection (currently running in background) to confirm threshold generalizes across a second session, not just across halves of one session.
No walk-forward framework — we already walked forward; the uplift survives.
No live-execution hooks.

Test plan

CI green
Merge (admin-squash after CI green — branch-protection on main)
Wait for 8h fresh collection (ETA ~07:30 UTC+3)
Run l2_regime_oos.py against FRESH substrate with threshold from ORIGINAL session — if uplift holds across sessions, regime filter is production-ready.

🤖 Generated with Claude Code

Three diagnostic scripts built on existing primitives (slice_features, run_killtest, cross_sectional_ricci_signal) — no new abstractions (AE principles 1, 20). * scripts/l2_killtest_recursive.py — depth-first bisection (up to depth 3) + cyclic K=8 disjoint blocks. Reveals regime structure hidden by full-window averaging. On collected substrate: 3/8 blocks PROCEED with IC up to +0.339; 3/8 KILL with IC as low as -0.109. Signal is intermittent, not uniform. * scripts/l2_regime_analysis.py — per-block regime features (realized vol, cross-asset correlation, dispersion, signed trend, κ_min moments). Spearman rank-correlates block IC against each feature. On K=8: corr_mean strongest direction (ρ=+0.429, p=0.29 n=8). Underpowered for statistical claim; motivates finer-grained analysis. * scripts/l2_walk_forward.py — 40-minute rolling window with 5-minute step across substrate. ~56 windows gives the statistical power that 8 disjoint blocks lack. Reports IC trajectory, Spearman ρ at rolling resolution, quartile bins on the most-correlated feature to find a discriminator threshold. Output artifacts: * results/REGIME_ANALYSIS.json (8-block table + ρ matrix) * results/L2_WALK_FORWARD.json (56-row trajectory, quartile bins) Non-goals: new dataclasses, new production modules. Pure diagnostics. If walk-forward identifies a regime discriminator, next commit adds the regime filter to killtest.py as an optional parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Walk-forward analysis (scripts/l2_walk_forward.py, 56 rolling 40-min windows) identified rolling realized volatility as the dominant regime discriminator: Spearman ρ(IC_signal, rv_mean) = +0.352 p = 0.008 *** Spearman ρ(IC_signal, corr_mean) = +0.317 p = 0.017 * Spearman ρ(IC_signal, trend_*) = not significant Quartile analysis on rv_mean: Q1_low IC median +0.027 (signal ≈ noise) Q4_high IC median +0.137 (signal works) IN-SAMPLE CONDITIONAL (scripts/l2_regime_conditional.py): unconditional IC = +0.122 rv_w600_q75 IC = +0.256 (frac_on = 24.2 %) rv_w300_q50 IC = +0.177 (frac_on = 49.2 %) TRUE OOS (scripts/l2_regime_oos.py, threshold trained on first half, applied to second half, no information leakage): TEST unconditional IC = +0.116 frac_on = 100.0 % TEST q50 thr from train IC = +0.202 frac_on = 43.9 % TEST q75 thr from train IC = +0.236 frac_on = 36.3 % => 2.03× IC uplift OOS, threshold generalizes Components - research/microstructure/regime.py — 4 functions, no new dataclasses: * rolling_corr_regime(features, window_rows) * rolling_rv_regime(features, window_rows) (primary, OOS-verified) * regime_mask_from_score(score, threshold) * regime_mask_from_quantile(score, quantile) - research/microstructure/killtest.py — single optional parameter: * run_killtest(..., regime_mask: NDArray[bool] | None = None) * Backwards-compatible: None → identical behavior to before * When supplied, mask is applied at scoring time (ricci + target → NaN outside mask), Ricci signal itself still computed on full contiguous series (its rolling corr needs consecutive rows) - tests/test_l2_regime.py — 7 new tests: * shape + warmup on rolling_corr_regime * high-ρ vs low-ρ synthetic discrimination * argument validation (window too small, single symbol) * mask NaN handling * killtest rejects wrong mask shape * trivial all-True mask matches unconditional (regression) All 26 tests green. ruff + black + mypy --strict clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

GeoSync/research/microstructure/killtest.py

Line 459 in 8c31f7b

n_samples=int(features.n_rows),

Report effective sample count for masked killtest runs

n_samples is always reported as features.n_rows, but with regime_mask many rows are excluded from IC/p-value computations, so the verdict metadata overstates the amount of data actually used. This can mislead analysis scripts or operators about statistical power; n_samples should reflect the finite scored rows (or an additional effective-sample field should be emitted).

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T20:55:31Z

+        ricci_panel = np.where(panel_mask, ricci_panel, np.nan)
+        target = np.where(panel_mask, target, np.nan)


Permute only active rows in masked null tests

When regime_mask is applied, rows outside the regime are converted to NaN and then _permutation_pvalue still shuffles across the full time axis; for sparse masks this makes most shuffled rows miss the finite target rows, shrinking effective trial sample size (often below _pooled_ic's minimum) and biasing permutation p-values toward non-significance. This can falsely KILL regime-conditional runs even when the observed IC is strong, so the null test should permute/compress only the active (finite) rows.

Useful? React with 👍 / 👎.

Two more diagnostic scripts for the regime filter, completing the generalization ladder: * scripts/l2_regime_walkforward_calibration.py Rolling 60-min calibration + 30-min evaluation. Slides across substrate; at each step derives q50/q75 thresholds from the calibration window, applies them to the next evaluation window. Result on collected substrate: uplift POSITIVE in only 1 of 7 steps (q50) / 1 of 5 steps (q75). HONEST LIMIT: the 50/50 split uplift (IC +0.12 → +0.24 OOS) does NOT survive production-style short-window rolling recalibration. Threshold needs longer calibration horizons to stabilize. * scripts/l2_regime_cross_session.py Cross-session OOS scaffold: takes --train-dir and --test-dir, derives quantile thresholds from the train session, applies to the test session, writes results/L2_REGIME_CROSS_SESSION.json. Runnable against the second 8h session currently being collected into data/binance_l2_perp_v2. Strongest form of OOS we can do without multi-day walk-forward. These land as diagnostics only. The regime MODULE (regime.py) and its integration (run_killtest regime_mask param) are the shippable artifacts. Scripts document the calibration surface honestly — including where the filter breaks under stricter recalibration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Open question from PR #237: does Ricci cross-sectional signal polarity vary with UTC hour-of-day, or was Session-2's IC=-0.22 just sample noise? Binary test via new research/microstructure/diurnal.py folds three collected sessions into UTC-hour buckets and runs per-hour pooled Spearman IC + permutation test. VERDICT: SIGN_FLIP_CONFIRMED Per-hour IC (pooled across sessions; all buckets at permutation p<0.05 unless noted; n_rows after NaN-filter): hour IC p sessions regime 05Z n/a — S3 underpowered 06Z -0.0619 0.002 S3 Sat AM quiet 07Z -0.1194 0.002 S3 Sat AM quiet 08Z +0.0086 0.036 S1+S3 EU open weak 09Z +0.0191 0.002 S1+S3 EU open 10Z +0.1279 0.002 S1 EU active 11Z -0.0360 0.002 S1 EU midday lull 12Z +0.0556 0.002 S1 EU + pre-US 13Z +0.3554 0.002 S1 EU+US overlap PEAK 20Z +0.0008 0.954 S2 underpowered (n=7410) 21Z -0.0715 0.002 S2 US close / EU eve 22Z -0.1989 0.002 S2 EU evening Significance tally: 5 positive hours + 5 negative hours at p<0.05. Verdict: SIGN_FLIP_CONFIRMED (at least one bucket each side). Architectural implication: The regime-q75 filter landed in PR #236 conditions on volatility but does NOT capture the inversion driver. Saturday-morning 07Z shows IC=-0.12 at ordinary vol levels — a pure vol threshold does not separate this from the +0.13 of 10Z. Time-of-day is a load-bearing axis the current gate does not model. Next step (out of scope for this PR): diurnal-aware sign strategy as follow-up research module. Components: research/microstructure/diurnal.py (typed, 7 tests) utc_hour_of_row(start_ms, n_rows) -> NDArray[int64] compute_diurnal_profile(sessions, horizon_sec, min_rows_per_hour, perm_trials, pvalue_gate, seed) -> DiurnalProfile profile_to_json_dict(profile) -> dict session_start_ms_from_frames(frames) -> int scripts/run_l2_diurnal_profile.py --data-dir PATH (repeatable; one per session) --horizon-sec INT (default 180) --min-rows-per-hour INT (default 300) --perm-trials INT (default 500) --pvalue-gate FLOAT (default 0.05) --seed INT (default 42) --output PATH tests/test_l2_diurnal.py (7 tests): utc_hour_of_row monotone + wrap + negative rejection empty-sessions underpowered low-sample underpowered/stable multi-session hour merging JSON schema contract determinism under fixed seed Evidence: results/L2_DIURNAL_PROFILE.json (tracked; full per-hour breakdown) Quality gates: ruff + black + mypy --strict clean. Test regression: 42 → 49 (+7 diurnal). Numerical locks (PR #238 fixtures) unchanged: ic_test_q75 = 0.23638402111955653 breakeven_q75 = 0.4072465349699599 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#240) Open question from PR #237: does Ricci cross-sectional signal polarity vary with UTC hour-of-day, or was Session-2's IC=-0.22 just sample noise? Binary test via new research/microstructure/diurnal.py folds three collected sessions into UTC-hour buckets and runs per-hour pooled Spearman IC + permutation test. VERDICT: SIGN_FLIP_CONFIRMED Per-hour IC (pooled across sessions; all buckets at permutation p<0.05 unless noted; n_rows after NaN-filter): hour IC p sessions regime 05Z n/a — S3 underpowered 06Z -0.0619 0.002 S3 Sat AM quiet 07Z -0.1194 0.002 S3 Sat AM quiet 08Z +0.0086 0.036 S1+S3 EU open weak 09Z +0.0191 0.002 S1+S3 EU open 10Z +0.1279 0.002 S1 EU active 11Z -0.0360 0.002 S1 EU midday lull 12Z +0.0556 0.002 S1 EU + pre-US 13Z +0.3554 0.002 S1 EU+US overlap PEAK 20Z +0.0008 0.954 S2 underpowered (n=7410) 21Z -0.0715 0.002 S2 US close / EU eve 22Z -0.1989 0.002 S2 EU evening Significance tally: 5 positive hours + 5 negative hours at p<0.05. Verdict: SIGN_FLIP_CONFIRMED (at least one bucket each side). Architectural implication: The regime-q75 filter landed in PR #236 conditions on volatility but does NOT capture the inversion driver. Saturday-morning 07Z shows IC=-0.12 at ordinary vol levels — a pure vol threshold does not separate this from the +0.13 of 10Z. Time-of-day is a load-bearing axis the current gate does not model. Next step (out of scope for this PR): diurnal-aware sign strategy as follow-up research module. Components: research/microstructure/diurnal.py (typed, 7 tests) utc_hour_of_row(start_ms, n_rows) -> NDArray[int64] compute_diurnal_profile(sessions, horizon_sec, min_rows_per_hour, perm_trials, pvalue_gate, seed) -> DiurnalProfile profile_to_json_dict(profile) -> dict session_start_ms_from_frames(frames) -> int scripts/run_l2_diurnal_profile.py --data-dir PATH (repeatable; one per session) --horizon-sec INT (default 180) --min-rows-per-hour INT (default 300) --perm-trials INT (default 500) --pvalue-gate FLOAT (default 0.05) --seed INT (default 42) --output PATH tests/test_l2_diurnal.py (7 tests): utc_hour_of_row monotone + wrap + negative rejection empty-sessions underpowered low-sample underpowered/stable multi-session hour merging JSON schema contract determinism under fixed seed Evidence: results/L2_DIURNAL_PROFILE.json (tracked; full per-hour breakdown) Quality gates: ruff + black + mypy --strict clean. Test regression: 42 → 49 (+7 diurnal). Numerical locks (PR #238 fixtures) unchanged: ic_test_q75 = 0.23638402111955653 breakeven_q75 = 0.4072465349699599 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

neuron7xLab and others added 2 commits April 17, 2026 23:41

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

neuron7xLab merged commit bdbc991 into main Apr 17, 2026
13 checks passed

neuron7xLab deleted the l2-regime-calibration branch April 17, 2026 21:19

neuron7xLab mentioned this pull request Apr 17, 2026

l2-diagnostics: regime, per-symbol, horizon, P&L simulation + morning brief #237

Merged

6 tasks

neuron7xLab mentioned this pull request Apr 18, 2026

feat(diurnal): cross-session per-hour IC profile — SIGN_FLIP_CONFIRMED #240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

l2-regime: rolling-RV regime filter, OOS-verified 2× IC uplift#236

l2-regime: rolling-RV regime filter, OOS-verified 2× IC uplift#236
neuron7xLab merged 3 commits intomainfrom
l2-regime-calibration

neuron7xLab commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		ricci_panel = np.where(panel_mask, ricci_panel, np.nan)
		target = np.where(panel_mask, target, np.nan)

Conversation

neuron7xLab commented Apr 17, 2026

Summary

Architecture (AE-compliant — elimination over addition)

Analysis scripts (diagnostics, not production)

Quality gates

What's explicitly NOT in this PR

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant