feat: ship shared validation runners#11
Conversation
There was a problem hiding this comment.
Pull request overview
This PR factors out shared cross-framework validation/benchmark runners (LEAN, VectorBT, Backtrader, Zipline) into ml4t.backtest._validation, refactors the benchmark harness to use them, and updates the LEAN parity profile plus associated docs/tests.
Changes:
- Add shared validation runners for LEAN/VectorBT/Backtrader/Zipline under
src/ml4t/backtest/_validation/. - Refactor
validation/benchmark_suite.pyto reuse shared runners and update LEAN benchmarking artifacts (fills/equity/order-events). - Update the LEAN profile semantics (next-bar open + leverage) and refresh parity docs/results and focused tests.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| validation/benchmark_suite.py | Refactors the benchmark harness to consume shared runners; updates LEAN artifact generation/parsing. |
| validation/README.md | Refreshes benchmark/parity result tables. |
| validation/METHODOLOGY.md | Updates methodology tables for the new LEAN “fills” parity framing. |
| uv.lock | Updates lock metadata/extras for dependency wiring. |
| tests/test_config_wiring.py | Updates preset wiring tests; adds EXIT_FIRST edge-case regression tests; adds lean preset assertions. |
| tests/contracts/test_profile_parity_basics.py | Updates LEAN profile contract assertions to reflect new semantics. |
| tests/benchmark/test_lean_adapter.py | Adds tests for LEAN artifact loader behavior surfaced via benchmark suite adapter. |
| tests/benchmark/test_backtrader_zipline_runners.py | Adds unit tests for new shared runner helpers (trade log + normalization/flattening). |
| src/ml4t/backtest/types.py | Adds internal _created_bar_index to Order for next-bar queue handling. |
| src/ml4t/backtest/profiles.py | Updates LEAN profile semantics; removes lean_strict; enables queue shadow validation for strict-like parity. |
| src/ml4t/backtest/core/order_book.py | Stamps orders with _created_bar_index at submission time. |
| src/ml4t/backtest/core/execution_engine.py | Adds next-bar queue shadow validation path; adjusts EXIT_FIRST processing. |
| src/ml4t/backtest/config.py | Wires new next_bar_queue_shadow_validation knob through serialization and presets. |
| src/ml4t/backtest/broker.py | Threads next_bar_queue_shadow_validation from config into runtime broker. |
| src/ml4t/backtest/_validation/zipline_runner.py | New shared Zipline orchestration + bundle caching + artifact extraction helpers. |
| src/ml4t/backtest/_validation/vectorbt_runner.py | New shared VectorBT import/execution/artifact helpers. |
| src/ml4t/backtest/_validation/lean_runner.py | New shared LEAN CLI orchestration, data export, and artifact parsing. |
| src/ml4t/backtest/_validation/backtrader_runner.py | New shared Backtrader orchestration + PyFolio trade-log helpers. |
| src/ml4t/backtest/_validation/init.py | Introduces internal _validation package marker. |
| docs/user-guide/profiles.md | Removes lean_strict references; updates LEAN parity row to fills-based metrics. |
| docs/user-guide/accounts.md | Removes lean_strict from the “strict variants” list. |
| docs/index.md | Updates LEAN parity row to fills-based metrics. |
| README.md | Updates top-level validation summary to fills-based LEAN parity. |
| LIMITATIONS.md | Removes lean_strict mention; updates LEAN parity row. |
| CHANGELOG.md | Adds release notes for the new shared validation runners and parity updates. |
Comments suppressed due to low confidence (1)
validation/benchmark_suite.py:1940
- In the generated LEAN algorithm,
on_datareturns early when the timestamp key is missing from_targets. Becausetargets.csvis written sparsely (non-zero targets only), a day with all-zero targets (possible when signals are missing/NaN early in the series) would skip both equity logging and the logic that closes any existing positions. Consider treating missing keys as an empty target set (targets = self._targets.get(key, {})) and still appending the equity row.
def on_data(self, data: Slice):
key = self.time.strftime("%Y-%m-%d")
targets = self._targets.get(key)
if targets is None:
return
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| broker._partial_orders.pop(order.order_id, None) | ||
| else: | ||
| fill.update_partial_order(order) | ||
| broker.mark_account_positions(use_open=False) |
There was a problem hiding this comment.
broker.mark_account_positions(use_open=False) forces close/price_col marking even when this path is triggered for next-bar open execution (use_open=True). This can desync position marks (and any mark-to-market dependent logic) within the bar. Pass through use_open (or call once after the loop with the correct flag).
| broker.mark_account_positions(use_open=False) | |
| broker.mark_account_positions(use_open=use_open) |
| cache_hit = False | ||
| if manifest_path.exists(): | ||
| try: | ||
| manifest = json.loads(manifest_path.read_text(encoding="utf-8")) | ||
| cache_hit = manifest.get("signature") == lean_data_sig | ||
| except Exception: | ||
| cache_hit = False | ||
|
|
There was a problem hiding this comment.
The pre-check that sets cache_hit by comparing manifest['signature'] to lean_data_sig no longer matches the manifest format written by export_lean_daily_data (it hashes the full payload). This makes the local check misleading/dead code. Either remove this block and rely on export_lean_daily_data’s return value, or align the signature scheme so both sides compare the same value.
| | Zipline | **0 (0.00%)** | **$10.30 (0.0014%)** | **99.9%+** | DONE | | ||
| | Backtrader | **0 (0.00%)** | float noise | **99.9%+** | DONE | | ||
| | VBT OSS | +91 (0.04%) | $0 (0.00%) | **99%+** | Production | | ||
| | LEAN | +663 (0.29%) | $13K (1.2%) | **97%+** | Buying-power reservation gap | |
There was a problem hiding this comment.
The “Parity Confidence” table still lists the old LEAN trade/value diffs (+663 trades, $13K) even though this doc now reports LEAN parity as 0 fill gap and ~$1.55 value gap above. Update the LEAN row to reflect the current LEAN semantics/metrics (fills-based) to avoid contradictory guidance.
| | LEAN | +663 (0.29%) | $13K (1.2%) | **97%+** | Buying-power reservation gap | | |
| | LEAN | **0 (0.00%)** | **$1.55 (0.0002%)** | **99.9%+** | DONE | |
| validation_price = broker._current_prices.get(order.asset, fill_price) | ||
| valid, rejection_reason = self._validate_shadow_queue_order( | ||
| order=order, | ||
| validation_price=validation_price, | ||
| shadow_cash=shadow_cash, | ||
| shadow_positions=shadow_positions, | ||
| ) |
There was a problem hiding this comment.
In _process_orders_next_bar_queue_shadow, validation_price is derived from broker._current_prices, which will typically be the feed’s price_col (often close). In NEXT_BAR + open execution this can validate commissions/buying power against a different price than the actual fill_price, causing incorrect accepts/rejects. Use fill_price (or price) for validation/commission inputs, and only use mark prices for marking positions.
Summary
Verification