fix: route UBJSON-format .bst files to UBJ scanner#1037
Conversation
Modern XGBoost (2.0+) saves .bst files in UBJSON format by default. The binary structure validator didn't recognize UBJSON, marking these scans as inconclusive and failing closed. Detect UBJSON by its header signature and route to the existing UBJ scanner for proper analysis. Fixes Python 3.12 CI failure in TestXGBoostScannerIntegration::test_real_xgboost_model_creation_and_scan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Performance BenchmarksCompared
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 466858bcc8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if self._is_ubjson_file(path): | ||
| self._scan_ubj_model(path, result) | ||
| return |
There was a problem hiding this comment.
Preserve xgboost-load fallback when UBJSON decoder is missing
Routing every UBJSON-looking .bst file directly to _scan_ubj_model() introduces a regression for environments where enable_xgb_loading=True and ubjson is not installed: _scan_ubj_model() marks the scan inconclusive and returns, so _safe_xgboost_load() is never attempted even though the same XGBoost runtime can load modern UBJSON-backed .bst files. Before this change, those files could still be validated via the XGBoost load path; now they fail closed with xgboost_ubj_dependency_missing solely due to the optional Python UBJSON package.
Useful? React with 👍 / 👎.
|
Addressed review feedback in 4113278:
|
Summary
.bstfiles in UBJSON format by default, not the legacy binary formatbinary_structure_unrecognized→ inconclusive →success=False_is_ubjson_file()detection that checks for UBJSON object header signature ({+ type marker).bstfile is detected as UBJSON, it's routed to_scan_ubj_model()for proper analysisFixes the Python 3.12 CI failure:
TestXGBoostScannerIntegration::test_real_xgboost_model_creation_and_scanwas assertingbst_result.successwhich wasFalsebecause the fail-closed logic from #1019 correctly rejected the unrecognized binary structure — but the file was actually valid UBJSON.Test plan
test_real_xgboost_model_creation_and_scanpasses (was failing)🤖 Generated with Claude Code