fix: fail closed on incomplete Flax traversal#1295
Conversation
Recognize bounded, structurally valid Flax/JAX MessagePack payloads when they use misleading suffixes, while leaving generic MessagePack state maps skipped. Add routing, filtering, fail-closed, and malicious payload regression coverage plus user-facing documentation.
Treat MessagePack recursion-limit exhaustion as inconclusive coverage, preserve observed suspicious patterns, and add renamed hidden-payload plus benign regression coverage.
Performance BenchmarksCompared
|
There was a problem hiding this comment.
Pull request overview
This PR updates the Flax/JAX MessagePack scanner to fail closed when traversal cannot fully cover the payload due to recursion-depth exhaustion, classifying those cases as inconclusive (exit code 2) and ensuring they are not cached as clean results.
Changes:
- Mark recursion-depth exhaustion during Flax MessagePack traversal as an inconclusive scan outcome rather than a clean success.
- Update scanner success semantics so
success=Falsewhenscan_outcome == inconclusive, even without CRITICAL findings. - Add regression tests verifying exit code
2for benign/hidden-payload recursion-limit cases and confirming inconclusive results are not cached; update changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
modelaudit/scanners/flax_msgpack_scanner.py |
Marks recursion-depth exhaustion as inconclusive and ensures final success is false for inconclusive outcomes. |
tests/scanners/test_flax_msgpack_scanner.py |
Adds tests for inconclusive recursion-limit behavior, aggregate exit code 2, and “not cached as clean” guarantees. |
CHANGELOG.md |
Documents the recursion-limit inconclusive-coverage behavior change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
2and are not cached as clean resultsFinding
This PR is stacked on #1280, which routes structurally valid renamed Flax MessagePack files by content and now fails closed when its bounded routing probe is exhausted. With
max_recursion_depth=2, a routedhidden_payload.jpgcontainingos.system("id")below the traversal cap previously emitted an informational recursion-depth check but returnedsuccess=Trueand aggregate exit code0. Content beyond the configured cap was unexamined, so that clean result was a false negative.The scanner now records
flax_msgpack_recursion_limit_exceededand returns aggregate exit code2when traversal cannot cover the payload. Both a benign deep checkpoint and a malicious payload hidden beyond the cap are explicitly inconclusive and remain absent from the clean-result cache. A suspicious pattern found within the permitted depth remains a security finding with exit code1.Verification
env VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --active --no-sync ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/env VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --active --no-sync ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/env VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --active --no-sync mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/env npm_config_cache=/private/tmp/modelaudit-npm-cache npx prettier --check CHANGELOG.md README.md docs/user/compatibility-matrix.mdenv PYTHONPATH=/private/tmp/modelaudit-pr1295-audit VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --active --no-sync pytest tests/scanners/test_flax_msgpack_scanner.py tests/test_core.py -q(156 passed)env PYTHONPATH=/private/tmp/modelaudit-pr1295-audit VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --active --no-sync pytest tests/test_performance_benchmarks.py::TestPerformanceBenchmarks::test_concurrent_performance -q(1 skippedunder the local-overhead guard after an initial loaded run timed out)env PYTHONPATH=/private/tmp/modelaudit-pr1295-audit VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --active --no-sync pytest -n auto -m "not slow and not integration" --maxfail=1(5707 passed, 16 skipped)