Skip to content

fix: fail closed on incomplete Flax traversal#1295

Merged
mldangelo-oai merged 8 commits into
mainfrom
mdangelo/codex/fix-flax-recursion-limit-outcome
May 28, 2026
Merged

fix: fail closed on incomplete Flax traversal#1295
mldangelo-oai merged 8 commits into
mainfrom
mdangelo/codex/fix-flax-recursion-limit-outcome

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

@mldangelo-oai mldangelo-oai commented May 24, 2026

Summary

  • merge the refreshed renamed-Flax-routing base so this stacked PR is reviewable against current behavior
  • mark Flax MessagePack recursion-depth exhaustion as inconclusive coverage instead of reporting a clean scan
  • prove benign and hidden-payload recursion-limit results return exit code 2 and are not cached as clean results

Finding

This PR is stacked on #1280, which routes structurally valid renamed Flax MessagePack files by content and now fails closed when its bounded routing probe is exhausted. With max_recursion_depth=2, a routed hidden_payload.jpg containing os.system("id") below the traversal cap previously emitted an informational recursion-depth check but returned success=True and aggregate exit code 0. Content beyond the configured cap was unexamined, so that clean result was a false negative.

The scanner now records flax_msgpack_recursion_limit_exceeded and returns aggregate exit code 2 when traversal cannot cover the payload. Both a benign deep checkpoint and a malicious payload hidden beyond the cap are explicitly inconclusive and remain absent from the clean-result cache. A suspicious pattern found within the permitted depth remains a security finding with exit code 1.

Verification

  • env VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --active --no-sync ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • env VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --active --no-sync ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • env VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --active --no-sync mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • env npm_config_cache=/private/tmp/modelaudit-npm-cache npx prettier --check CHANGELOG.md README.md docs/user/compatibility-matrix.md
  • env PYTHONPATH=/private/tmp/modelaudit-pr1295-audit VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --active --no-sync pytest tests/scanners/test_flax_msgpack_scanner.py tests/test_core.py -q (156 passed)
  • env PYTHONPATH=/private/tmp/modelaudit-pr1295-audit VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --active --no-sync pytest tests/test_performance_benchmarks.py::TestPerformanceBenchmarks::test_concurrent_performance -q (1 skipped under the local-overhead guard after an initial loaded run timed out)
  • env PYTHONPATH=/private/tmp/modelaudit-pr1295-audit VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --active --no-sync pytest -n auto -m "not slow and not integration" --maxfail=1 (5707 passed, 16 skipped)

Recognize bounded, structurally valid Flax/JAX MessagePack payloads when they use misleading suffixes, while leaving generic MessagePack state maps skipped. Add routing, filtering, fail-closed, and malicious payload regression coverage plus user-facing documentation.
Treat MessagePack recursion-limit exhaustion as inconclusive coverage, preserve observed suspicious patterns, and add renamed hidden-payload plus benign regression coverage.
@mldangelo-oai mldangelo-oai marked this pull request as ready for review May 25, 2026 00:18
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 25, 2026

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 726.71ms -> 724.26ms (-0.3%).

Workload Benchmark Target Size Files Baseline Current Change Status
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 102.17ms 96.28ms -5.8% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 461.2us 477.8us +3.6% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 36.82ms 37.68ms +2.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 474.7us 484.2us +2.0% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 1.66ms 1.64ms -1.2% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 15.25ms 15.09ms -1.1% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 288.50ms 291.22ms +0.9% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 486.8us 490.1us +0.7% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 18.51ms 18.55ms +0.2% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 1.58ms 1.59ms +0.2% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 61.40ms 61.31ms -0.1% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 199.39ms 199.45ms +0.0% stable

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Flax/JAX MessagePack scanner to fail closed when traversal cannot fully cover the payload due to recursion-depth exhaustion, classifying those cases as inconclusive (exit code 2) and ensuring they are not cached as clean results.

Changes:

  • Mark recursion-depth exhaustion during Flax MessagePack traversal as an inconclusive scan outcome rather than a clean success.
  • Update scanner success semantics so success=False when scan_outcome == inconclusive, even without CRITICAL findings.
  • Add regression tests verifying exit code 2 for benign/hidden-payload recursion-limit cases and confirming inconclusive results are not cached; update changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
modelaudit/scanners/flax_msgpack_scanner.py Marks recursion-depth exhaustion as inconclusive and ensures final success is false for inconclusive outcomes.
tests/scanners/test_flax_msgpack_scanner.py Adds tests for inconclusive recursion-limit behavior, aggregate exit code 2, and “not cached as clean” guarantees.
CHANGELOG.md Documents the recursion-limit inconclusive-coverage behavior change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread modelaudit/scanners/flax_msgpack_scanner.py
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Base automatically changed from mdangelo/codex/fix-renamed-flax-msgpack-routing to main May 26, 2026 23:51
@mldangelo-oai mldangelo-oai merged commit 335d06c into main May 28, 2026
29 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/fix-flax-recursion-limit-outcome branch May 28, 2026 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants