fix: fail closed on incomplete Flax traversal by mldangelo-oai · Pull Request #1295 · promptfoo/modelaudit

mldangelo-oai · 2026-05-24T22:25:30Z

Summary

merge the refreshed renamed-Flax-routing base so this stacked PR is reviewable against current behavior
mark Flax MessagePack recursion-depth exhaustion as inconclusive coverage instead of reporting a clean scan
prove benign and hidden-payload recursion-limit results return exit code 2 and are not cached as clean results

Finding

This PR is stacked on #1280, which routes structurally valid renamed Flax MessagePack files by content and now fails closed when its bounded routing probe is exhausted. With max_recursion_depth=2, a routed hidden_payload.jpg containing os.system("id") below the traversal cap previously emitted an informational recursion-depth check but returned success=True and aggregate exit code 0. Content beyond the configured cap was unexamined, so that clean result was a false negative.

The scanner now records flax_msgpack_recursion_limit_exceeded and returns aggregate exit code 2 when traversal cannot cover the payload. Both a benign deep checkpoint and a malicious payload hidden beyond the cap are explicitly inconclusive and remain absent from the clean-result cache. A suspicious pattern found within the permitted depth remains a security finding with exit code 1.

Verification

env VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --active --no-sync ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
env VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --active --no-sync ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
env VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --active --no-sync mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
env npm_config_cache=/private/tmp/modelaudit-npm-cache npx prettier --check CHANGELOG.md README.md docs/user/compatibility-matrix.md
env PYTHONPATH=/private/tmp/modelaudit-pr1295-audit VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --active --no-sync pytest tests/scanners/test_flax_msgpack_scanner.py tests/test_core.py -q (156 passed)
env PYTHONPATH=/private/tmp/modelaudit-pr1295-audit VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --active --no-sync pytest tests/test_performance_benchmarks.py::TestPerformanceBenchmarks::test_concurrent_performance -q (1 skipped under the local-overhead guard after an initial loaded run timed out)
env PYTHONPATH=/private/tmp/modelaudit-pr1295-audit VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --active --no-sync pytest -n auto -m "not slow and not integration" --maxfail=1 (5707 passed, 16 skipped)

Recognize bounded, structurally valid Flax/JAX MessagePack payloads when they use misleading suffixes, while leaving generic MessagePack state maps skipped. Add routing, filtering, fail-closed, and malicious payload regression coverage plus user-facing documentation.

Treat MessagePack recursion-limit exhaustion as inconclusive coverage, preserve observed suspicious patterns, and add renamed hidden-payload plus benign regression coverage.

github-actions · 2026-05-25T00:18:47Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 726.71ms -> 724.26ms (-0.3%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	102.17ms	96.28ms	-5.8%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	461.2us	477.8us	+3.6%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	36.82ms	37.68ms	+2.3%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	474.7us	484.2us	+2.0%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	1.66ms	1.64ms	-1.2%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	15.25ms	15.09ms	-1.1%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	288.50ms	291.22ms	+0.9%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	486.8us	490.1us	+0.7%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	18.51ms	18.55ms	+0.2%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	1.58ms	1.59ms	+0.2%	stable
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	61.40ms	61.31ms	-0.1%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	199.39ms	199.45ms	+0.0%	stable

Copilot

Pull request overview

This PR updates the Flax/JAX MessagePack scanner to fail closed when traversal cannot fully cover the payload due to recursion-depth exhaustion, classifying those cases as inconclusive (exit code 2) and ensuring they are not cached as clean results.

Changes:

Mark recursion-depth exhaustion during Flax MessagePack traversal as an inconclusive scan outcome rather than a clean success.
Update scanner success semantics so success=False when scan_outcome == inconclusive, even without CRITICAL findings.
Add regression tests verifying exit code 2 for benign/hidden-payload recursion-limit cases and confirming inconclusive results are not cached; update changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`modelaudit/scanners/flax_msgpack_scanner.py`	Marks recursion-depth exhaustion as inconclusive and ensures final `success` is false for inconclusive outcomes.
`tests/scanners/test_flax_msgpack_scanner.py`	Adds tests for inconclusive recursion-limit behavior, aggregate exit code `2`, and “not cached as clean” guarantees.
`CHANGELOG.md`	Documents the recursion-limit inconclusive-coverage behavior change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mldangelo-oai · 2026-05-25T04:32:18Z

@codex review

chatgpt-codex-connector · 2026-05-25T04:36:49Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…-pr1295

mldangelo-oai added 5 commits May 24, 2026 16:15

fix: route large renamed Flax checkpoints

ee16c67

fix: fail closed on ambiguous renamed Flax checkpoints

61e74e7

fix: fail closed on incomplete Flax traversal

3cddea8

Treat MessagePack recursion-limit exhaustion as inconclusive coverage, preserve observed suspicious patterns, and add renamed hidden-payload plus benign regression coverage.

fix: sync and harden incomplete Flax traversal

56f5494

mldangelo-oai marked this pull request as ready for review May 25, 2026 00:18

mldangelo-oai requested a review from Copilot May 25, 2026 04:26

Copilot started reviewing on behalf of mldangelo-oai May 25, 2026 04:26 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread modelaudit/scanners/flax_msgpack_scanner.py

ianw-oai approved these changes May 25, 2026

View reviewed changes

Base automatically changed from mdangelo/codex/fix-renamed-flax-msgpack-routing to main May 26, 2026 23:51

mldangelo-oai added 3 commits May 27, 2026 11:59

fix: harden incomplete Flax traversal after main sync

31190e5

Merge remote-tracking branch 'origin/main' into mdangelo/codex/review…

0c527ac

…-pr1295

test: cover mixed incomplete Flax findings

227c466

mldangelo-oai merged commit 335d06c into main May 28, 2026
29 checks passed

mldangelo-oai deleted the mdangelo/codex/fix-flax-recursion-limit-outcome branch May 28, 2026 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fail closed on incomplete Flax traversal#1295

fix: fail closed on incomplete Flax traversal#1295
mldangelo-oai merged 8 commits into
mainfrom
mdangelo/codex/fix-flax-recursion-limit-outcome

mldangelo-oai commented May 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

mldangelo-oai commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mldangelo-oai commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Finding

Verification

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

mldangelo-oai commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mldangelo-oai commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading