fix: harden pickle nested bypass detection by mldangelo-oai · Pull Request #1027 · promptfoo/modelaudit

mldangelo-oai · 2026-04-15T18:19:30Z

Summary

Expand nested pickle prefix probing to detect valid no-PROTO binary opcode streams in raw bytes, base64, and hex literals.
Fail closed with an inconclusive critical finding when raw nested pickle probe candidates exceed the bounded probe budget.
Flag loader-termination, crash, resource-limit, and low-level process primitives in both Rust and Python policy tables.
Add regression coverage for the reported bypass payload classes and update the changelog.

Security fixes

Finding 1: detects nested pickle payloads that start with opcodes such as SHORT_BINUNICODE instead of a PROTO header.
Finding 2: treats exhausted nested-probe budgets as unsafe/incomplete instead of silently stopping before later payloads.
Finding 3: flags builtins.exit and builtins.quit as dangerous callables.
Finding 4: flags faulthandler crash helpers, resource.setrlimit, and _posixsubprocess.fork_exec.

Validation

uv run --with maturin maturin develop --manifest-path packages/modelaudit-picklescan/Cargo.toml
cargo test --manifest-path packages/modelaudit-picklescan/Cargo.toml
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_rust_engine.py packages/modelaudit-picklescan/tests/test_adversarial_pickle_oracle.py tests/scanners/test_pickle_scanner.py tests/scanners/test_picklescan_adapter.py -q
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1
git diff --check

github-actions · 2026-04-15T18:20:50Z

Workflow run and artifacts

Performance Benchmarks

Compared 18 shared benchmarks with a regression threshold of 15%.
Status: 1 regressions, 4 improved, 13 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 550.24ms -> 541.80ms (-1.5%).

Top regressions:

tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle +16.0% (99.5us -> 115.4us, safe_model.pkl, size=49.4 KiB, files=1)

Top improvements:

tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[long_benign_string] -30.7% (898.5us -> 623.1us, long_benign_string, size=1.0 MiB, files=1)
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_hex] -24.8% (84.9us -> 63.8us, nested_hex, size=130 B, files=1)
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_large] -20.8% (3.67ms -> 2.91ms, safe_large, size=278.2 KiB, files=1)

Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[long_benign_string]`	`long_benign_string`	1.0 MiB	1	898.5us	623.1us	-30.7%	improved
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_hex]`	`nested_hex`	130 B	1	84.9us	63.8us	-24.8%	improved
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_large]`	`safe_large`	278.2 KiB	1	3.67ms	2.91ms	-20.8%	improved
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_base64]`	`nested_base64`	98 B	1	77.9us	65.3us	-16.2%	improved
`tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle`	`safe_model.pkl`	49.4 KiB	1	99.5us	115.4us	+16.0%	regression
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[malicious_reduce]`	`malicious_reduce`	52 B	1	52.6us	46.0us	-12.6%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_stream`	`chunked_stream`	278.2 KiB	1	5.86ms	5.21ms	-11.1%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[stack_global]`	`stack_global`	21 B	1	43.3us	38.6us	-10.8%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_hidden_suspicious_string_budget`	`hidden_suspicious_string`	8.0 KiB	1	412.1us	450.1us	+9.2%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_validate_file_type_pytorch_zip`	`state_dict.pt`	1.5 MiB	1	32.8us	34.9us	+6.5%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_raw]`	`nested_raw`	78 B	1	63.6us	61.6us	-3.2%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_opcode_budget_tail_payload`	`opcode_budget_tail`	14 B	1	44.5us	43.1us	-3.2%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle`	`safe_model.pkl`	49.4 KiB	1	22.71ms	22.13ms	-2.6%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_multi_stream_padded_payload`	`multi_stream_padded`	4.1 KiB	1	83.1us	84.9us	+2.1%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_directory`	`duplicate-corpus`	840.0 KiB	81	382.03ms	376.38ms	-1.5%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_mixed_directory`	`mixed-corpus`	1.7 MiB	54	106.83ms	106.22ms	-0.6%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_small]`	`safe_small`	68 B	1	37.4us	37.2us	-0.4%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip`	`state_dict.pt`	1.5 MiB	1	27.21ms	27.29ms	+0.3%	stable

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8789a88bb7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-15T23:40:55Z

+        if limit_exceeded {
+            self.record_nested_probe_limit_exceeded("raw", value.len(), position);
+        }


Only raise probe-limit when skipped offsets remain

limit_exceeded is computed before skip_offsets_before filtering, then always emitted here. If offset 0 is a valid full nested pickle, many inner bytes can still count as prefix candidates (e.g., repeated 0x80 0x04 inside string data), trip the 64-offset cap, and force an inconclusive critical result even though all relevant bytes were already covered by the parsed payload.

Useful? React with 👍 / 👎.

fix: harden pickle nested bypass detection

61b0b3a

mldangelo-oai added 3 commits April 15, 2026 12:49

perf: gate nested pickle probe fallbacks

2991bda

perf: avoid redundant encoded pickle probes

71d31f0

fix: reduce nested pickle probe false positives

8789a88

mldangelo-oai marked this pull request as ready for review April 15, 2026 23:31

chatgpt-codex-connector bot reviewed Apr 15, 2026

View reviewed changes

mldangelo-oai merged commit c3a3b9d into main Apr 16, 2026
31 checks passed

mldangelo-oai deleted the mdangelo/codex/fix-pickle-bypass-policy-20260415 branch April 16, 2026 06:34

This was referenced Apr 16, 2026

chore: release main #1007

Merged

chore: release main #1050

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden pickle nested bypass detection#1027

fix: harden pickle nested bypass detection#1027
mldangelo-oai merged 4 commits intomainfrom
mdangelo/codex/fix-pickle-bypass-policy-20260415

mldangelo-oai commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mldangelo-oai commented Apr 15, 2026

Summary

Security fixes

Validation

Uh oh!

github-actions bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Apr 15, 2026 •

edited

Loading