feat: replace picklescan with Rust-native engine by mldangelo-oai · Pull Request #990 · promptfoo/modelaudit

mldangelo-oai · 2026-04-12T23:34:42Z

Summary

Replaces the old in-repo Python pickle opcode scanner with the Rust-native modelaudit-picklescan package and wires ModelAudit to use that Rust scanner as the only pickle engine. The root PickleScanner is now a thin integration layer for file/container handling, bounded ModelAudit compatibility checks, and ScanResult adaptation.

This PR also removes the Python picklescan engine-selection path, deletes the old Python opcode-support module, adds Rust CI/build/smoke coverage, and keeps the public Python API/report models for users of the standalone package.

What Changed

Rust-native picklescan package: adds the PyO3/maturin Rust extension exposed through modelaudit_picklescan._rust.
Rust-only Python boundary: removes the old Python engine modules and MODELAUDIT_PICKLESCAN_ENGINE=python selection path; the Python package now calls Rust directly.
ModelAudit integration: replaces the root pickle scanner implementation with a Rust-backed wrapper that preserves file checks, archive-member context, parse-incomplete/fail-closed metadata, legacy rule-code aliases, and bounded root-level compatibility checks.
Security compatibility: preserves legacy signals for raw eval/exec, __import__, dangerous protocol-0 globals, copyreg extension REDUCE, non-allowlisted __main__ references, GLOBAL __dict__ import-only references, encoded execution strings, and malformed/truncated pickles.
False-positive fixes: avoids treating comment-only importlib# ... text as critical and tightens the CVE SETITEM compatibility check so plain system bytes do not look like a SETITEM opcode.
Hot-path performance: reduces large-literal and nested-payload allocation/copying in Rust, adds suspicious-string fast rejection, avoids expensive encoded-nested probes unless a plausible encoded pickle prefix exists, and skips duplicated Python compatibility work where Rust has already provided the primary signal.
CI and packaging: adds Rust toolchain checks, cargo fmt/check/clippy/test, standalone wheel build/smoke coverage, root dependency wiring, and Docker/release updates.
Repository cleanup: removes temporary comparison harnesses now that the replacement evidence has been captured here, keeping the repo focused on durable scanner code and regression tests.

Equivalence Evidence

Before removing temporary comparison harnesses from the branch, I ran a historical equivalence suite against origin/main across standalone package bytes, standalone package stream, root stream, and root file surfaces.

Final full-corpus run:

Corpus	Cases	Comparisons	Match	Strengthened	Operational improvement
Full generated + fixtures + malformed prefixes	852	3384	2477	905	2
Generated deterministic corpus	339	1332	1126	204	2
Committed fixture corpus	46	182	160	20	2

The two operational improvements are the negative stream-size cases: old main treated size=-1 as empty input or failed closed, while the Rust-backed path now treats negative size as unknown size and scans the actual stream. That directly addresses the review finding around negative stream sizes.

The strengthened results are additional detections or stricter fail-closed behavior, including base64/encoded execution strings, malformed missing-STOP inputs, dangerous alias globals, and root compatibility aliases. No main-detected malicious signal was lost, and no safe fixture regressed to malicious in the equivalence corpus.

Current branch fixture gate before cleanup also passed across 34 committed pickle fixtures and 102 package/adapter/root comparisons.

Benchmark Results

Latest incremental benchmark for this commit (macOS arm64, Python 3.11, release-built editable maturin extension via uv run --with 'maturin>=1.9,<2' maturin develop --release --manifest-path packages/modelaudit-picklescan/Cargo.toml, then uv run --with pytest-benchmark pytest tests/benchmarks/test_picklescan_benchmarks.py --benchmark-json /tmp/modelaudit-picklescan-bench-after-release.json --benchmark-min-rounds=16 -q):

Case	Mean	Notes
`long_benign_string`	0.560 ms	1 MiB repeated benign text literal; previously 70.066 ms in the pre-change benchmark harness and 4.684 ms in the dev-profile post-change harness
`safe_large`	2.553 ms	model-shaped benign pickle
`chunked_stream`	4.213 ms	Python stream wrapper path
`hidden_suspicious_string_budget`	0.344 ms	still detects middle-of-literal `os.system('id')`
`nested_hex`	0.071 ms	encoded nested payload
`nested_base64`	0.058 ms	encoded nested payload
`malicious_reduce`	0.037 ms	malicious reduce payload
`stack_global`	0.032 ms	direct dangerous global payload

A user-facing CLI subprocess smoke benchmark using .venv/bin/modelaudit scan --no-cache -q -f json measured median 277.1 ms for the 1 MiB benign literal and 221.8 ms for a tiny safe pickle, which shows the current wall time is dominated by Python process startup/root CLI orchestration rather than native pickle scanning.

Latest local benchmark after the hot-path work using Python 3.12.12 and the rebuilt editable maturin extension. Each value is the median of 5 runs through both standalone modelaudit_picklescan.scan_bytes and the user-facing PickleScanner().scan(path) path.

Artifact	Size	Before root scanner median	After root scanner median	Root speedup	Standalone Rust median	Security result
`safe_8m_bytes`	8,388,636 bytes	7,957.52 ms	197.52 ms	40.3x	77.68 ms	clean, 0 issues
`safe_8m_string`	8,388,636 bytes	8,725.32 ms	171.37 ms	50.9x	30.21 ms	clean, 0 issues
`hidden_mid_string`	8,394,795 bytes	8,682.80 ms	105.83 ms	82.0x	84.52 ms	detected, `S101` suspicious string

A pre-optimization profile showed the Python CLI path was dominated by duplicated compatibility/root detectors rather than Rust itself. The hot-path changes move the root CLI path much closer to standalone Rust while preserving the compatibility signals above.

Validation

Latest local validation after rebasing onto the PR branch tip and pushing commit d6e05ac:

cargo fmt --manifest-path packages/modelaudit-picklescan/Cargo.toml -- --check
cargo check --manifest-path packages/modelaudit-picklescan/Cargo.toml
cargo clippy --manifest-path packages/modelaudit-picklescan/Cargo.toml --all-targets -- -D warnings
cargo test --manifest-path packages/modelaudit-picklescan/Cargo.toml (77 passed)
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py packages/modelaudit-picklescan/tests/test_adversarial_pickle_oracle.py -q (567 passed)
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (381 files already formatted)
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (All checks passed)
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (Success: no issues found in 436 source files)
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 (4224 passed, 13 skipped, 21 warnings)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace picklescan with Rust-native engine#990

feat: replace picklescan with Rust-native engine#990
mldangelo-oai merged 176 commits into
mainfrom
mdangelo/codex/rust-picklescan-rewrite

mldangelo-oai commented Apr 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mldangelo-oai commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Equivalence Evidence

Benchmark Results

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mldangelo-oai commented Apr 12, 2026 •

edited

Loading