Skip to content

feat: replace picklescan with Rust-native engine#990

Merged
mldangelo-oai merged 176 commits into
mainfrom
mdangelo/codex/rust-picklescan-rewrite
Apr 15, 2026
Merged

feat: replace picklescan with Rust-native engine#990
mldangelo-oai merged 176 commits into
mainfrom
mdangelo/codex/rust-picklescan-rewrite

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

@mldangelo-oai mldangelo-oai commented Apr 12, 2026

Summary

Replaces the old in-repo Python pickle opcode scanner with the Rust-native modelaudit-picklescan package and wires ModelAudit to use that Rust scanner as the only pickle engine. The root PickleScanner is now a thin integration layer for file/container handling, bounded ModelAudit compatibility checks, and ScanResult adaptation.

This PR also removes the Python picklescan engine-selection path, deletes the old Python opcode-support module, adds Rust CI/build/smoke coverage, and keeps the public Python API/report models for users of the standalone package.

What Changed

  • Rust-native picklescan package: adds the PyO3/maturin Rust extension exposed through modelaudit_picklescan._rust.
  • Rust-only Python boundary: removes the old Python engine modules and MODELAUDIT_PICKLESCAN_ENGINE=python selection path; the Python package now calls Rust directly.
  • ModelAudit integration: replaces the root pickle scanner implementation with a Rust-backed wrapper that preserves file checks, archive-member context, parse-incomplete/fail-closed metadata, legacy rule-code aliases, and bounded root-level compatibility checks.
  • Security compatibility: preserves legacy signals for raw eval/exec, __import__, dangerous protocol-0 globals, copyreg extension REDUCE, non-allowlisted __main__ references, GLOBAL __dict__ import-only references, encoded execution strings, and malformed/truncated pickles.
  • False-positive fixes: avoids treating comment-only importlib# ... text as critical and tightens the CVE SETITEM compatibility check so plain system bytes do not look like a SETITEM opcode.
  • Hot-path performance: reduces large-literal and nested-payload allocation/copying in Rust, adds suspicious-string fast rejection, avoids expensive encoded-nested probes unless a plausible encoded pickle prefix exists, and skips duplicated Python compatibility work where Rust has already provided the primary signal.
  • CI and packaging: adds Rust toolchain checks, cargo fmt/check/clippy/test, standalone wheel build/smoke coverage, root dependency wiring, and Docker/release updates.
  • Repository cleanup: removes temporary comparison harnesses now that the replacement evidence has been captured here, keeping the repo focused on durable scanner code and regression tests.

Equivalence Evidence

Before removing temporary comparison harnesses from the branch, I ran a historical equivalence suite against origin/main across standalone package bytes, standalone package stream, root stream, and root file surfaces.

Final full-corpus run:

Corpus Cases Comparisons Match Strengthened Operational improvement Regressions
Full generated + fixtures + malformed prefixes 852 3384 2477 905 2 0
Generated deterministic corpus 339 1332 1126 204 2 0
Committed fixture corpus 46 182 160 20 2 0

The two operational improvements are the negative stream-size cases: old main treated size=-1 as empty input or failed closed, while the Rust-backed path now treats negative size as unknown size and scans the actual stream. That directly addresses the review finding around negative stream sizes.

The strengthened results are additional detections or stricter fail-closed behavior, including base64/encoded execution strings, malformed missing-STOP inputs, dangerous alias globals, and root compatibility aliases. No main-detected malicious signal was lost, and no safe fixture regressed to malicious in the equivalence corpus.

Current branch fixture gate before cleanup also passed across 34 committed pickle fixtures and 102 package/adapter/root comparisons.

Benchmark Results

Latest incremental benchmark for this commit (macOS arm64, Python 3.11, release-built editable maturin extension via uv run --with 'maturin>=1.9,<2' maturin develop --release --manifest-path packages/modelaudit-picklescan/Cargo.toml, then uv run --with pytest-benchmark pytest tests/benchmarks/test_picklescan_benchmarks.py --benchmark-json /tmp/modelaudit-picklescan-bench-after-release.json --benchmark-min-rounds=16 -q):

Case Mean Notes
long_benign_string 0.560 ms 1 MiB repeated benign text literal; previously 70.066 ms in the pre-change benchmark harness and 4.684 ms in the dev-profile post-change harness
safe_large 2.553 ms model-shaped benign pickle
chunked_stream 4.213 ms Python stream wrapper path
hidden_suspicious_string_budget 0.344 ms still detects middle-of-literal os.system('id')
nested_hex 0.071 ms encoded nested payload
nested_base64 0.058 ms encoded nested payload
malicious_reduce 0.037 ms malicious reduce payload
stack_global 0.032 ms direct dangerous global payload

A user-facing CLI subprocess smoke benchmark using .venv/bin/modelaudit scan --no-cache -q -f json measured median 277.1 ms for the 1 MiB benign literal and 221.8 ms for a tiny safe pickle, which shows the current wall time is dominated by Python process startup/root CLI orchestration rather than native pickle scanning.

Latest local benchmark after the hot-path work using Python 3.12.12 and the rebuilt editable maturin extension. Each value is the median of 5 runs through both standalone modelaudit_picklescan.scan_bytes and the user-facing PickleScanner().scan(path) path.

Artifact Size Before root scanner median After root scanner median Root speedup Standalone Rust median Security result
safe_8m_bytes 8,388,636 bytes 7,957.52 ms 197.52 ms 40.3x 77.68 ms clean, 0 issues
safe_8m_string 8,388,636 bytes 8,725.32 ms 171.37 ms 50.9x 30.21 ms clean, 0 issues
hidden_mid_string 8,394,795 bytes 8,682.80 ms 105.83 ms 82.0x 84.52 ms detected, S101 suspicious string

A pre-optimization profile showed the Python CLI path was dominated by duplicated compatibility/root detectors rather than Rust itself. The hot-path changes move the root CLI path much closer to standalone Rust while preserving the compatibility signals above.

Validation

Latest local validation after rebasing onto the PR branch tip and pushing commit d6e05ac:

  • cargo fmt --manifest-path packages/modelaudit-picklescan/Cargo.toml -- --check
  • cargo check --manifest-path packages/modelaudit-picklescan/Cargo.toml
  • cargo clippy --manifest-path packages/modelaudit-picklescan/Cargo.toml --all-targets -- -D warnings
  • cargo test --manifest-path packages/modelaudit-picklescan/Cargo.toml (77 passed)
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py packages/modelaudit-picklescan/tests/test_adversarial_pickle_oracle.py -q (567 passed)
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (381 files already formatted)
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (All checks passed)
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (Success: no issues found in 436 source files)
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 (4224 passed, 13 skipped, 21 warnings)

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants