fix(deps): promote msgpack to core dependency for Flax scanner by yash2998chhabria · Pull Request #583 · promptfoo/modelaudit

yash2998chhabria · 2026-02-25T19:58:43Z

Summary

Promotes msgpack>=1.0.0,<2.0 from optional dependency (extras: flax) to a core dependency in pyproject.toml
The flax_msgpack scanner was registered but silently fell back to generic scanning when msgpack was not installed, causing .msgpack Flax/JAX model files to miss specialized security analysis
msgpack is a lightweight, zero-dependency C extension (~100KB) that is appropriate as a core dependency

Test plan

Verified FlaxMsgpackScanner loads successfully with python -c "from modelaudit.scanners.flax_msgpack_scanner import FlaxMsgpackScanner; print('OK')"
Scanned 3 HuggingFace Flax models: sshleifer/tiny-gpt2, hf-internal-testing/tiny-bert-flax-only, patrickvonplaten/t5-tiny-random -- all recognized by flax_msgpack scanner (not generic fallback)
Regression tested 2 additional Flax models: ArthurZ/tiny-random-bert-flax-only, sanchit-gandhi/tiny-random-flax-bert -- both scanned correctly
Full test suite passes: 377 passed, 0 failed (python -m pytest tests/ -x -q --timeout=60)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Pickle scanner: multi-stream parsing, resync after non-pickle bytes, and propagation of pickle-boundary metadata for downstream binary scanning.
- Flax/msgpack scanner: improved transformer-model detection including nested-model and name-key heuristics.
Behavior Changes
- Scanner now flags a much larger set of stdlib/third-party modules and call patterns as suspicious in pickle contexts.
Tests
- New regression suite for crafted pickle payloads, multi-stream scenarios, and expanded blocklist coverage.
Chores
- Added msgpack dependency.

Expand ALWAYS_DANGEROUS_MODULES with ~45 new modules not covered by PR #518: network (smtplib, imaplib, poplib, nntplib, xmlrpc, socketserver, ssl, requests, aiohttp), compilation (codeop, marshal, types, compileall, py_compile), FFI (_ctypes), debugging (bdb, trace), operator bypasses (_operator, functools), pickle recursion (pickle, _pickle, dill, cloudpickle, joblib), filesystem (tempfile, filecmp, fileinput, glob, distutils, pydoc, pexpect), venv/pip (venv, ensurepip, pip), threading (signal, _signal, threading), and more (webbrowser, asyncio, mmap, select, selectors, logging, syslog, tarfile, zipfile, shelve, sqlite3, _sqlite3, doctest, idlelib, lib2to3, uuid). Add uuid as novel detection (VULN-6: _get_command_stdout calls subprocess.Popen) and pkgutil.resolve_name as dynamic resolution trampoline. Mirror new modules in SUSPICIOUS_GLOBALS for defense-in-depth. Fix multi-stream opcode analysis so detailed checks run across all pickle streams. Add NEWOBJ_EX to all opcode check lists. Includes 10 regression tests covering pkgutil trampoline, uuid RCE, multi-stream exploit, NEWOBJ_EX, and spot-checks for smtplib, sqlite3, tarfile, marshal, cloudpickle, and webbrowser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e scanning - Fix test regression: track first pickle STOP position and use it for binary content scanning instead of f.tell() (which advances past multi-stream data) - Fix multi-stream resync bypass: skip junk separator bytes between streams (up to 256 bytes) instead of returning immediately - Gracefully handle ValueErrors on subsequent streams (non-pickle data after valid pickle) instead of propagating them as file-level errors - Add type hints: ScanResult return type on _scan_bytes, -> None on all test methods - Add separator-byte resync regression test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Buffer opcodes for subsequent pickle streams and discard partial streams (binary tensor data misinterpreted as opcodes) to prevent MANY_DANGEROUS_OPCODES false positives - Add resync logic to skip up to 256 non-pickle separator bytes between streams, preventing single-byte bypass of multi-stream detection - Track pickle memo (BINPUT/BINGET) for safe ML globals so REDUCE calls referencing safe callables via memo are not counted as dangerous opcodes - Reset dangerous opcode counters at STOP boundaries so each stream is evaluated independently - Use first_pickle_end_pos for binary content scanning instead of f.tell() which advances past tensor data during multi-stream parsing Validated against 5 HuggingFace models with A/B comparison to main branch — no new false positives introduced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The flax_msgpack scanner was registered but failed to load when msgpack was not installed, silently falling back to generic scanning. Since msgpack is a lightweight zero-dependency C extension, promote it from optional (extras: flax) to a core dependency so Flax/JAX .msgpack files are scanned correctly out of the box. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-25T19:59:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Expands suspicious globals and always-dangerous lists, adds multi-stream pickle parsing with resync and first_pickle_end_pos propagation, treats NEWOBJ_EX in danger checks, adds tests for blocklist and multi-stream cases, tightens msgpack dependency, and refines Flax/MsgPack transformer detection.

Changes

Cohort / File(s)	Summary
Suspicious symbols mapping `modelaudit/detectors/suspicious_symbols.py`	Large expansion of `SUSPICIOUS_GLOBALS` with many module/function entries (networking, execution/compilation, FFI, threading/process, stdlib trampolines, packaging, tooling).
Pickle scanner logic `modelaudit/scanners/pickle_scanner.py`	Adds multi-stream parsing with resync & `_MAX_RESYNC_BYTES`, buffers subsequent streams, tracks/propagates `first_pickle_end_pos`, treats `NEWOBJ_EX` like `NEWOBJ` in many checks, and expands `ALWAYS_DANGEROUS_FUNCTIONS`/`ALWAYS_DANGEROUS_MODULES`.
Pickle scanner tests `tests/scanners/test_pickle_scanner.py`	New test class covering GLOBAL+REDUCE payloads, NEWOBJ_EX detection, multi-stream benign→malicious flows, separator-byte resync, and blocklist assertions for many modules.
Flax/msgpack scanner `modelaudit/scanners/flax_msgpack_scanner.py`	Improves transformer-model detection by inspecting nested dicts and top-level keys matching common model-name tokens when transformer keys are absent.
Dependencies `pyproject.toml`	Adds `msgpack>=1.0.0,<2.0` constraint.
Real-world tests tweak `tests/test_real_world_dill_joblib.py`	Relaxes assertion for compressed joblib scans to require any issue when bytes_scanned == 0.

Sequence Diagram(s)

sequenceDiagram
    participant FileObj as FileObj
    participant Parser as PickleParser
    participant Buffer as StreamBuffer
    participant Detector as OpcodeDetector
    participant Binary as BinaryScanner

    FileObj->>Parser: read bytes
    Parser->>Parser: attempt parse stream
    alt first stream completes (STOP)
        Parser->>Detector: yield ops for analysis
        Parser->>Parser: set first_pickle_end_pos
    else subsequent stream
        Parser->>Buffer: buffer opcodes
        Buffer->>Parser: validate buffered stream
        alt buffered parse succeeds
            Buffer->>Detector: yield ops for analysis
        else within _MAX_RESYNC_BYTES
            Parser->>FileObj: skip separator bytes (resync)
            Parser->>Parser: continue parsing
        end
    end
    Detector->>Binary: attach first_pickle_end_pos to metadata
    Binary->>Detector: return binary-scan results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

"I nibbled bytes and hopped through streams,
Resynced gaps and chased new schemes.
NEWOBJ_EX lit the carrot sky,
Globals grew—oh my, oh my!
🐇✨"

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary change: promoting msgpack from an optional dependency to a core dependency for the Flax scanner. It clearly summarizes the main objective of the pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 95.83% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/flax-dep

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 52-54: The scanner currently terminates multi-stream scanning when
a later stream yields at least one opcode before an error; instead, in the
scanning loop that uses resync_skipped and _MAX_RESYNC_BYTES you must remove the
early return on partial parse and implement resync logic: when a stream errors
after yielding opcodes, do not return — increment resync_skipped, advance the
input by one byte and attempt to resync/continue scanning this stream (and
subsequent streams) until a match or until resync_skipped exceeds a sensible
limit; do not treat a single opcode+junk separator as final. Replace the
immediate return with a continue/resync flow, reset resync_skipped on successful
parse, and revise the hard-coded 256-byte ceiling (_MAX_RESYNC_BYTES) to either
be higher or configurable so simple padding cannot evade detection; add tests
for benign and malicious samples to verify detection still fires after padding.

ℹ️ Review info

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e736ebb and b1ff559.

📒 Files selected for processing (4)

modelaudit/detectors/suspicious_symbols.py
modelaudit/scanners/pickle_scanner.py
pyproject.toml
tests/scanners/test_pickle_scanner.py

Some HuggingFace Flax models (e.g. bert-for-token-classification) use a nested structure where model weights are wrapped under a model-name key (e.g. "bert", "classifier") rather than placing transformer keys like "encoder" or "embeddings" at the top level. This caused an INFO-level false positive: "Suspicious data structure - does not match known ML model patterns". Add detection for nested model structures by checking if top-level keys contain transformer sub-keys, and add a set of common HuggingFace model name keys to recognize known architectures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Resolve conflicts in modelaudit/scanners/pickle_scanner.py: - ALWAYS_DANGEROUS_GLOBALS: keep both branch additions (pkgutil.resolve_name, uuid internals) and main additions (cProfile/profile/pdb/timeit, ctypes) - ALWAYS_DANGEROUS_MODULES: merge branch's comprehensive module blocklist with main's PR #518 additions (ctypes, profiling, thread/process, module loading) and preserve main's linecache/logging.config exclusion note - REDUCE opcode handler: adopt main's resolved_callables approach with branch's _safe_memo fallback for BINGET memo patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 52-54: Document the security trade-off for _MAX_RESYNC_BYTES in
pickle_scanner.py: update the comment near the _MAX_RESYNC_BYTES and
resync_skipped definitions to explain that the 256-byte limit prevents skipping
arbitrarily large gaps between pickle streams but can be evaded by inserting
>=256 bytes of non-pickle data before a malicious stream, note the rationale for
stopping on partial opcode sequences to avoid misinterpreting binary tensor
data, and recommend making _MAX_RESYNC_BYTES configurable (and covered by tests)
if callers need a different performance/security trade-off.
- Around line 409-411: The scan currently lists "functools" in
ALWAYS_DANGEROUS_MODULES which will produce false positives for legitimate ML
usages; update ML_SAFE_GLOBALS to whitelist safe functools members (e.g.,
partial, reduce, lru_cache, wraps) and adjust any checks that treat the entire
module as dangerous so they allow these specific attributes, or add a clarifying
comment (similar to the existing logging/linecache rationale) explaining why the
module is treated specially; reference ALWAYS_DANGEROUS_MODULES and
ML_SAFE_GLOBALS in your change so the scanner accepts common safe usages like
functools.partial or functools.lru_cache while still flagging dangerous patterns
that use functools to wrap unsafe callables.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b1ff559 and c3b8150.

📒 Files selected for processing (4)

modelaudit/detectors/suspicious_symbols.py
modelaudit/scanners/flax_msgpack_scanner.py
modelaudit/scanners/pickle_scanner.py
tests/scanners/test_pickle_scanner.py

modelaudit/scanners/pickle_scanner.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelaudit/scanners/pickle_scanner.py (1)

2132-2138: ⚠️ Potential issue | 🟠 Major

Split INST from OBJ/NEWOBJ/NEWOBJ_EX handling—the current condition disables the latter three.

In pickletools.genops(), INST yields a string argument (module and class name), but OBJ, NEWOBJ, and NEWOBJ_EX have no inline arguments and yield arg=None. The isinstance(arg, str) guard prevents these three opcodes from ever triggering the detection pattern, making their addition ineffective.

Suggested fix

-        if opcode.name in ["INST", "OBJ", "NEWOBJ", "NEWOBJ_EX"] and isinstance(arg, str):
-            return {
-                "pattern": f"{opcode.name}_EXECUTION",
-                "argument": arg,
-                "position": pos,
-                "opcode": opcode.name,
-            }
+        if opcode.name == "INST" and isinstance(arg, str):
+            return {
+                "pattern": f"{opcode.name}_EXECUTION",
+                "argument": arg,
+                "position": pos,
+                "opcode": opcode.name,
+            }
+
+        if opcode.name in {"OBJ", "NEWOBJ", "NEWOBJ_EX"}:
+            return {
+                "pattern": f"{opcode.name}_EXECUTION",
+                "argument": arg,
+                "position": pos,
+                "opcode": opcode.name,
+            }

This violates the guideline: "Preserve or strengthen security detections; test both benign and malicious samples when modifying scanners."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/pickle_scanner.py` around lines 2132 - 2138, The current
check conflates INST with OBJ/NEWOBJ/NEWOBJ_EX by requiring isinstance(arg,
str), which blocks OBJ/NEWOBJ/NEWOBJ_EX because they yield arg=None; update the
logic in pickle_scanner.py (around the opcode handling using opcode.name and
arg, e.g., in the code that references pickletools.genops and opcode.name) to
split cases: handle opcode.name == "INST" with the isinstance(arg, str) guard
and emit the INST_EXECUTION pattern including the string argument, and add a
separate branch for opcode.name in ["OBJ","NEWOBJ","NEWOBJ_EX"] that emits a
distinct pattern (e.g., OBJ_EXECUTION or NEWOBJ_EXECUTION) using position and
opcode.name while not requiring arg to be a string so those opcodes are detected
correctly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/flax_msgpack_scanner.py`:
- Around line 726-737: The loop that checks top-level keys (using found_keys,
obj, transformer_keys, model_name_keys, has_transformer_keys) can call
key.lower() on non-string keys and raise AttributeError; guard against that by
first verifying the key is a string (e.g., if isinstance(key, str)) before
calling key.lower(), and only perform the model-name membership checks
(key.lower() in model_name_keys and any(mk in key.lower() for mk in
model_name_keys)) when the key is a string so the scanner won't crash while
preserving the same transformer-key detection logic.

---

Outside diff comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 2132-2138: The current check conflates INST with
OBJ/NEWOBJ/NEWOBJ_EX by requiring isinstance(arg, str), which blocks
OBJ/NEWOBJ/NEWOBJ_EX because they yield arg=None; update the logic in
pickle_scanner.py (around the opcode handling using opcode.name and arg, e.g.,
in the code that references pickletools.genops and opcode.name) to split cases:
handle opcode.name == "INST" with the isinstance(arg, str) guard and emit the
INST_EXECUTION pattern including the string argument, and add a separate branch
for opcode.name in ["OBJ","NEWOBJ","NEWOBJ_EX"] that emits a distinct pattern
(e.g., OBJ_EXECUTION or NEWOBJ_EXECUTION) using position and opcode.name while
not requiring arg to be a string so those opcodes are detected correctly.

ℹ️ Review info

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c3b8150 and 1869d08.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (2)

modelaudit/scanners/flax_msgpack_scanner.py
modelaudit/scanners/pickle_scanner.py

modelaudit/scanners/flax_msgpack_scanner.py

- Guard key.lower() for non-string msgpack keys to prevent AttributeError that could abort the scan (evasion vector) - Add functools safe functions to ML_SAFE_GLOBALS to whitelist common ML utilities like partial, reduce, lru_cache - Increase _MAX_RESYNC_BYTES from 256 to 8192 so binary tensor data between pickle streams doesn't cause early scan termination - Continue scanning on partial stream errors instead of returning early, preventing evasion via hidden payloads after binary blobs - Clear _safe_memo at STOP boundaries so safe callables from one pickle stream cannot whitelist dangerous REDUCEs in the next Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/flax_msgpack_scanner.py`:
- Around line 695-740: The substring check using model_name_keys (especially
short tokens like "opt" or "score") is too permissive; update the logic in the
block that computes has_transformer_keys (references: model_name_keys,
has_transformer_keys, transformer_keys, key_lower, found_keys, obj.get) to use
stricter token/boundary matching instead of any(mk in key_lower for mk in
model_name_keys): normalize key_lower and match whole tokens (e.g., split on
non-alphanumerics or use regex word boundaries) and explicitly check both exact
matches and token-level membership for entries in model_name_keys, and
add/update unit tests to include adversarial keys (e.g., "optional_score",
"desktop") and true positives to ensure the heuristic no longer false-positives
benign payloads.

In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 95-100: When the branch with "if stream_error and had_opcodes" is
taken inside the PickleScanner scanning loop, increment the resync budget by
adding the number of bytes consumed by the discarded buffer to resync_skipped
and abort (raise/stop) when resync_skipped exceeds _MAX_RESYNC_BYTES;
specifically, compute the consumed byte count for the current partial stream
(e.g., bytes between the previous sync point and current file position or
len(buffer) being discarded), add that value to resync_skipped, and enforce the
existing _MAX_RESYNC_BYTES check before continuing so a crafted binary region
cannot indefinitely bypass the resync limit (update references in the scanning
loop where stream_error, had_opcodes, and resync_skipped are used).

ℹ️ Review info

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1869d08 and 6fa1a0e.

📒 Files selected for processing (2)

modelaudit/scanners/flax_msgpack_scanner.py
modelaudit/scanners/pickle_scanner.py

modelaudit/scanners/pickle_scanner.py

- Guard non-string keys: skip int/bytes msgpack keys instead of converting them to string, preventing potential crashes (CR #1) - Remove substring matching for model_name_keys: use exact match only to avoid false positives like "opt" matching "optional" (CR #2) - Charge partial-stream bytes against resync budget: prevents attackers from using repeated partial streams to bypass _MAX_RESYNC_BYTES (CR #3) - Fix Windows test: relax compressed joblib assertion to not require specific keywords in issue messages, fixing CI on Windows Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/flax_msgpack_scanner.py`:
- Around line 695-745: The current heuristic sets has_transformer_keys true if a
top-level key_lower is in model_name_keys, which lets generic names like
"score", "classifier", "qa_outputs", "lm_head" falsely trigger; change the check
so that ambiguous generic names only count when they have nested transformer
sub-keys (i.e., value is a dict and sub_keys & transformer_keys is non-empty) or
when another more-specific model name is also present. Concretely, introduce an
AMBIGUOUS_KEYS set (e.g., {"score","classifier","qa_outputs","lm_head"}) and in
the loop over found_keys: if key_lower in AMBIGUOUS_KEYS, require the value to
be a dict and contain transformer_keys before setting has_transformer_keys; if
key_lower in model_name_keys minus AMBIGUOUS_KEYS, you may set
has_transformer_keys immediately; keep the existing exact-match and type checks
for keys and use obj.get(key)/sub_keys & transformer_keys as before.

In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 82-90: The buffered list used inside the subsequent-streams loop
(variable buffered in the loop iterating pickletools.genops(file_obj)) can grow
unbounded; introduce a fixed cap (e.g. MAX_BUFFERED_OPCODES) and enforce it
(either by using collections.deque(maxlen=MAX_BUFFERED_OPCODES) or by popping
oldest entries when len(buffered) >= MAX_BUFFERED_OPCODES) so buffered cannot
consume unlimited memory for large valid streams; update the code that appends
to buffered in the for item in pickletools.genops(file_obj) loop (and keep
had_opcodes logic unchanged) to respect this cap.

In `@tests/test_real_world_dill_joblib.py`:
- Around line 151-159: Replace the weak type check on result.success with a
concrete assertion covering expected outcomes: assert that either the scan
succeeded (result.success is True) or, if no bytes were scanned, the scanner
reported issues (result.bytes_scanned == 0 implies len(result.issues) > 0); you
can implement this as a single boolean assertion combining these conditions
(reference symbols: result.success, result.bytes_scanned, result.issues).

ℹ️ Review info

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6fa1a0e and 345ca63.

📒 Files selected for processing (3)

modelaudit/scanners/flax_msgpack_scanner.py
modelaudit/scanners/pickle_scanner.py
tests/test_real_world_dill_joblib.py

coderabbitai · 2026-02-26T17:48:22Z

modelaudit/scanners/flax_msgpack_scanner.py

+        # Common HuggingFace model name keys that wrap transformer substructure
+        model_name_keys = {
+            "bert",
+            "roberta",
+            "distilbert",
+            "albert",
+            "electra",
+            "xlm",
+            "gpt2",
+            "gpt_neo",
+            "gpt_neox",
+            "gptj",
+            "opt",
+            "llama",
+            "t5",
+            "bart",
+            "pegasus",
+            "mbart",
+            "blenderbot",
+            "vit",
+            "clip",
+            "whisper",
+            "wav2vec2",
+            "flax_model",
+            "classifier",
+            "qa_outputs",
+            "lm_head",
+            "score",
+        }
        has_transformer_keys = any(key in found_keys for key in transformer_keys)

+        # Check if any top-level key contains transformer sub-keys (nested model structure)
+        if not has_transformer_keys:
+            for key in found_keys:
+                value = obj.get(key)
+                if isinstance(value, dict):
+                    sub_keys = set(value.keys())
+                    if sub_keys & transformer_keys:
+                        has_transformer_keys = True
+                        break
+                    # Also check if the top-level key is a known model name.
+                    # Only check string keys — msgpack allows int/bytes keys which
+                    # are never valid model names and would cause crashes on .lower().
+                    if not isinstance(key, str):
+                        continue
+                    key_lower = key.lower()
+                    # Use exact match only to avoid false positives from substring
+                    # matching (e.g. "opt" matching "optional", "t5" matching "t500").
+                    if key_lower in model_name_keys:
+                        has_transformer_keys = True
+                        break


⚠️ Potential issue | 🟠 Major

Avoid promoting transformer classification from generic key names alone.

Using standalone top-level names like "score"/"classifier" as sufficient evidence is still too permissive and can misclassify benign msgpack payloads.

Suggested hardening

- if key_lower in model_name_keys: + # Require corroborating structure before classifying as transformer. + if key_lower in model_name_keys and ( + sub_keys & transformer_keys + or any(isinstance(child, dict) for child in value.values()) + ): has_transformer_keys = True break

As per coding guidelines "Preserve or strengthen security detections; test both benign and malicious samples when modifying scanners".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/flax_msgpack_scanner.py` around lines 695 - 745, The current heuristic sets has_transformer_keys true if a top-level key_lower is in model_name_keys, which lets generic names like "score", "classifier", "qa_outputs", "lm_head" falsely trigger; change the check so that ambiguous generic names only count when they have nested transformer sub-keys (i.e., value is a dict and sub_keys & transformer_keys is non-empty) or when another more-specific model name is also present. Concretely, introduce an AMBIGUOUS_KEYS set (e.g., {"score","classifier","qa_outputs","lm_head"}) and in the loop over found_keys: if key_lower in AMBIGUOUS_KEYS, require the value to be a dict and contain transformer_keys before setting has_transformer_keys; if key_lower in model_name_keys minus AMBIGUOUS_KEYS, you may set has_transformer_keys immediately; keep the existing exact-match and type checks for keys and use obj.get(key)/sub_keys & transformer_keys as before.

coderabbitai · 2026-02-26T17:48:23Z

modelaudit/scanners/pickle_scanner.py

+            # Subsequent streams: buffer opcodes so that partial streams
+            # (e.g. binary tensor data misinterpreted as opcodes) don't
+            # produce false positives.
+            buffered: list[Any] = []
+            try:
+                for item in pickletools.genops(file_obj):
+                    had_opcodes = True
+                    buffered.append(item)
+            except ValueError:


⚠️ Potential issue | 🟠 Major

Bound buffered opcode growth for subsequent streams.

buffered grows without limits for later streams. A large valid second stream can consume substantial memory before any caller-side opcode cap is enforced.

Suggested guardrail

+ _MAX_BUFFERED_OPCODES = 200_000 @@ try: for item in pickletools.genops(file_obj): had_opcodes = True buffered.append(item) + if len(buffered) > _MAX_BUFFERED_OPCODES: + stream_error = True + logger.warning( + "Aborting oversized secondary pickle stream during resync " + "(>%d buffered opcodes)", + _MAX_BUFFERED_OPCODES, + ) + break except ValueError: # Any ValueError on a subsequent stream means we hit # non-pickle data or a junk separator byte. stream_error = True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 82 - 90, The buffered list used inside the subsequent-streams loop (variable buffered in the loop iterating pickletools.genops(file_obj)) can grow unbounded; introduce a fixed cap (e.g. MAX_BUFFERED_OPCODES) and enforce it (either by using collections.deque(maxlen=MAX_BUFFERED_OPCODES) or by popping oldest entries when len(buffered) >= MAX_BUFFERED_OPCODES) so buffered cannot consume unlimited memory for large valid streams; update the code that appends to buffered in the for item in pickletools.genops(file_obj) loop (and keep had_opcodes logic unchanged) to respect this cap.

coderabbitai · 2026-02-26T17:48:23Z

tests/test_real_world_dill_joblib.py

        assert isinstance(result.success, bool)
-        # May not scan bytes if compression format isn't recognized as pickle
+        # The scanner may:
+        # 1. Successfully scan some bytes (multi-stream parsing may consume compressed data)
+        # 2. Report issues when the compressed data isn't valid pickle
+        # Either outcome is acceptable — what matters is no unhandled crash.
        if result.bytes_scanned == 0:
-            # Should have reported format issues
-            assert len(result.issues) > 0
-            format_issues = [i for i in result.issues if "opcode" in str(i.message).lower()]
-            assert len(format_issues) > 0, "Should report format/opcode issues for compressed files"
+            # When no bytes were scanned, the scanner should have reported
+            # some kind of issue explaining why.
+            assert len(result.issues) > 0, "Should report issues for compressed files"


⚠️ Potential issue | 🟡 Minor

Strengthen this assertion to preserve regression value.

assert isinstance(result.success, bool) is effectively non-informative here and can let scanner regressions slip through undetected. Assert a concrete behavior outcome instead of type shape.

Proposed test tightening

- assert isinstance(result.success, bool) + # Scanner must either parse bytes or explain why it could not. + assert result.bytes_scanned > 0 or len(result.issues) > 0 # The scanner may: # 1. Successfully scan some bytes (multi-stream parsing may consume compressed data) # 2. Report issues when the compressed data isn't valid pickle # Either outcome is acceptable — what matters is no unhandled crash. if result.bytes_scanned == 0: # When no bytes were scanned, the scanner should have reported # some kind of issue explaining why. assert len(result.issues) > 0, "Should report issues for compressed files"

As per coding guidelines "Add comprehensive tests including edge cases and regression coverage for scanner changes".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_real_world_dill_joblib.py` around lines 151 - 159, Replace the weak type check on result.success with a concrete assertion covering expected outcomes: assert that either the scan succeeded (result.success is True) or, if no bytes were scanned, the scanner reported issues (result.bytes_scanned == 0 implies len(result.issues) > 0); you can implement this as a single boolean assertion combining these conditions (reference symbols: result.success, result.bytes_scanned, result.issues).

yash2998chhabria and others added 4 commits February 25, 2026 10:46

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

yash2998chhabria and others added 3 commits February 25, 2026 12:34

Update uv.lock after merge with main

dfc7b4d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

modelaudit/scanners/pickle_scanner.py Outdated Show resolved Hide resolved

modelaudit/scanners/pickle_scanner.py Show resolved Hide resolved

fix: resolve ruff E501 line-too-long and format flax_msgpack_scanner

1869d08

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

modelaudit/scanners/flax_msgpack_scanner.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

modelaudit/scanners/pickle_scanner.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deps): promote msgpack to core dependency for Flax scanner#583

fix(deps): promote msgpack to core dependency for Flax scanner#583
yash2998chhabria wants to merge 10 commits intomainfrom
fix/flax-dep

yash2998chhabria commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

Reviews paused

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yash2998chhabria commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yash2998chhabria commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading