fix: resolve ONNX weight extraction failure by yash2998chhabria · Pull Request #589 · promptfoo/modelaudit

yash2998chhabria · 2026-02-25T20:06:45Z

Summary

Root cause: The scanner dispatch system only runs one scanner per file. OnnxScanner (priority 6) always won over WeightDistributionScanner (priority 13) for .onnx files, so weight distribution analysis never ran. Additionally, _extract_onnx_weights called onnx.load() without load_external_data=False, causing ValidationError on models with missing external data files (common with HuggingFace downloads).
Fix: Add _check_weight_distribution() to OnnxScanner that extracts inline weights from the already-loaded model and delegates statistical analysis to WeightDistributionScanner._analyze_weight_distributions(). Also fix the standalone extraction method to use load_external_data=False and skip external-data tensors.
Safety: Added max_array_size guard (default 100MB) to both extraction paths, onnx import check to can_handle(), and proper handling for external-data-only models.

Test plan

Verified "Failed to extract weights" is gone on Silero VAD (kennysuper007/vispo-models)
Verified weight extraction works on Xenova/all-MiniLM-L6-v2 (model.onnx: 40 layers analyzed)
Regression: model_quantized.onnx (40 layers, 25 anomalies detected)
Regression: model.int8.onnx (352 layers, 0 anomalies)
All 380 existing tests pass (pytest tests/ -x -q)
CI pipeline

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Weight-distribution analysis for ONNX models (runs when ONNX support is available).
Improvements
- Expanded detection coverage for many additional risky modules and patterns.
- Pickle scanning: multi-stream parsing, improved resynchronization, extended opcode handling, and stronger blocklisting.
- More accurate scanned-bytes reporting when scans exit early.
- More robust ONNX weight extraction with external-data and size safeguards.
Tests
- New regression tests validating pickle hardening and bypass scenarios.

Expand ALWAYS_DANGEROUS_MODULES with ~45 new modules not covered by PR #518: network (smtplib, imaplib, poplib, nntplib, xmlrpc, socketserver, ssl, requests, aiohttp), compilation (codeop, marshal, types, compileall, py_compile), FFI (_ctypes), debugging (bdb, trace), operator bypasses (_operator, functools), pickle recursion (pickle, _pickle, dill, cloudpickle, joblib), filesystem (tempfile, filecmp, fileinput, glob, distutils, pydoc, pexpect), venv/pip (venv, ensurepip, pip), threading (signal, _signal, threading), and more (webbrowser, asyncio, mmap, select, selectors, logging, syslog, tarfile, zipfile, shelve, sqlite3, _sqlite3, doctest, idlelib, lib2to3, uuid). Add uuid as novel detection (VULN-6: _get_command_stdout calls subprocess.Popen) and pkgutil.resolve_name as dynamic resolution trampoline. Mirror new modules in SUSPICIOUS_GLOBALS for defense-in-depth. Fix multi-stream opcode analysis so detailed checks run across all pickle streams. Add NEWOBJ_EX to all opcode check lists. Includes 10 regression tests covering pkgutil trampoline, uuid RCE, multi-stream exploit, NEWOBJ_EX, and spot-checks for smtplib, sqlite3, tarfile, marshal, cloudpickle, and webbrowser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e scanning - Fix test regression: track first pickle STOP position and use it for binary content scanning instead of f.tell() (which advances past multi-stream data) - Fix multi-stream resync bypass: skip junk separator bytes between streams (up to 256 bytes) instead of returning immediately - Gracefully handle ValueErrors on subsequent streams (non-pickle data after valid pickle) instead of propagating them as file-level errors - Add type hints: ScanResult return type on _scan_bytes, -> None on all test methods - Add separator-byte resync regression test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Buffer opcodes for subsequent pickle streams and discard partial streams (binary tensor data misinterpreted as opcodes) to prevent MANY_DANGEROUS_OPCODES false positives - Add resync logic to skip up to 256 non-pickle separator bytes between streams, preventing single-byte bypass of multi-stream detection - Track pickle memo (BINPUT/BINGET) for safe ML globals so REDUCE calls referencing safe callables via memo are not counted as dangerous opcodes - Reset dangerous opcode counters at STOP boundaries so each stream is evaluated independently - Use first_pickle_end_pos for binary content scanning instead of f.tell() which advances past tensor data during multi-stream parsing Validated against 5 HuggingFace models with A/B comparison to main branch — no new false positives introduced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Scanners that return early (missing optional dependency, parse error, etc.) left bytes_scanned at its default value of 0, causing the scan summary to always display "Size: 0 bytes". This affected ONNX, TFLite, Flax, XGBoost, and several other format scanners. Add a fallback in _scan_file_internal that sets bytes_scanned to the actual file size (already computed via os.path.getsize) whenever a scanner leaves it at zero. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The weight distribution scanner never ran on ONNX files because the scanner dispatch system only executes one scanner per file, and the OnnxScanner (priority 6) always won over WeightDistributionScanner (priority 13). Additionally, the standalone _extract_onnx_weights method called onnx.load() without load_external_data=False, causing ValidationError on models with missing external data files. Changes: - Add _check_weight_distribution to OnnxScanner that extracts inline weights from the already-loaded model and delegates statistical analysis to WeightDistributionScanner._analyze_weight_distributions - Fix _extract_onnx_weights to use load_external_data=False and skip external-data tensors whose raw bytes are unavailable - Add onnx import check to WeightDistributionScanner.can_handle() - Add max_array_size guard to both extraction paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-25T20:07:04Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Expanded detector blocklists; added ONNX weight-distribution analysis into OnnxScanner; reworked PickleScanner for multi-stream resynchronization, NEWOBJ_EX handling, and extended allow/block lists; added regression tests for pickle hardening and a bytes_scanned fallback in core scan reporting.

Changes

Cohort / File(s)	Summary
Detection Pattern Expansion `modelaudit/detectors/suspicious_symbols.py`	Added ~65 entries to `SUSPICIOUS_GLOBALS` (mix of `"*"` and specific callables) covering dynamic/import trampolines, network/exfiltration, code-execution/compilation, FFI/native hooks, profiling/debugging, filesystem/runtime, threading/process, DB/archive, logging/config, and miscellaneous modules.
ONNX Weight Distribution Analysis `modelaudit/scanners/onnx_scanner.py`, `modelaudit/scanners/weight_distribution_scanner.py`	Added `OnnxScanner._check_weight_distribution` and integrated it into scan flow; lazy-imports `onnx`, filters 2D+ initializers, skips external-data tensors, enforces `max_array_size`, builds `weights_info`, and delegates analysis to `WeightDistributionScanner`; ONNX handling gated on import availability and failures are silently skipped.
Pickle Parsing & Safety Enhancements `modelaudit/scanners/pickle_scanner.py`	Replaced opcode generator with `_genops_with_fallback(file_obj: BinaryIO, *, multi_stream: bool = False)` supporting multi-stream resynchronization, buffered yielding, STOP/separator handling, and tracking `first_pickle_end_pos`; added `ALWAYS_DANGEROUS_MODULES` and `ML_SAFE_GLOBALS`; included `NEWOBJ_EX` in safety and CVE-pattern checks; updated downstream binary-content boundary logic.
Pickle Scanner Tests `tests/scanners/test_pickle_scanner.py`	Added `TestPickleScannerBlocklistHardening` with helpers to craft GLOBAL+REDUCE payloads and scan bytes; new tests assert CRITICAL detections for numerous dangerous modules, multi-stream/resync, and NEWOBJ_EX scenarios; tests now import `ScanResult`.
Core scan metadata fallback `modelaudit/core.py`	Added fallback in `_scan_file_internal` to set `bytes_scanned` to file size when a scanner exits early with zero bytes scanned to preserve accurate scan metadata.

Sequence Diagram(s)

sequenceDiagram
    participant Scanner as OnnxScanner
    participant ONNX as "onnx (lib)"
    participant WDS as WeightDistributionScanner
    participant Result as ScanResult

    Scanner->>Scanner: scan() existing checks
    Scanner->>Scanner: _check_weight_distribution(model, path, result)
    Scanner->>ONNX: lazy import onnx
    ONNX-->>Scanner: import success / ImportError
    alt import success
        Scanner->>Scanner: iterate initializers, filter 2D+, skip EXTERNAL, enforce max_array_size
        Scanner->>Scanner: build weights_info map
        alt weights_info non-empty
            Scanner->>WDS: analyze(weights_info)
            activate WDS
            WDS->>WDS: detect anomalies
            WDS-->>Scanner: anomalies list
            deactivate WDS
            Scanner->>Result: emit anomaly checks and update metadata
        else no suitable tensors
            Scanner->>Scanner: exit silently
        end
    else import fails
        Scanner->>Scanner: exit silently
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through code both quick and sly,
Blocklists grew long beneath my eye,
Pickles now pause where streams collide,
ONNX weights get scanned with pride,
A rabbit guards the model trail nearby.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: resolve ONNX weight extraction failure' directly addresses the core problem described in the PR objectives: fixing an ONNX weight extraction issue caused by ValidationError on models with external-data tensors. It is concise, specific, and captures the primary fix.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/onnx-weights

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/onnx_scanner.py`:
- Around line 441-443: The except blocks that currently call logger.debug(...)
are swallowing failures in the ONNX weight extraction/distribution analysis and
must surface degraded coverage to the scan result; replace those silent
debug-only handlers (the logger.debug calls in the weight
extraction/distribution analysis code paths) with logic that (1) logs the full
exception at warning/error level and (2) updates the ScanResult (e.g., set a
coverage_degraded flag or call scan_result.add_finding()/add_warning with a
clear message and exception details) so downstream consumers know the weight
analysis failed; apply this change to both locations where logger.debug is used
for these exceptions.
- Line 177: Add regression tests in test_onnx_scanner.py that exercise the ONNX
scanner's _check_weight_distribution path: create small ONNX models or mocked
TensorProto objects and call the ONNX scanning entry (or directly invoke
_check_weight_distribution via a scanner instance) to verify external-data
tensors are skipped (simulate TensorProto with external_data set), verify
max_array_size size guard blocks large tensors (set a small max_array_size in
scanner config and supply a large tensor), and verify detection behavior for
benign vs malicious-style weight distributions (supply normal random weights and
crafted anomalous patterns and assert expected result/anomaly flags). Reference
the _check_weight_distribution method and the scanner instance used by
test_onnx_scanner.py to locate where to add these focused cases.
- Around line 434-438: The code currently calls
onnx.numpy_helper.to_array(initializer) (in onnx_scanner.py within the loop
handling initializers) before checking max_array_size, which can allocate huge
arrays; change the logic to estimate the byte size first by converting
initializer.dims and the element dtype via
tensor_dtype_to_np_dtype(initializer.data_type) (or equivalent) to compute
estimated_bytes = np.dtype(np_dtype).itemsize * product(initializer.dims) and
only call onnx.numpy_helper.to_array(initializer) if estimated_bytes <=
max_array_size (and max_array_size > 0); apply the same pre-check pattern in
weight_distribution_scanner.py where to_array is used so oversized tensors are
skipped before materialization.

In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 1549-1591: The _safe_memo map is never cleared at STOP and
individual memo slots can carry stale True values; modify the STOP handling in
the loop (the branch where opcode.name == "STOP") to clear or reinitialize
_safe_memo (e.g., _safe_memo.clear()) so memo safety can't be carried across
STOP boundaries, and in the BINPUT/LONG_BINPUT handling (the block that looks up
prev GLOBAL/STACK_GLOBAL and writes _safe_memo[arg]) set a default safe state
first (e.g., _safe_memo[arg] = False) before reclassifying based on
_is_safe_ml_global, and apply the same default-reset fix to the analogous block
later in the file (the second BINPUT/LONG_BINPUT handling around the later
lines) so no memo slot retains stale True values.

In `@modelaudit/scanners/weight_distribution_scanner.py`:
- Around line 433-437: The current size guard checks array.nbytes after calling
onnx.numpy_helper.to_array() in the ONNX extractor; change this to estimate size
before materialization by converting initializer.data_type with
onnx.helper.tensor_dtype_to_np_dtype(initializer.data_type), computing
element_count from initializer.dims, and comparing element_count *
np_dtype.itemsize against self.max_array_size before calling
onnx.numpy_helper.to_array(); apply the same pre-conversion-size check logic to
the other extractors (_extract_tensorflow_checkpoint_weights and
_extract_tensorflow_pb_weights) so behavior is consistent (or add a clear
comment documenting an intentional post-conversion check if you choose not to
change all three).

In `@tests/scanners/test_pickle_scanner.py`:
- Around line 647-695: Each test should assert that the scan itself succeeded in
addition to detecting issues: in test_smtplib_blocked, test_sqlite3_blocked,
test_tarfile_blocked, test_marshal_blocked, test_cloudpickle_blocked, and
test_webbrowser_blocked, add an assertion after creating result (result =
self._scan_bytes(...)) such as "assert result.success" (or "assert
result.success is True") before checking result.has_errors and result.issues so
scan failures can't be hidden by later issue assertions.

ℹ️ Review info

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e736ebb and 84da75c.

📒 Files selected for processing (5)

modelaudit/detectors/suspicious_symbols.py
modelaudit/scanners/onnx_scanner.py
modelaudit/scanners/pickle_scanner.py
modelaudit/scanners/weight_distribution_scanner.py
tests/scanners/test_pickle_scanner.py

coderabbitai · 2026-02-25T20:15:20Z

        self._check_custom_ops(model, path, result)
        self._check_external_data(model, path, result)
        self._check_tensor_sizes(model, path, result)
+        self._check_weight_distribution(model, path, result)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for ONNX scanner regression tests tied to new weight-distribution behavior echo "=== Finding ONNX test files ===" fd -t f "test*onnx*.py" tests 2>/dev/null || echo "No exact match, searching more broadly..." fd -t f "*.py" tests -E __pycache__ | grep -i onnx echo -e "\n=== Searching for weight-distribution and external-data test coverage ===" rg -n "load_external_data|TensorProto\.EXTERNAL|max_array_size|weight.distribution|_check_weight_distribution" tests -S 2>/dev/null echo -e "\n=== Getting context from onnx_scanner.py around line 177 ===" wc -l modelaudit/scanners/onnx_scanner.py

Repository: promptfoo/modelaudit

Length of output: 2110

🏁 Script executed:

#!/bin/bash # Get specific line context from onnx_scanner.py echo "=== Lines around 177 in onnx_scanner.py ===" sed -n '170,185p' modelaudit/scanners/onnx_scanner.py echo -e "\n=== Lines 400-467 in onnx_scanner.py ===" sed -n '400,467p' modelaudit/scanners/onnx_scanner.py

Repository: promptfoo/modelaudit

Length of output: 3693

🏁 Script executed:

#!/bin/bash # Check for test files that test onnx_scanner echo "=== Searching for test files that import or test onnx_scanner ===" rg -n "onnx_scanner|OnnxScanner" tests -S -t py echo -e "\n=== Check test directory structure ===" fd -t f "*.py" tests | head -20

Repository: promptfoo/modelaudit

Length of output: 3044

🏁 Script executed:

#!/bin/bash # Read the ONNX scanner test file to check weight-distribution coverage echo "=== Content of test_onnx_scanner.py ===" cat -n tests/scanners/test_onnx_scanner.py

Repository: promptfoo/modelaudit

Length of output: 4854

🏁 Script executed:

#!/bin/bash # Check weight distribution scanner tests to understand test patterns echo "=== Weight distribution scanner test file (first 150 lines) ===" head -150 tests/scanners/test_weight_distribution_scanner.py echo -e "\n=== Check if ONNX weight-distribution tests exist ===" rg -n "_check_weight_distribution|weight.distribution.*onnx|test.*weight.*onnx" tests/scanners/test_onnx_scanner.py

Repository: promptfoo/modelaudit

Length of output: 4658

🏁 Script executed:

#!/bin/bash # Search for tests with benign and malicious samples related to weights/ONNX echo "=== Searching for benign/malicious sample tests ===" rg -n "benign|malicious|anomal" tests/scanners/test_onnx_scanner.py -A 3

Repository: promptfoo/modelaudit

Length of output: 115

🏁 Script executed:

#!/bin/bash # Double-check: Search for any weight distribution tests specifically in ONNX context echo "=== Search for weight-distribution tests anywhere in test_onnx_scanner.py ===" grep -n "weight" tests/scanners/test_onnx_scanner.py || echo "No 'weight' references found in test_onnx_scanner.py" echo -e "\n=== Full count of test functions in test_onnx_scanner.py ===" grep -c "^def test_" tests/scanners/test_onnx_scanner.py echo -e "\n=== Check test_safetensors_optimization.py for weight_distribution context ===" grep -B 2 -A 5 "test_weight_distribution_scanner_skips_safetensors" tests/test_safetensors_optimization.py

Repository: promptfoo/modelaudit

Length of output: 703

Add regression tests for ONNX weight-distribution analysis, especially external-data tensor handling and malicious weight patterns.

The _check_weight_distribution() method (lines 400–467) integrates weight distribution scanning into ONNX processing with external-data tensor skipping and size guards, but test_onnx_scanner.py has no tests covering this path. Add focused tests for:

External-data tensors (skipped safely during analysis)

Size guard enforcement (max_array_size limiting)

Benign and malicious-style weight distribution samples to ensure anomaly detection works

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/onnx_scanner.py` at line 177, Add regression tests in test_onnx_scanner.py that exercise the ONNX scanner's _check_weight_distribution path: create small ONNX models or mocked TensorProto objects and call the ONNX scanning entry (or directly invoke _check_weight_distribution via a scanner instance) to verify external-data tensors are skipped (simulate TensorProto with external_data set), verify max_array_size size guard blocks large tensors (set a small max_array_size in scanner config and supply a large tensor), and verify detection behavior for benign vs malicious-style weight distributions (supply normal random weights and crafted anomalous patterns and assert expected result/anomaly flags). Reference the _check_weight_distribution method and the scanner instance used by test_onnx_scanner.py to locate where to add these focused cases.

- Upgrade logger.debug() to logger.warning() in weight extraction/analysis except blocks so scan output signals reduced coverage on failure - Pre-check estimated array size before calling to_array() to avoid materializing oversized tensors into memory - Assert result.success in blocked-module pickle tests before checking result.has_errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (2)

modelaudit/scanners/onnx_scanner.py (2)

177-177: ⚠️ Potential issue | 🟠 Major

Add ONNX regression tests for the newly integrated weight-distribution path.

Line 177 introduces scanner behavior that should be regression-tested (external-data tensor skipping, max_array_size guard behavior, and anomaly propagation into ScanResult metadata/checks). This path is security-relevant and currently not covered in the provided test changes.

As per coding guidelines: "tests/**/*.py: Add comprehensive tests including edge cases and regression coverage for scanner changes".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/onnx_scanner.py` at line 177, Add regression tests for
the newly exercised weight-distribution scan path by creating tests that call
the ONNX scanner's _check_weight_distribution via the public scan entry (e.g.,
using the scanner class that invokes _check_weight_distribution) and assert: (1)
external-data tensors are skipped (construct an ONNX ModelProto with
external-data tensors and verify no weight reads occur), (2) max_array_size
guard triggers appropriately (create a large tensor > max_array_size and assert
the scanner avoids loading contents and records the guard), and (3) anomalies
detected by _check_weight_distribution are propagated into the ScanResult
metadata and checks (inspect the returned ScanResult object for the expected
anomaly entries and check flags). Use existing test utilities to build small
ONNX models and ensure tests cover both normal and edge cases to provide
regression coverage.

422-449: ⚠️ Potential issue | 🟠 Major

Harden weight extraction robustness to prevent scan interruption and numeric overflow.

The outer try block (lines 422-449) aborts extraction for all remaining initializers if any single tensor fails, reducing security coverage. Additionally, numpy.prod(initializer.dims) can overflow with large dimensions—for example, np.prod([2^32, 2^32]) returns 0 instead of 2^64, causing size estimates to wrap and bypass the max_array_size guard.

Both issues should be fixed by moving exception handling inside the loop (allowing continued processing) and switching to math.prod for arbitrary-precision arithmetic. The new weight distribution path is also untested and should include regression tests with representative ONNX models.

🔧 Proposed fix

+        from math import prod
+
         weights_info: dict[str, Any] = {}
-        try:
-            for initializer in model.graph.initializer:
-                self.check_interrupted()
+        raw_max_array_size = self.config.get("max_array_size", 100 * 1024 * 1024)
+        try:
+            max_array_size = int(raw_max_array_size)
+        except (TypeError, ValueError):
+            max_array_size = 100 * 1024 * 1024
+
+        for initializer in model.graph.initializer:
+            try:
+                self.check_interrupted()
 
                 # Skip external-data tensors — their bytes were not loaded.
                 if initializer.data_location == onnx.TensorProto.EXTERNAL:
                     continue
 
                 # Only 2-D+ tensors are interesting for distribution analysis.
                 if len(initializer.dims) < 2:
                     continue
 
                 # Pre-check estimated size before materializing the array.
                 import numpy as np
 
-                estimated_bytes = (
-                    np.prod(initializer.dims)
-                    * np.dtype(onnx.helper.tensor_dtype_to_np_dtype(initializer.data_type)).itemsize
-                )
+                numel = prod(int(dim) for dim in initializer.dims)
+                itemsize = int(np.dtype(onnx.helper.tensor_dtype_to_np_dtype(initializer.data_type)).itemsize)
+                estimated_bytes = numel * itemsize
                 if max_array_size and max_array_size > 0 and estimated_bytes > max_array_size:
                     continue
 
                 array = onnx.numpy_helper.to_array(initializer)
 
                 weights_info[initializer.name] = array
-        except Exception as e:
-            logger.warning(f"Failed to extract ONNX weights for distribution analysis: {e}")
+            except Exception as e:
+                logger.warning(
+                    "Failed to extract ONNX initializer '%s' for distribution analysis: %s",
+                    initializer.name,
+                    e,
+                )
+                continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/onnx_scanner.py` around lines 422 - 449, The current
outer try/except around iterating model.graph.initializer aborts processing of
all initializers on any failure and uses np.prod which can overflow; change to
per-initializer error handling by moving the try/except inside the for
initializer in model.graph.initializer loop (catch and log exceptions
per-initializer while continuing), and replace np.prod(initializer.dims) with
math.prod(initializer.dims) to use arbitrary-precision multiplication (import
math where needed); keep the existing max_array_size guard logic but compute
estimated_bytes using math.prod and the dtype itemsize, and add regression tests
exercising the weights extraction/distribution path with representative ONNX
models (including large-dimension cases and an initializer that intentionally
raises) to ensure robustness.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelaudit/scanners/onnx_scanner.py`:
- Line 177: Add regression tests for the newly exercised weight-distribution
scan path by creating tests that call the ONNX scanner's
_check_weight_distribution via the public scan entry (e.g., using the scanner
class that invokes _check_weight_distribution) and assert: (1) external-data
tensors are skipped (construct an ONNX ModelProto with external-data tensors and
verify no weight reads occur), (2) max_array_size guard triggers appropriately
(create a large tensor > max_array_size and assert the scanner avoids loading
contents and records the guard), and (3) anomalies detected by
_check_weight_distribution are propagated into the ScanResult metadata and
checks (inspect the returned ScanResult object for the expected anomaly entries
and check flags). Use existing test utilities to build small ONNX models and
ensure tests cover both normal and edge cases to provide regression coverage.
- Around line 422-449: The current outer try/except around iterating
model.graph.initializer aborts processing of all initializers on any failure and
uses np.prod which can overflow; change to per-initializer error handling by
moving the try/except inside the for initializer in model.graph.initializer loop
(catch and log exceptions per-initializer while continuing), and replace
np.prod(initializer.dims) with math.prod(initializer.dims) to use
arbitrary-precision multiplication (import math where needed); keep the existing
max_array_size guard logic but compute estimated_bytes using math.prod and the
dtype itemsize, and add regression tests exercising the weights
extraction/distribution path with representative ONNX models (including
large-dimension cases and an initializer that intentionally raises) to ensure
robustness.

ℹ️ Review info

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 84da75c and 137268f.

📒 Files selected for processing (2)

modelaudit/scanners/onnx_scanner.py
tests/scanners/test_pickle_scanner.py

Resolve conflicts in pickle_scanner.py: - ALWAYS_DANGEROUS_GLOBALS: keep both branch additions (pkgutil, uuid) and main additions (cProfile, ctypes, etc.) - ALWAYS_DANGEROUS_MODULES: merge branch's expanded blocklist with main's PR #518 modules and linecache/logging.config notes - REDUCE opcode handling: use main's resolved_callables approach (handles memo indirection via symbolic stack simulation), replacing manual lookback loop

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelaudit/scanners/pickle_scanner.py (1)

2132-2138: ⚠️ Potential issue | 🟡 Minor

Use resolved_callables for NEWOBJ/NEWOBJ_EX execution-pattern detection.

At Line 2132, this branch requires arg to be str, so it doesn’t leverage the callable targets already resolved for OBJ/NEWOBJ/NEWOBJ_EX. That leaves this detection path weaker than intended.

🔧 Proposed fix

-        # Check for INST or OBJ opcodes which can also be used for code execution
-        if opcode.name in ["INST", "OBJ", "NEWOBJ", "NEWOBJ_EX"] and isinstance(arg, str):
-            return {
-                "pattern": f"{opcode.name}_EXECUTION",
-                "argument": arg,
-                "position": pos,
-                "opcode": opcode.name,
-            }
+        # Check for INST/OBJ/NEWOBJ/NEWOBJ_EX opcodes which can also be used for code execution
+        if opcode.name in {"INST", "OBJ", "NEWOBJ", "NEWOBJ_EX"}:
+            resolved = resolved_callables.get(i)
+            if resolved:
+                mod, func = resolved
+                if _is_dangerous_ref(mod, func):
+                    return {
+                        "pattern": f"{opcode.name}_EXECUTION",
+                        "module": mod,
+                        "function": func,
+                        "position": pos,
+                        "opcode": opcode.name,
+                    }
+            elif opcode.name == "INST" and isinstance(arg, str):
+                return {
+                    "pattern": f"{opcode.name}_EXECUTION",
+                    "argument": arg,
+                    "position": pos,
+                    "opcode": opcode.name,
+                }

Based on learnings: "Preserve or strengthen security detections; test both benign and malicious samples when modifying scanners".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/pickle_scanner.py` around lines 2132 - 2138, The branch
that returns an execution-pattern for opcode names
["INST","OBJ","NEWOBJ","NEWOBJ_EX"] should not require arg to be a str and must
consult the already-resolved callable targets (resolved_callables) for
NEWOBJ/NEWOBJ_EX; update the check in the block around opcode.name handling so
that for NEWOBJ and NEWOBJ_EX you inspect resolved_callables (matching entries
for the current pos/arg/stack position) and use those callable names/targets
when building the returned dict (pattern, argument, position, opcode) instead of
only using arg when it's a str; ensure you still handle OBJ/INST as before and
fall back safely when no resolved callable is available.

♻️ Duplicate comments (1)

modelaudit/scanners/weight_distribution_scanner.py (1)

491-495: ⚠️ Potential issue | 🟠 Major

Move the ONNX size guard before tensor materialization.

At Line 491, onnx.numpy_helper.to_array() materializes the tensor before the size check at Line 494, so oversized tensors can still trigger memory spikes and reduce scan coverage.

🔧 Proposed fix

             for initializer in model.graph.initializer:
                 # Skip external-data tensors — their raw bytes are not available
                 # when loaded with load_external_data=False and calling to_array
                 # on them would raise a ValidationError.
                 if initializer.data_location == onnx.TensorProto.EXTERNAL:
                     continue
                 if len(initializer.dims) >= 2:
+                    if self.max_array_size and self.max_array_size > 0:
+                        # Pre-check approximate size before materializing tensor
+                        elem_count = 1
+                        for dim in initializer.dims:
+                            elem_count *= int(dim)
+                        np_dtype = onnx.helper.tensor_dtype_to_np_dtype(initializer.data_type)
+                        estimated_nbytes = elem_count * np_dtype.itemsize
+                        if estimated_nbytes > self.max_array_size:
+                            continue
                     array = onnx.numpy_helper.to_array(  # type: ignore[possibly-unresolved-reference]
                         initializer,
                     )
                     if self.max_array_size and self.max_array_size > 0 and array.nbytes > self.max_array_size:
                         continue
                     weights_info[initializer.name] = array

#!/bin/bash
# Verify guard ordering around ONNX tensor materialization and size checks.
rg -n -C3 "numpy_helper\.to_array|max_array_size|initializer\.dims" modelaudit/scanners/weight_distribution_scanner.py

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelaudit/scanners/weight_distribution_scanner.py` around lines 491 - 495,
The size guard is applied after materializing the ONNX tensor via
onnx.numpy_helper.to_array, causing large tensors to be loaded into memory;
instead, compute the tensor byte size from the initializer metadata and apply
the max_array_size check before calling numpy_helper.to_array. Update the code
around numpy_helper.to_array (referencing initializer, max_array_size, and
onnx.numpy_helper.to_array) to calculate nbytes from initializer.dims and its
data type / raw_data length (or use initializer.raw_data length when present)
and skip materialization (continue) if the computed size exceeds
self.max_array_size, only calling to_array when the size check passes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/detectors/suspicious_symbols.py`:
- Around line 173-237: Add explicit regression tests that assert the new
suspicious module patterns are detected: create unit tests for
cloudpickle/joblib recursive pickle loading, for functools.partial and
functools.reduce callable-chain cases, for threading/_thread/_signal
thread-spawning patterns, for tarfile/zipfile path-traversal (zip-slip) cases,
and for logging.config.fileConfig execution; also add tests covering
cProfile/profile/pdb/timeit.timeit and dynamic-loading/compilation patterns
(pkgutil.resolve_name, marshal.loads, py_compile, compileall). Place these cases
into the existing suspicious-symbols detector test module and include malicious
pickle/asset payloads to exercise detection paths so each named symbol
(cloudpickle, joblib, functools.partial/reduce, threading, _thread, _signal,
tarfile, zipfile, logging.config.fileConfig, cProfile, profile, pdb,
timeit.timeit, pkgutil.resolve_name, marshal.loads, py_compile, compileall) is
explicitly asserted as flagged by the scanner.

In `@tests/scanners/test_pickle_scanner.py`:
- Around line 723-740: Add a companion test to ensure NEWOBJ_EX doesn't
false-positive: create a new test (e.g., test_newobj_ex_benign_class) modeled on
test_newobj_ex_dangerous_class that crafts a pickle for a benign global class
(for example use GLOBAL 'builtins\nlist\n' or another harmless class) with
EMPTY_TUPLE, EMPTY_DICT, NEWOBJ_EX and STOP; call self._scan_bytes(data) and
assert result.success and that there are no CRITICAL issues referencing "os" (or
other malicious markers), ensuring the scanner accepts this benign NEWOBJ_EX
path to prevent overblocking regressions.

---

Outside diff comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 2132-2138: The branch that returns an execution-pattern for opcode
names ["INST","OBJ","NEWOBJ","NEWOBJ_EX"] should not require arg to be a str and
must consult the already-resolved callable targets (resolved_callables) for
NEWOBJ/NEWOBJ_EX; update the check in the block around opcode.name handling so
that for NEWOBJ and NEWOBJ_EX you inspect resolved_callables (matching entries
for the current pos/arg/stack position) and use those callable names/targets
when building the returned dict (pattern, argument, position, opcode) instead of
only using arg when it's a str; ensure you still handle OBJ/INST as before and
fall back safely when no resolved callable is available.

---

Duplicate comments:
In `@modelaudit/scanners/weight_distribution_scanner.py`:
- Around line 491-495: The size guard is applied after materializing the ONNX
tensor via onnx.numpy_helper.to_array, causing large tensors to be loaded into
memory; instead, compute the tensor byte size from the initializer metadata and
apply the max_array_size check before calling numpy_helper.to_array. Update the
code around numpy_helper.to_array (referencing initializer, max_array_size, and
onnx.numpy_helper.to_array) to calculate nbytes from initializer.dims and its
data type / raw_data length (or use initializer.raw_data length when present)
and skip materialization (continue) if the computed size exceeds
self.max_array_size, only calling to_array when the size check passes.

ℹ️ Review info

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between feccaef and ad59987.

📒 Files selected for processing (4)

modelaudit/detectors/suspicious_symbols.py
modelaudit/scanners/pickle_scanner.py
modelaudit/scanners/weight_distribution_scanner.py
tests/scanners/test_pickle_scanner.py

coderabbitai · 2026-02-26T01:33:05Z

+    def test_newobj_ex_dangerous_class(self) -> None:
+        """NEWOBJ_EX opcode with a dangerous class should be flagged."""
+        # Craft pickle: PROTO 4 | GLOBAL 'os _wrap_close' | EMPTY_TUPLE | EMPTY_DICT | NEWOBJ_EX | STOP
+        # Protocol 4 is needed for NEWOBJ_EX (opcode 0x92)
+        proto = b"\x80\x04"
+        global_op = b"c" + b"os\n_wrap_close\n"
+        empty_tuple = b")"
+        empty_dict = b"}"
+        newobj_ex = b"\x92"  # NEWOBJ_EX opcode
+        stop = b"."
+        data = proto + global_op + empty_tuple + empty_dict + newobj_ex + stop
+
+        result = self._scan_bytes(data)
+        assert result.success
+        assert result.has_errors
+        os_issues = [i for i in result.issues if i.severity == IssueSeverity.CRITICAL and "os" in i.message.lower()]
+        assert os_issues, f"Expected CRITICAL os issue for NEWOBJ_EX, got: {[i.message for i in result.issues]}"
+


🧹 Nitpick | 🔵 Trivial

Add a benign NEWOBJ_EX counterpart test to lock false-positive behavior.

This section validates the malicious NEWOBJ_EX path well; adding one safe NEWOBJ_EX case would prevent overblocking regressions as allowlists/blocklists evolve.

Based on learnings: "Add comprehensive tests including edge cases and regression coverage for scanner changes" and "Preserve or strengthen security detections; test both benign and malicious samples when modifying scanners".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/scanners/test_pickle_scanner.py` around lines 723 - 740, Add a companion test to ensure NEWOBJ_EX doesn't false-positive: create a new test (e.g., test_newobj_ex_benign_class) modeled on test_newobj_ex_dangerous_class that crafts a pickle for a benign global class (for example use GLOBAL 'builtins\nlist\n' or another harmless class) with EMPTY_TUPLE, EMPTY_DICT, NEWOBJ_EX and STOP; call self._scan_bytes(data) and assert result.success and that there are no CRITICAL issues referencing "os" (or other malicious markers), ensuring the scanner accepts this benign NEWOBJ_EX path to prevent overblocking regressions.

- pickle_scanner: Clear _safe_memo at STOP opcode boundaries to prevent stale True values from carrying across pickle streams and bypassing dangerous-global checks. Default memo slots to False before reclassifying in BINPUT/LONG_BINPUT handling. - pickle_scanner: Continue scanning after partial stream errors instead of terminating, so malicious payloads hidden after a partial stream are not missed. - onnx_scanner: Move try/except inside the initializer loop so a single failing tensor does not abort extraction for all remaining tensors. Replace np.prod with math.prod for arbitrary-precision arithmetic to prevent silent integer overflow on large dimension products. Surface weight distribution analysis failures in ScanResult. - weight_distribution_scanner: Pre-check estimated tensor byte size before calling onnx.numpy_helper.to_array() to avoid memory spikes from oversized arrays. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 52-55: The resync limit in _genops_with_fallback (using
_MAX_RESYNC_BYTES and resync_skipped) allows an attacker to hide a second pickle
stream after >256 padding bytes; change the logic so it does not unconditionally
stop searching after _MAX_RESYNC_BYTES but instead either (a) make
_MAX_RESYNC_BYTES configurable and significantly larger or (b) loop until EOF
while attempting to validate potential stream headers, only giving up on
definite EOF or a configurable hard limit, and ensure resync_skipped is
correctly reset when a candidate stream is found; update all uses in
_genops_with_fallback (and the similar block around lines 107-116) to perform a
robust scan-for-next-stream routine rather than aborting after 256 bytes.

In `@modelaudit/scanners/weight_distribution_scanner.py`:
- Around line 473-509: The try/except around the whole ONNX load-and-loop lets
one malformed initializer abort processing; move the exception handling to
per-initializer level inside the for initializer in model.graph.initializer loop
(around the size-estimate and onnx.numpy_helper.to_array call) so each
initializer is individually guarded, log a debug message including
initializer.name, path and the caught exception, and continue to the next
initializer; keep the outer model = onnx.load(...) call as-is but ensure
weights_info[initializer.name] is only set after a successful to_array
conversion.

ℹ️ Review info

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ad59987 and ae97a7a.

📒 Files selected for processing (3)

modelaudit/scanners/onnx_scanner.py
modelaudit/scanners/pickle_scanner.py
modelaudit/scanners/weight_distribution_scanner.py

coderabbitai · 2026-02-26T03:45:50Z

+    # Maximum number of consecutive non-pickle bytes to skip when resyncing
+    _MAX_RESYNC_BYTES = 256
+    resync_skipped = 0
+    # Track whether we've successfully parsed at least one complete stream


⚠️ Potential issue | 🟠 Major

Resync window is bypassable with >256 separator bytes.

_genops_with_fallback stops searching for the next pickle stream after 256 non-pickle bytes. A crafted file can place a malicious second stream after 257+ bytes of padding and evade multi-stream analysis.

🔧 Proposed fix

-def _genops_with_fallback(file_obj: BinaryIO, *, multi_stream: bool = False) -> Any: +def _genops_with_fallback( + file_obj: BinaryIO, + *, + multi_stream: bool = False, + max_resync_bytes: int = 64 * 1024, +) -> Any: @@ - _MAX_RESYNC_BYTES = 256 @@ - if resync_skipped >= _MAX_RESYNC_BYTES: + if resync_skipped >= max_resync_bytes: return

Also applies to: 107-116

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 52 - 55, The resync limit in _genops_with_fallback (using _MAX_RESYNC_BYTES and resync_skipped) allows an attacker to hide a second pickle stream after >256 padding bytes; change the logic so it does not unconditionally stop searching after _MAX_RESYNC_BYTES but instead either (a) make _MAX_RESYNC_BYTES configurable and significantly larger or (b) loop until EOF while attempting to validate potential stream headers, only giving up on definite EOF or a configurable hard limit, and ensure resync_skipped is correctly reset when a candidate stream is found; update all uses in _genops_with_fallback (and the similar block around lines 107-116) to perform a robust scan-for-next-stream routine rather than aborting after 256 bytes.

The test_real_joblib_compressed test was failing on Windows CI (Python 3.11) because compressed joblib files produce "Unable to parse pickle file: MemoryError" rather than an opcode-related message. Broaden the keyword match to accept opcode, parse, format, memory, or unable. Validated with 5 ONNX models (mobilenetv2-7, resnet50-v2-7, squeezenet1.0-12, shufflenet-7, efficientnet-lite4-11) -- all scan clean with no false positives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Move per-initializer try/except inside loop in weight_distribution_scanner so one bad initializer does not abort extraction for remaining layers - Move numpy import outside the per-initializer loop in onnx_scanner - Track and expose extraction_failures count in scan metadata - Increase pickle multi-stream resync window from 256 to 4096 bytes to prevent bypass via separator byte injection - Fix formatting in test_real_world_dill_joblib.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Missing external data files are common for HuggingFace-hosted ONNX models and are NOT a security issue. Reporting them as CRITICAL causes false positive security alerts. Also deduplicate per-file (not per-tensor) to avoid flooding scan output with hundreds of identical warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…er merge The merge of main into fix/onnx-weights incorrectly split the try block, placing extraction code after a return statement (unreachable) and creating a duplicate except clause. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yash2998chhabria and others added 5 commits February 25, 2026 10:46

coderabbitai Bot reviewed Feb 25, 2026

View reviewed changes

yash2998chhabria added 2 commits February 25, 2026 12:33

Merge branch 'fix/size-bug' into fix/onnx-weights

feccaef

coderabbitai Bot reviewed Feb 26, 2026

View reviewed changes

yash2998chhabria and others added 8 commits February 25, 2026 20:00

Merge branch 'main' into fix/onnx-weights

69656a2

Merge remote-tracking branch 'origin/main' into fix/onnx-weights

3c2943d

Merge remote-tracking branch 'origin/main' into fix/onnx-weights

9a78085

Merge remote-tracking branch 'origin/main' into fix/onnx-weights

2f1c477

yash2998chhabria merged commit 3f54602 into main Feb 27, 2026
25 checks passed

yash2998chhabria deleted the fix/onnx-weights branch February 27, 2026 21:05

github-actions Bot mentioned this pull request Feb 27, 2026

chore(main): release 0.2.27 #573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve ONNX weight extraction failure#589

fix: resolve ONNX weight extraction failure#589
yash2998chhabria merged 17 commits intomainfrom
fix/onnx-weights

yash2998chhabria commented Feb 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 25, 2026 •

edited

Loading

Reviews paused

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Feb 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Feb 26, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yash2998chhabria commented Feb 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yash2998chhabria commented Feb 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 25, 2026 •

edited

Loading