fix(security): bound pickle metadata reads in metadata extraction by mldangelo · Pull Request #712 · promptfoo/modelaudit

mldangelo · 2026-03-16T09:39:51Z

Motivation

Prevent unbounded memory reads during metadata extraction of pickle files which allowed a DoS by calling f.read() on arbitrarily large .pkl files.
Ensure metadata path respects existing file-size safeguards without changing scanner behavior for normal files.

Description

Add a bounded-read guard in PickleScanner.extract_metadata() using a new config key max_metadata_pickle_read_size with a default of 10 MiB and raise a ValueError when the file or read exceeds the limit.
Read at most max_metadata_pickle_read_size + 1 bytes and surface a clear extraction_error when the limit is exceeded to avoid large allocations while preserving opcode analysis for small files.
Add a regression test test_pickle_metadata_enforces_read_limit to tests/test_metadata_extractor.py that verifies oversized pickle metadata extraction is rejected.

Testing

Ran formatting and linting: uv run ruff format modelaudit/ tests/ (reformatted 2 files) and uv run ruff check --fix modelaudit/ tests/ (passed).
Type checking: uv run mypy modelaudit/ (passed with no issues).
Added regression unit test and validated behavior with a focused runtime check (small Python snippet asserting PickleScanner({'max_metadata_pickle_read_size':64}).extract_metadata() returns an extraction_error for a 128-byte file), which passed.
Full test suite uv run pytest -n auto -m "not slow and not integration" --maxfail=1 was attempted but encountered unrelated, pre-existing failures in other tests; the change-specific checks and linters passed.

Codex Task

Summary by CodeRabbit

New Features
- Added a configurable read limit for pickle metadata extraction (default 10 MB). Extraction now enforces a positive limit, fails closed on out‑of‑range values, and clamps any caller-supplied limit to the 10 MB ceiling.
Tests
- Added parameterized tests validating enforcement for valid, zero, negative, and oversized limits, and updated an existing pickle-related test to reflect the new read-limit behavior.

coderabbitai · 2026-03-16T09:40:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 492fa2cd-2208-432a-a010-b6329f8a83c6

📥 Commits

Reviewing files that changed from the base of the PR and between 93c38ad and 7cb61a3.

📒 Files selected for processing (2)

modelaudit/scanners/pickle_scanner.py
tests/test_metadata_extractor.py

Walkthrough

Adds a configurable, validated read-size cap (max_metadata_pickle_read_size, default capped at 10 MiB) to pickle metadata extraction: stat the file, enforce the cap (>0), perform a bounded read (limit + 1) to detect over-limit content, compute metadata from the truncated bytes, and raise ValueError on violations.

Changes

Cohort / File(s)	Summary
Pickle Scanner `modelaudit/scanners/pickle_scanner.py`	Add `max_metadata_pickle_read_size` config (clamped to 10 MiB). Validate >0, compare on-disk size to cap, read up to `cap + 1` bytes to detect overflow, set `pickle_size`/`pickle_protocol` from the (possibly truncated) bytes, and raise `ValueError` when limits exceeded.
Tests — Metadata Extractor `tests/test_metadata_extractor.py`	Add parameterized `test_pickle_metadata_enforces_read_limit` (cases: 256, 64, 0, -1) asserting success for valid limits and specific extraction errors for invalid/non-positive limits. Add `test_pickle_metadata_caps_configured_read_limit_at_10_mib` to verify clamping to 10 MiB. Existing dangerous-opcode tests unchanged.

Sequence Diagram

sequenceDiagram
    participant Scanner as PickleScanner
    participant Config as Configuration
    participant FileIO as File I/O
    participant Validator as Validator
    participant Metadata as MetadataExtractor

    Scanner->>Config: read max_metadata_pickle_read_size
    Config-->>Scanner: return limit (clamped to 10 MiB)

    Scanner->>FileIO: stat(file) -> size
    FileIO-->>Scanner: return size

    Scanner->>Validator: ensure limit > 0
    Validator-->>Scanner: ok / raise ValueError

    Scanner->>Validator: compare file size <= limit
    alt file size > limit
        Validator-->>Scanner: raise ValueError
    else file size <= limit
        Scanner->>FileIO: read up to (limit + 1) bytes
        FileIO-->>Scanner: return pickle_data

        Scanner->>Validator: ensure len(pickle_data) <= limit
        alt read exceeded limit
            Validator-->>Scanner: raise ValueError
        else within limit
            Scanner->>Scanner: infer pickle_protocol from pickle_data
            Scanner->>Metadata: extract metadata from limited data
            Metadata-->>Scanner: return metadata
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble bytes with careful paws,
A cap in place to mind the laws.
Read one extra to catch the leak,
If limits break, I sound the beak.
Hop, scan, and stash the tidy freak.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main security fix: bounding pickle metadata reads to prevent DoS attacks during metadata extraction.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/fix-dos-vulnerability-in-pickle-extraction

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 6129-6148: The code allows non-positive max_metadata_read_size
which causes f.read(-1) to read the entire file and bypass the guard; change the
logic in the pickle metadata read block (where max_metadata_read_size,
get_file_size, open(file_path) and pickle_data are used) to reject non-positive
values or coerce them to a safe default (e.g., 10*1024*1024) before reading;
specifically, validate max_metadata_read_size > 0 at the start of the try block
and if it is <= 0 raise a ValueError (or set it to the documented default) so
that f.read is always called with a bounded positive length and the subsequent
length checks remain effective.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2bc940b8-4f9f-4bb4-9f56-2b5dbb99ec89

📥 Commits

Reviewing files that changed from the base of the PR and between d9fe283 and b639d2d.

📒 Files selected for processing (2)

modelaudit/scanners/pickle_scanner.py
tests/test_metadata_extractor.py

modelaudit/scanners/pickle_scanner.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_metadata_extractor.py`:
- Around line 469-489: The parametrized test
test_pickle_metadata_enforces_read_limit currently only asserts failure cases;
add a valid success case (e.g., limit 256) to the ("limit", "expected_error")
matrix and update assertions so that when limit is the success value the test
asserts that PickleScanner({"max_metadata_pickle_read_size":
limit}).extract_metadata(...) does not contain "extraction_error" (i.e., assert
"extraction_error" not in metadata), using the same tmp_path pkl_file setup and
referencing PickleScanner and the max_metadata_pickle_read_size config key.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4e0768a7-e299-4842-8114-1b30f84ba28b

📥 Commits

Reviewing files that changed from the base of the PR and between b639d2d and d3ff557.

📒 Files selected for processing (2)

modelaudit/scanners/pickle_scanner.py
tests/test_metadata_extractor.py

tests/test_metadata_extractor.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 6885-6895: The code reads max_metadata_read_size from self.config
but only rejects non-positive values and allows arbitrarily large sizes; change
the logic around max_metadata_read_size in pickle_scanner.py (the variable
max_metadata_read_size, the config.get call) so that after parsing you enforce a
hard upper bound of 10 * 1024 * 1024 (10 MiB) — e.g., validate > 0 then apply
max_metadata_read_size = min(max_metadata_read_size, 10 * 1024 * 1024) (or
replace oversized values with the cap) to ensure metadata reads cannot exceed 10
MiB.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 38d7bd2d-53f6-42de-9347-8bad4c7dc0e8

📥 Commits

Reviewing files that changed from the base of the PR and between d3ff557 and 93c38ad.

📒 Files selected for processing (1)

modelaudit/scanners/pickle_scanner.py

modelaudit/scanners/pickle_scanner.py

…bility-in-pickle-extraction

Clamp caller-supplied metadata read limits to 10 MiB, keep malformed limit parsing inside extract_metadata error handling, and cover the success and hard-cap paths in regression tests. Co-authored-by: Codex <noreply@openai.com>

…bility-in-pickle-extraction

fix(security): bound pickle metadata reads to prevent DoS

b639d2d

mldangelo added aardvark codex labels Mar 16, 2026 — with ChatGPT Codex Connector

mldangelo added the aardvark label Mar 16, 2026

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

modelaudit/scanners/pickle_scanner.py Outdated Show resolved Hide resolved

fix: reject invalid pickle metadata read limits

d3ff557

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

tests/test_metadata_extractor.py Show resolved Hide resolved

mldangelo added 2 commits March 17, 2026 07:02

Merge branch 'main' into review-pr-712

5f89bff

Merge remote-tracking branch 'origin/main' into audit-pr712-mainmerge

93c38ad

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

modelaudit/scanners/pickle_scanner.py Outdated Show resolved Hide resolved

mldangelo and others added 4 commits March 18, 2026 07:02

Merge remote-tracking branch 'origin/main' into review-pr712-refresh

56110af

Merge remote-tracking branch 'origin/main' into codex/fix-dos-vulnera…

38fdb78

…bility-in-pickle-extraction

fix: enforce pickle metadata hard cap

7cb61a3

Clamp caller-supplied metadata read limits to 10 MiB, keep malformed limit parsing inside extract_metadata error handling, and cover the success and hard-cap paths in regression tests. Co-authored-by: Codex <noreply@openai.com>

Merge remote-tracking branch 'origin/main' into codex/fix-dos-vulnera…

ca8199e

…bility-in-pickle-extraction

mldangelo-oai merged commit f1d0698 into main Mar 20, 2026
5 of 6 checks passed

mldangelo-oai deleted the codex/fix-dos-vulnerability-in-pickle-extraction branch March 20, 2026 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): bound pickle metadata reads in metadata extraction#712

fix(security): bound pickle metadata reads in metadata extraction#712
mldangelo-oai merged 8 commits intomainfrom
codex/fix-dos-vulnerability-in-pickle-extraction

mldangelo commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mldangelo commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mldangelo commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading