Skip to content

Pure Python pickle.py Raises Wrong Exceptions for Invalid Data #141749

@djoume

Description

@djoume

Bug report

Bug description:

Summary

The pure Python implementation of pickle.py in CPython's standard library raises incorrect exceptions (KeyError and IndexError) when unpickling invalid data, instead of the expected pickle.UnpicklingError. This bug is masked in normal CPython usage because the C implementation (_pickle) is used by default, but it affects any Python implementation that relies on the pure Python fallback (e.g., Brython, PyPy in certain modes).

Issue Details

When the pure Python _Unpickler class encounters invalid pickle data, it fails to provide proper error handling in two scenarios:

Problem 1: Invalid Opcodes (KeyError)

When an unrecognized opcode is encountered, dispatch[key[0]](self) raises KeyError instead of UnpicklingError.

Expected behavior: pickle.UnpicklingError: invalid load key, '\xff'.
Actual behavior: KeyError: 255

Problem 2: Missing MARK Instructions (IndexError)

When opcodes like TUPLE are encountered without a preceding MARK instruction, pop_mark() attempts to pop from an empty metastack, raising IndexError instead of UnpicklingError.

Expected behavior: pickle.UnpicklingError: could not find MARK
Actual behavior: IndexError: pop from empty list

Steps to Reproduce

Reproducing in CPython (Force Pure Python Implementation)

The bug exists in CPython's pickle.py but is not normally visible because CPython uses the C _pickle module. To reproduce, you must disable the C implementation:

# test_pickle_bug.py
import sys

# Block the C implementation to force pure Python fallback
sys.modules['_pickle'] = None

import pickle

print("Test 1: Invalid opcode (should raise UnpicklingError)")
try:
    pickle.loads(b'\xff\xff\xff')
    print("FAIL: Expected UnpicklingError")
except pickle.UnpicklingError as e:
    print(f"PASS: {e}")
except KeyError as e:
    print(f"FAIL: Got KeyError instead: {e}")

print("\nTest 2: Missing MARK (should raise UnpicklingError)")
try:
    pickle.loads(b'this is not valid pickle data')
    print("FAIL: Expected UnpicklingError")
except pickle.UnpicklingError as e:
    print(f"PASS: {e}")
except IndexError as e:
    print(f"FAIL: Got IndexError instead: {e}")

Expected output (with C _pickle):

Test 1: Invalid opcode (should raise UnpicklingError)
PASS: invalid load key, '\xff'.

Test 2: Missing MARK (should raise UnpicklingError)
PASS: could not find MARK

Actual output (with pure Python pickle.py):

Test 1: Invalid opcode (should raise UnpicklingError)
FAIL: Got KeyError instead: 255

Test 2: Missing MARK (should raise UnpicklingError)
FAIL: Got IndexError instead: pop from empty list

Root Cause Analysis

Location 1: _Unpickler.load() method

File: Lib/pickle.py
Approximate line: 1212 (may vary by Python version)

try:
    while True:
        key = read(1)
        if not key:
            raise EOFError
        assert isinstance(key, bytes_types)
        dispatch[key[0]](self)  # <-- KeyError raised here for invalid opcodes
except _Stop as stopinst:
    return stopinst.value

The dispatch dictionary lookup is not wrapped in error handling, so KeyError propagates directly to the caller instead of being converted to UnpicklingError.

Location 2: _Unpickler.pop_mark() method

File: Lib/pickle.py
Approximate line: 1219 (may vary by Python version)

def pop_mark(self):
    items = self.stack
    self.stack = self.metastack.pop()  # <-- IndexError raised here if metastack is empty
    self.append = self.stack.append
    return items

When pop_mark() is called without a preceding MARK instruction, the metastack is empty, causing list.pop() to raise IndexError.

Proposed Fix

Add proper error handling to both locations:

Fix 1: Handle KeyError in load() method

try:
    while True:
        key = read(1)
        if not key:
            raise EOFError
        assert isinstance(key, bytes_types)
        try:
            dispatch[key[0]](self)
        except KeyError:
            raise UnpicklingError(
                f"invalid load key, '\\x{key[0]:02x}'.")
except _Stop as stopinst:
    return stopinst.value

Fix 2: Handle empty metastack in pop_mark() method

def pop_mark(self):
    if not self.metastack:
        raise UnpicklingError("could not find MARK")
    items = self.stack
    self.stack = self.metastack.pop()
    self.append = self.stack.append
    return items

Impact

Who is affected?

  • Alternative Python implementations that cannot use the C _pickle module (e.g., Brython, potentially PyPy in pure mode, MicroPython if it includes pickle)
  • Security tools that analyze pickle data and expect proper exception types
  • Testing frameworks that validate pickle error handling
  • Code that explicitly uses pure Python pickle for debugging or compatibility

Why this matters?

  1. API consistency: The pure Python implementation should behave identically to the C implementation
  2. Exception handling: Code that catches pickle.UnpicklingError for malformed data will fail to catch these errors
  3. Security: Malformed pickle data should be handled gracefully with appropriate error messages
  4. Debugging: Proper error messages help developers identify pickle format issues

Verification

After applying the fix, both test cases should pass:

import sys
sys.modules['_pickle'] = None
import pickle

# Test 1: Invalid opcode
try:
    pickle.loads(b'\xff\xff\xff')
    assert False, "Should raise UnpicklingError"
except pickle.UnpicklingError as e:
    assert "invalid load key" in str(e)
    print("✓ Test 1 passed")

# Test 2: Missing MARK
try:
    pickle.loads(b'this is not valid pickle data')
    assert False, "Should raise UnpicklingError"
except pickle.UnpicklingError as e:
    assert "could not find MARK" in str(e)
    print("✓ Test 2 passed")

# Test 3: Using mock file (from unittest.mock)
from io import BytesIO
from unittest.mock import patch, mock_open

invalid_data = b"invalid pickle"
with patch("builtins.open", mock_open(read_data=invalid_data)):
    with open("dummy", "rb") as f:
        try:
            pickle.load(f)
            assert False, "Should raise UnpicklingError"
        except pickle.UnpicklingError:
            print("✓ Test 3 passed")

Additional Context

This issue was discovered while fixing pickle compatibility in Brython. The bug probably exists in all versions of CPython but has gone unnoticed because:

  1. CPython always uses the C _pickle module in practice
  2. The pure Python pickle.py is rarely executed
  3. Most developers don't test with malformed pickle data

However, the pure Python implementation is part of the standard library API contract and should behave identically to the C implementation.

Related Code Patterns

Note that CPython's pickle.py already has precedent for this type of error handling. For example, the load_get(), load_binget(), and load_long_binget() methods all wrap KeyError exceptions when accessing the memo dictionary:

def load_binget(self):
    i = self.read(1)[0]
    try:
        self.append(self.memo[i])
    except KeyError as exc:
        msg = f'Memo value not found at index {i}'
        raise UnpicklingError(msg) from None

The proposed fixes follow this same established pattern.

Python Versions Affected

All Python versions with pure Python pickle.py implementation (verified on 3.9 and 3.14, likely affects 3.9+).

CPython versions tested on:

3.14

Operating systems tested on:

macOS

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions