Skip to content

Commit da2df33

Browse files
sytelusclaude
andcommitted
switch RestrictedUnpickler from blocklist to allowlist for pickle hardening
The previous blocklist approach could be bypassed via modules not on the list (e.g. types, functools, _thread). The allowlist only permits modules TensorWatch actually needs (builtins, collections, numpy, torch, pandas, tensorwatch, and pickle internals), blocking everything else by default. Also adds security logging for HMAC verification failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent f286895 commit da2df33

3 files changed

Lines changed: 72 additions & 41 deletions

File tree

README.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,11 @@ TensorWatch supports Python 3.x and is tested with PyTorch 0.4-1.x. Most feature
5858
> - All incoming ZMQ messages are HMAC-SHA256 verified **before**
5959
> deserialization (`ZmqWrapper.verify_and_loads`). Messages with invalid
6060
> signatures are rejected without being deserialized.
61-
> - A `RestrictedUnpickler` blocks known-dangerous modules (`os`,
62-
> `subprocess`, `socket`, `ctypes`, etc.) as defense-in-depth.
61+
> - A `RestrictedUnpickler` uses an **allowlist** of permitted modules
62+
> (builtins, collections, numpy, torch, pandas, tensorwatch, and pickle
63+
> internals) as defense-in-depth. Any module not explicitly approved is
64+
> blocked, which is significantly stronger than the previous blocklist
65+
> approach.
6366
> - For multi-process setups, set the `TENSORWATCH_HMAC_KEY` environment
6467
> variable to a shared hex-encoded secret (e.g.
6568
> `export TENSORWATCH_HMAC_KEY=$(python -c "import os; print(os.urandom(32).hex())")`).
@@ -76,9 +79,10 @@ TensorWatch supports Python 3.x and is tested with PyTorch 0.4-1.x. Most feature
7679
> from files. A crafted pickle file can execute arbitrary code when loaded.
7780
>
7881
> **Mitigations in place:**
79-
> - A `RestrictedUnpickler` blocks known-dangerous modules (`os`,
80-
> `subprocess`, etc.) as defense-in-depth. This is **not** a complete
81-
> sandbox — determined attackers may find bypasses.
82+
> - A `RestrictedUnpickler` uses an **allowlist** of permitted modules
83+
> as defense-in-depth. Only modules TensorWatch needs (builtins,
84+
> collections, numpy, torch, pandas, tensorwatch) are allowed; all
85+
> others are blocked by default.
8286
>
8387
> **User responsibilities:**
8488
> - **Only open TensorWatch data files (`.log`, `.pkl`) that you created
@@ -97,8 +101,8 @@ TensorWatch supports Python 3.x and is tested with PyTorch 0.4-1.x. Most feature
97101
> | Risk | Mitigation | User Action |
98102
> |------|-----------|-------------|
99103
> | `eval()` on expressions from clients | HMAC auth + localhost binding | Never expose ports to untrusted networks |
100-
> | `pickle.loads()` from ZMQ | HMAC + RestrictedUnpickler | Keep HMAC key secret |
101-
> | `pickle.load()` from files | RestrictedUnpickler (defense-in-depth) | Only load trusted files |
104+
> | `pickle.loads()` from ZMQ | HMAC + allowlist RestrictedUnpickler | Keep HMAC key secret |
105+
> | `pickle.load()` from files | Allowlist RestrictedUnpickler (defense-in-depth) | Only load trusted files |
102106
> | YAML deserialization | `yaml.SafeLoader` by default | Do not override with unsafe loaders |
103107
> | ZMQ port exposure | Binds to `127.0.0.1` by default | Do not change to `0.0.0.0` in untrusted environments |
104108

tensorwatch/safe_pickle.py

Lines changed: 52 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -3,41 +3,52 @@
33

44
"""Defense-in-depth RestrictedUnpickler for TensorWatch.
55
6-
Blocks modules commonly exploited in pickle deserialization attacks while
7-
allowing the data types that TensorWatch legitimately serializes (numpy
8-
arrays, torch tensors, TensorWatch data classes, built-in collections, etc.).
9-
10-
WARNING: This is NOT a complete sandbox. A determined attacker may still find
11-
bypass techniques. Do not load pickle data from untrusted sources.
6+
Uses an **allowlist** approach: only modules that TensorWatch legitimately
7+
needs for serialization are permitted. Everything else is blocked by default.
8+
9+
Allowed module families:
10+
- Python builtins and standard-library data types (collections, datetime,
11+
decimal, fractions, numbers, uuid)
12+
- Pickle internals (_codecs, copyreg, _collections — used by the pickle
13+
protocol itself)
14+
- numpy, torch, pandas — data-science libraries whose objects appear in
15+
StreamItem values
16+
- tensorwatch — the library's own data classes
17+
18+
WARNING: This is NOT a complete sandbox. Pickle allowlists reduce the attack
19+
surface dramatically compared to blocklists, but do not eliminate all risk.
20+
Do not load pickle data from untrusted sources.
1221
"""
1322

1423
import io
1524
import pickle
1625
import logging
1726

18-
_BLOCKED_MODULES = frozenset({
19-
# OS / filesystem access
20-
'os', 'posix', 'nt', 'os.path',
21-
'shutil', 'pathlib',
22-
'tempfile', 'glob', 'fnmatch',
23-
# Process / subprocess execution
24-
'subprocess', 'multiprocessing',
25-
'pty', 'commands',
26-
# Code compilation / execution
27-
'code', 'codeop', 'compileall',
28-
'importlib', 'runpy', 'pkgutil',
29-
# Network
30-
'socket', 'http', 'urllib', 'ftplib', 'smtplib', 'xmlrpc',
31-
'socketserver', 'asyncio',
32-
# Low-level / FFI
33-
'ctypes', 'mmap',
34-
# Interactive / debug
35-
'pdb', 'profile', 'webbrowser',
36-
# Signal handling
37-
'signal',
27+
# ---------------------------------------------------------------------------
28+
# Allowlist — a module is permitted if its top-level package appears here.
29+
# Any module NOT in this set is rejected outright.
30+
# ---------------------------------------------------------------------------
31+
_ALLOWED_PREFIXES = frozenset({
32+
# Python built-in / standard-library data types
33+
'builtins',
34+
'collections', '_collections', # _collections holds C-accelerated types
35+
'datetime',
36+
'decimal',
37+
'fractions',
38+
'numbers',
39+
'uuid',
40+
# Pickle protocol internals (required for __reduce__ reconstruction)
41+
'copyreg',
42+
'_codecs',
43+
# Data-science libraries
44+
'numpy',
45+
'torch',
46+
'pandas',
47+
# TensorWatch's own types
48+
'tensorwatch',
3849
})
3950

40-
# Specific names blocked from builtins module
51+
# Even within allowed modules, these builtins are too dangerous to permit.
4152
_BLOCKED_BUILTINS = frozenset({
4253
'eval', 'exec', 'compile', '__import__',
4354
'open', 'input', 'breakpoint',
@@ -46,25 +57,32 @@
4657
'getattr', 'setattr', 'delattr',
4758
})
4859

60+
_log = logging.getLogger(__name__)
61+
4962

5063
class RestrictedUnpickler(pickle.Unpickler):
51-
"""Unpickler that blocks known-dangerous modules and callables.
64+
"""Unpickler that only allows classes from explicitly approved modules.
5265
53-
Allowed: numpy, torch, tensorwatch, collections, standard data types.
54-
Blocked: os, subprocess, socket, ctypes, importlib, etc.
66+
Uses an allowlist (not a blocklist), so unknown or newly-introduced
67+
dangerous modules are blocked by default.
5568
"""
5669

5770
def find_class(self, module, name):
5871
top_module = module.split('.')[0]
5972

60-
if top_module in _BLOCKED_MODULES:
73+
# Reject any module not in the allowlist
74+
if top_module not in _ALLOWED_PREFIXES:
75+
_log.warning("Pickle restricted: blocked %s.%s (module '%s' not in allowlist)",
76+
module, name, top_module)
6177
raise pickle.UnpicklingError(
62-
"Blocked: unpickling {}.{} is not allowed "
63-
"(module '{}' is restricted)".format(module, name, top_module))
78+
"Blocked: module '{}' is not in the allowlist "
79+
"(attempted to load {}.{})".format(top_module, module, name))
6480

81+
# Block dangerous builtins even though 'builtins' is allowed
6582
if top_module == 'builtins' and name in _BLOCKED_BUILTINS:
83+
_log.warning("Pickle restricted: blocked builtins.%s", name)
6684
raise pickle.UnpicklingError(
67-
"Blocked: unpickling builtins.{} is not allowed".format(name))
85+
"Blocked: builtins.{} is not allowed".format(name))
6886

6987
return super().find_class(module, name)
7088

tensorwatch/zmq_wrapper.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
from .safe_pickle import restricted_loads
1616
import weakref, logging
1717

18+
_log = logging.getLogger(__name__)
19+
1820
class ZmqWrapper:
1921

2022
_thread:Thread = None
@@ -56,11 +58,18 @@ def verify_and_loads(signed_data: bytes):
5658
Raises ValueError if the signature does not match.
5759
"""
5860
if len(signed_data) < 32:
61+
_log.warning("ZMQ security: rejected message too short for HMAC "
62+
"(%d bytes) - possible malformed or unsigned message",
63+
len(signed_data))
5964
raise ValueError("Message too short to contain HMAC signature")
6065
sig = signed_data[:32]
6166
payload = signed_data[32:]
6267
expected = hmac.new(ZmqWrapper.get_hmac_key(), payload, hashlib.sha256).digest()
6368
if not hmac.compare_digest(sig, expected):
69+
_log.warning("ZMQ security: HMAC verification failed - "
70+
"rejecting untrusted message (%d bytes payload). "
71+
"This may indicate a connection from an unauthorized "
72+
"process or a tampering attempt.", len(payload))
6473
raise ValueError("HMAC verification failed - rejecting untrusted message")
6574
return restricted_loads(payload)
6675

0 commit comments

Comments
 (0)