Skip to content

nripankadas07/safejson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

safejson

License: MIT Python: 3.9+ Typed: strict Tests: 183 Coverage: 100%

Hardened JSON parser for Python. Drop-in replacement for the relevant bits of json.loads that adds the safety knobs you want when the input might be hostile:

  • Configurable depth limit — protects against deep-nesting attacks that crash the stdlib decoder with RecursionError.
  • Configurable string / array / object / total-node limits — refuses gigantic payloads before allocating them.
  • Duplicate-key detection — the stdlib silently keeps the last value; safejson raises by default.
  • NaN / Infinity rejection — pure-JSON only by default; opt in with allow_nan=True.
  • Type whitelisting — restrict the result to a known set of Python types.
  • Streaming pre-scan — every limit is enforced before json.loads is allowed to allocate, so a malicious payload can never blow the stack or fill memory.
  • Zero runtime dependencies, fully typed (py.typed).

Why?

The stdlib json module is fast and correct, but it is recursive, accepts NaN / Infinity, has no notion of resource limits, and silently picks the last value when an object key repeats. These choices are fine for hand-written documents and disastrous when the input is attacker-controlled.

safejson keeps the stdlib decoder in the loop (it's the fast path), but wraps it with an iterative, allocation-free pre-scan that enforces every safety policy before any Python object is constructed.

Install

pip install safejson

Python 3.9+ is required.

Usage

from safejson import loads, Limits

# Sensible defaults: max_depth=64, NaN/Infinity rejected, duplicate
# keys rejected.
value = loads('{"name": "Forge", "tags": ["python", "json"]}')

# Tighten the policy for untrusted input:
strict = Limits(
    max_depth=16,
    max_string_length=4096,
    max_array_length=1024,
    max_object_keys=128,
    max_total_nodes=8192,
)
value = loads(payload, limits=strict)

# Allow IEEE-754 specials when interoperating with a producer
# that emits them (e.g. NumPy, JavaScript engines):
value = loads("Infinity", allow_nan=True)

# Allow repeated keys (last write wins, like the stdlib):
value = loads('{"a": 1, "a": 2}', allow_duplicate_keys=True)
# -> {"a": 2}

# Whitelist permitted result types:
value = loads(
    '{"id": 1, "name": "alpha"}',
    allowed_types=(dict, str, int),
)

If you only need a yes/no answer ("would this parse safely?") and don't want to allocate the result, use validate:

from safejson import validate, Limits

try:
    validate(payload, limits=Limits(max_depth=16))
except SafeJsonError as exc:
    log.warning("rejected: %s", exc)

API

loads(text, *, limits=None, allow_nan=False, allow_duplicate_keys=False, allowed_types=None) -> Any

Parse text (str, bytes, or bytearray) into a Python value. Bytes are decoded as UTF-8; non-UTF-8 input raises JsonSyntaxError.

Parameter Default Effect
limits DEFAULT_LIMITS Resource ceilings. None ⇒ defaults.
allow_nan False Permit NaN / Infinity / -Infinity.
allow_duplicate_keys False Last-write-wins instead of raising.
allowed_types None Iterable of permitted Python types.

validate(text, *, limits=None, allow_nan=False) -> None

Run the safety scan without constructing the result. Useful for preflight checks or when you only need to gate an upstream decode.

Limits(...)

Immutable, hashable dataclass:

Field Default Meaning
max_depth 64 Maximum container nesting.
max_string_length None Maximum chars per string / key.
max_array_length None Maximum elements per array.
max_object_keys None Maximum keys per object.
max_total_nodes None Maximum scalar + container starts.

None for any size limit disables that check. max_depth=0 is invalid; size limits may be 0 (meaning "permit empty containers / strings but nothing else").

DEFAULT_LIMITS is a module-level singleton equal to Limits().

Errors

SafeJsonError(ValueError)
├── JsonSyntaxError       (.detail, .position)
├── LimitExceededError    (.limit_name, .actual, .limit, .position)
├── DuplicateKeyError     (.key, .position)
├── DisallowedConstantError (.constant)
└── DisallowedTypeError   (.type_name)

All errors also inherit ValueError, so existing except ValueError: catches still work.

Comparison with json.loads

Behaviour json.loads safejson.loads
Depth limit None (recurses; crashes ~990) Limits.max_depth (default 64)
Max string length None Limits.max_string_length
Max array length None Limits.max_array_length
Max object keys None Limits.max_object_keys
NaN / Infinity Accepted Rejected (opt-in with allow_nan)
Duplicate keys Silent overwrite Rejected (opt-in with allow_duplicate_keys)
Type whitelist N/A allowed_types=...
Recursion-safe No Yes (iterative scanner)

Running Tests

pip install pytest pytest-cov mypy
pytest                                # 183 tests
pytest --cov=safejson --cov-branch    # 100% line + 100% branch
mypy --strict src/safejson            # strict, 0 errors

License

MIT

About

Hardened JSON parser: configurable depth/size limits, duplicate-key detection, NaN/Infinity rejection, type whitelisting, allocation-free streaming pre-scan. Zero deps.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages