Skip to content

::JSON.pretty_generate should sort hash keys #976

@jcpunk

Description

@jcpunk

Summary
JSON.pretty_generate currently preserves the insertion order of Ruby hashes. This leads to non-deterministic output when hash construction order varies, making diffs noisy and reproducibility harder. Add an option to sort hash keys during generation to produce stable, predictable JSON output.

Motivation / Problem

  • In many workflows (config generation, test fixtures, CI artifacts), stable serialization is critical.

  • Hash insertion order may differ across code paths, Ruby versions, or data sources, causing semantically identical objects to produce different JSON.

  • This complicates:

    • Git diffs and code reviews
    • Caching and content hashing
    • Snapshot testing
    • Reproducible builds

Proposed Solution
Introduce an option to JSON.pretty_generate (and possibly JSON.generate) to sort object keys lexicographically.

API Options (one of):

  1. Keyword argument:

    JSON.pretty_generate(obj, sort_keys: true)
  2. Extend JSON::State:

    state = JSON::State.new(sort_keys: true)
    JSON.pretty_generate(obj, state)

Behavior

  • When sort_keys: true, all hashes are serialized with keys sorted (string comparison).
  • Default remains false to preserve backward compatibility and performance characteristics.

Example

obj = { b: 1, a: 2 }

JSON.pretty_generate(obj)
# => {
#      "b": 1,
#      "a": 2
#    }

JSON.pretty_generate(obj, sort_keys: true)
# => {
#      "a": 2,
#      "b": 1
#    }

Alternatives Considered

  • Pre-sorting hashes before serialization:

    • Requires deep traversal and duplication of data structures
    • Error-prone and inefficient for large nested objects
  • Relying on insertion order discipline:

    • Not robust across boundaries or contributors

Impact

  • Improves determinism and reproducibility across tooling and environments
  • Reduces diff noise and improves developer experience
  • Aligns with behavior available in other ecosystems (e.g., Python’s json.dumps(sort_keys=True))

Performance Considerations

  • Sorting introduces overhead proportional to key count per object
  • Acceptable when opt-in; no impact on default behavior

Backward Compatibility

  • Fully backward compatible if default remains unsorted

Test Plan

  • Unit tests verifying:

    • Sorted vs unsorted output for flat and deeply nested hashes
    • Stability across multiple invocations
    • Mixed key types (symbols/strings) normalized to strings before sort
  • Benchmark comparison with and without sorting

Open Questions

  • Should sorting be strictly lexicographic on stringified keys?
  • Should there be a global default toggle via JSON::State configuration?

Additional Context
This feature would support reproducible outputs in CI pipelines and long-lived systems where deterministic artifacts are a requirement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions