Skip to content

Avoid redundant whitespace scan in json.loads() for documents without surrounding whitespace #150860

@gaborbernat

Description

@gaborbernat

Decoding a JSON document with json.loads() runs a whitespace-skipping regular expression at the start and end of the document through the pure-Python decode() wrapper, on every call, before and after the C scanner does the actual parsing. The overwhelming majority of documents have no leading or trailing whitespace, so both matches scan zero characters yet still pay for the call and the match-object allocation. For the small documents that dominate real traffic, that fixed overhead is a meaningful fraction of the total decode time.

Skipping the leading match when the document does not begin with whitespace, and the trailing match when the parse already consumed the whole string, removes that overhead from the common case. On a tiny document it is roughly 1.5x faster; documents that do have surrounding whitespace keep the original behavior, and large documents are unaffected.

This overlaps with the broader rewrite proposed in gh-117397; it is filed separately so the small, self-contained version can be evaluated on its own.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions