Speed up JSON and reduce HTML formatter consumption #1569
In #1425, the author of an app that depends on Pygments reported slowness when running Pygments against large JSON files.
I investigated by generating a 118MB JSON file as input, using the inputs reported by the author of that app. I found that the regex parser's
I also found that Pygments was consuming ~3GB of memory when formatting the 118MB JSON file in HTML. It appears that the buffered file I/O is caching everything in memory before finally writing to the file all at once as Pygments exits. While I wasn't able to stop the buffering from waiting until the entire file was in memory, I was able to shave off an entire gigabyte of memory consumption by caching the opening span classes that are generated per-token (like
For the Terminal256 formatter, this patch cuts the total runtime almost in half, dropping from 2:02 to 1:09 total processing time (as measured by Powershell's Measure-Command).
For the HTML formatter, the gains are more significant:
All output from the HTML and Terminal256 formatters is 100% byte-for-byte identical between master branch and this branch.
The text was updated successfully, but these errors were encountered:
Changes in this patch: * Update the JSON-LD URL to HTTPS * Update the list of JSON-LD keywords * Make the JSON-LD parser less dependent on the JSON lexer implementation * Add unit tests for the JSON-LD lexer
Related to pygments#1425 Included in this change: * The JSON parser is rewritten * The JSON bare object parser no longer requires additional code * `get_tokens_unprocessed()` returns as much as it can to reduce yields (for example, side-by-side punctuation is not returned separately) * The unit tests were updated * Add unit tests based on Hypothesis test results
Related to pygments#1425 Tested on a 118MB JSON file. Memory consumption tops out at ~3GB before this patch and drops to only ~2GB with this patch. These were the command lines used: python -m pygments -l json -f html -o .\new-code-classes.html .\jc-output.txt python -m pygments -l json -f html -O "noclasses" -o .\new-code-styles.html .\jc-output.txt