fix: speed up parsing long lists #2302
The List tokenizer was using a RegEx to capture the potential next list Item, and then split that captured text to work line-by-line to determine if it had proper indentation, etc. and whether each line should be part of the current list Item.
Problem is, the captured text was literally the entire document, so for every potential list item, we were capturing the entire document and then splitting the document into lines. For longer documents, this meant spending the majority of time just splitting the document into lines over and over.
Here's the offending Regex, and notice that
This PR changes that so we only capture the first line (with it's bullet point), and once we verify that it is a candidate for starting a new list Item, we just traverse the SRC one line at a time. No more mass line-splitting when we really only need to look one line at a time anyway.
Offending line-splitting line where 90% of processing time was spent:
Needs a bit of cleanup, but it's passing tests.
In most cases, this should be a different person than the contributor.
The text was updated successfully, but these errors were encountered:
This pull request is being automatically deployed with Vercel (learn more).
@UziTech Cleaned up the logic how I wanted. Passes the specs but is failing the SNYK security test here and I'm not sure why.
Otherwise, this is now ready to merged.
Edit: Ah, there was a merge conflict somewhere setting it off. All fixed now!