Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CddlLexer performance #1959

Merged
merged 1 commit into from Nov 16, 2021
Merged

Conversation

blu-base
Copy link
Contributor

@blu-base blu-base commented Nov 15, 2021

The CddlLexer always seemed incredibly slow for the tiny provided example file. Profiling showed that the re engine took most of the time. After some testing the _re_id expression proved to be the culprit.

An other minor change improved grouping of the String.Single Token. That's the only reason the examplefile output needs an update.


If you want, I can remove the modification for the String.Single token to only keep the important change to the _re_id expression

The cddl lexer always seemed incredibly slow for the tiny example
file. Profiling showed that the re engine took most of the time.
After some testing the _re_id expression proved to be the culprit.
@blu-base
Copy link
Contributor Author

blu-base commented Nov 15, 2021

The failed check seems to be a fluke?

  =================================== FAILURES ===================================
  ________________________ test_happy_javascript_fragment ________________________
  
  lexer_html = <pygments.lexers.HtmlLexer>
  
      def test_happy_javascript_fragment(lexer_html):
          """valid, even long Javascript fragments should still get parsed ok"""
      
          fragment = "<script type=\"text/javascript\">"+"alert(\"hi\");"*2000+"</script>"
          start_time = time.time()
          tokens = list(lexer_html.get_tokens(fragment))
          assert all(x[1] != Token.Error for x in tokens)
  >       assert time.time() - start_time < MAX_HL_TIME, \
              'The HTML lexer might have an expensive happy-path script case'
  E       AssertionError: The HTML lexer might have an expensive happy-path script case
  E       assert (1637013420.331315 - 1637013409.7813637) < 10
  E        +  where 1637013420.331315 = <built-in function time>()
  E        +    where <built-in function time> = time.time
  
  /home/runner/work/pygments/pygments/tests/test_html_lexer.py:31: AssertionError
  =========================== short test summary info ============================
  FAILED tests/test_html_lexer.py::test_happy_javascript_fragment - AssertionEr...
  ============ 1 failed, 3775 passed, 12 skipped in 294.47s (0:04:54) ============

@birkenfeld
Copy link
Member

Looks good to me (and I like the cleanup for the string case), thanks!

@birkenfeld birkenfeld merged commit d9f2e6c into pygments:master Nov 16, 2021
@blu-base blu-base deleted the cddl_performance branch November 16, 2021 22:06
@birkenfeld birkenfeld added the changelog-update Items which need to get mentioned in the changelog label Nov 17, 2021
hwayne pushed a commit to hwayne/pygments that referenced this pull request Nov 29, 2021
@Anteru Anteru added this to the 2.11.0 milestone Dec 12, 2021
@Anteru Anteru removed the changelog-update Items which need to get mentioned in the changelog label Dec 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants