Improve CddlLexer performance #1959

blu-base · 2021-11-15T21:50:15Z

The CddlLexer always seemed incredibly slow for the tiny provided example file. Profiling showed that the re engine took most of the time. After some testing the _re_id expression proved to be the culprit.

An other minor change improved grouping of the String.Single Token. That's the only reason the examplefile output needs an update.

If you want, I can remove the modification for the String.Single token to only keep the important change to the _re_id expression

The cddl lexer always seemed incredibly slow for the tiny example file. Profiling showed that the re engine took most of the time. After some testing the _re_id expression proved to be the culprit.

blu-base · 2021-11-15T22:24:47Z

The failed check seems to be a fluke?

  =================================== FAILURES ===================================
  ________________________ test_happy_javascript_fragment ________________________
  
  lexer_html = <pygments.lexers.HtmlLexer>
  
      def test_happy_javascript_fragment(lexer_html):
          """valid, even long Javascript fragments should still get parsed ok"""
      
          fragment = "<script type=\"text/javascript\">"+"alert(\"hi\");"*2000+"</script>"
          start_time = time.time()
          tokens = list(lexer_html.get_tokens(fragment))
          assert all(x[1] != Token.Error for x in tokens)
  >       assert time.time() - start_time < MAX_HL_TIME, \
              'The HTML lexer might have an expensive happy-path script case'
  E       AssertionError: The HTML lexer might have an expensive happy-path script case
  E       assert (1637013420.331315 - 1637013409.7813637) < 10
  E        +  where 1637013420.331315 = <built-in function time>()
  E        +    where <built-in function time> = time.time
  
  /home/runner/work/pygments/pygments/tests/test_html_lexer.py:31: AssertionError
  =========================== short test summary info ============================
  FAILED tests/test_html_lexer.py::test_happy_javascript_fragment - AssertionEr...
  ============ 1 failed, 3775 passed, 12 skipped in 294.47s (0:04:54) ============

birkenfeld · 2021-11-16T07:00:10Z

Looks good to me (and I like the cleanup for the string case), thanks!

Improve CddlLexer performance

e4d8bf8

The cddl lexer always seemed incredibly slow for the tiny example file. Profiling showed that the re engine took most of the time. After some testing the _re_id expression proved to be the culprit.

birkenfeld merged commit d9f2e6c into pygments:master Nov 16, 2021

blu-base deleted the cddl_performance branch November 16, 2021 22:06

birkenfeld added the changelog-update Items which need to get mentioned in the changelog label Nov 17, 2021

hwayne pushed a commit to hwayne/pygments that referenced this pull request Nov 29, 2021

Improve CddlLexer performance (pygments#1959)

e2e0d4f

Anteru added this to the 2.11.0 milestone Dec 12, 2021

Anteru removed the changelog-update Items which need to get mentioned in the changelog label Dec 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CddlLexer performance #1959

Improve CddlLexer performance #1959

blu-base commented Nov 15, 2021 •

edited

blu-base commented Nov 15, 2021 •

edited

birkenfeld commented Nov 16, 2021

Improve CddlLexer performance #1959

Improve CddlLexer performance #1959

Conversation

blu-base commented Nov 15, 2021 • edited

blu-base commented Nov 15, 2021 • edited

birkenfeld commented Nov 16, 2021

blu-base commented Nov 15, 2021 •

edited

blu-base commented Nov 15, 2021 •

edited