Performance tuning for RegexLexer and HTML formatter #114

korny · 2013-11-21T02:14:51Z

TL;DR

Merge it to make Rouge 2x as fast.

Story

This branch is called "hiddensee" because that's where I am right now, on holiday ;-)

Continuing my tinkering with the internals of CodeRay and Rouge, I dug deeper into the DSL-/rule-based lexer system, comparing it to CodeRay's hand-written approach to find out where they differ and what's standing in the way of better performance. It took a few days, but I think I have a set of patches ready that make Rouge much faster than current master (especially faster than Pygments.rb) without changing any of the lexer's definitions.

I tried to reduce the changes to a minimum, and while some of them may have a relatively minor impact on performance, all of them together give Rouge a remarkable speed boost, running at least twice as fast in my benchmarks.

Benchmarks

The tests include example inputs in C, CSS, HTML, JavaScript, JSON, Perl, and Ruby.

Depending on input size, Pygments.rb is mostly faster than Rouge right now for files larger than 1kB. Below that, the additional overhead of switching to Python seems to slow it down.

With these patches, Rouge can match and surpass Pygments' speed, even for large files.

If you have Numbers, you can download the detailed result tables that I used to create the graph. All benchmarks were done with Ruby 2.0.0p247 on the Shootout v1.6.

Changes

Here's an overview of the changes, roughly ordered by the impact they have on performance:

Lexer / RegexLexer

Unroll deeply nested calls during rule recognition and rule execution. Especially:
- Remove #with_output_stream. I found no reason for the use of Enumerator::Yielder; the whole thing with the nested callbacks seemed pretty convoluted, but perhaps I'm missing something?
- Remove the block parameter from #step, because the value is static. Instead, @output_stream is set by #stream_tokens at the beginning.
- The three main actions (token, state push and :pop!) are no longer using DRY method calls, but are optimized to manipulate the stack and output stream directly. This allows for several other optimizations, like knowing that the state is already a Symbol when pushing. They also make use of the given stream parameter.
- ::get_state uses Symbols instead of Strings as keys for the states hash now, because we can directly try @states[next_state] when pushing.
- Inline #run_rule and #run_callback into the #step method to reduce the number of method calls considerably, and use local variables as far as possible. This allows using next instead of return.
Using if @debug as a guard for debug statements, because evaluating an instance variable is much faster than calling a method with a block. This eliminates the need for the hack that redefines the method when it's called for the first time, but makes debug { … } lines a bit more ugly.
Optimize the hotspot, #step:
- Use StringScanner#skip instead of StringScanner#scan, because it's faster, and the size of the match is more useful at this point than the match itself.
- Check @null_steps only after incrementing it.
- Use if instead of case in #step, and rely on rule being a Rule when it's not a State.
- Make Rule#beginning_of_line an attribute instead of a method.
Define @states as a cache for self.class.states.
Define @null_steps beforehand because we need it anyway.
Set the default value for the second parameter of #token directly in the signature. Using a magic value like :__absent__ seems strange to me.
Token#name, #parent, and #shortname are now attributes instead of methods.

HTML formatter

Replace CGI.escape_html (which is quite slow) with a simplified version that only escapes &, <, and >.
- In my opinion there's no need to escape ", ', and / outside of argument lists and script tags.
- I left your TODO in because I'm not sure what it was about.
Optimize the hotspot, #span:
- Combine yield calls as much as possible.
- Don't use inspect to wrap the token shortname in quotes. Its definition could change in the future, and it's faster to add the quotes to the wrapping strings.
- Call tok.shortname only once.
- Use if instead of case.
Remove block parameter in #stream_tableized and span (it already uses yield instead).

Open issues

There are more things that I wanted to optimize:

The @group_count = 0 line in #step looks awfully wasteful, but I couldn't find a better way.
Do we really need the @null_steps check? It seems to be a good tool for debugging, but the Lexers shouldn't have such bugs in the first place. Silently failing here is also a bit weird…it would be great to get rid of this construction.

Cheers!

korny · 2013-11-21T02:38:02Z

Pulling @siong1987, @robin850, and @mattr- into the discussion. Referencing jekyll/jekyll#930 and #41.

korny · 2013-11-21T02:42:21Z

Another graph, this time comparing a typical file size of 1kB in different languages:

mattr- · 2013-11-21T03:46:12Z

Based on the benchmarks provided here, having this merged would get me very excited about including this in Jekyll.

mattr- · 2013-11-21T03:46:48Z

Also, @korny, this is nice work. Thanks for the detailed description and the benchmarks.

jneen · 2013-11-21T05:56:59Z

Brilliant, merged. I'm okay with regex_lexer.rb being a bit ugly for the sake of performance, as long as it's still easy to write a lexer :).

jneen · 2013-11-21T05:58:59Z

I'd like to keep the @null_steps check unless there's a really pressing need to get rid of it. Ideally lexers would be bug-free, but that check ensures that the failure mode is "you get error tokens in the output" rather than "your server crashes because rouge infinite-looped". I'd be okay with it raising an error instead, for example, but I think the safety check is still necessary.

jneen · 2013-11-21T08:35:03Z

I killed RegexLexer#group (and hence the @group_count assignment) in c95a5d3 and d469551.

robin850 · 2013-11-21T15:55:22Z

Awesome work @korny! ❤️

jneen · 2013-11-21T18:14:46Z

I've got one or two lexers in the pipeline before I release (OCaml and SML in particular), and then I'll release 1.2.0.

siong1987 · 2013-11-21T19:33:19Z

@korny 👍

jneen · 2013-11-26T21:24:06Z

This is released in v1.2.0.

@kevin-buttercoin

…uby#114 Thanks to @kevin-buttercoin for this fix!

lots of performance tuning (combined in 1 commit)

cf83b49

jneen merged commit cf83b49 into rouge-ruby:master Nov 21, 2013

siong1987 mentioned this pull request Nov 21, 2013

Add support for the Rouge syntax highlighter jekyll/jekyll#930

Closed

lucidbee pushed a commit to BonsaiAI/rouge that referenced this pull request Mar 16, 2017

Fix scrollIntoView undefined bug when there is no hash, fixes rouge-r…

be28b78

…uby#114 Thanks to @kevin-buttercoin for this fix!

pyrmont mentioned this pull request May 30, 2019

JSON rewrite #1029

Merged

ashmaroli mentioned this pull request Jun 3, 2019

Bring back CGI.escape_html in Formatters::HTML #1154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance tuning for RegexLexer and HTML formatter #114

Performance tuning for RegexLexer and HTML formatter #114

korny commented Nov 21, 2013

korny commented Nov 21, 2013

korny commented Nov 21, 2013

mattr- commented Nov 21, 2013

mattr- commented Nov 21, 2013

jneen commented Nov 21, 2013

jneen commented Nov 21, 2013

jneen commented Nov 21, 2013

robin850 commented Nov 21, 2013

jneen commented Nov 21, 2013

siong1987 commented Nov 21, 2013

jneen commented Nov 26, 2013

Performance tuning for RegexLexer and HTML formatter #114

Performance tuning for RegexLexer and HTML formatter #114

Conversation

korny commented Nov 21, 2013

TL;DR

Story

Benchmarks

Changes

Lexer / RegexLexer

HTML formatter

Open issues

korny commented Nov 21, 2013

korny commented Nov 21, 2013

mattr- commented Nov 21, 2013

mattr- commented Nov 21, 2013

jneen commented Nov 21, 2013

jneen commented Nov 21, 2013

jneen commented Nov 21, 2013

robin850 commented Nov 21, 2013

jneen commented Nov 21, 2013

siong1987 commented Nov 21, 2013

jneen commented Nov 26, 2013