New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #41

Merged
merged 2 commits into from Oct 21, 2012

Conversation

Projects
None yet
5 participants
@korny
Contributor

korny commented Oct 21, 2012

Hey, I'm playing around with Rouge right now. It's awesome to have a really good port of Pygments in Ruby.

The only thing it misses right now is comparable speed. In my tests (Ruby 1.9.3 and Python 2.7.2), rougify is about 4-7x slower than pygmentize.

To help things, I detected two hot spots that are easy to optimize. It should speed up highlighting by about 40%:

1.9.3-p194 ~/ruby/rouge:master time ruby -Ilib bin/rougify highlight code.json > /dev/null
real    0m3.437s

1.9.3-p194 ~/ruby/rouge:speedup time ruby -Ilib bin/rougify highlight code.json > /dev/null
real    0m2.077s

1.9.3-p194 ~/ruby/rouge:master time ruby -Ilib bin/rougify highlight code.rb > /dev/null
real    0m12.583s

1.9.3-p194 ~/ruby/rouge:speedup time ruby -Ilib bin/rougify highlight code.rb > /dev/null
real    0m6.885s
  • code.rb is the Rouge code (348K)
  • code.json is a simple JSON log file (216K)

Maybe we can find more such optimizations. Based on my own experience, I firmly believe that Ruby code can be as fast as Python code without sacrificing too much beauty.

@jneen

This comment has been minimized.

Show comment
Hide comment
@jneen

jneen Oct 21, 2012

Owner

Neat, thanks! The @debug hotspot is a great one to optimize. I'm a bit wary of extending core classes like Regexp, especially since the only reason we need that is for a hacky workaround to a ruby bug :. But I think it should be possible to cache that on the Rule object instead, since a Rule never changes its regex.

Owner

jneen commented Oct 21, 2012

Neat, thanks! The @debug hotspot is a great one to optimize. I'm a bit wary of extending core classes like Regexp, especially since the only reason we need that is for a hacky workaround to a ruby bug :. But I think it should be possible to cache that on the Rule object instead, since a Rule never changes its regex.

jneen added a commit that referenced this pull request Oct 21, 2012

Merge pull request #41 from korny/speedup
Performance improvements

@jneen jneen merged commit 9343724 into jneen:master Oct 21, 2012

1 check passed

default The Travis build passed
Details
@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Oct 21, 2012

Contributor

But I think it should be possible to cache that on the Rule object instead, since a Rule never changes its regex.

Great idea, I didn't see that :-)

Contributor

korny commented Oct 21, 2012

But I think it should be possible to cache that on the Rule object instead, since a Rule never changes its regex.

Great idea, I didn't see that :-)

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Oct 31, 2012

Contributor

Just wanted to share this with you.

Rouge performance comparison

Rouge is almost as fast as Pygments now, at least on my system.

Contributor

korny commented Oct 31, 2012

Just wanted to share this with you.

Rouge performance comparison

Rouge is almost as fast as Pygments now, at least on my system.

@jneen

This comment has been minimized.

Show comment
Hide comment
@jneen

jneen Oct 31, 2012

Owner

Neat! That's very... validating :)

Owner

jneen commented Oct 31, 2012

Neat! That's very... validating :)

@siong1987

This comment has been minimized.

Show comment
Hide comment
@siong1987

siong1987 Aug 22, 2013

@korny wonder whether you have the latest data on the speed of Rouge 4.0 comparing to others?

siong1987 commented Aug 22, 2013

@korny wonder whether you have the latest data on the speed of Rouge 4.0 comparing to others?

@korny korny deleted the korny:speedup branch Aug 23, 2013

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Aug 23, 2013

Contributor

I only have benchmarks up to 0.3.5 right now. Were there any performance-related changes since then?

Contributor

korny commented Aug 23, 2013

I only have benchmarks up to 0.3.5 right now. Were there any performance-related changes since then?

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Aug 23, 2013

Contributor

highlighting-performance

"speedup 2" were some experimental changes for Rouge that I played around with.

Contributor

korny commented Aug 23, 2013

highlighting-performance

"speedup 2" were some experimental changes for Rouge that I played around with.

@siong1987

This comment has been minimized.

Show comment
Hide comment
@siong1987

siong1987 Aug 23, 2013

Thanks. I am just wondering. :)

On Friday, August 23, 2013 at 5:31 AM, Kornelius Kalnbach wrote:


Reply to this email directly or view it on GitHub (#41 (comment)).

siong1987 commented Aug 23, 2013

Thanks. I am just wondering. :)

On Friday, August 23, 2013 at 5:31 AM, Kornelius Kalnbach wrote:


Reply to this email directly or view it on GitHub (#41 (comment)).

@siong1987

This comment has been minimized.

Show comment
Hide comment
@siong1987

siong1987 Aug 23, 2013

One of the reason that jekyll isn't using rouge(instead of pyments) is the performance concern. It seems like rouge is actually comparable in term of speed from your graph.

Maybe it is time to revise this issue: jekyll/jekyll#930

siong1987 commented Aug 23, 2013

One of the reason that jekyll isn't using rouge(instead of pyments) is the performance concern. It seems like rouge is actually comparable in term of speed from your graph.

Maybe it is time to revise this issue: jekyll/jekyll#930

@jneen

This comment has been minimized.

Show comment
Hide comment
@jneen

jneen Aug 24, 2013

Owner

@korny which version of ruby are you testing against? I know they've focused a lot on performance lately, and I'd be interested to see how it shakes out on f.ex. the falcon fork. Do you have a script to automate that graph? I'd love to play around with it.

Owner

jneen commented Aug 24, 2013

@korny which version of ruby are you testing against? I know they've focused a lot on performance lately, and I'd be interested to see how it shakes out on f.ex. the falcon fork. Do you have a script to automate that graph? I'd love to play around with it.

@siong1987

This comment has been minimized.

Show comment
Hide comment
@siong1987

siong1987 Aug 24, 2013

@jayferd yeah. that will be cool if you @korny could commit the script to the repo. 👍

siong1987 commented Aug 24, 2013

@jayferd yeah. that will be cool if you @korny could commit the script to the repo. 👍

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Aug 24, 2013

Contributor

@jayferd Tests were done with Ruby 2.0.0p247 (the latest one). Didn't write a script, just ran some basic commands using time. On holiday right now, but I'll create a repo containing the sample code etc. when I get back.

Ruby 2 is much faster than 1.8, and marginally faster than 1.9. I think it's fair to compare Ruby 2.0 with Python 2.7 or 3, but I don't know about specific setups. I'm always testing on a 2011 MacBook Pro (i7) which has awesome Ruby (= single-thread) performance.

The biggest issue with my stats is that I have not enough knowledge about Python to really tweak it. For example, I only tested with C-Python 2.7 and 3 (3 is slower), but there seem to be faster implementations out there. We should ask the Pygments guys how to properly benchmark it.

Also, how about exploring the possibilities of precompilation or some other nice hacks to make Rouge even faster? CodeRay shows that Ruby can scan code much faster using ad-hoc scanners instead of a DSL. If we could somehow make the DSL-based scanners compile into single-method scanners, it would make CodeRay mostly obsolete.

Contributor

korny commented Aug 24, 2013

@jayferd Tests were done with Ruby 2.0.0p247 (the latest one). Didn't write a script, just ran some basic commands using time. On holiday right now, but I'll create a repo containing the sample code etc. when I get back.

Ruby 2 is much faster than 1.8, and marginally faster than 1.9. I think it's fair to compare Ruby 2.0 with Python 2.7 or 3, but I don't know about specific setups. I'm always testing on a 2011 MacBook Pro (i7) which has awesome Ruby (= single-thread) performance.

The biggest issue with my stats is that I have not enough knowledge about Python to really tweak it. For example, I only tested with C-Python 2.7 and 3 (3 is slower), but there seem to be faster implementations out there. We should ask the Pygments guys how to properly benchmark it.

Also, how about exploring the possibilities of precompilation or some other nice hacks to make Rouge even faster? CodeRay shows that Ruby can scan code much faster using ad-hoc scanners instead of a DSL. If we could somehow make the DSL-based scanners compile into single-method scanners, it would make CodeRay mostly obsolete.

@jneen

This comment has been minimized.

Show comment
Hide comment
@jneen

jneen Aug 25, 2013

Owner

I should also mention that the ruby lexer in particular catches a few more edge cases than the one in pygments - I don't remember which ones exactly, but it's possible it makes it slower, especially since the rouge code contains so much complicated regex syntax. I'll take a minute soon to make sure pygments actually lexes rouge correctly.

Owner

jneen commented Aug 25, 2013

I should also mention that the ruby lexer in particular catches a few more edge cases than the one in pygments - I don't remember which ones exactly, but it's possible it makes it slower, especially since the rouge code contains so much complicated regex syntax. I'll take a minute soon to make sure pygments actually lexes rouge correctly.

@korny

This comment has been minimized.

Show comment
Hide comment
@korny
Contributor

korny commented Aug 29, 2013

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Aug 29, 2013

Contributor

As for the results, I'd say that Rouge is in the same ballpark as Pygments (and Ruby wrappers around it), but still a bit slower.

The most interesting output format would be HTML snippets with CSS classes. Two ways to look at it: speed (input code kilobytes per second) or time (how many seconds it takes to highlight a file).

Speed

speed

Time

time

Contributor

korny commented Aug 29, 2013

As for the results, I'd say that Rouge is in the same ballpark as Pygments (and Ruby wrappers around it), but still a bit slower.

The most interesting output format would be HTML snippets with CSS classes. Two ways to look at it: speed (input code kilobytes per second) or time (how many seconds it takes to highlight a file).

Speed

speed

Time

time

@jneen

This comment has been minimized.

Show comment
Hide comment
@jneen

jneen Aug 29, 2013

Owner

Cool! I just pushed a branch I'd like to try this out on (spike.token-perf). I'll also take a look at the json and ruby lexers in particular, since a lot of optimization can be done on a per-lexer basis.

Owner

jneen commented Aug 29, 2013

Cool! I just pushed a branch I'd like to try this out on (spike.token-perf). I'll also take a look at the json and ruby lexers in particular, since a lot of optimization can be done on a per-lexer basis.

@siong1987

This comment has been minimized.

Show comment
Hide comment
@siong1987

siong1987 commented Aug 29, 2013

@korny 👍

@robin850

This comment has been minimized.

Show comment
Hide comment
@robin850

robin850 Aug 29, 2013

Contributor

@korny : Awesome, thank you! Could you also tell us what did you use to get these nice graphs please ?

Contributor

robin850 commented Aug 29, 2013

@korny : Awesome, thank you! Could you also tell us what did you use to get these nice graphs please ?

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Aug 29, 2013

Contributor

@robin850 That's just Numbers :-) It's pretty awesome for making some quick graphs, because it's more flexible then Excel when it comes to layout.

Here's the file: http://rubychan.de/share/Shootout.numbers

Contributor

korny commented Aug 29, 2013

@robin850 That's just Numbers :-) It's pretty awesome for making some quick graphs, because it's more flexible then Excel when it comes to layout.

Here's the file: http://rubychan.de/share/Shootout.numbers

@jneen

This comment has been minimized.

Show comment
Hide comment
@jneen

jneen Aug 30, 2013

Owner

Cool, I just merged a few commits that tightened things up a bit, the main one being that lexers no longer specify strings as tokens, but constants. So instead of rule /regex/, 'Literal.String.Regex', it's rule /regex/, Literal::String::Regex, which eliminates token lookup entirely. But it's a major API change, so it's gonna be in 0.5.

Owner

jneen commented Aug 30, 2013

Cool, I just merged a few commits that tightened things up a bit, the main one being that lexers no longer specify strings as tokens, but constants. So instead of rule /regex/, 'Literal.String.Regex', it's rule /regex/, Literal::String::Regex, which eliminates token lookup entirely. But it's a major API change, so it's gonna be in 0.5.

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Aug 30, 2013

Contributor

@jayferd:

lexers no longer specify strings as tokens, but constants

Interesting. Did you find this to be faster?

Contributor

korny commented Aug 30, 2013

@jayferd:

lexers no longer specify strings as tokens, but constants

Interesting. Did you find this to be faster?

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Aug 30, 2013

Contributor

I benchmarked again with latest rouge master, and it's now faster than Pygments for both HTML and Ruby!

rouge-master

Looks very promising. If @jayferd tweaks the lexers a bit more, I'm pretty sure Rouge can finally beat Pygments in all categories…

(I also re-benchmarked Pygments.rb and Albino, the numbers were strange. Pygments.rb is supposed to be faster than Albino because it seems to avoid reloading the Python bridge every time, but Albino can't be much faster than the calling pygmentize…)

Contributor

korny commented Aug 30, 2013

I benchmarked again with latest rouge master, and it's now faster than Pygments for both HTML and Ruby!

rouge-master

Looks very promising. If @jayferd tweaks the lexers a bit more, I'm pretty sure Rouge can finally beat Pygments in all categories…

(I also re-benchmarked Pygments.rb and Albino, the numbers were strange. Pygments.rb is supposed to be faster than Albino because it seems to avoid reloading the Python bridge every time, but Albino can't be much faster than the calling pygmentize…)

@robin850

This comment has been minimized.

Show comment
Hide comment
@robin850

robin850 Aug 30, 2013

Contributor

@korny : Thank you! :-)

Contributor

robin850 commented Aug 30, 2013

@korny : Thank you! :-)

@jneen

This comment has been minimized.

Show comment
Hide comment
@jneen

jneen Aug 30, 2013

Owner

:D. Yep, that's why I waited to merge it until I'd run the shootout.

Owner

jneen commented Aug 30, 2013

:D. Yep, that's why I waited to merge it until I'd run the shootout.

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Aug 31, 2013

Contributor

I think we're doing this wrong. What I did was benchmarking performance for very large files (200 kB and up). For a blog generator like Jekyll, the most typical input for the syntax highlighter would be a short code example, somewhere between 1 and 20 lines of code. So we're talking about less than 1 kB.

This matters a lot because even Pygments.rb still has a small overhead for calling the Python engine. For HTML output, this overhead is ~1ms. Every call takes at least this long.

So for a small input (let's take 42 bytes), Rouge beats Pygments by a large margin:

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (0 kB)
=> html                 0.15 ms             1.11 ms

HTML (0 kB)
=> html                 0.23 ms             1.23 ms

JSON (0 kB)
=> html                 0.23 ms             1.13 ms

RUBY (0 kB)
=> html                 0.26 ms             1.33 ms

If we look at an input size of 400 bytes, Rouge is still faster:

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (0 kB)
=> html                 2.70 ms             2.75 ms

HTML (0 kB)
=> html                 1.57 ms             1.74 ms

JSON (0 kB)
=> html                 1.82 ms             2.08 ms

RUBY (0 kB)
=> html                 3.11 ms             3.84 ms

For 1000 bytes, the winner starts to depend on the language:

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (1 kB)
=> html                 6.97 ms             7.31 ms

HTML (1 kB)
=> html                 3.95 ms             2.64 ms

JSON (1 kB)
=> html                 4.28 ms             3.41 ms

RUBY (1 kB)
=> html                 6.97 ms             8.53 ms

(Of course, this all depends heavily on the kind of code you input. For a single multi-line comment you get different numbers than a code golfing competition winner. I tried to use real-world code examples, but YMMV.)

At the high end (10 kB and up), Rouge is clearly slower than Pygments (except for Ruby code):

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (10 kB)
=> html                64.50 ms            45.48 ms

HTML (10 kB)
=> html                43.24 ms            20.84 ms

JSON (10 kB)
=> html                42.79 ms            24.35 ms

RUBY (10 kB)
=> html                78.34 ms            76.47 ms

For very large files (1 MB), Pygments wins:

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (1000 kB)
=> html              6454.04 ms          4596.14 ms

HTML (1000 kB)
=> html              3746.45 ms          1720.50 ms

JSON (1000 kB)
=> html              4261.04 ms          2176.45 ms

RUBY (1000 kB)
=> html              3433.13 ms          3189.98 ms

So, instead of looking at raw theoretical performance for large data sets, I suggest we should look at actual real-world cases that would affect Jekyll's performance.

Contributor

korny commented Aug 31, 2013

I think we're doing this wrong. What I did was benchmarking performance for very large files (200 kB and up). For a blog generator like Jekyll, the most typical input for the syntax highlighter would be a short code example, somewhere between 1 and 20 lines of code. So we're talking about less than 1 kB.

This matters a lot because even Pygments.rb still has a small overhead for calling the Python engine. For HTML output, this overhead is ~1ms. Every call takes at least this long.

So for a small input (let's take 42 bytes), Rouge beats Pygments by a large margin:

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (0 kB)
=> html                 0.15 ms             1.11 ms

HTML (0 kB)
=> html                 0.23 ms             1.23 ms

JSON (0 kB)
=> html                 0.23 ms             1.13 ms

RUBY (0 kB)
=> html                 0.26 ms             1.33 ms

If we look at an input size of 400 bytes, Rouge is still faster:

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (0 kB)
=> html                 2.70 ms             2.75 ms

HTML (0 kB)
=> html                 1.57 ms             1.74 ms

JSON (0 kB)
=> html                 1.82 ms             2.08 ms

RUBY (0 kB)
=> html                 3.11 ms             3.84 ms

For 1000 bytes, the winner starts to depend on the language:

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (1 kB)
=> html                 6.97 ms             7.31 ms

HTML (1 kB)
=> html                 3.95 ms             2.64 ms

JSON (1 kB)
=> html                 4.28 ms             3.41 ms

RUBY (1 kB)
=> html                 6.97 ms             8.53 ms

(Of course, this all depends heavily on the kind of code you input. For a single multi-line comment you get different numbers than a code golfing competition winner. I tried to use real-world code examples, but YMMV.)

At the high end (10 kB and up), Rouge is clearly slower than Pygments (except for Ruby code):

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (10 kB)
=> html                64.50 ms            45.48 ms

HTML (10 kB)
=> html                43.24 ms            20.84 ms

JSON (10 kB)
=> html                42.79 ms            24.35 ms

RUBY (10 kB)
=> html                78.34 ms            76.47 ms

For very large files (1 MB), Pygments wins:

                    Rouge 0.5.0   Pygments.rb 0.5.2
   C (1000 kB)
=> html              6454.04 ms          4596.14 ms

HTML (1000 kB)
=> html              3746.45 ms          1720.50 ms

JSON (1000 kB)
=> html              4261.04 ms          2176.45 ms

RUBY (1000 kB)
=> html              3433.13 ms          3189.98 ms

So, instead of looking at raw theoretical performance for large data sets, I suggest we should look at actual real-world cases that would affect Jekyll's performance.

@siong1987

This comment has been minimized.

Show comment
Hide comment
@siong1987

siong1987 Aug 31, 2013

Nice observation.

It's good to know that Rouge is faster in smaller file. But one side effect of improving speed for general file size is that speed for small file size might be improved too.

We should definitely bring this up on Jekyll.

On Saturday, August 31, 2013 at 6:43 AM, Kornelius Kalnbach wrote:

I think we're doing this wrong. What I did was benchmarking performance for very large files (200 kB and up). For a blog generator like Jekyll, the most typical input for the syntax highlighter would be a short code example, somewhere between 1 and 20 lines of code. So we're talking about less than 1 kB.
This matters a lot because even Pygments.rb still has a small overhead for calling the Python engine. For HTML output, this overhead is ~1ms. Every call takes at least this long.
So for a small input (let's take 42 bytes), Rouge beats Pygments by a large margin:
Rouge 0.5.0 Pygments.rb 0.5.2 C (0 kB) => html 0.15 ms 1.11 ms HTML (0 kB) => html 0.23 ms 1.23 ms JSON (0 kB) => html 0.23 ms 1.13 ms RUBY (0 kB) => html 0.26 ms 1.33 ms
If we look at an input size of 400 bytes, Rouge is still faster:
Rouge 0.5.0 Pygments.rb 0.5.2 C (0 kB) => html 2.70 ms 2.75 ms HTML (0 kB) => html 1.57 ms 1.74 ms JSON (0 kB) => html 1.82 ms 2.08 ms RUBY (0 kB) => html 3.11 ms 3.84 ms
For 1000 bytes, the winner starts to depend on the language:
Rouge 0.5.0 Pygments.rb 0.5.2 C (1 kB) => html 6.97 ms 7.31 ms HTML (1 kB) => html 3.95 ms 2.64 ms JSON (1 kB) => html 4.28 ms 3.41 ms RUBY (1 kB) => html 6.97 ms 8.53 ms
(Of course, this all depends heavily on the kind of code you input. For a single multi-line comment you get different numbers than a code golfing competition winner. I tried to use real-world code examples, but YMMV.)
At the high end (10 kB and up), Rouge is clearly slower than Pygments (except for Ruby code):
Rouge 0.5.0 Pygments.rb 0.5.2 C (10 kB) => html 64.50 ms 45.48 ms HTML (10 kB) => html 43.24 ms 20.84 ms JSON (10 kB) => html 42.79 ms 24.35 ms RUBY (10 kB) => html 78.34 ms 76.47 ms
For very large files (1 MB), Pygments wins:
Rouge 0.5.0 Pygments.rb 0.5.2 C (1000 kB) => html 6454.04 ms 4596.14 ms HTML (1000 kB) => html 3746.45 ms 1720.50 ms JSON (1000 kB) => html 4261.04 ms 2176.45 ms RUBY (1000 kB) => html 3433.13 ms 3189.98 ms
So, instead of looking at raw theoretical performance for large data sets, I suggest we should look at actual real-world cases that would affect Jekyll's performance.


Reply to this email directly or view it on GitHub (#41 (comment)).

siong1987 commented Aug 31, 2013

Nice observation.

It's good to know that Rouge is faster in smaller file. But one side effect of improving speed for general file size is that speed for small file size might be improved too.

We should definitely bring this up on Jekyll.

On Saturday, August 31, 2013 at 6:43 AM, Kornelius Kalnbach wrote:

I think we're doing this wrong. What I did was benchmarking performance for very large files (200 kB and up). For a blog generator like Jekyll, the most typical input for the syntax highlighter would be a short code example, somewhere between 1 and 20 lines of code. So we're talking about less than 1 kB.
This matters a lot because even Pygments.rb still has a small overhead for calling the Python engine. For HTML output, this overhead is ~1ms. Every call takes at least this long.
So for a small input (let's take 42 bytes), Rouge beats Pygments by a large margin:
Rouge 0.5.0 Pygments.rb 0.5.2 C (0 kB) => html 0.15 ms 1.11 ms HTML (0 kB) => html 0.23 ms 1.23 ms JSON (0 kB) => html 0.23 ms 1.13 ms RUBY (0 kB) => html 0.26 ms 1.33 ms
If we look at an input size of 400 bytes, Rouge is still faster:
Rouge 0.5.0 Pygments.rb 0.5.2 C (0 kB) => html 2.70 ms 2.75 ms HTML (0 kB) => html 1.57 ms 1.74 ms JSON (0 kB) => html 1.82 ms 2.08 ms RUBY (0 kB) => html 3.11 ms 3.84 ms
For 1000 bytes, the winner starts to depend on the language:
Rouge 0.5.0 Pygments.rb 0.5.2 C (1 kB) => html 6.97 ms 7.31 ms HTML (1 kB) => html 3.95 ms 2.64 ms JSON (1 kB) => html 4.28 ms 3.41 ms RUBY (1 kB) => html 6.97 ms 8.53 ms
(Of course, this all depends heavily on the kind of code you input. For a single multi-line comment you get different numbers than a code golfing competition winner. I tried to use real-world code examples, but YMMV.)
At the high end (10 kB and up), Rouge is clearly slower than Pygments (except for Ruby code):
Rouge 0.5.0 Pygments.rb 0.5.2 C (10 kB) => html 64.50 ms 45.48 ms HTML (10 kB) => html 43.24 ms 20.84 ms JSON (10 kB) => html 42.79 ms 24.35 ms RUBY (10 kB) => html 78.34 ms 76.47 ms
For very large files (1 MB), Pygments wins:
Rouge 0.5.0 Pygments.rb 0.5.2 C (1000 kB) => html 6454.04 ms 4596.14 ms HTML (1000 kB) => html 3746.45 ms 1720.50 ms JSON (1000 kB) => html 4261.04 ms 2176.45 ms RUBY (1000 kB) => html 3433.13 ms 3189.98 ms
So, instead of looking at raw theoretical performance for large data sets, I suggest we should look at actual real-world cases that would affect Jekyll's performance.


Reply to this email directly or view it on GitHub (#41 (comment)).

@jneen

This comment has been minimized.

Show comment
Hide comment
@jneen

jneen Sep 1, 2013

Owner

Yeah, perhaps a better test would be to sequentially lex all the demos a number of times (since a blog post is likely to have lots of small snippets). That would test across all the lexers as well, and I think there are some that are pretty inefficient (although honestly ruby is one of the most complex ones).

The other angle is eliminating hangs, which certain inputs can cause (see #77 and #78).

Owner

jneen commented Sep 1, 2013

Yeah, perhaps a better test would be to sequentially lex all the demos a number of times (since a blog post is likely to have lots of small snippets). That would test across all the lexers as well, and I think there are some that are pretty inefficient (although honestly ruby is one of the most complex ones).

The other angle is eliminating hangs, which certain inputs can cause (see #77 and #78).

@marcamillion

This comment has been minimized.

Show comment
Hide comment
@marcamillion

marcamillion Jun 4, 2016

Any further update on these tests guys? Have recent optimizations been done that would cause Rogue to be faster than Pygments in all categories for all file sizes?

marcamillion commented Jun 4, 2016

Any further update on these tests guys? Have recent optimizations been done that would cause Rogue to be faster than Pygments in all categories for all file sizes?

@korny

This comment has been minimized.

Show comment
Hide comment
@korny

korny Jun 4, 2016

Contributor

I ran the shootout again with the latest versions of Ruby 2.3, Python 2.7, Rouge, and Pygments.rb. Lower times are better.

$ rake SHOOTERS="Rouge Pygments.rb" LANGUAGES=c METRIC=time SIZES="[42, 400, 1000, 10_000, 1_000_000]" REPEATS=1 FORMATS=html

                       Welcome to
  ~~~ The Great Syntax Highlighter Shootout v1.7 ~~~

using Ruby 2.3.1 and Python 2.7.11, repeating 1 times

                   Rouge 1.10.1   Pygments.rb 0.6.3
using 42 bytes
   C (10000 repeats)
=> html                 0.03 ms             0.66 ms
---------------------------------------------------
Total score           1211 kB/s             64 kB/s
Relative                                     5.27 %

using 400 bytes
   C (2500 repeats)
=> html                 0.79 ms             1.73 ms
---------------------------------------------------
Total score            506 kB/s            231 kB/s
Relative                                    45.73 %

using 1000 bytes
   C (1000 repeats)
=> html                 2.14 ms             3.50 ms
---------------------------------------------------
Total score            466 kB/s            285 kB/s
Relative                                    61.17 %

using 10000 bytes
   C (100 repeats)
=> html                19.76 ms            27.17 ms
---------------------------------------------------
Total score            506 kB/s            368 kB/s
Relative                                    72.73 %

using 1000000 bytes
   C (1 repeats)
=> html              1959.04 ms          2706.23 ms
---------------------------------------------------
Total score            510 kB/s            370 kB/s
Relative                                    72.39 %

Rouge is clearly faster than Pygments.rb in all cases.

Contributor

korny commented Jun 4, 2016

I ran the shootout again with the latest versions of Ruby 2.3, Python 2.7, Rouge, and Pygments.rb. Lower times are better.

$ rake SHOOTERS="Rouge Pygments.rb" LANGUAGES=c METRIC=time SIZES="[42, 400, 1000, 10_000, 1_000_000]" REPEATS=1 FORMATS=html

                       Welcome to
  ~~~ The Great Syntax Highlighter Shootout v1.7 ~~~

using Ruby 2.3.1 and Python 2.7.11, repeating 1 times

                   Rouge 1.10.1   Pygments.rb 0.6.3
using 42 bytes
   C (10000 repeats)
=> html                 0.03 ms             0.66 ms
---------------------------------------------------
Total score           1211 kB/s             64 kB/s
Relative                                     5.27 %

using 400 bytes
   C (2500 repeats)
=> html                 0.79 ms             1.73 ms
---------------------------------------------------
Total score            506 kB/s            231 kB/s
Relative                                    45.73 %

using 1000 bytes
   C (1000 repeats)
=> html                 2.14 ms             3.50 ms
---------------------------------------------------
Total score            466 kB/s            285 kB/s
Relative                                    61.17 %

using 10000 bytes
   C (100 repeats)
=> html                19.76 ms            27.17 ms
---------------------------------------------------
Total score            506 kB/s            368 kB/s
Relative                                    72.73 %

using 1000000 bytes
   C (1 repeats)
=> html              1959.04 ms          2706.23 ms
---------------------------------------------------
Total score            510 kB/s            370 kB/s
Relative                                    72.39 %

Rouge is clearly faster than Pygments.rb in all cases.

@marcamillion

This comment has been minimized.

Show comment
Hide comment
@marcamillion

marcamillion Jun 8, 2016

Awesome. Thanks @korny. Just what I was looking for.

marcamillion commented Jun 8, 2016

Awesome. Thanks @korny. Just what I was looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment