New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize header parsing #513

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@glebm
Contributor

glebm commented May 18, 2018

Fixes #505

Benchmarks:

"setext":

ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}'

"atx":

ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}'

@glebm glebm force-pushed the glebm:header-parsing branch 2 times, most recently from bc1c4f8 to 5fa05e5 May 18, 2018

Optimize header parsing
Fixes #505

Benchmarks:

"setext":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}'
```

"atx":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}'
```

@glebm glebm force-pushed the glebm:header-parsing branch from 5fa05e5 to 9b35ff2 May 18, 2018

@krasnoukhov

Seems like this fixes a performance issue with no breaking changes. Good work!

@gettalong

Thanks for the pull request and your great work!

Before I merge, please also remove the changes to .gitignore and remove the zero-byte test/testcases/block/04_header/atx_header.hcd file.

@src.check(ATX_HEADER_MATCH)
level, text, id = @src[1], @src[2].to_s.strip, @src[3]
text, id = parse_header_contents
text.sub!(/[\t ]#+\z/, '') && text.rstrip!

This comment has been minimized.

@gettalong

gettalong May 23, 2018

Owner

Is there a reason why these two statements are on one line? I would rather prefer them on separate lines.

This comment has been minimized.

@glebm

glebm May 24, 2018

Contributor

There is no need to call rstrip! if sub! return nil.

The other options here are:

A:

text.rstrip! if text.sub!(...)

B:

if text.sub!(...)
  text.rstrip!
end

I find the current one (&&) to be the most readable but can go with A or B if that's what you prefer.

This comment has been minimized.

@gettalong

gettalong May 24, 2018

Owner

Personally I would just put them on separate lines and take this minimal performance hit because it would be easier to see what's going on without the additional conditional. But I'm okay with leaving this as it is.

@@ -13,47 +13,60 @@ module Kramdown
module Parser
class Kramdown

HEADER_ID=/(?:[ \t]+\{#([A-Za-z][\w:-]*)\})?/
SETEXT_HEADER_START = /^(#{OPT_SPACE}[^ \t].*?)#{HEADER_ID}[ \t]*?\n(-|=)+\s*?\n/
SETEXT_HEADER_START = /^#{OPT_SPACE}(?<contents>.*)\n(?<level>[-=])[-=]*[ \t\r\f\v]*\n/

This comment has been minimized.

@gettalong

gettalong May 23, 2018

Owner

Why did you leave out [^ \t] in the new version? It is not strictly necessary for the main kramdown parser due to the fixed order of block parsers but behaviour of derived parser may change. As far as I can tell, it doesn't make a performance difference.

Also: What is the reason behind changing \s to [ \t\r\f\v]?

This comment has been minimized.

@glebm

glebm May 24, 2018

Contributor

Why did you leave out [^ \t] in the new version?

Because it's not necessary as we have to validate header length later on anyway. I have now added it back in. No performance difference.

Also: What is the reason behind changing \s to [ \t\r\f\v]?

I find constructs like \s*?\n a bit cryptic. Performance-wise both are the same.

By the way, do we actually want to accept \r\f\v here? CommonMark spec only mentions trailing spaces (https://spec.commonmark.org/0.28/#example-54), and also allows OPT_SPACE for the underline.


HEADER_ID = /[\t ]{#(?<id>[A-Za-z][\w:-]*)}\z/

# @return [[String, String]] header text and optional ID.

This comment has been minimized.

@gettalong

gettalong May 23, 2018

Owner

Please use standard RDoc comments.

This comment has been minimized.

@glebm

glebm May 24, 2018

Contributor

I'm not familiar with RDoc, and although I've read the RDoc Markup Reference, it doesn't seem to mention the correct markup for type hints, so I'm not sure what to do here.

The format I've used is YARD, which is also widely supported by IDEs for type hints. Can you convert to the desired format when merging?

This comment has been minimized.

@gettalong

gettalong May 24, 2018

Owner

RDoc doesn't have type hints.

Just leave out this API documentation (most of the other parsing methods don't have API documentation as well).

This comment has been minimized.

@glebm

glebm May 25, 2018

Contributor

Done


# @param [Number] level
# @param [String] text
# @param [String, nil] id

This comment has been minimized.

@gettalong

gettalong May 23, 2018

Owner

Please use standard RDoc comments.

This comment has been minimized.

@gettalong

gettalong May 24, 2018

Owner

Same here, just remove the API documentation.

This comment has been minimized.

@glebm

glebm May 25, 2018

Contributor

Done

glebm added some commits May 24, 2018

@gettalong

This comment has been minimized.

Owner

gettalong commented May 25, 2018

I just noticed that the PR doesn't include the test cases themselves. Could you add them or should I add them after I merge the changes?

And please squash the commits into one before I merge them - thank you!

@glebm

This comment has been minimized.

Contributor

glebm commented May 26, 2018

I just noticed that the PR doesn't include the test cases themselves. Could you add them or should I add them after I merge the changes?

Please add them yourself, it will be faster this way.

And please squash the commits into one before I merge them - thank you!

Nowadays you can do it yourself at merge time by clicking "Squash and merge"!

@gettalong

This comment has been minimized.

Owner

gettalong commented May 26, 2018

Merged - thanks!

@gettalong gettalong closed this May 26, 2018

@krasnoukhov

This comment has been minimized.

krasnoukhov commented May 30, 2018

@gettalong Will you cut a release please? Thank you

@gettalong

This comment has been minimized.

Owner

gettalong commented May 30, 2018

Yes, if I have time this weekend.

@glebm glebm deleted the glebm:header-parsing branch May 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment