Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added location information to Kramdown::Element. #96

Merged
merged 1 commit into from Dec 8, 2013

Conversation

@jhund
Copy link
Contributor

@jhund jhund commented Nov 19, 2013

  • Uses StringScannerKramdown (which has current_line_number method) instead of StringScanner.
  • Records :location option on all Kramdown::Elements where this makes sense and where it is easily implemented.
  • Initializes nested StringScanners with their parent Element's :location as start_line_number.
  • Adds tests for :location information.
  • Adds tests for StringScannerKramdown.

See pull request for more details.

* Uses StringScannerKramdown (which has current_line_number method) instead of StringScanner.
* Records :location option on all Kramdown::Elements where this makes sense and where it is easily implemented.
* Initializes nested StringScanners with their parent Element's :location as start_line_number.
* Adds tests for :location information.
* Adds tests for StringScannerKramdown.

See pull request for more details.
@jhund
Copy link
Contributor Author

@jhund jhund commented Nov 19, 2013

How it works

Whenever we instantiate a new Kramdown::Element, we record the current source line number in the Element's :location option. I added a method #current_line_number to StringScannerKramdown that gives us the line number based on StringScanner's current position. We count all instances of \n from the beginning of the document to the current position to get the line number. This requires that all line endings in the source are normalized to \n.

Kramdown uses nested StringScanners for :span level elements (during #update_tree), and for some :block level elements. These StringScanners can give us only the line number in their context (a substring of the global source). In order to get the correct global line number, we add the parent Element's :location to the nested StringScanner's #current_line_number via the #start_line_number accessor.

StringScanner was made multi-byte character aware in Ruby 2.0 (via #char_pos). I use #char_pos to compute #current_line_number if it is available. Otherwise I fall back to #pos which works fine for single-byte characters. Please note that strings with multi-byte characters on versions of Ruby prior to 2.0 will yield incorrect line numbers.

Design considerations

  • Rather than patch the (core) StringScanner class, I subclassed it as StringScannerKramdown and added new features to the sub class. (Don't patch core classes!)
  • At one point I considered creating a factory method to instantiate new Elements and add location information in that method. However it turned out that with all the nested StringScanners, it was necessary to record location information in the various parser methods.
  • Only Kramdown::Parser::Kramdown and descendants record location. The HTML parser doesn't record location information. It could be added, however it was beyond the scope of the project.
  • At first I wanted to record :location information as a hash, however that turned out to be too complicated and I decided to just record the line number as Integer.

Performance impact

I haven't run a benchmark, so I'm not sure what the performance impact is. The biggest hit probably comes from computing StringScannerKramdown.current_line_number where we count all \n instances in the source document. I assume that's O(N).

If performance is an issue, then we could change all instances of start_line_number = @src.current_line_number to just record orig_pos and then update #current_line_number to accept a position override. Then we would count \n only if required (e.g. when we embed it into a warning).

Alternatively, there may be a more efficient way to keep track of line numbers in StringScanner. Maybe we could keep a running count of \ns rather than recounting them every time we need the current line number. We could e.g., remember the line count and position of the last invocation of #current_line_number and just count \ns since that position.

Element types that don't record :location information

  • :blank_line (too much noise)
  • :block_boundary (non-content element)
  • :eob (non-content element)
  • :escaped_chars (use text element type which doesn't have :location info)
  • :extensions: only partially implemented. (when generating warnings on start tag), too much work at this stage.
  • :html (would require implementing of location info for HTML parser, beyond scope for now).
  • :line_break (too much noise)
  • :raw_text (temporary element)
  • :smart_quotes (too much noise)
  • :table: location recorded only for table, not for nested elements (e.g., table rows)
  • :text (too much noise)

:location information can be added to all of the above if desired.

Loading

@ghost ghost assigned gettalong Nov 20, 2013
@gettalong
Copy link
Owner

@gettalong gettalong commented Nov 27, 2013

Thanks for the pull request and detailed information! I'm rather busy at the moment but I will try to have a look at it this weekend!

Loading

@gettalong gettalong merged commit 02d5b09 into gettalong:master Dec 8, 2013
@gettalong
Copy link
Owner

@gettalong gettalong commented Dec 8, 2013

I have run some benchmarks and the performance impact does not seem to be that much.

I also moved the custom StringScanner class to Kramdown::Utils.

Loading

@jhund
Copy link
Contributor Author

@jhund jhund commented Dec 10, 2013

Thank you for accepting the pull request! Good call on moving the custom
StringScanner.

On 2013-12-08 2:57 , Thomas Leitner wrote:

I have run some benchmarks and the performance impact does not seem to
be that much.

I also moved the custom StringScanner class to Kramdown::Utils.


Reply to this email directly or view it on GitHub
#96 (comment).

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants