Skip to content

Commit 3ad9db3

Browse files
committed
Optimize ripper bounds
Basically a port of ruby/ruby@c45f781 into ruby It's quite effective at ~97% hit rate for me. Speeds it up from ~6.77x slower to only 4.07x slower. For the lexer `on_sp` it also gives a bit of an improvement: 1.04x slower to 1.10x faster I guess the class may be universally useful but for now I just made it nodoc.
1 parent ceb4699 commit 3ad9db3

7 files changed

Lines changed: 91 additions & 15 deletions

File tree

lib/prism/lex_compat.rb

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,12 @@ module Prism
2323
# def self.[]: (Integer value) -> State
2424
# end
2525
# end
26+
#
27+
# class LineAndColumnCache
28+
# def initialize: (Source source) -> void
29+
#
30+
# def line_and_column: (Integer byte_offset) -> [Integer, Integer]
31+
# end
2632
# end
2733
# end
2834

@@ -837,6 +843,8 @@ def post_process_tokens(tokens, source, data_loc, bom, eof_token)
837843
prev_token_state = Translation::Ripper::Lexer::State[Translation::Ripper::EXPR_BEG]
838844
prev_token_end = bom ? 3 : 0
839845

846+
cache = Translation::Ripper::LineAndColumnCache.new(source)
847+
840848
tokens.each do |token|
841849
# Skip missing heredoc ends.
842850
next if token[1] == :on_heredoc_end && token[2] == ""
@@ -851,8 +859,7 @@ def post_process_tokens(tokens, source, data_loc, bom, eof_token)
851859

852860
if start_offset > prev_token_end
853861
sp_value = source.slice(prev_token_end, start_offset - prev_token_end)
854-
sp_line = source.line(prev_token_end)
855-
sp_column = source.column(prev_token_end)
862+
sp_line, sp_column = cache.line_and_column(prev_token_end)
856863
# Ripper reports columns on line 1 without counting the BOM
857864
sp_column -= 3 if sp_line == 1 && bom
858865
continuation_index = sp_value.byteindex("\\")

lib/prism/parse_result.rb

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -223,9 +223,7 @@ def deep_freeze
223223
freeze
224224
end
225225

226-
private
227-
228-
# Binary search through the offsets to find the line number for the given
226+
# Binary search through the offsets to find the index for the given
229227
# byte offset.
230228
#--
231229
#: (Integer byte_offset) -> Integer

lib/prism/translation/ripper.rb

Lines changed: 64 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -446,6 +446,64 @@ def self.sexp_raw(src, filename = "-", lineno = 1, raise_errors: false)
446446
autoload :SexpBuilder, "prism/translation/ripper/sexp"
447447
autoload :SexpBuilderPP, "prism/translation/ripper/sexp"
448448

449+
# Provides optimized access to line and column information.
450+
# Ripper bounds are mostly accessed in a linear fashion, so
451+
# we can try a linear scan first and fall back to binary search.
452+
class LineAndColumnCache # :nodoc:
453+
# How many should it look ahead/behind before falling back to binary searching.
454+
WINDOW = 8
455+
private_constant :WINDOW
456+
457+
#: (Source source) -> void
458+
def initialize(source)
459+
@source = source
460+
@offsets = source.offsets
461+
@hint = 0
462+
end
463+
464+
#: (Integer byte_offset) -> [Integer, Integer]
465+
def line_and_column(byte_offset)
466+
@hint = new_hint(byte_offset) || @source.find_line(byte_offset)
467+
return [@hint + @source.start_line, byte_offset - @offsets[@hint]]
468+
end
469+
470+
private
471+
472+
def new_hint(byte_offset)
473+
if @offsets[@hint] <= byte_offset
474+
# Same line?
475+
if (@hint + 1 >= @offsets.size || @offsets[@hint + 1] > byte_offset)
476+
return @hint
477+
end
478+
479+
# Scan forwards
480+
limit = [@hint + WINDOW + 1, @offsets.size].min
481+
idx = @hint + 1
482+
while idx < limit
483+
if @offsets[idx] > byte_offset
484+
return idx - 1
485+
end
486+
if @offsets[idx] == byte_offset
487+
return idx
488+
end
489+
idx += 1
490+
end
491+
else
492+
# Scan backwards
493+
limit = @hint > WINDOW ? @hint - WINDOW : 0
494+
idx = @hint
495+
while idx >= limit + 1
496+
if @offsets[idx - 1] <= byte_offset
497+
return idx - 1
498+
end
499+
idx -= 1
500+
end
501+
end
502+
503+
nil
504+
end
505+
end
506+
449507
# :stopdoc:
450508
# This is not part of the public API but used by some gems.
451509

@@ -489,6 +547,7 @@ def initialize(source, filename = "(ripper)", lineno = 1)
489547
@lineno = lineno
490548
@column = 0
491549
@result = nil
550+
@line_and_column_cache = nil
492551
end
493552

494553
##########################################################################
@@ -4014,6 +4073,10 @@ def result
40144073
@result ||= Prism.parse(source, partial_script: true, version: "current")
40154074
end
40164075

4076+
def line_and_column_cache
4077+
@line_and_column_cache ||= LineAndColumnCache.new(result.source)
4078+
end
4079+
40174080
##########################################################################
40184081
# Helpers
40194082
##########################################################################
@@ -4114,12 +4177,8 @@ def visit_write_value(node)
41144177

41154178
# This method is responsible for updating lineno and column information
41164179
# to reflect the current node.
4117-
#
4118-
# This method could be drastically improved with some caching on the start
4119-
# of every line, but for now it's good enough.
41204180
def bounds(location)
4121-
@lineno = location.start_line
4122-
@column = location.start_column
4181+
@lineno, @column = line_and_column_cache.line_and_column(location.start_offset)
41234182
end
41244183

41254184
# :startdoc:

rbi/generated/prism/lex_compat.rbi

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

rbi/generated/prism/parse_result.rbi

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

sig/generated/prism/lex_compat.rbs

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

sig/generated/prism/parse_result.rbs

Lines changed: 1 addition & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)