Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite RubyLex to fix some bugs and make it possible to add new features easily #500

Merged
merged 7 commits into from Jun 15, 2023

Conversation

tompng
Copy link
Member

@tompng tompng commented Jan 9, 2023

Description

Fixes #499
In RubyLex, there are several parser-like methods listed below. To fix bugs, almost all of these methods needs change.

  • process_nesting_level
  • check_corresponding_token_depth
  • check_newline_depth_difference
  • check_string_literal
  • is_the_in_correspond_to_a_for
  • take_corresponding_syntax_to_kw_do
  • in_keyword_case_scope?
  • is_method_calling?
  • heredoc_scope?

I combined them all into a single parser that calculates open tokens for each line.
Duplicated codes are reduced.

What I changed

Rewrite nesting parser

Create IRB::NestingParser and use it from several methods.

# Example of open tokens for each line
if true # open tokens: ['if']
  puts( # open tokens: ['if', '(']
    1   # open tokens: ['if', '(']
  )     # open tokens: ['if']
end     # open tokens: []
each_line_result = RubyLex::NestingParser.parse_line(tokens)
line_tokens, open_tokens_before_line, open_tokens_after_line, minimum_token_depth_in_line = each_line_result[line_index]

# Example
puts((
    1 + 2
  ) + [ # HERE
    1
  ].size
)
# line_tokens: ['  ', ')', ' ', '+', ' ', '[', ' ', "# HERE\n"]
# open tokens changes: '((' → '(' (minimum) → '(['
# open_tokens_before_line: ['(', '(']
# open_tokens_after_line: ['(', '[']
# minimum_token_depth_in_line: 1

Prompt(check_corresponding_token_depth, check_string_literal), indent(process_indent_level, check_newline_depth_difference) and termination can be calculated from open tokens.

Update test

  • Add test for NestingParser
  • Delete some test which will be tested in newly added test
  • Fix test which was testing broken feature

Refactor

I've done a minimal refactoring in ruby-lex.rb

  • Reduce instance variables, Split readmultiline from each_top_level_statement (moved to Simplify each_top_level_statement #576)
  • Refactor free indentation inside heredoc feature because it needs to be implemented in both process_indent_level and check_corresponding_token_depth

Bugs that cannot be fixed in this pull-request

Cannot indent these kind of code correctly. (endless def inside while condition)

if false
  while def f() = p do end and ()[] do end
end
if false
  while def f() = p do end && ()[] do end
  end
end

The only difference of Ripper.lex result is [[2, 27], :on_kw, "and", BEG] and [[2, 27], :on_op, "&&", BEG].

Other good things for the future

Testability

We can now test these functionalities separately.

  • calculate nesting for many ruby syntax
  • open tokens to indent conversion
  • open tokens to prompt conversion
  • actual indentation

Indent

We can now implement heredoc indent like the code below.
Previously, we cannot implement it because indent was using indent += 1 and indent -= 1.

if true
  if true
    s = <<HEREDOC
#{ # nesting level gets deeper but indent gets shallower
  (
    1
  )
}
heredoc
HEREDOC
    puts s # restores indent
  end
end

Parsing logic and indent calculation logic is separated, so we can now easily update indent of specific syntax.

# easy to implement this indent
words = %w[
  irb
  reline
]

Prompt

Prompt string is calculated from ltype, indent, continue and line_no.
We can now easily change it.

> string = "hello #{[
>   1, # ltype is currently `"` here because it is restricted to string-like literal
>   2, # changing ltype to `]` might be better
> ].join} world"

Completion

Currently, completion is implemented using regexp. array[index].??? shows Array methods and array.map{}.??? shows Hash | Proc methods.
We can get an S-expression or a syntax tree of incomplete code and use it for accurate completion.

closing_tokens_calculated_from_open_tokens = open_tokens.map { |t| closing_token_from(t) }
Ripper.sexp(incomplete_code + closing_tokens_calculated_from_open_tokens.join)

@tompng tompng marked this pull request as draft January 9, 2023 14:50
@tompng
Copy link
Member Author

tompng commented Jan 9, 2023

changed to draft because it got infinite loop in ruby2.7. trying to fixing it now

@tompng tompng marked this pull request as ready for review January 9, 2023 19:40
Copy link
Member

@st0012 st0012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the rewrite 🙏 I believe it’ll be a tremendous improvement to IRB.
But given the size of the change, we may need to go through this slowly with multiple reviews. I hope that’ll be ok.

I’ve given it a couple of scans and I have a question: Should we make context an instance variable?

From what I see, RubyLex can be used as an instance (e.g. scanner.set_input) or as a helper class (e.g. RubyLex.generate_local_variables_assign_code). And currently both usages would take context as an argument.

But for the former usages, the invocations could be largely simplified if context could be stored as an instance variable. So I did a quick search on RubyLex.new and I think it should be possible to do:

  • In ShowSource it could access the context through irb_context (provided by Nop)
  • In Irb.initialize it has context defined just a couple lines prior.
  • In RubyLex.set_input it also has access to context through itself. I wonder why we need to initialise another RubyLex instance here though.

If the answer is yes, I will to do that refactor before merging this PR, which is likely to cause some conflicts. So this is why I’m asking here 🙂

lib/irb/ruby-lex.rb Outdated Show resolved Hide resolved
@tompng
Copy link
Member Author

tompng commented Jan 10, 2023

@st0012

Should we make context an instance variable?

Thanks for the explanation. I think it's a very good idea too. My answer is yes.

likely to cause some conflicts

I'll resolve it when it's needed 😄

@st0012
Copy link
Member

st0012 commented Jan 10, 2023

I ended up doing 3 types of refactor:

I hope once they're all merged, it'll make the rewrite even simpler 🙂

@st0012
Copy link
Member

st0012 commented Jan 14, 2023

@tompng All 3 refactor PRs have been merged 😄

@tompng
Copy link
Member Author

tompng commented Jan 14, 2023

Thanks, I rebased it

@st0012 st0012 added the bug Something isn't working label Mar 1, 2023
@tompng tompng marked this pull request as draft May 19, 2023 14:37
@tompng tompng force-pushed the rewrite_rubylex branch 4 times, most recently from 3a92159 to b7b46ee Compare May 20, 2023 04:50
@tompng tompng marked this pull request as ready for review May 20, 2023 05:08
@tompng tompng mentioned this pull request Jun 2, 2023
lib/irb/nesting_parser.rb Show resolved Hide resolved
lib/irb/nesting_parser.rb Outdated Show resolved Hide resolved
lib/irb/nesting_parser.rb Show resolved Hide resolved
lib/irb/ruby-lex.rb Outdated Show resolved Hide resolved
lib/irb/ruby-lex.rb Outdated Show resolved Hide resolved
lib/irb/ruby-lex.rb Outdated Show resolved Hide resolved
lib/irb/ruby-lex.rb Outdated Show resolved Hide resolved
lib/irb/ruby-lex.rb Outdated Show resolved Hide resolved
Copy link
Member

@st0012 st0012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we're getting more detailed coverage with this rewrite and I think some small tweaks can make the new tests even more approachable 👍

test/irb/test_nesting_parser.rb Outdated Show resolved Hide resolved
test/irb/test_nesting_parser.rb Outdated Show resolved Hide resolved
test/irb/test_nesting_parser.rb Outdated Show resolved Hide resolved
test/irb/test_nesting_parser.rb Outdated Show resolved Hide resolved
@tompng tompng force-pushed the rewrite_rubylex branch 2 times, most recently from 16be166 to 6642387 Compare June 13, 2023 10:48
Copy link
Member

@st0012 st0012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an amazing work, I think we're pretty close to merging it 👍

assert_equal(code.lines.size, line_results.size)
class_open, *inner_line_results, class_close = line_results
assert_equal(['class'], class_open[2].map(&:tok))
inner_line_results.each {|result| assert_equal(['class'], result[2].map(&:tok)) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: why do we pick class as the target for checking Ruby syntax? Is the assumption: "if the class is not accidentally closed by any of the complicated syntax inside it, then we assume the nesting parser is working correctly"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. the intention of the test is to ensure "class is not accidentally closed".
We can also use if true or other nesting syntax instead of class A

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think class A is fine. But perhaps in later PRs we can try to match the inner lines' tokens too to increase the coverage.

test/irb/test_nesting_parser.rb Outdated Show resolved Hide resolved
lib/irb/ruby-lex.rb Show resolved Hide resolved
lib/irb/ruby-lex.rb Outdated Show resolved Hide resolved
Copy link
Member

@st0012 st0012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge it 🚀

@tompng tompng merged commit 1b17101 into ruby:master Jun 15, 2023
24 checks passed
@tompng tompng deleted the rewrite_rubylex branch June 15, 2023 15:40
matzbot pushed a commit to ruby/ruby that referenced this pull request Jun 15, 2023
add new features easily
(ruby/irb#500)

* Add nesting level parser for multiple use (indent, prompt, termination check)

* Rewrite RubyLex using NestingParser

* Add nesting parser tests, fix some existing tests

* Add description comment, rename method to NestingParser

* Add comments and tweak code to RubyLex

* Update NestingParser test

* Extract list of ltype tokens to constants
@smmr0 smmr0 mentioned this pull request Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging this pull request may close these issues.

# RubyLex cannot handle some edge-case code correctly
2 participants