Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Lint/OutOfRangeRefInRegexp cop #7755 #8407

Merged

Conversation

@sonalinavlakhe
Copy link
Contributor

@sonalinavlakhe sonalinavlakhe commented Jul 27, 2020

This cop looks for out of range referencing for Regexp, as while capturing groups out of range reference always returns nil.

  /(foo)bar/ =~ 'foobar'

  #   bad - always returns nil
  #   puts $2 # => nil

  #   good
  #   puts $1 # => foo

Issue: #7755

Before submitting the PR make sure the following are checked:

  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Added an entry to the Changelog if the new code introduces user-observable changes. See changelog entry format.
  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Run bundle exec rake default. It executes all tests and RuboCop for itself, and generates the documentation.
Copy link
Contributor

@marcandre marcandre left a comment

Interesting idea :-)

I think this cop can only marked as unsafe though, as I don't think there's a way to be sure of the diagnostic.

lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb Outdated Show resolved Hide resolved
lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb Outdated Show resolved Hide resolved
lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb Outdated Show resolved Hide resolved
lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@marcandre marcandre left a comment

Pretty much done. Must be rebased to resolve conflict in Changelog.

Naming: OutOfRangeRefInRegexp => OutOfRangeRegexpRef ?

@@ -1,4 +1,4 @@
= Installation
= Installation

This comment has been minimized.

@marcandre

marcandre Jul 30, 2020
Contributor

You probably didn't mean to change this

@@ -1620,6 +1620,12 @@ Lint/OrderedMagicComments:
Enabled: true
VersionAdded: '0.53'

Lint/OutOfRangeRefInRegexp:
Description: 'Checks for out of range reference for Regep because it always returns nil.'

This comment has been minimized.

@marcandre

marcandre Jul 30, 2020
Contributor

Regep => Regexp

@sonalinavlakhe sonalinavlakhe force-pushed the sonalinavlakhe:feature/out_of_range_ref_in_regexp branch from d9c62a4 to a3497ff Jul 30, 2020
@sonalinavlakhe
Copy link
Contributor Author

@sonalinavlakhe sonalinavlakhe commented Jul 30, 2020

@marcandre, I have resolved all the suggested changes, please have a look and let me know.

Copy link
Contributor

@marcandre marcandre left a comment

I'm a tough reviewer, I hope you don't mind 😅.

There should be a test for encountering $4 before any regexp. You'll want to do an initialization in on_new_investigation

MSG = 'Do not use out of range reference for the Regexp.'

def on_regexp(node)
@valid_ref = cop_config['Count']

This comment has been minimized.

@marcandre

marcandre Jul 30, 2020
Contributor

I'm not sure I understand cop_config['Count'], maybe you simply meant nil?

named_capture += 1 if e.instance_of?(Regexp::Expression::Group::Named)
numbered_capture += 1 if e.instance_of?(Regexp::Expression::Group::Capture)
Comment on lines 52 to 53

This comment has been minimized.

@marcandre

marcandre Jul 30, 2020
Contributor

Is it possible to rely on e.type instead of the class?

This comment has been minimized.

@sonalinavlakhe

sonalinavlakhe Jul 31, 2020
Author Contributor

yeah, we can do it but we have to check with the token instead.

tree.each_expression do |e|
  named_capture += 1 if e.token == :named
  numbered_capture += 1 if e.token == :capture
end

This comment has been minimized.

@marcandre

marcandre Jul 31, 2020
Contributor

Neither seem great to me, but that might be the gem's fault. Could you use e.capturing?, and then e.respond_to?(:name), say?

This comment has been minimized.

@sonalinavlakhe

sonalinavlakhe Jul 31, 2020
Author Contributor

I can check with e.capturing? but again we cant check further for e.respond_to?(:name).

This comment has been minimized.

@marcandre

marcandre Jul 31, 2020
Contributor

Sorry, why?

This comment has been minimized.

@sonalinavlakhe

sonalinavlakhe Aug 1, 2020
Author Contributor

@marcandre, e.capturing? will raise an error
undefined method capturing? for #<Regexp::Expression::CharacterType::Word:0x00007fb466af9cf0>
for all non-group type expressions so better we to use e.type?(:group) then e.respond_to?(:name)

This comment has been minimized.

@marcandre

marcandre Aug 1, 2020
Contributor

e.type?(:group) && e.respond_to?(:name) 👍, I think it's much better than relying on the class, even if longer...

Copy link
Contributor

@marcandre marcandre left a comment

I'm a tough reviewer, and I don't see everything from the get go, I hope you don't mind 😅

You should add a test if $4 is encountered before any regexp. You'll need define on_new_investigation too.

@sonalinavlakhe
Copy link
Contributor Author

@sonalinavlakhe sonalinavlakhe commented Jul 31, 2020

Never mind @marcandre, for me, it's a huge learning opportunity 😄

What I understood from the below comment
" you should add a test if $4 is encountered before any regexp. You'll need to define on_new_investigation too"
that you must be referring below example

$4
/(foo)(bar)/ =~ "foobar"

Do I need to initialize @valid_ref to nil in the on_new_investigation method? if so then I am not able to understand the use case for the on_new_investigation method because we are not creating any offense if we got the reference before any regexp.
Please suggest.

@marcandre
Copy link
Contributor

@marcandre marcandre commented Jul 31, 2020

Do I need to initialize @valid_ref to nil in the on_new_investigation method?

Yes

if so then I am not able to understand the use case for the on_new_investigation method because we are not creating any offense if we got the reference before any regexp.

First, shouldn't we create an offense then?
Second: in warning mode you would get a 'uninitialized instance variable' I think. Also running the cop on a second source would still hold the result form the previous one.

@sonalinavlakhe
Copy link
Contributor Author

@sonalinavlakhe sonalinavlakhe commented Jul 31, 2020

First, shouldn't we create an offense then?

Please suggest we should consider this case to raise offense.

Second:` in warning mode you would get a 'uninitialized instance variable' I think. Also running the cop on a second source would still hold the result form the previous one.

I got this one. Thanks 👍

@marcandre
Copy link
Contributor

@marcandre marcandre commented Jul 31, 2020

Please suggest we should consider this case to raise offense.

Sorry, I don't understand... If I'm not mistaken, def foo; $4; end should raise an offense, in that it can not be non nil, so maybe we should initialize @valid_ref to 0, no?

@sonalinavlakhe
Copy link
Contributor Author

@sonalinavlakhe sonalinavlakhe commented Aug 1, 2020

Sorry, I don't understand... If I'm not mistaken, def foo; $4; end should raise an offense, in that it can not be non nil, so maybe we should initialize @valid_ref to 0, no?

Yes, you are right. only for regular expression with non-literals, I have to make @valid_ref set to nil, so we will have a check to not raise an offense.

Copy link
Contributor

@marcandre marcandre left a comment

Great! The implementation looks really good to me 🎉

You're still missing a test for $3 without a regexp though...

it 'does not register an offence when containing a ivar' do
expect_no_offenses(<<~'RUBY')
@var = '(\d+)'
/(?<foo>#{@var}*)/ =~ "12"
puts $1
puts $3
RUBY
end

it 'does not register an offence when containing a cvar' do
expect_no_offenses(<<~'RUBY')
@@var = '(\d+)'
/(?<foo>#{@@var}*)/ =~ "12"
puts $1
puts $4
RUBY
end

it 'does not register an offence when containing a gvar' do
expect_no_offenses(<<~'RUBY')
$var = '(\d+)'
/(?<foo>#{$var}*)/ =~ "12"
puts $1
puts $2
RUBY
end

it 'does not register an offence when containing a method' do
expect_no_offenses(<<~'RUBY')
def do_something
'(\d+)'
end
/(?<foo>#{do_something}*)/ =~ "12"
puts $1
puts $4
RUBY
end

it 'does not register an offence when containing a constant' do
expect_no_offenses(<<~'RUBY')
CONST = "12"
/(?<foo>#{CONST}*)/ =~ "12"
puts $1
puts $3
RUBY
end
Comment on lines 83 to 128

This comment has been minimized.

@marcandre

marcandre Aug 1, 2020
Contributor

All these tests can be removed, we only need to test one "dynamic" case.

This comment has been minimized.

@sonalinavlakhe

sonalinavlakhe Aug 1, 2020
Author Contributor

hi @marcandre but actually these specs are self-explanatory and devs who are reading these will get to understand the code better is what I think, please give an example of how this can be improved better with dynamic test case.

This comment has been minimized.

@marcandre

marcandre Aug 1, 2020
Contributor

What I meant to say is that the previous test (with a variable) is sufficient; a good implementation (like we have now) can not pass the variable test and fail the instance variable, the method call, etc... or vice versa. They all test the same idea:is the regexp dynamic or not. It's not about what is inside the #{...}, just that there is a #{...} or not. Otherwise to be "complete", we'd have to add all possibilities, for example method call with block, or with rescue, or literal, or ...

it 'does not register offense when using a Regexp cannot be processed by regexp_parser gem' do
expect_no_offenses(<<~'RUBY')
/data = ({"words":.+}}}[^}]*})/m
RUBY

This comment has been minimized.

@marcandre

marcandre Aug 1, 2020
Contributor

Might as well add $3 or something. Hopefully the Regexp parser will handle this one day and that test will fail, but right now it would always pass.

This comment has been minimized.

@marcandre

marcandre Aug 3, 2020
Contributor

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

This comment has been minimized.

@sonalinavlakhe

sonalinavlakhe Aug 3, 2020
Author Contributor

I have added $1 in last commit to fix this. please refer this

@sonalinavlakhe
Copy link
Contributor Author

@sonalinavlakhe sonalinavlakhe commented Aug 2, 2020

@marcandre, Please review the updated changes. let me know all good or we can still improve 😄

Copy link
Contributor

@marcandre marcandre left a comment

Believe it or not, I found something else 🤣

module Lint
# This cops looks for out of range referencing for Regexp, as while capturing groups out of
# out of range reference always returns nil.

This comment has been minimized.

@marcandre

marcandre Aug 2, 2020
Contributor

I just noticed that the doc file is incomplete, because you have a missing # here


# @example
# /(foo)bar/ =~ 'foobar'

This comment has been minimized.

@marcandre

marcandre Aug 2, 2020
Contributor

and here...


# # bad - always returns nil
# puts $2 # => nil

This comment has been minimized.

@marcandre

marcandre Aug 2, 2020
Contributor

and here. Might be a cop idea! 😆

This comment has been minimized.

@sonalinavlakhe

sonalinavlakhe Aug 3, 2020
Author Contributor

Cool, hope I fixed everything now 🤣 🤞

@sonalinavlakhe sonalinavlakhe force-pushed the sonalinavlakhe:feature/out_of_range_ref_in_regexp branch from 9d91d81 to c33997b Aug 3, 2020
Copy link
Contributor

@marcandre marcandre left a comment

Last tweaks, I promise 😅

it 'does not register offense when using a Regexp cannot be processed by regexp_parser gem' do
expect_no_offenses(<<~'RUBY')
/data = ({"words":.+}}}[^}]*})/m
RUBY

This comment has been minimized.

@marcandre

marcandre Aug 3, 2020
Contributor

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

module RuboCop
module Cop
module Lint
# This cops looks for out of range referencing for Regexp, as while capturing groups out of

This comment has been minimized.

@marcandre

marcandre Aug 3, 2020
Contributor

I'm not a native speaker, but the "as while" sounds strange to me. How about?

"This cops looks for references of Regexp captures that are out of range and thus always returns nil."

@sonalinavlakhe
Copy link
Contributor Author

@sonalinavlakhe sonalinavlakhe commented Aug 4, 2020

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

I have added $1 in the last commit to fix this. please refer this

@marcandre
Copy link
Contributor

@marcandre marcandre commented Aug 4, 2020

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

I have added $1 in the last commit to fix this.

Right, but the "unparseable" regexp has one captured group, so $1 is not out-of-range...

@sonalinavlakhe
Copy link
Contributor Author

@sonalinavlakhe sonalinavlakhe commented Aug 5, 2020

Right, but the "unparseable" regexp has one captured group, so $1 is not out-of-range...

Sorry, I missed that one, when I tried to use other than $1 i.e $2 or $3 this test is failing. I checked more on this and conclude that regex_parser now able to parse this type of expression.
I checked with a series of examples - #8083 and ammar/regexp_parser#15. regex_parser parsing those without throwing an exception

irb(main):001:0> require 'regexp_parser'
=> true
irb(main):002:0> Regexp::Parser.parse('data = ({"words":.+}}}[^}]*})')
=> #<Regexp::Expression::Root:0x00007fa07a9c5c08 @type=:expression, @token=:root, @text="", @ts=0, @level=nil, @set_level=nil, @conditional_level=nil, @nesting_level=0, @quantifier=nil, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00007fa07b857bb8 @type=:literal, @token=:literal, @text="data = ", @ts=0, @level=0, @set_level=0, @conditional_level=0, @nesting_level=1, @quantifier=nil, @options={}>, #<Regexp::Expression::Group::Capture:0x00007fa07b857b90 @type=:group, @token=:capture, @text="(", @ts=7, @level=0, @set_level=0, @conditional_level=0, @nesting_level=1, @quantifier=nil, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00007fa07b8579d8 @type=:literal, @token=:literal, @text="{\"words\":", @ts=8, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=nil, @options={}>, #<Regexp::Expression::CharacterType::Any:0x00007fa07b8579b0 @type=:meta, @token=:dot, @text=".", @ts=17, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=#<Regexp::Expression::Quantifier:0x00007fa07b857938 @token=:one_or_more, @text="+", @mode=:greedy, @min=1, @max=-1>, @options={}>, #<Regexp::Expression::Literal:0x00007fa07b8578e8 @type=:literal, @token=:literal, @text="}}}", @ts=19, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=nil, @options={}>, #<Regexp::Expression::CharacterSet:0x00007fa07b8578c0 @negative=true, @closed=true, @type=:set, @token=:character, @text="[", @ts=22, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=#<Regexp::Expression::Quantifier:0x00007fa07b857640 @token=:zero_or_more, @text="*", @mode=:greedy, @min=0, @max=-1>, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00007fa07b8577d0 @type=:literal, @token=:literal, @text="}", @ts=24, @level=1, @set_level=1, @conditional_level=0, @nesting_level=3, @quantifier=nil, @options={}>]>, #<Regexp::Expression::Literal:0x00007fa07b8575f0 @type=:literal, @token=:literal, @text="}", @ts=27, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=nil, @options={}>], @number=1, @number_at_level=1>]> 

Please suggest should we remove the rescue block then?

rescue Regexp::Scanner::ScannerError
  return
end

as Errors like this (https://github.com/ammar/regexp_parser/blob/master/spec/scanner/errors_spec.rb) are handled by Lint/Syntax cop.
#app/controllers/accounts_controller.rb

/\p{foobar}/
$1
Inspecting 1 file
E

Offenses:

app/controllers/accounts_controller.rb:7:4: E: Lint/Syntax: invalid character property name {foobar}: /\p{foobar}/
(Using Ruby 2.5 parser; configure using TargetRubyVersion parameter, under AllCops)
   /\p{foobar}/
   ^^^^^^^^^^^^

1 file inspected, 1 offense detected
@marcandre
Copy link
Contributor

@marcandre marcandre commented Aug 5, 2020

That's good news :-)

Right, I see in their Changelog the issue has been changed in v1.7.1, so we could bump our gemspec requirement to that version and remove the rescue (and the test). If ever there's such an exception, it won't be the end of the world (just a notice there was an issue when processing the file) and a bug report can be filed with the gem.

It also means that the other rescue in the code could be removed and that the associated specs are incorrect (as they should fail with the regexp parser 1.7.1)

@sonalinavlakhe
Copy link
Contributor Author

@sonalinavlakhe sonalinavlakhe commented Aug 5, 2020

That's good news :-)

Right, I see in their Changelog the issue has been changed in v1.7.1, so we could bump our gemspec requirement to that version and remove the rescue (and the test). If ever there's such an exception, it won't be the end of the world (just a notice there was an issue when processing the file) and a bug report can be filed with the gem.

It also means that the other rescue in the code could be removed and that the associated specs are incorrect (as they should fail with the regexp parser 1.7.1)

@marcandre Well, I got this ok I have to remove the rescue block from code and the test from the out_of_range_regexp_ref_spec.rb.
Any other action item needed to fix this?

@marcandre
Copy link
Contributor

@marcandre marcandre commented Aug 5, 2020

Any other action item needed to fix this?

No, rest can be done separately.

You'll have to rebase to fix the Changelog conflict though

@sonalinavlakhe sonalinavlakhe force-pushed the sonalinavlakhe:feature/out_of_range_ref_in_regexp branch from 8c36fe8 to b5e412a Aug 5, 2020
@sonalinavlakhe
Copy link
Contributor Author

@sonalinavlakhe sonalinavlakhe commented Aug 5, 2020

@marcandre, I have done rebase and all the changes.

@marcandre marcandre merged commit a678aba into rubocop-hq:master Aug 5, 2020
26 checks passed
26 checks passed
windows 2.4
Details
windows 2.5
Details
windows 2.6
Details
windows 2.7
Details
windows mingw
Details
ci/circleci: cc-setup Your tests passed on CircleCI!
Details
ci/circleci: cc-upload-coverage Your tests passed on CircleCI!
Details
ci/circleci: documentation-checks Your tests passed on CircleCI!
Details
ci/circleci: jruby-9.2-ascii_spec Your tests passed on CircleCI!
Details
ci/circleci: jruby-9.2-rubocop Your tests passed on CircleCI!
Details
ci/circleci: jruby-9.2-spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.4-ascii_spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.4-rubocop Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.4-spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.5-ascii_spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.5-rubocop Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.5-spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.6-ascii_spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.6-rubocop Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.6-spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.7-ascii_spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.7-rubocop Your tests passed on CircleCI!
Details
ci/circleci: ruby-2.7-spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-head-ascii_spec Your tests passed on CircleCI!
Details
ci/circleci: ruby-head-rubocop Your tests passed on CircleCI!
Details
ci/circleci: ruby-head-spec Your tests passed on CircleCI!
Details
@marcandre
Copy link
Contributor

@marcandre marcandre commented Aug 5, 2020

🎉 Thank you @sonalinavlakhe for this PR and your quick turn around to my multiple review comments 👍

Congratulations on your first Cop 🍾 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.