Add new Lint/OutOfRangeRefInRegexp cop #7755 #8407

sonalinavlakhe · 2020-07-27T16:54:51Z

This cop looks for out of range referencing for Regexp, as while capturing groups out of range reference always returns nil.

  /(foo)bar/ =~ 'foobar'

  #   bad - always returns nil
  #   puts $2 # => nil

  #   good
  #   puts $1 # => foo

Issue: #7755

Before submitting the PR make sure the following are checked:

Wrote good commit messages.
Commit message starts with [Fix #issue-number] (if the related issue exists).
Feature branch is up-to-date with master (if not - rebase it).
Squashed related commits together.
Added tests.
Added an entry to the Changelog if the new code introduces user-observable changes. See changelog entry format.
The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
Run bundle exec rake default. It executes all tests and RuboCop for itself, and generates the documentation.

marcandre

Interesting idea :-)

I think this cop can only marked as unsafe though, as I don't think there's a way to be sure of the diagnostic.

lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb

spec/rubocop/cop/lint/out_of_range_ref_in_regexp_spec.rb

lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb

docs/modules/ROOT/pages/cops_lint.adoc

lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb

marcandre

Pretty much done. Must be rebased to resolve conflict in Changelog.

Naming: OutOfRangeRefInRegexp => OutOfRangeRegexpRef ?

marcandre · 2020-07-30T13:27:10Z

docs/modules/ROOT/pages/installation.adoc

@@ -1,4 +1,4 @@
-= Installation
+  = Installation


You probably didn't mean to change this

marcandre · 2020-07-30T13:28:04Z

config/default.yml

@@ -1620,6 +1620,12 @@ Lint/OrderedMagicComments:
  Enabled: true
  VersionAdded: '0.53'

+Lint/OutOfRangeRefInRegexp:
+  Description: 'Checks for out of range reference for Regep because it always returns nil.'


Regep => Regexp

sonalinavlakhe · 2020-07-30T16:58:13Z

@marcandre, I have resolved all the suggested changes, please have a look and let me know.

marcandre

I'm a tough reviewer, I hope you don't mind 😅.

There should be a test for encountering $4 before any regexp. You'll want to do an initialization in on_new_investigation

marcandre · 2020-07-30T20:12:42Z

lib/rubocop/cop/lint/out_of_range_regexp_ref.rb

+        MSG = 'Do not use out of range reference for the Regexp.'
+
+        def on_regexp(node)
+          @valid_ref = cop_config['Count']


I'm not sure I understand cop_config['Count'], maybe you simply meant nil?

marcandre · 2020-07-30T20:22:37Z

lib/rubocop/cop/lint/out_of_range_regexp_ref.rb

+            named_capture += 1 if e.instance_of?(Regexp::Expression::Group::Named)
+            numbered_capture += 1 if e.instance_of?(Regexp::Expression::Group::Capture)


Is it possible to rely on e.type instead of the class?

yeah, we can do it but we have to check with the token instead.

tree.each_expression do |e| named_capture += 1 if e.token == :named numbered_capture += 1 if e.token == :capture end

Neither seem great to me, but that might be the gem's fault. Could you use e.capturing?, and then e.respond_to?(:name), say?

I can check with e.capturing? but again we cant check further for e.respond_to?(:name).

Sorry, why?

@marcandre, e.capturing? will raise an error
undefined method capturing? for #<Regexp::Expression::CharacterType::Word:0x00007fb466af9cf0>
for all non-group type expressions so better we to use e.type?(:group) then e.respond_to?(:name)

e.type?(:group) && e.respond_to?(:name) 👍, I think it's much better than relying on the class, even if longer...

marcandre

I'm a tough reviewer, and I don't see everything from the get go, I hope you don't mind 😅

You should add a test if $4 is encountered before any regexp. You'll need define on_new_investigation too.

sonalinavlakhe · 2020-07-31T09:19:47Z

Never mind @marcandre, for me, it's a huge learning opportunity 😄

What I understood from the below comment
" you should add a test if $4 is encountered before any regexp. You'll need to define on_new_investigation too"
that you must be referring below example

$4
/(foo)(bar)/ =~ "foobar"

Do I need to initialize @valid_ref to nil in the on_new_investigation method? if so then I am not able to understand the use case for the on_new_investigation method because we are not creating any offense if we got the reference before any regexp.
Please suggest.

marcandre · 2020-07-31T14:52:06Z

Do I need to initialize @valid_ref to nil in the on_new_investigation method?

Yes

if so then I am not able to understand the use case for the on_new_investigation method because we are not creating any offense if we got the reference before any regexp.

First, shouldn't we create an offense then?
Second: in warning mode you would get a 'uninitialized instance variable' I think. Also running the cop on a second source would still hold the result form the previous one.

sonalinavlakhe · 2020-07-31T17:35:18Z

First, shouldn't we create an offense then?

Please suggest we should consider this case to raise offense.

Second:` in warning mode you would get a 'uninitialized instance variable' I think. Also running the cop on a second source would still hold the result form the previous one.

I got this one. Thanks 👍

marcandre · 2020-07-31T18:04:47Z

Please suggest we should consider this case to raise offense.

Sorry, I don't understand... If I'm not mistaken, def foo; $4; end should raise an offense, in that it can not be non nil, so maybe we should initialize @valid_ref to 0, no?

sonalinavlakhe · 2020-08-01T10:52:12Z

Sorry, I don't understand... If I'm not mistaken, def foo; $4; end should raise an offense, in that it can not be non nil, so maybe we should initialize @valid_ref to 0, no?

Yes, you are right. only for regular expression with non-literals, I have to make @valid_ref set to nil, so we will have a check to not raise an offense.

marcandre

Great! The implementation looks really good to me 🎉

You're still missing a test for $3 without a regexp though...

marcandre · 2020-08-01T14:32:33Z

spec/rubocop/cop/lint/out_of_range_regexp_ref_spec.rb

+    it 'does not register an offence when containing a ivar' do
+      expect_no_offenses(<<~'RUBY')
+        @var = '(\d+)'
+        /(?<foo>#{@var}*)/ =~ "12"
+        puts $1
+        puts $3
+      RUBY
+    end
+
+    it 'does not register an offence when containing a cvar' do
+      expect_no_offenses(<<~'RUBY')
+        @@var = '(\d+)'
+        /(?<foo>#{@@var}*)/ =~ "12"
+        puts $1
+        puts $4
+      RUBY
+    end
+
+    it 'does not register an offence when containing a gvar' do
+      expect_no_offenses(<<~'RUBY')
+        $var = '(\d+)'
+        /(?<foo>#{$var}*)/ =~ "12"
+        puts $1
+        puts $2
+      RUBY
+    end
+
+    it 'does not register an offence when containing a method' do
+      expect_no_offenses(<<~'RUBY')
+        def do_something
+          '(\d+)'
+        end
+        /(?<foo>#{do_something}*)/ =~ "12"
+        puts $1
+        puts $4
+      RUBY
+    end
+
+    it 'does not register an offence when containing a constant' do
+      expect_no_offenses(<<~'RUBY')
+        CONST = "12"
+        /(?<foo>#{CONST}*)/ =~ "12"
+        puts $1
+        puts $3
+      RUBY
+    end


All these tests can be removed, we only need to test one "dynamic" case.

hi @marcandre but actually these specs are self-explanatory and devs who are reading these will get to understand the code better is what I think, please give an example of how this can be improved better with dynamic test case.

What I meant to say is that the previous test (with a variable) is sufficient; a good implementation (like we have now) can not pass the variable test and fail the instance variable, the method call, etc... or vice versa. They all test the same idea:is the regexp dynamic or not. It's not about what is inside the #{...}, just that there is a #{...} or not. Otherwise to be "complete", we'd have to add all possibilities, for example method call with block, or with rescue, or literal, or ...

marcandre · 2020-08-01T14:33:50Z

spec/rubocop/cop/lint/out_of_range_regexp_ref_spec.rb

+  it 'does not register offense when using a Regexp cannot be processed by regexp_parser gem' do
+    expect_no_offenses(<<~'RUBY')
+      /data = ({"words":.+}}}[^}]*})/m
+    RUBY


Might as well add $3 or something. Hopefully the Regexp parser will handle this one day and that test will fail, but right now it would always pass.

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

I have added $1 in last commit to fix this. please refer this

sonalinavlakhe · 2020-08-02T16:50:04Z

@marcandre, Please review the updated changes. let me know all good or we can still improve 😄

marcandre

Believe it or not, I found something else 🤣

marcandre · 2020-08-02T19:11:09Z

lib/rubocop/cop/lint/out_of_range_regexp_ref.rb

+    module Lint
+      # This cops looks for out of range referencing for Regexp, as while capturing groups out of
+      # out of range reference always returns nil.
+


I just noticed that the doc file is incomplete, because you have a missing # here

marcandre · 2020-08-02T19:11:24Z

lib/rubocop/cop/lint/out_of_range_regexp_ref.rb

+
+      # @example
+      #   /(foo)bar/ =~ 'foobar'
+


and here...

marcandre · 2020-08-02T19:11:56Z

lib/rubocop/cop/lint/out_of_range_regexp_ref.rb

+
+      #   # bad - always returns nil
+      #   puts $2 # => nil
+


and here. Might be a cop idea! 😆

Cool, hope I fixed everything now 🤣 🤞

marcandre

Last tweaks, I promise 😅

marcandre · 2020-08-03T14:16:56Z

spec/rubocop/cop/lint/out_of_range_regexp_ref_spec.rb

+  it 'does not register offense when using a Regexp cannot be processed by regexp_parser gem' do
+    expect_no_offenses(<<~'RUBY')
+      /data = ({"words":.+}}}[^}]*})/m
+    RUBY


There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

marcandre · 2020-08-03T14:20:23Z

lib/rubocop/cop/lint/out_of_range_regexp_ref.rb

+module RuboCop
+  module Cop
+    module Lint
+      # This cops looks for out of range referencing for Regexp, as while capturing groups out of


I'm not a native speaker, but the "as while" sounds strange to me. How about?

"This cops looks for references of Regexp captures that are out of range and thus always returns nil."

sonalinavlakhe · 2020-08-04T06:18:04Z

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

I have added $1 in the last commit to fix this. please refer this

marcandre · 2020-08-04T17:10:16Z

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

I have added $1 in the last commit to fix this.

Right, but the "unparseable" regexp has one captured group, so $1 is not out-of-range...

sonalinavlakhe · 2020-08-05T07:52:59Z

Right, but the "unparseable" regexp has one captured group, so $1 is not out-of-range...

Sorry, I missed that one, when I tried to use other than $1 i.e $2 or $3 this test is failing. I checked more on this and conclude that regex_parser now able to parse this type of expression.
I checked with a series of examples - #8083 and ammar/regexp_parser#15. regex_parser parsing those without throwing an exception

irb(main):001:0> require 'regexp_parser'
=> true
irb(main):002:0> Regexp::Parser.parse('data = ({"words":.+}}}[^}]*})')
=> #<Regexp::Expression::Root:0x00007fa07a9c5c08 @type=:expression, @token=:root, @text="", @ts=0, @level=nil, @set_level=nil, @conditional_level=nil, @nesting_level=0, @quantifier=nil, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00007fa07b857bb8 @type=:literal, @token=:literal, @text="data = ", @ts=0, @level=0, @set_level=0, @conditional_level=0, @nesting_level=1, @quantifier=nil, @options={}>, #<Regexp::Expression::Group::Capture:0x00007fa07b857b90 @type=:group, @token=:capture, @text="(", @ts=7, @level=0, @set_level=0, @conditional_level=0, @nesting_level=1, @quantifier=nil, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00007fa07b8579d8 @type=:literal, @token=:literal, @text="{\"words\":", @ts=8, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=nil, @options={}>, #<Regexp::Expression::CharacterType::Any:0x00007fa07b8579b0 @type=:meta, @token=:dot, @text=".", @ts=17, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=#<Regexp::Expression::Quantifier:0x00007fa07b857938 @token=:one_or_more, @text="+", @mode=:greedy, @min=1, @max=-1>, @options={}>, #<Regexp::Expression::Literal:0x00007fa07b8578e8 @type=:literal, @token=:literal, @text="}}}", @ts=19, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=nil, @options={}>, #<Regexp::Expression::CharacterSet:0x00007fa07b8578c0 @negative=true, @closed=true, @type=:set, @token=:character, @text="[", @ts=22, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=#<Regexp::Expression::Quantifier:0x00007fa07b857640 @token=:zero_or_more, @text="*", @mode=:greedy, @min=0, @max=-1>, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00007fa07b8577d0 @type=:literal, @token=:literal, @text="}", @ts=24, @level=1, @set_level=1, @conditional_level=0, @nesting_level=3, @quantifier=nil, @options={}>]>, #<Regexp::Expression::Literal:0x00007fa07b8575f0 @type=:literal, @token=:literal, @text="}", @ts=27, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=nil, @options={}>], @number=1, @number_at_level=1>]>

Please suggest should we remove the rescue block then?

rescue Regexp::Scanner::ScannerError
  return
end

as Errors like this (https://github.com/ammar/regexp_parser/blob/master/spec/scanner/errors_spec.rb) are handled by Lint/Syntax cop.
#app/controllers/accounts_controller.rb

/\p{foobar}/
$1

Inspecting 1 file
E

Offenses:

app/controllers/accounts_controller.rb:7:4: E: Lint/Syntax: invalid character property name {foobar}: /\p{foobar}/
(Using Ruby 2.5 parser; configure using TargetRubyVersion parameter, under AllCops)
   /\p{foobar}/
   ^^^^^^^^^^^^

1 file inspected, 1 offense detected

marcandre · 2020-08-05T12:24:36Z

That's good news :-)

Right, I see in their Changelog the issue has been changed in v1.7.1, so we could bump our gemspec requirement to that version and remove the rescue (and the test). If ever there's such an exception, it won't be the end of the world (just a notice there was an issue when processing the file) and a bug report can be filed with the gem.

It also means that the other rescue in the code could be removed and that the associated specs are incorrect (as they should fail with the regexp parser 1.7.1)

sonalinavlakhe · 2020-08-05T12:40:15Z

That's good news :-)

Right, I see in their Changelog the issue has been changed in v1.7.1, so we could bump our gemspec requirement to that version and remove the rescue (and the test). If ever there's such an exception, it won't be the end of the world (just a notice there was an issue when processing the file) and a bug report can be filed with the gem.

It also means that the other rescue in the code could be removed and that the associated specs are incorrect (as they should fail with the regexp parser 1.7.1)

@marcandre Well, I got this ok I have to remove the rescue block from code and the test from the out_of_range_regexp_ref_spec.rb.
Any other action item needed to fix this?

marcandre · 2020-08-05T13:46:55Z

Any other action item needed to fix this?

No, rest can be done separately.

You'll have to rebase to fix the Changelog conflict though

sonalinavlakhe · 2020-08-05T16:00:19Z

@marcandre, I have done rebase and all the changes.

marcandre · 2020-08-05T17:33:31Z

🎉 Thank you @sonalinavlakhe for this PR and your quick turn around to my multiple review comments 👍

Congratulations on your first Cop 🍾 😄

marcandre requested changes Jul 27, 2020

View reviewed changes

marcandre reviewed Jul 28, 2020

View reviewed changes

docs/modules/ROOT/pages/cops_lint.adoc Outdated Show resolved Hide resolved

marcandre reviewed Jul 28, 2020

View reviewed changes

lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb Outdated Show resolved Hide resolved

marcandre requested changes Jul 30, 2020

View reviewed changes

sonalinavlakhe force-pushed the feature/out_of_range_ref_in_regexp branch from d9c62a4 to a3497ff Compare July 30, 2020 16:40

marcandre requested changes Jul 30, 2020

View reviewed changes

marcandre mentioned this pull request Jul 30, 2020

Forces that could run along with cops #8420

Closed

marcandre requested changes Jul 30, 2020

View reviewed changes

marcandre requested changes Aug 1, 2020

View reviewed changes

marcandre requested changes Aug 2, 2020

View reviewed changes

sonalinavlakhe force-pushed the feature/out_of_range_ref_in_regexp branch from 9d91d81 to c33997b Compare August 3, 2020 07:00

marcandre reviewed Aug 3, 2020

View reviewed changes

sonalinavlakhe added 5 commits August 5, 2020 20:44

Add new Lint/OutOfRangeRefInRegexp cop rubocop#7755

d1e1f64

code review fixes

98d8cc4

code review fixes

2beea65

mark a cop unsafe

a5e89f5

refactor the contain_non_literal? method

51dda40

sonalinavlakhe added 5 commits August 5, 2020 20:44

code review fixes and rename lint to OutOfRangeRegexpRef

4efe602

refactor regexp_captures method

6984914

Update examples in specs

29d162f

fix review comments

b17f32a

Remove rescue block and update test

b5e412a

sonalinavlakhe force-pushed the feature/out_of_range_ref_in_regexp branch from 8c36fe8 to b5e412a Compare August 5, 2020 15:18

marcandre merged commit a678aba into rubocop:master Aug 5, 2020

		named_capture += 1 if e.instance_of?(Regexp::Expression::Group::Named)
		numbered_capture += 1 if e.instance_of?(Regexp::Expression::Group::Capture)

Add new Lint/OutOfRangeRefInRegexp cop #7755 #8407

Add new Lint/OutOfRangeRefInRegexp cop #7755 #8407

Conversation

sonalinavlakhe commented Jul 27, 2020 • edited Loading

marcandre left a comment

Choose a reason for hiding this comment

marcandre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonalinavlakhe commented Jul 30, 2020

marcandre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonalinavlakhe Aug 1, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcandre left a comment

Choose a reason for hiding this comment

sonalinavlakhe commented Jul 31, 2020

marcandre commented Jul 31, 2020

sonalinavlakhe commented Jul 31, 2020

marcandre commented Jul 31, 2020

sonalinavlakhe commented Aug 1, 2020 • edited Loading

marcandre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcandre Aug 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonalinavlakhe commented Aug 2, 2020

marcandre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcandre left a comment

Choose a reason for hiding this comment

marcandre Aug 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonalinavlakhe commented Aug 4, 2020

marcandre commented Aug 4, 2020

sonalinavlakhe commented Aug 5, 2020

marcandre commented Aug 5, 2020

sonalinavlakhe commented Aug 5, 2020

marcandre commented Aug 5, 2020

sonalinavlakhe commented Aug 5, 2020

marcandre commented Aug 5, 2020

sonalinavlakhe commented Jul 27, 2020 •

edited

Loading

sonalinavlakhe Aug 1, 2020 •

edited

Loading

sonalinavlakhe commented Aug 1, 2020 •

edited

Loading

marcandre Aug 3, 2020 •

edited

Loading

marcandre Aug 3, 2020 •

edited

Loading