Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #8859] Add new Lint/UnmodifiedReduceAccumulator cop #8916

Merged
merged 1 commit into from Oct 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -6,6 +6,7 @@

* [#8895](https://github.com/rubocop-hq/rubocop/pull/8895): Add new `Lint/EmptyBlock` cop. ([@fatkodima][])
* [#7549](https://github.com/rubocop-hq/rubocop/issues/7549): Add new `Style/ArgumentsForwarding` cop. ([@koic][])
* [#8859](https://github.com/rubocop-hq/rubocop/issues/8859): Add new `Lint/UnmodifiedReduceAccumulator` cop. ([@dvandersluis][])

### Changes

Expand Down
5 changes: 5 additions & 0 deletions config/default.yml
Expand Up @@ -1942,6 +1942,11 @@ Lint/UnifiedInteger:
Enabled: true
VersionAdded: '0.43'

Lint/UnmodifiedReduceAccumulator:
Description: Checks for `reduce` or `inject` blocks that do not update the accumulator each iteration.
Enabled: pending
VersionAdded: '1.1'

Lint/UnreachableCode:
Description: 'Unreachable code.'
Enabled: true
Expand Down
1 change: 1 addition & 0 deletions docs/modules/ROOT/pages/cops.adoc
Expand Up @@ -274,6 +274,7 @@ In the following section you find all available cops:
* xref:cops_lint.adoc#linttrailingcommainattributedeclaration[Lint/TrailingCommaInAttributeDeclaration]
* xref:cops_lint.adoc#lintunderscoreprefixedvariablename[Lint/UnderscorePrefixedVariableName]
* xref:cops_lint.adoc#lintunifiedinteger[Lint/UnifiedInteger]
* xref:cops_lint.adoc#lintunmodifiedreduceaccumulator[Lint/UnmodifiedReduceAccumulator]
* xref:cops_lint.adoc#lintunreachablecode[Lint/UnreachableCode]
* xref:cops_lint.adoc#lintunreachableloop[Lint/UnreachableLoop]
* xref:cops_lint.adoc#lintunusedblockargument[Lint/UnusedBlockArgument]
Expand Down
65 changes: 65 additions & 0 deletions docs/modules/ROOT/pages/cops_lint.adoc
Expand Up @@ -4190,6 +4190,71 @@ This cop checks for using Fixnum or Bignum constant.
1.is_a?(Integer)
----

== Lint/UnmodifiedReduceAccumulator

|===
| Enabled by default | Safe | Supports autocorrection | VersionAdded | VersionChanged

| Pending
| Yes
| No
| 1.1
| -
|===

Looks for `reduce` or `inject` blocks where the value returned (implicitly or
explicitly) does not include the accumulator. A block is considered valid as
long as at least one return value includes the accumulator.

If the accumulator is not included in the return value, then the entire
block will just return a transformation of the last element value, and
could be rewritten as such without a loop.

Also catches instances where an index of the accumulator is returned, as
this may change the type of object being retained.

NOTE: For the purpose of reducing false positives, this cop only flags
returns in `reduce` blocks where the element is the only variable in
the expression (since we will not be able to tell what other variables
relate to via static analysis).

=== Examples

[source,ruby]
----
# bad
(1..4).reduce(0) do |acc, el|
el * 2
end

# bad, may raise a NoMethodError after the first iteration
%w(a b c).reduce({}) do |acc, letter|
acc[letter] = true
end

# good
(1..4).reduce(0) do |acc, el|
acc + el * 2
end

# good, returns the accumulator instead of the index
%w(a b c).reduce({}) do |acc, letter|
acc[letter] = true
acc
end

# good, at least one branch returns the accumulator
values.reduce(nil) do |result, value|
break result if something?
value
end

# ignored as the return value cannot be determined
enum.reduce do |acc, el|
x + y
end
----

== Lint/UnreachableCode

|===
Expand Down
1 change: 1 addition & 0 deletions lib/rubocop.rb
Expand Up @@ -306,6 +306,7 @@
require_relative 'rubocop/cop/lint/percent_symbol_array'
require_relative 'rubocop/cop/lint/raise_exception'
require_relative 'rubocop/cop/lint/rand_one'
require_relative 'rubocop/cop/lint/reduce_accumulator'
require_relative 'rubocop/cop/lint/redundant_cop_disable_directive'
require_relative 'rubocop/cop/lint/redundant_cop_enable_directive'
require_relative 'rubocop/cop/lint/redundant_require_statement'
Expand Down
164 changes: 164 additions & 0 deletions lib/rubocop/cop/lint/reduce_accumulator.rb
@@ -0,0 +1,164 @@
# frozen_string_literal: true

module RuboCop
module Cop
module Lint
# Looks for `reduce` or `inject` blocks where the value returned (implicitly or
# explicitly) does not include the accumulator. A block is considered valid as
# long as at least one return value includes the accumulator.
#
# If the accumulator is not included in the return value, then the entire
# block will just return a transformation of the last element value, and
# could be rewritten as such without a loop.
#
# Also catches instances where an index of the accumulator is returned, as
# this may change the type of object being retained.
#
# NOTE: For the purpose of reducing false positives, this cop only flags
# returns in `reduce` blocks where the element is the only variable in
# the expression (since we will not be able to tell what other variables
# relate to via static analysis).
#
# @example
#
# # bad
# (1..4).reduce(0) do |acc, el|
# el * 2
# end
Comment on lines +25 to +27
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad example, acc is not used at all. There are already cops that deal with this, and Ruby too.

Even if acc was used, this could still lead to false negatives:

def nest(*values)
  values.inject([]) do |prev, elem|
    elem = [elem] unless elem.is_a?(Array)
    elem << prev
    elem
  end
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad example, acc is not used at all. There are already cops that deal with this, and Ruby too.

Do you mean Lint/UnusedBlockArgument here? While that's true it doesn't really convey the same thing that this cop does. (I don't see any other offenses for the example, so I think it's valid).

I think you meant false positives (false negatives are inherent here on purpose), but you're right that there is a class of false positives here that I did not consider. I'm looking at this now, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any other offenses for the example, so I think it's valid

specs only check for the current cop, you can't rely on that.

I think you meant false positives

Yes, sorry 馃槄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcandre no I meant I ran all of rubocop on the example code haha, it raised Lint/UnusedBlockArgument and this cop. 馃榿

#
# # bad, may raise a NoMethodError after the first iteration
# %w(a b c).reduce({}) do |acc, letter|
# acc[letter] = true
# end
Comment on lines +30 to +32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue is in the word may.
There are very valid uses:

def chain(*values)
  values.inject({value: nil}) do |acc, elem|
    node = {value: elem, next: acc}
    acc[:prev] = node
  end
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concede that that is a valid use that will be registered by the cop, but I'd argue that this is going to be much less prevalent than mistakenly forgetting to return the accumulator when setting a hash key.

I see a few options:

  1. We can remove this behaviour (I think this isn't ideal, because I do think this is still a good check)
  2. We can put it behind a conf variable that is not enabled by default
  3. We can accept the false positives (if this was my code I'd just # rubocop:disable it, but maybe that's suboptimal).

What do we think? /cc @bbatsov

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgetting to return the accumulator when setting a hash key.

BTW this is a double error, as inject probably shouldn't be used in the first place in that case (see Style/EachWithObject)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely true, but in this case Style/EachWithObject doesn't register (because the accumulator is not being returned). We definitely have other places where a piece of code is chain corrected by multiple cops so I don't see this as being an issue (except for it not being autocorrectable here I guess 馃槗 ).

#
# # good
# (1..4).reduce(0) do |acc, el|
# acc + el * 2
# end
#
# # good, returns the accumulator instead of the index
# %w(a b c).reduce({}) do |acc, letter|
# acc[letter] = true
# acc
# end
#
# # good, at least one branch returns the accumulator
# values.reduce(nil) do |result, value|
# break result if something?
# value
# end
#
# # ignored as the return value cannot be determined
# enum.reduce do |acc, el|
# x + y
# end
Comment on lines +51 to +54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad example, as it could be determined that this is an error. x = foo(acc, el); bar(x) would be a better example

class UnmodifiedReduceAccumulator < Base
MSG = 'Ensure the accumulator `%<accum>s` will be modified by `%<method>s`.'
MSG_INDEX = 'Do not return an element of the accumulator in `%<method>s`.'

def_node_matcher :reduce_with_block?, <<~PATTERN
(block (send _recv {:reduce :inject} !sym) ...)
PATTERN

def_node_matcher :accumulator_index?, <<~PATTERN
(send (lvar %1) {:[] :[]=} ...)
PATTERN

def_node_matcher :lvar_used?, <<~PATTERN
{
(lvar %1)
(lvasgn %1 ...)
(send (lvar %1) :<< ...)
(dstr (begin (lvar %1)))
(%RuboCop::AST::Node::SHORTHAND_ASSIGNMENTS (lvasgn %1))
}
PATTERN

def_node_search :expression_values, <<~PATTERN
`{
marcandre marked this conversation as resolved.
Show resolved Hide resolved
(%RuboCop::AST::Node::VARIABLES $_)
(%RuboCop::AST::Node::EQUALS_ASSIGNMENTS $_ ...)
(send (%RuboCop::AST::Node::VARIABLES $_) :<< ...)
(send nil? $_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mistakes the name of a method with the name of a variable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually needed, because I'm looking for identifiers in general, not specifically variables. This is valid code and should not be flagged since we can't determine what method is from static analysis:

(1..4).reduce(0) do |acc, el|
  method + el
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to why there's nil? then. For example self.method + el, or el.method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW, please do not "resolve conversation" when there's no agreement)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right there shouldn't be!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW, please do not "resolve conversation" when there's no agreement)

Sorry about that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right there shouldn't be!

Ok, if you remove nil?, then el and self.el are treated the same way, even though they are not related?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I've fixed that now too. (btw I've been taking all your notes and applying them locally for the purpose of a new fix PR)

(dstr (begin {(send nil? $_) (%RuboCop::AST::Node::VARIABLES $_)}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (first part of {})

(%RuboCop::AST::Node::SHORTHAND_ASSIGNMENTS (%RuboCop::AST::Node::EQUALS_ASSIGNMENTS $_) ...)
}
PATTERN
Comment on lines +77 to +86
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcandre I feel like you probably have a simpler way to do this 馃榿


def on_block(node)
return unless reduce_with_block?(node)

check_return_values(node)
end

private

# Return values in a block are either the value given to next,
# the last line of a multiline block, or the only line of the block
def return_values(block_body_node)
nodes = [block_body_node.begin_type? ? block_body_node.child_nodes.last : block_body_node]

block_body_node.each_descendant(:next, :break) do |n|
# Ignore `next`/`break` inside an inner block
next if n.each_ancestor(:block).first != block_body_node.parent
next unless n.first_argument

nodes << n.first_argument
end

nodes
end

def check_return_values(block_node)
return_values = return_values(block_node.body)
accumulator_name = block_arg_name(block_node, 0)
element_name = block_arg_name(block_node, 1)
message_opts = { method: block_node.method_name, accum: accumulator_name }

if (node = returned_accumulator_index(return_values, accumulator_name))
add_offense(node, message: format(MSG_INDEX, message_opts))
elsif !returns_accumulator_anywhere?(return_values, accumulator_name)
return_values.each do |return_val|
unless acceptable_return?(return_val, element_name)
add_offense(return_val, message: format(MSG, message_opts))
end
end
end
end

def block_arg_name(node, index)
node.arguments[index].node_parts[0]
end

# Look for an index of the accumulator being returned
# This is always an offense, in order to try to catch potential exceptions
# due to type mismatches
def returned_accumulator_index(return_values, accumulator_name)
return_values.detect { |val| accumulator_index?(val, accumulator_name) }
end

# If the accumulator is used in any return value, the node is acceptable since
# the accumulator has a chance to change each iteration
def returns_accumulator_anywhere?(return_values, accumulator_name)
return_values.any? { |node| lvar_used?(node, accumulator_name) }
end

# Determine if a return value is acceptable for the purposes of this cop
# If it is an expression containing the accumulator, it is acceptable
# Otherwise, it is only unacceptable if it contains the iterated element, since we
# otherwise do not have enough information to prevent false positives.
def acceptable_return?(return_val, element_name)
vars = expression_values(return_val).uniq
return true if vars.none? || (vars - [element_name]).any?

false
end

# Exclude `begin` nodes inside a `dstr` from being collected by `return_values`
def allowed_type?(parent_node)
!parent_node.dstr_type?
end
end
end
end
end