Less false positives for len-as-conditions (pandas's DataFrame, numpy's Array) #3821

Pierre-Sassoulas · 2020-09-07T19:13:26Z

Description

len-as-conditions is now triggered only for classes that are inheriting directly from list, dict, or set and not implementing the __bool__ function, or from generators like range or list/dict/set comprehension. This should reduce the false positive for other classes, like pandas's DataFrame or numpy's Array. See discussion in related issue.

Type of Changes

	Type
✓	🐛 Bug fix

Related Issue

#1879

Pierre-Sassoulas · 2020-09-07T19:19:06Z

@hippo91 and @bersbersbers I'd like your opinion on this :)
I think @ethan-leba could be interested too.

I don't think infering the class and checking for its implementation is very efficient, suggestions welcomed :)

hippo91

Great change! The message should be clearer thanks to it!

pylint/checkers/refactoring/len_checker.py

coveralls · 2020-09-07T19:59:12Z

Coverage increased (+0.02%) to 90.66% when pulling b228d77 on Pierre-Sassoulas:fix-github-issue-1879 into 973cdeb on PyCQA:master.

ethan-leba · 2020-09-07T19:59:21Z

pylint/checkers/refactoring/len_checker.py

+                try:
+                    instance = next(node.args[0].infer())
+                except astroid.InferenceError:
+                    self.add_message("len-as-condition", node=node)
+                    return


I don't know much about the internals of astroid's infer, but what's the reasoning to emitting the error if the value can't be inferred? I'm assuming that astroid would probably less likely to be able to infer the type of an imported library, but that could be wrong

It actually happen for list comprehension, dict comprehension, set comprehension and generators like range. It's not very elegant nor explicit..

I don't know exactly what to think about it.
The implementation you propose is quite powerfull but it will emit a message as soon as the len argument cannot be inferred. I'm not confident about the fact it will occur only for comprehensions and generator...

Maybe we could try to filter comprehension before trying to infer. Something along the lines:

# we're finally out of any nested boolean operations so check if # this len() call is part of a test condition if _is_test_condition(node, parent): len_arg = node.args[0] # First, the node is a generator or comprehesion as in len([x for x in ...]) if isinstance(len_arg, (astroid.ListComp, astroid.SetComp, astroid.DictComp, astroid.GeneratorExp): self.add_message("len-as-condition", node=node) return # Or the node is a name as in len(foo) and we look for name assignment and analyze it if isinstance(len_arg, astroid.Name): lookup for name assignment and check if it is a Comprehension or Generator # Last case i see is the node is a call to a function as in len(my_func(my_arg)) but it that case we should be able to check whether the result is a Generator if isinstance(len_arg, astroid.Call) and astroid.GeneratorExpr in len_arg.inferred(): self.add_message("len-as-condition", node=node) ...

Note that i'm not found of this implementation neither because it uses a lot of isinstance calls...
What do you think about it?
Maybe we can ask @AWhetter advice...

I agree, it's very heavy. I'm searching for a better way than checking all the code of the class in search of a __bool__ function. If there isn't one, maybe not raising a message for anything that is not a list, set, dict, range, comprehension or generator has a better yield in term of result versus calculation to do ?

I think doing the checks @hippo91 suggested might even end up being faster than calling .infer(). If we're worried about the performance that much, we could do some performance comparisons and see which is faster.
My preference is to do the checks because we're way less likely to do raise false positives, and I think they're more annoying than pylint running slowly. At least this section of code will get called only when len() is calling in the source code being analysed.

Doing it like @hippo91 suggested is fine, my implementation wasn't efficient at all.

pylint/testutils.py

bersbersbers · 2020-09-07T20:33:15Z

Based on the change log, it seems this has the desired effect. I did not check the implementation as I am not used to pylint internals.

I don't think infering the class and checking for its implementation is very efficient

I don't even see why the current change does not already do that in infer() and instance_has_bool().

hippo91 · 2020-09-08T20:11:09Z

@Pierre-Sassoulas i will try to review this PR next week end. Sorry for the delay.

Pierre-Sassoulas · 2020-09-08T20:44:50Z

I don't even see why the current change does not already do that in infer() and instance_has_bool().

Yes I did exactly that :) But maybe there is a better way, or the value given is not worth the calculation ?

hippo91

It's a nice work @Pierre-Sassoulas !
I left a bunch of comments for discussion.

pylint/checkers/refactoring/len_checker.py

pylint/testutils.py

tests/functional/l/len_checks.py

hippo91 · 2020-11-26T14:52:07Z

@Pierre-Sassoulas let me know if you want another feedback before merging it.

Pierre-Sassoulas · 2020-11-29T12:35:23Z

@hippo91 I'm stuck on the way to handle functions. Here's my functional test:

    def function_returning_list(r):
        if r==1:
            return [1]
        return [2]

    def function_returning_int(r):
        if r==1:
            return 1
        return 2

    def function_returning_generator(r):
        for i in [r, 1, 2, 3]:
            yield i

    def function_returning_comprehension(r):
        return [x+1 for x in [r, 1, 2, 3]]

    def function_returning_function(r):
        return function_returning_generator(r)

    assert len(function_returning_list(z))  # [len-as-condition]
    assert len(function_returning_int(z))
    assert len(function_returning_generator(z))  # [len-as-condition] (line 173 here)
    assert len(function_returning_comprehension(z))  # [len-as-condition]
    assert len(function_returning_function(z))  # [len-as-condition]

The last three one (l 173, 174, 175) aren't working right now. How would you infer the type of the return of a function from the following ?

 Call(func=<Name.function_returning_generator l.173 at 0x7fb5e231f970>,
     args=[<Name.z l.173 at 0x7fb5e231f9a0>],
     keywords=None)
{'lineno': 173, 'col_offset': 15, 'parent': <Call l.173 at 0x7fb5e231f880>,
 'func': <Name.function_returning_generator l.173 at 0x7fb5e231f970>, 
'args': [<Name.z l.173 at 0x7fb5e231f9a0>], 'keywords': None, 'fromlineno': 173, 'tolineno': 173}

 Call(func=<Name.function_returning_comprehension l.174 at 0x7fb5e231fb50>,
     args=[<Name.z l.174 at 0x7fb5e231fb80>],
     keywords=None)
{'lineno': 174, 'col_offset': 15, 'parent': <Call l.174 at 0x7fb5e231fa60>,
 'func': <Name.function_returning_comprehension l.174 at 0x7fb5e231fb50>, 
'args': [<Name.z l.174 at 0x7fb5e231fb80>], 'keywords': None, 'fromlineno': 174, 'tolineno': 174}

 Call(func=<Name.function_returning_function l.175 at 0x7fb5e231fd30>,
     args=[<Name.z l.175 at 0x7fb5e231fd60>],
     keywords=None)
{'lineno': 175, 'col_offset': 15, 'parent': <Call l.175 at 0x7fb5e231fc40>, 
'func': <Name.function_returning_function l.175 at 0x7fb5e231fd30>,
 'args': [<Name.z l.175 at 0x7fb5e231fd60>], 'keywords': None, 'tolineno': 175, 'fromlineno': 175}

To be fair, I found other adjacent problem in testutil while doing that and did not take a lot of time for this. But maybe we could ignore function call if this is hard to do ?

hippo91 · 2020-12-12T15:24:17Z

@Pierre-Sassoulas sorry for the last answer.
The problem here is that astroid doesn't seem to be able to infer list comprehension nor generator.
For example, if you run this little snippet:

from astroid import extract_node, helpers

ast_node = extract_node(
"""
(x for x in [1,2,3])
""")
print(ast_node.inferred())

you obtain the following:

File "astroid/astroid/node_classes.py", line 734, in _infer
    "No inference function for {node!r}.", node=self, context=context
astroid.exceptions.InferenceError: No inference function for <GeneratorExp l.2 at 0x7f68f3bde990>.

The same arises for ListComp. We will have to try to fix this but IMHO it will be in another astroids PR.

Following review https://github.com/PyCQA/pylint/pull/3821/files#r484553606

More efficient, more elegant. 👌

Or a function returning a generator.

Pierre-Sassoulas · 2020-12-31T10:48:25Z

Thanks for the review @hippo91, I commented the test cases that are not possible with the current astroid and I created #4002 to keep track of this. I'll merge the current version.

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch from 1e8ba7e to f4357fd Compare September 7, 2020 19:14

Pierre-Sassoulas requested a review from hippo91 September 7, 2020 19:15

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch from f4357fd to 84f0957 Compare September 7, 2020 19:20

hippo91 self-assigned this Sep 7, 2020

hippo91 reviewed Sep 7, 2020

View reviewed changes

pylint/checkers/refactoring/len_checker.py Show resolved Hide resolved

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch from 84f0957 to 3e84349 Compare September 7, 2020 19:53

ethan-leba reviewed Sep 7, 2020

View reviewed changes

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch 2 times, most recently from c9c653c to 4fe3ff4 Compare September 7, 2020 20:17

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch 2 times, most recently from f101d3a to d92c8de Compare September 10, 2020 21:34

hippo91 reviewed Sep 13, 2020

View reviewed changes

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch 3 times, most recently from d69f819 to 98b5e41 Compare September 14, 2020 20:00

Pierre-Sassoulas added this to the 2.6.1 release milestone Sep 14, 2020

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch 2 times, most recently from 55fed7c to ef5c811 Compare September 15, 2020 05:06

hippo91 approved these changes Sep 19, 2020

View reviewed changes

tests/functional/l/len_checks.py Show resolved Hide resolved

hippo91 added the Work in progress label Sep 20, 2020

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch 2 times, most recently from a5c76f7 to bec7283 Compare September 23, 2020 17:31

Pierre-Sassoulas mentioned this pull request Oct 24, 2020

Better assert message when dealing with functional tests messages #3918

Merged

hippo91 added the Waiting on author Indicate that maintainers are waiting for a message of the author label Nov 26, 2020

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch from bec7283 to e3ee0e6 Compare November 29, 2020 12:36

Pierre-Sassoulas added 13 commits December 31, 2020 10:56

Add new function to recover all base class of a Node

0c21c6e

Restrict the number of classes affected by len-as-condition

57648d5

Handle range, generator list/dict/set comprehension

651660f

Add a message in the changelog and update contributors.txt

00b4755

Add a docstring to LenChecker.base_classes_of_node

9961c4f

Following review https://github.com/PyCQA/pylint/pull/3821/files#r484553606

Add a comment for unclear InferenceError

9944264

Apply hippo91's suggestion for retrieving ancestors

7ff76be

More efficient, more elegant. 👌

Add a test for function in len() call for len_check

43244d6

Treat generator and comprehension more gracefully

9375cb8

Better and faster implementation of instance_has_bool

5234dae

Add a test for function with comprehension or generator

250abec

Or a function returning a generator.

Use premature return to simplify the code

07fc0dd

Remove test for function inference as astroid is not ready yet

b228d77

Pierre-Sassoulas force-pushed the fix-github-issue-1879 branch from e3ee0e6 to b228d77 Compare December 31, 2020 10:19

Pierre-Sassoulas mentioned this pull request Dec 31, 2020

Raising len-as-conditions when using len on function returning generators or list comprehensions #4002

Open

Pierre-Sassoulas merged commit f95bbac into pylint-dev:master Dec 31, 2020

Pierre-Sassoulas mentioned this pull request Dec 31, 2020

len-as-sequence with numpy.arrays #1879

Closed

Pierre-Sassoulas added Bug 🪲 False Positive 🦟 A message is emitted but nothing is wrong with the code and removed Waiting on author Indicate that maintainers are waiting for a message of the author Work in progress labels Dec 31, 2020

Pierre-Sassoulas deleted the fix-github-issue-1879 branch December 31, 2020 10:59

Pierre-Sassoulas mentioned this pull request Jul 29, 2021

Add checker for comparing for equality to empty literal #4774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less false positives for len-as-conditions (pandas's DataFrame, numpy's Array) #3821

Less false positives for len-as-conditions (pandas's DataFrame, numpy's Array) #3821

Pierre-Sassoulas commented Sep 7, 2020

Pierre-Sassoulas commented Sep 7, 2020

hippo91 left a comment

coveralls commented Sep 7, 2020 •

edited

Loading

ethan-leba Sep 7, 2020

Pierre-Sassoulas Sep 7, 2020

hippo91 Sep 12, 2020

Pierre-Sassoulas Sep 13, 2020 •

edited

Loading

AWhetter Sep 13, 2020

Pierre-Sassoulas Sep 15, 2020

bersbersbers commented Sep 7, 2020

hippo91 commented Sep 8, 2020

Pierre-Sassoulas commented Sep 8, 2020

hippo91 left a comment

hippo91 commented Nov 26, 2020

Pierre-Sassoulas commented Nov 29, 2020

hippo91 commented Dec 12, 2020

Pierre-Sassoulas commented Dec 31, 2020

Less false positives for len-as-conditions (pandas's DataFrame, numpy's Array) #3821

Less false positives for len-as-conditions (pandas's DataFrame, numpy's Array) #3821

Conversation

Pierre-Sassoulas commented Sep 7, 2020

Description

Type of Changes

Related Issue

Pierre-Sassoulas commented Sep 7, 2020

hippo91 left a comment

Choose a reason for hiding this comment

coveralls commented Sep 7, 2020 • edited Loading

ethan-leba Sep 7, 2020

Choose a reason for hiding this comment

Pierre-Sassoulas Sep 7, 2020

Choose a reason for hiding this comment

hippo91 Sep 12, 2020

Choose a reason for hiding this comment

Pierre-Sassoulas Sep 13, 2020 • edited Loading

Choose a reason for hiding this comment

AWhetter Sep 13, 2020

Choose a reason for hiding this comment

Pierre-Sassoulas Sep 15, 2020

Choose a reason for hiding this comment

bersbersbers commented Sep 7, 2020

hippo91 commented Sep 8, 2020

Pierre-Sassoulas commented Sep 8, 2020

hippo91 left a comment

Choose a reason for hiding this comment

hippo91 commented Nov 26, 2020

Pierre-Sassoulas commented Nov 29, 2020

hippo91 commented Dec 12, 2020

Pierre-Sassoulas commented Dec 31, 2020

coveralls commented Sep 7, 2020 •

edited

Loading

Pierre-Sassoulas Sep 13, 2020 •

edited

Loading