-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less false positives for len-as-conditions (pandas's DataFrame, numpy's Array) #3821
Less false positives for len-as-conditions (pandas's DataFrame, numpy's Array) #3821
Conversation
1e8ba7e
to
f4357fd
Compare
@hippo91 and @bersbersbers I'd like your opinion on this :) I don't think infering the class and checking for its implementation is very efficient, suggestions welcomed :) |
f4357fd
to
84f0957
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change! The message should be clearer thanks to it!
84f0957
to
3e84349
Compare
try: | ||
instance = next(node.args[0].infer()) | ||
except astroid.InferenceError: | ||
self.add_message("len-as-condition", node=node) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about the internals of astroid's infer, but what's the reasoning to emitting the error if the value can't be inferred? I'm assuming that astroid would probably less likely to be able to infer the type of an imported library, but that could be wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually happen for list comprehension, dict comprehension, set comprehension and generators like range. It's not very elegant nor explicit..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly what to think about it.
The implementation you propose is quite powerfull but it will emit a message as soon as the len argument cannot be inferred. I'm not confident about the fact it will occur only for comprehensions and generator...
Maybe we could try to filter comprehension before trying to infer. Something along the lines:
# we're finally out of any nested boolean operations so check if
# this len() call is part of a test condition
if _is_test_condition(node, parent):
len_arg = node.args[0]
# First, the node is a generator or comprehesion as in len([x for x in ...])
if isinstance(len_arg, (astroid.ListComp, astroid.SetComp, astroid.DictComp, astroid.GeneratorExp):
self.add_message("len-as-condition", node=node)
return
# Or the node is a name as in len(foo) and we look for name assignment and analyze it
if isinstance(len_arg, astroid.Name):
lookup for name assignment and check if it is a Comprehension or Generator
# Last case i see is the node is a call to a function as in len(my_func(my_arg)) but it that case we should be able to check whether the result is a Generator
if isinstance(len_arg, astroid.Call) and astroid.GeneratorExpr in len_arg.inferred():
self.add_message("len-as-condition", node=node)
...
Note that i'm not found of this implementation neither because it uses a lot of isinstance calls...
What do you think about it?
Maybe we can ask @AWhetter advice...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it's very heavy. I'm searching for a better way than checking all the code of the class in search of a __bool__
function. If there isn't one, maybe not raising a message for anything that is not a list, set, dict, range, comprehension or generator has a better yield in term of result versus calculation to do ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing the checks @hippo91 suggested might even end up being faster than calling .infer()
. If we're worried about the performance that much, we could do some performance comparisons and see which is faster.
My preference is to do the checks because we're way less likely to do raise false positives, and I think they're more annoying than pylint running slowly. At least this section of code will get called only when len()
is calling in the source code being analysed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it like @hippo91 suggested is fine, my implementation wasn't efficient at all.
c9c653c
to
4fe3ff4
Compare
Based on the change log, it seems this has the desired effect. I did not check the implementation as I am not used to pylint internals.
I don't even see why the current change does not already do that in infer() and instance_has_bool(). |
@Pierre-Sassoulas i will try to review this PR next week end. Sorry for the delay. |
Yes I did exactly that :) But maybe there is a better way, or the value given is not worth the calculation ? |
f101d3a
to
d92c8de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nice work @Pierre-Sassoulas !
I left a bunch of comments for discussion.
d69f819
to
98b5e41
Compare
55fed7c
to
ef5c811
Compare
a5c76f7
to
bec7283
Compare
@Pierre-Sassoulas let me know if you want another feedback before merging it. |
@hippo91 I'm stuck on the way to handle functions. Here's my functional test: def function_returning_list(r):
if r==1:
return [1]
return [2]
def function_returning_int(r):
if r==1:
return 1
return 2
def function_returning_generator(r):
for i in [r, 1, 2, 3]:
yield i
def function_returning_comprehension(r):
return [x+1 for x in [r, 1, 2, 3]]
def function_returning_function(r):
return function_returning_generator(r)
assert len(function_returning_list(z)) # [len-as-condition]
assert len(function_returning_int(z))
assert len(function_returning_generator(z)) # [len-as-condition] (line 173 here)
assert len(function_returning_comprehension(z)) # [len-as-condition]
assert len(function_returning_function(z)) # [len-as-condition] The last three one (l 173, 174, 175) aren't working right now. How would you infer the type of the return of a function from the following ?
To be fair, I found other adjacent problem in testutil while doing that and did not take a lot of time for this. But maybe we could ignore function call if this is hard to do ? |
bec7283
to
e3ee0e6
Compare
@Pierre-Sassoulas sorry for the last answer.
you obtain the following:
The same arises for |
More efficient, more elegant. 👌
Or a function returning a generator.
e3ee0e6
to
b228d77
Compare
Description
len-as-conditions
is now triggered only for classes that are inheriting directly from list, dict, or set and not implementing the__bool__
function, or from generators like range or list/dict/set comprehension. This should reduce the false positive for other classes, like pandas's DataFrame or numpy's Array. See discussion in related issue.Type of Changes
Related Issue
#1879