-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Python: Add getAMethodCall
to LocalSourceNode
#6064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This seems like something we have been missing for a while now, so I figured it might be useful to add. It is roughly based on the JavaScript equivalent, with one major difference: in the JavaScript libraries, `getAMethodCall` is reserved for syntactic method calls (`obj.m(...)`) whereas `getAMemberInvocation` is used for both this and the case where the bound method `obj.m` is stored in a temporary variable and then subsequently invoked in the same local scope. It seems to me that the more general predicate is more useful, and hence should have the simpler name. (And also we don't really work with a notion of "invocation" in the Python libraries, so we would need a better name for it anyway.) I think as long as the documentation makes the behaviour clear, it should be okay.
Roughly patterned after the JS equivalent.
Not as many opportunities to clean stuff up here.
Now that we have a `MethodCallNode` class, it would be silly not to use that as the return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really love it 😍
I think it's important to highlight that this only tracks method references in local scope.
def foo(meth):
res = meth() # not found by MethodCallNode
print(res)
def bar():
r = RocketStation()
meth = r.fire_the_rockets
meth() # found by MethodCallNode
foo(meth)
Receiver
We haven't used the term receiver
a lot in the Python libraries until now, and personally I've only seen it used here in CodeQL. So I just want to check if there are arguments for keeping it this way, or whether we should maybe call it getOjbect()
? (I think we talked about this the other day, but I simply can't remember)
calls
Personally I don't immediately love the way calls
read, as in the example below. Something rubs me wrong about the way the object/receiver is specified... just like for accesses
. I think it's very natural that we should either go with both of these, or none of them.
result.(DataFlow::MethodCallNode).calls(cipherInstance(algorithmName), "encryptor")
Based on the fact that I can't really find a different approach replacing calls
for MethodCallNode
, that means the only alternative is to not provide these shortcuts, and always require a query writer to specify both getMethodName()
and getReceiver()
(likewise for AttrRead
/AttrWrite
).
I feel that if we provide these shortcuts, it should be because we want people (mostly us) to use them. I personally don't love them, but if you're feeling good about them, we can try them out, and maybe I'll change my mind after some time 🤷♂️ (and I guess you are, since you've made these suggestion for using these already 😛)
* Also covers the case where the method lookup is done separately from the call itself, as in | ||
* `temp = foo.bar; temp(...)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Also covers the case where the method lookup is done separately from the call itself, as in | |
* `temp = foo.bar; temp(...)`. | |
* Also covers the case where the method lookup is done separately from the call itself, as in | |
* `temp = foo.bar; temp(...)`, although this is only tracked through local scope. |
I got curious how it would handle multiple methods flowing to the same call-site. So I made a little test: if cond:
meth = obj1.foo
else:
meth = obj2.bar
meth() with this query: import python
import semmle.python.dataflow.new.DataFlow
import experimental.dataflow.TestUtil.PrintNode
from DataFlow::MethodCallNode methCall
select prettyNode(methCall), methCall.getMethodName(), prettyNode(methCall.getReceiver()) results in
So there is a risk of someone wanting all objects where the @tausbn do you see an (easy) way to fix this? I personally don't, but it would be nice if we avoid this 😊 (I'm thinking we would need a new constructor in Otherwise I guess it's a tradeoff between how big of a problem we estimate this to be vs. how much of an improvement we expect |
Amusingly, your second comment (which is an excellent find, by the way ❤️) explains why we need The only way to avoid this is to specify/access the object and the method name at the same time, for which the following implementation of predicate calls(Node object, string methodName) {
object = method_lookup.getObject() and
methodName = method_lookup.getAttributeName()
} By ensuring that it's the same With this in mind, I also think it's reasonable to add a big disclaimer on Regarding some of your other points, we agreed that |
- changes `getReceiver` to `getObject` - fixes `calls` to avoid unwanted cross-talk - adds some more documentation to highlight the above issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amusingly, your second comment (which is an excellent find, by the way ) explains why we need
calls
in the first place: taken separately,getReceiver
/getObject
andgetMethodName
may have multiple results (because, indeed, there may be multiple methods flowing to the call in question). If we require people to specify these separately as a conjuction, we will get the kind of overlap you describe above, leading to false positives.
I got to the same conclusion while I was eating my breakfast thinking this over, but nice to see it has already been implemented 👍
This seems like something we have been missing for a while now, so I
figured it might be useful to add. It is roughly based on the JavaScript
equivalent, with one major difference: in the JavaScript libraries,
getAMethodCall
is reserved for syntactic method calls (obj.m(...)
)whereas
getAMemberInvocation
is used for both this and the case wherethe bound method
obj.m
is stored in a temporary variable and thensubsequently invoked in the same local scope.
It seems to me that the more general predicate is more useful, and hence
should have the simpler name. (And also we don't really work with a
notion of "invocation" in the Python libraries, so we would need a
better name for it anyway.)
I think as long as the documentation makes the behaviour clear, it
should be okay.
Needs a change note, so I'm keeping it as draft for now.Change note added.