Python: Add `getAMethodCall` to `LocalSourceNode` #6064

tausbn · 2021-06-11T21:37:02Z

This seems like something we have been missing for a while now, so I
figured it might be useful to add. It is roughly based on the JavaScript
equivalent, with one major difference: in the JavaScript libraries,
getAMethodCall is reserved for syntactic method calls (obj.m(...))
whereas getAMemberInvocation is used for both this and the case where
the bound method obj.m is stored in a temporary variable and then
subsequently invoked in the same local scope.

It seems to me that the more general predicate is more useful, and hence
should have the simpler name. (And also we don't really work with a
notion of "invocation" in the Python libraries, so we would need a
better name for it anyway.)

I think as long as the documentation makes the behaviour clear, it
should be okay.

~~Needs a change note, so I'm keeping it as draft for now.~~ Change note added.

This seems like something we have been missing for a while now, so I figured it might be useful to add. It is roughly based on the JavaScript equivalent, with one major difference: in the JavaScript libraries, `getAMethodCall` is reserved for syntactic method calls (`obj.m(...)`) whereas `getAMemberInvocation` is used for both this and the case where the bound method `obj.m` is stored in a temporary variable and then subsequently invoked in the same local scope. It seems to me that the more general predicate is more useful, and hence should have the simpler name. (And also we don't really work with a notion of "invocation" in the Python libraries, so we would need a better name for it anyway.) I think as long as the documentation makes the behaviour clear, it should be okay.

Roughly patterned after the JS equivalent.

Not as many opportunities to clean stuff up here.

Now that we have a `MethodCallNode` class, it would be silly not to use that as the return type.

RasmusWL

Really love it 😍

I think it's important to highlight that this only tracks method references in local scope.

def foo(meth):
    res = meth() # not found by MethodCallNode
    print(res)

def bar():
    r = RocketStation()
    meth = r.fire_the_rockets
    meth() # found by MethodCallNode
    foo(meth)

Receiver

We haven't used the term receiver a lot in the Python libraries until now, and personally I've only seen it used here in CodeQL. So I just want to check if there are arguments for keeping it this way, or whether we should maybe call it getOjbect()? (I think we talked about this the other day, but I simply can't remember)

`calls`

Personally I don't immediately love the way calls read, as in the example below. Something rubs me wrong about the way the object/receiver is specified... just like for accesses. I think it's very natural that we should either go with both of these, or none of them.

result.(DataFlow::MethodCallNode).calls(cipherInstance(algorithmName), "encryptor")

Based on the fact that I can't really find a different approach replacing calls for MethodCallNode, that means the only alternative is to not provide these shortcuts, and always require a query writer to specify both getMethodName() and getReceiver() (likewise for AttrRead/AttrWrite).

I feel that if we provide these shortcuts, it should be because we want people (mostly us) to use them. I personally don't love them, but if you're feeling good about them, we can try them out, and maybe I'll change my mind after some time 🤷‍♂️ (and I guess you are, since you've made these suggestion for using these already 😛)

RasmusWL · 2021-06-21T10:04:10Z

python/ql/src/semmle/python/dataflow/new/internal/DataFlowPublic.qll

+ * Also covers the case where the method lookup is done separately from the call itself, as in
+ * `temp = foo.bar; temp(...)`.


Suggested change

* Also covers the case where the method lookup is done separately from the call itself, as in

* `temp = foo.bar; temp(...)`.

* Also covers the case where the method lookup is done separately from the call itself, as in

* `temp = foo.bar; temp(...)`, although this is only tracked through local scope.

RasmusWL · 2021-06-21T12:55:48Z

I got curious how it would handle multiple methods flowing to the same call-site.

So I made a little test:

if cond:
    meth = obj1.foo
else:
    meth = obj2.bar

meth()

with this query:

import python
import semmle.python.dataflow.new.DataFlow
import experimental.dataflow.TestUtil.PrintNode

from DataFlow::MethodCallNode methCall
select prettyNode(methCall), methCall.getMethodName(), prettyNode(methCall.getReceiver())

results in

| meth() | bar | obj1 |
| meth() | bar | obj2 |
| meth() | foo | obj1 |
| meth() | foo | obj2 |

So there is a risk of someone wanting all objects where the foo method is called on, using any(DataFlow::MethodCallNode methCall).calls(result, "foo"), and getting FPs (like obj2) 😞 (and technically also FNs due to the local scope only property).

@tausbn do you see an (easy) way to fix this? I personally don't, but it would be nice if we avoid this 😊 (I'm thinking we would need a new constructor in TNode, which then couldn't just subclass CallCfgNode, and it would be sort of a mess).

Otherwise I guess it's a tradeoff between how big of a problem we estimate this to be vs. how much of an improvement we expect MethodCallNode to provide.

tausbn · 2021-06-21T14:38:40Z

Amusingly, your second comment (which is an excellent find, by the way ❤️) explains why we need calls in the first place: taken separately, getReceiver/getObject and getMethodName may have multiple results (because, indeed, there may be multiple methods flowing to the call in question). If we require people to specify these separately as a conjuction, we will get the kind of overlap you describe above, leading to false positives.

The only way to avoid this is to specify/access the object and the method name at the same time, for which the following implementation of calls will do:

  predicate calls(Node object, string methodName) {
    object = method_lookup.getObject() and
    methodName = method_lookup.getAttributeName()
  }

By ensuring that it's the same method_lookup, we avoid the unfortunate crosstalk. (I have created a test illustrating this point, which I'll push shortly.)

With this in mind, I also think it's reasonable to add a big disclaimer on getObject and getMethodName saying that you should use calls if you want to specify both at the same time (to avoid people falling into this trap by mistake).

Regarding some of your other points, we agreed that getObject was more in line with the rest of the Python library, so I will make this change as well. Your suggestion for clarifying that this only works in local scope will also be folded in. 🙂

- changes `getReceiver` to `getObject` - fixes `calls` to avoid unwanted cross-talk - adds some more documentation to highlight the above issue

RasmusWL

Amusingly, your second comment (which is an excellent find, by the way ) explains why we need calls in the first place: taken separately, getReceiver/getObject and getMethodName may have multiple results (because, indeed, there may be multiple methods flowing to the call in question). If we require people to specify these separately as a conjuction, we will get the kind of overlap you describe above, leading to false positives.

I got to the same conclusion while I was eating my breakfast thinking this over, but nice to see it has already been implemented 👍

github-actions bot added the Python label Jun 11, 2021

tausbn mentioned this pull request Jun 11, 2021

Python: Small Cleanups #5926

Merged

tausbn added 6 commits June 15, 2021 15:04

Python: Add MethodCallNode class

d4b0554

Roughly patterned after the JS equivalent.

Python: Clean up Cryptography.qll

82fab3b

Python: Clean up Ssl.qll

e90ec80

Python: Clean up Stdlib.qll

41ee325

Not as many opportunities to clean stuff up here.

Python: Add change note

92063dc

Python: Fix up getAMethodCall

b55c034

Now that we have a `MethodCallNode` class, it would be silly not to use that as the return type.

github-actions bot added the documentation label Jun 15, 2021

tausbn marked this pull request as ready for review June 15, 2021 15:54

tausbn requested a review from a team as a code owner June 15, 2021 15:54

Python: Autoformat

359bc5e

RasmusWL requested changes Jun 21, 2021

View reviewed changes

Python: Address review comments

768cab3

- changes `getReceiver` to `getObject` - fixes `calls` to avoid unwanted cross-talk - adds some more documentation to highlight the above issue

tausbn requested a review from RasmusWL June 21, 2021 16:12

RasmusWL approved these changes Jun 22, 2021

View reviewed changes

RasmusWL merged commit e05d6e7 into github:main Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python: Add `getAMethodCall` to `LocalSourceNode` #6064

Python: Add `getAMethodCall` to `LocalSourceNode` #6064

Uh oh!

tausbn commented Jun 11, 2021 •

edited

Loading

Uh oh!

RasmusWL left a comment •

edited

Loading

Uh oh!

RasmusWL Jun 21, 2021

Uh oh!

RasmusWL commented Jun 21, 2021

Uh oh!

tausbn commented Jun 21, 2021

Uh oh!

RasmusWL left a comment

Uh oh!

Uh oh!

		* Also covers the case where the method lookup is done separately from the call itself, as in
		* `temp = foo.bar; temp(...)`.

Python: Add getAMethodCall to LocalSourceNode #6064

Python: Add getAMethodCall to LocalSourceNode #6064

Uh oh!

Conversation

tausbn commented Jun 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RasmusWL left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Receiver

calls

Uh oh!

RasmusWL Jun 21, 2021

Choose a reason for hiding this comment

Uh oh!

RasmusWL commented Jun 21, 2021

Uh oh!

tausbn commented Jun 21, 2021

Uh oh!

RasmusWL left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Python: Add `getAMethodCall` to `LocalSourceNode` #6064

Python: Add `getAMethodCall` to `LocalSourceNode` #6064

tausbn commented Jun 11, 2021 •

edited

Loading

RasmusWL left a comment •

edited

Loading

`calls`