Python: Port use-use implementation from Java#4235
Conversation
|
Decisions about which predicates and modules to make |
|
I'm a bit worried about leaving out |
RasmusWL
left a comment
There was a problem hiding this comment.
Overall really good stuff 👍 and results looks promising 🎉
Can you explain why we don't need adjacentUseUse? -- I would think we should use that instead of adjacentUseUseSameVar.
| * Holds if `b2` is a transitive successor of `b1` and `v` occurs in `b1` [and | ||
| * in `b2` or one of its transitive successors]? but not in any block on the path | ||
| * between `b1` and `b2`. |
There was a problem hiding this comment.
Why did you add the square brackets to the qldoc?
There was a problem hiding this comment.
Because it is wrong, it does not hold in the base case.
There was a problem hiding this comment.
Reverted, now that it is correct.
|
tausbn
left a comment
There was a problem hiding this comment.
A few comments, but otherwise this looks really nice!
| i = rank[rankix](int j | variableDefine(v, _, b, j) or variableSourceUse(v, _, b, j)) | ||
| } | ||
|
|
||
| /** A `VarAccess` `use` of `v` in `b` at index `i`. */ |
There was a problem hiding this comment.
I don't think VarAccess has a special meaning in the Python libraries (whereas I assume it does in the Java libraries), so maybe this should just be spelled out as variable access instead?
(And in writing this, I realise that the variableUse predicate also has this odd reference to VarAccess)
There was a problem hiding this comment.
Probably it was copied from the same place.. :)
There was a problem hiding this comment.
I changed both places.
|
In java, the module |
|
|
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
Are you saying that we don't need to handle this explicitly in a |
|
Yes, the phi-nodes are in our dataflow graph. But perhaps they should not be. Perhaps we should only have |
|
Perhaps @hvitved can weigh in on the situation in C# (as we're aligning ourselves more with that than with Java)? |
RasmusWL
left a comment
There was a problem hiding this comment.
Besides the open question of how to handle phi nodes, looks good to me.
|
gonna merge this now, thinking we can resolve that part in a separate PR. |
| @@ -120,6 +128,14 @@ module EssaFlow { | |||
| nodeFrom.(EssaNode).getVar() = p.getAnInput() | |||
There was a problem hiding this comment.
In these three cases, C# takes any of the last reads of the input variable as nodeFrom, and only if there are no reads do we take the SSA node. I believe this will currently not work
if (..)
x = taint;
clean(x);
else
x = taint;
clean(x)
sink(x)
because you will jump directly from both definitions of x to the call to sink.
There was a problem hiding this comment.
Even worse, something like this will (I believe) also not work:
x = ...
if (...)
x.Foo = taint;
else
x = ...
sink(x.Foo)
because there is not step from the x in x.Foo = taint to the phi node for x after the if-then-else.
There was a problem hiding this comment.
@yoff have you added the testcases from above somewhere, and checked how we handle them after use-use flow? 😊
There was a problem hiding this comment.
Actually, I think the latter case will work, as long as the store step in x.Foo = taint targets the refined SSA node for x. But if it instead targets the post-update node for x (as for Java and C#), the change is needed (and it will be needed for the first case anyway).
There was a problem hiding this comment.
Not yet (but very soon), the need is tracked here. I think we probably need to remove some essa-flow and let use-use do the work.
I think it is time to let the team see what is going on with the implementation of use-use.
After re-adding def-use steps to global variables and fixing the bug that pre-update nodes lost out-flow, all existing flow is recovered (as viewed by our test files, we should add specific tests for use-use).
After fixing the bug that post-update nodes were never having out-flow, we finally obtain some of the expected improvements by passing the taint-tracking test for
list_append(as feared, fixing those bugs on main does not give the same improvement).The details
The implementation follows
java/ql/src/semmle/code/java/dataflow/SSA.qlllines 754-856 and is carried out inSsaCompute.qll. That file already contains several predicates that seem to be copied from the Java implementation and then adapted (not always adapting the comments) to the Python analysis which includes refinements and many implicit uses.This PR adds a new module in
SsaCompute.qllcalledAdjacentUsesImplwhich is exported asAdjacentUses. This module contains the ported computation, but also a redefinition of some of the underlying predicates to exclude refinements and implicit uses. For instance, the Java computation relies on a predicate calleddefUseRankandSsaCompute.qllalready provides one inSsaComputeImpl. But rather than reuse that one,AdjacentUsesImpldefinesdefSourceUseRankwhich is based ongetASourceUserather than ongetAUseand which excludes refinements (comparevariableUseto the newvariableSourceUseandvariableDeftovariableDefine).Apart from renaming
defUseRanktodefSourceUseRankandvariableUsetovariableSourceUse, the java implementation can be used almost verbatim. OnlydefinesAthad to be implemented (andgetABBSucessorrenamed togetASucessor).