-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Java: Add backward dataflow edges through modelled function invocations #5223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java: Add backward dataflow edges through modelled function invocations #5223
Conversation
I've confirmed this is sufficient to replace #5178 |
Also add convenience abstract classes for easily modelling new functions as fluent or value-preserving.
8963b73
to
da0a7f3
Compare
@github/codeql-java rebased this now its dependency is merged |
* These steps will be visible for all global data-flow purposes, as well as via | ||
* `DataFlow::Node.getASuccessor` and other related functions exposing intraprocedural dataflow. | ||
*/ | ||
abstract class ValuePreservingCallable extends Callable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an abstract class for this comes with a heavy commitment to bidirectional import. Therefore, I don't think this is the right place for this. Instead we should put both of these abstract classes in FlowSteps.qll
.
The few that are already modelled here (StandardLibraryValuePreservingCallable
) should move to a different place as well: Either directly in FlowSteps.qll
as that already contains a bit of JDK modelling, or quite possibly moved to a new file along with the other modelling from FlowSteps.qll
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved them both to FlowSteps.qll
Note value-preserving functions can't be constructors Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
) | ||
or | ||
this.getDeclaringType() | ||
.getSourceDeclaration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should possibly get rid of this line and move the responsibility for handling generics into DataFlowUtil.qll
.
java/ql/src/semmle/code/java/dataflow/internal/DataFlowUtil.qll
Outdated
Show resolved
Hide resolved
m.hasName(["requireNonNull", "requireNonNullElseGet"]) and node1.asExpr() = ma.getArgument(0) | ||
or | ||
m.hasName("requireNonNullElse") and node1.asExpr() = ma.getAnArgument() | ||
node1.asExpr() = ma.getArgument(argNo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using the syntactic MethodAcces.getArgument(int)
we should use the ArgumentNode.argumentOf(call, pos)
. That also covers the -1
case.
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
…e uses of ValuePreservingCallable -> ValuePreservingMethod
@aschackmull all changes applied |
exists(MethodAccess ma, ValuePreservingCallable c, int argNo | | ||
ma.getCallee().getSourceDeclaration() = c and c.returnsValue(argNo) | ||
exists(MethodAccess ma, ValuePreservingMethod m, int argNo | | ||
ma.getCallee().getSourceDeclaration() = m and m.returnsValue(argNo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the DIL for this join? It's a triangular join involving an integer argument position and that has been known to have a tendency to go bad. (The DIL for this join should be inspected in a suitably realistic query context - i.e. not through quick-eval).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It's the order of returnsValue
, getCallee
+ getSourceDeclaration
, and argumentOf
that matters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of an XSS.ql evaluation:
{3} r43 = JOIN Expr::Call::getCallee_dispred#ff_10#join_rhs AS L WITH Member::Callable::getSourceDeclaration_dispred#bf AS R ON FIRST 1 OUTPUT L.<1>, 61, R.<1>
{2} r44 = JOIN r43 WITH exprs ON FIRST 2 OUTPUT r43.<0>, r43.<2>
{3} r45 = JOIN r44 WITH DataFlowUtil::TExprNode#ff@staged_ext AS R ON FIRST 1 OUTPUT r44.<0>, r44.<1>, R.<1>
{4} r46 = JOIN r45 WITH DataFlowPrivate::ArgumentNode::argumentOf_dispred#fff_102#join_rhs AS R ON FIRST 1 OUTPUT r45.<1>, R.<2>, r45.<2>, R.<1>
{2} r47 = JOIN r46 WITH FlowSteps::StandardLibraryValuePreservingMethod#class#ff AS R ON FIRST 2 OUTPUT r46.<3>, r46.<2>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So looks like at present, the fact that all models define returnsValue
as -1 means any join on the integer is wholly optimised away?
The join against the class relation is joining on the Method and the argument number concurrently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the last line is doing the integer join. So this is all good, the integer join is scheduled last, so the pipeline is bounded linearly by arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively,
exists(/* FlowSteps::ValuePreservingMethod */ cached entity m, int argNo |
exists(/* Expr::MethodAccess */ cached entity ma |
exists(/* Member::Callable */ cached entity receiver#30 |
Member::Callable::getSourceDeclaration_dispred#bf(receiver#30, m),
Expr::Call::getCallee_dispred#ff(ma, receiver#30)
),
exists(cached int arg1, /* @type */ cached dontcare entity _,
/* @exprparent */ cached dontcare entity _1, cached dontcare int _2 |
arg1 = 61, exprs(ma, arg1, _, _1, _2)
),
DataFlowUtil::TExprNode(ma, node2),
DataFlowPrivate::ArgumentNode::argumentOf_dispred#fff(node1, ma, argNo)
),
FlowSteps::StandardLibraryValuePreservingMethod#class#ff(m, argNo)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The member predicate returnsValue
is the same as the characteristic predicate FlowSteps::StandardLibraryValuePreservingMethod#class#ff
- note that this is binary. This is because the index is a field on the class.
This extends #5219 to also consider modelled functions, and provides a simple abstract class to annotate new value-preserving functions.