Skip to content

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Feb 19, 2021

This extends #5219 to also consider modelled functions, and provides a simple abstract class to annotate new value-preserving functions.

@smowton smowton requested a review from a team as a code owner February 19, 2021 16:01
@github-actions github-actions bot added the Java label Feb 19, 2021
@smowton
Copy link
Contributor Author

smowton commented Feb 19, 2021

I've confirmed this is sufficient to replace #5178

smowton added 2 commits March 4, 2021 11:45
Also add convenience abstract classes for easily modelling new functions as fluent or value-preserving.
@smowton smowton force-pushed the smowton/feature/backward-dataflow-for-modelled-fluent-methods branch from 8963b73 to da0a7f3 Compare March 4, 2021 11:47
@smowton smowton added the no-change-note-required This PR does not need a change note label Mar 4, 2021
@smowton
Copy link
Contributor Author

smowton commented Mar 4, 2021

@github/codeql-java rebased this now its dependency is merged

* These steps will be visible for all global data-flow purposes, as well as via
* `DataFlow::Node.getASuccessor` and other related functions exposing intraprocedural dataflow.
*/
abstract class ValuePreservingCallable extends Callable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an abstract class for this comes with a heavy commitment to bidirectional import. Therefore, I don't think this is the right place for this. Instead we should put both of these abstract classes in FlowSteps.qll.
The few that are already modelled here (StandardLibraryValuePreservingCallable) should move to a different place as well: Either directly in FlowSteps.qll as that already contains a bit of JDK modelling, or quite possibly moved to a new file along with the other modelling from FlowSteps.qll.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved them both to FlowSteps.qll

Note value-preserving functions can't be constructors

Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
)
or
this.getDeclaringType()
.getSourceDeclaration()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should possibly get rid of this line and move the responsibility for handling generics into DataFlowUtil.qll.

m.hasName(["requireNonNull", "requireNonNullElseGet"]) and node1.asExpr() = ma.getArgument(0)
or
m.hasName("requireNonNullElse") and node1.asExpr() = ma.getAnArgument()
node1.asExpr() = ma.getArgument(argNo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the syntactic MethodAcces.getArgument(int) we should use the ArgumentNode.argumentOf(call, pos). That also covers the -1 case.

Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
smowton and others added 2 commits March 5, 2021 13:28
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
…e uses of ValuePreservingCallable -> ValuePreservingMethod
@smowton
Copy link
Contributor Author

smowton commented Mar 5, 2021

@aschackmull all changes applied

exists(MethodAccess ma, ValuePreservingCallable c, int argNo |
ma.getCallee().getSourceDeclaration() = c and c.returnsValue(argNo)
exists(MethodAccess ma, ValuePreservingMethod m, int argNo |
ma.getCallee().getSourceDeclaration() = m and m.returnsValue(argNo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the DIL for this join? It's a triangular join involving an integer argument position and that has been known to have a tendency to go bad. (The DIL for this join should be inspected in a suitably realistic query context - i.e. not through quick-eval).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's the order of returnsValue, getCallee + getSourceDeclaration, and argumentOf that matters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of an XSS.ql evaluation:

    {3} r43 = JOIN Expr::Call::getCallee_dispred#ff_10#join_rhs AS L WITH Member::Callable::getSourceDeclaration_dispred#bf AS R ON FIRST 1 OUTPUT L.<1>, 61, R.<1>
    {2} r44 = JOIN r43 WITH exprs ON FIRST 2 OUTPUT r43.<0>, r43.<2>
    {3} r45 = JOIN r44 WITH DataFlowUtil::TExprNode#ff@staged_ext AS R ON FIRST 1 OUTPUT r44.<0>, r44.<1>, R.<1>
    {4} r46 = JOIN r45 WITH DataFlowPrivate::ArgumentNode::argumentOf_dispred#fff_102#join_rhs AS R ON FIRST 1 OUTPUT r45.<1>, R.<2>, r45.<2>, R.<1>
    {2} r47 = JOIN r46 WITH FlowSteps::StandardLibraryValuePreservingMethod#class#ff AS R ON FIRST 2 OUTPUT r46.<3>, r46.<2>

Copy link
Contributor Author

@smowton smowton Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looks like at present, the fact that all models define returnsValue as -1 means any join on the integer is wholly optimised away?

The join against the class relation is joining on the Method and the argument number concurrently

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the last line is doing the integer join. So this is all good, the integer join is scheduled last, so the pipeline is bounded linearly by arguments.

Copy link
Contributor Author

@smowton smowton Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively,

exists(/* FlowSteps::ValuePreservingMethod */ cached entity m, int argNo |
    exists(/* Expr::MethodAccess */ cached entity ma |
      exists(/* Member::Callable */ cached entity receiver#30 |
        Member::Callable::getSourceDeclaration_dispred#bf(receiver#30, m),
        Expr::Call::getCallee_dispred#ff(ma, receiver#30)
      ),
      exists(cached int arg1, /* @type */ cached dontcare entity _,
             /* @exprparent */ cached dontcare entity _1, cached dontcare int _2 |
        arg1 = 61, exprs(ma, arg1, _, _1, _2)
      ),
      DataFlowUtil::TExprNode(ma, node2),
      DataFlowPrivate::ArgumentNode::argumentOf_dispred#fff(node1, ma, argNo)
    ),
    FlowSteps::StandardLibraryValuePreservingMethod#class#ff(m, argNo)
  )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The member predicate returnsValue is the same as the characteristic predicate FlowSteps::StandardLibraryValuePreservingMethod#class#ff - note that this is binary. This is because the index is a field on the class.

@aschackmull aschackmull merged commit cf4f55d into github:main Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants