Skip to content

Conversation

tausbn
Copy link
Contributor

@tausbn tausbn commented Feb 23, 2021

Really, this boils down to "Port re library model to use API graphs
instead of points-to", which is what this PR actually does.

Instead of using points-to to track flags, we use a type tracker. To
handle multiple flags at the same time, we add additional flow from

x to x | y and y | x

and, as an added bonus, the above with + instead of |, neatly
fixing #4707

I had to modify the Qualified.ql test slightly, as it now had a
result stemming from the standard library (in warnings.py) that
points-to previously ignored.

It might be possible to implement this as a type tracker on
LocalSourceNodes, but with the added steps for the above operations,
this was not obvious to me, and so I opted for the simpler
"smallstep" variant.


Before merging, I need to

  • check that performance isn't impacted because of bad joins in the type tracker.
  • check that no other tests need to be modified because of the change to the re model
  • figure out if a change note is needed. The queries haven't changed and the behaviour should be the same, but I did deprecate a public predicate in Regex.qll, so that probably merits a change note.

Really, this boils down to "Port `re` library model to use API graphs
instead of points-to", which is what this PR actually does.

Instead of using points-to to track flags, we use a type tracker. To
handle multiple flags at the same time, we add additional flow from

`x` to `x | y` and `y | x`

and, as an added bonus, the above with `+` instead of `|`, neatly
fixing github#4707

I had to modify the `Qualified.ql` test slightly, as it now had a
result stemming from the standard library (in `warnings.py`) that
points-to previously ignored.

It might be possible to implement this as a type tracker on
`LocalSourceNode`s, but with the added steps for the above operations,
this was not obvious to me, and so I opted for the simpler
"`smallstep`" variant.
@tausbn tausbn added Python Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish no-change-note-required This PR does not need a change note labels Feb 23, 2021
@tausbn tausbn requested a review from a team as a code owner February 23, 2021 21:13
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically looks great, two comments.

flag = Value::named("sre_constants.SRE_FLAG_" + result).(ObjectInternal).intValue() and
obj.(ObjectInternal).intValue().bitAnd(flag) = flag
flag = Value::named("sre_constants.SRE_FLAG_" + result).(OI::ObjectInternal).intValue() and
obj.(OI::ObjectInternal).intValue().bitAnd(flag) = flag
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this does not involve points-to, but please reassure me that this is not code we will be moving away from :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your statement confuses me. The change you have highlighted above does indeed involve points-to, and it is code we will be moving away from. However, it's a predicate that could have seen use outside of this codebase, and so I couldn't simply delete it.

I should add a comment saying it's deprecated, though, to make this clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the indicated statement. (And dated it, so we can actually see when we can delete it. 🙂)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and I guess that the signature makes it rather awkward to rewrite it to not use points-to...

These were increased because of the indirection needed to get to the
regex flags, but as we no longer rely on this, we can make do with a
smaller import depth.
@tausbn
Copy link
Contributor Author

tausbn commented Feb 24, 2021

With the join-order fix, the performance is now well within parameters.

@tausbn tausbn removed the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Feb 24, 2021
@tausbn tausbn requested review from yoff and RasmusWL February 24, 2021 11:36
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides me wanting to understand the rationale behind t.continue() a bit deeper, this PR LGTM 👍

exists(API::Node flag | flag_name = canonical_name(flag) and result = flag.getAUse())
or
exists(BinaryExprNode binop, DataFlow::Node operand |
operand.getALocalSource() = re_flag_tracker(flag_name, t.continue()) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bit of trouble understanding if t.continue() is the only right choice here.

I would have thought that if we were able to track a flag with some type tracker t2, if that flag is used in a binary or operation, the resulting type-tracker would be the continuation of t2. So

  exists(BinaryExprNode binop, DataFlow::Node operand, DataFlow::TypeTracker t2 |
    operand.getALocalSource() = re_flag_tracker(flag_name, t2) and
    t = t2.continue() and
    ...

But I'm also wondering if we need this continue stuff at all, or we could just use re_flag_tracker(flag_name) instead -- and if there would be any bad consequences of that.

  exists(BinaryExprNode binop, DataFlow::Node operand |
    operand.getALocalSource() = re_flag_tracker(flag_name) and
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggested rewrite would have at least two consequences, both of which may affect performance, and one of which will affect behaviour:

  1. By referring to re_flag_tracker/1, the fixpoint computation now has to evaluate both this and re_flag_tracker/2 at the same time. Currently, re_flag_tracker/1 is simply an extra join on top of the result of the other predicate. The impact on performance probably isn't terribly big, but I imagine there is some overhead in doing this.
  2. Probably more impactful on performance is the fact that by not reusing the type tracker, we lose track of whether we have previously propagated the type information across a call. Thus, with your suggestion we might track into a call, then through a binary operation and then out of a different call to the same function. This is potentially a much larger set of nodes.

Finally, rewriting to use t2 actually doesn't change the behaviour. t = t2.continue() is equivalent to t = t2 and t.attr = "" (with a slight abuse of notation). Thus, continue really just checks that we're not tracking an attribute (which makes sense -- we can add re flags, but not objects that happen to have an re flag in an attribute).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Finally, rewriting to use t2 actually doesn't change the behaviour. t = t2.continue() is equivalent to t = t2 and t.attr = "" (with a slight abuse of notation). Thus, continue really just checks that we're not tracking an attribute (which makes sense -- we can add re flags, but not objects that happen to have an re flag in an attribute).

Right. Although the behavior ends up being the same, for me, it just reads wrong semantically. I would like to use the t2 approach, and have made a suggestion for that (couldn't do it in this thread, since that only covered one of the lines 🤦)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere else (in Python and JavaScript) where this construction appears, and I do mean everywhere is in the form t.continue(). We only see t2.continue() (and currently only in JavaScript) when (back)track(t2,t) is used in an adjacent disjunct.

So I think to suddenly change what is the standard idiom doesn't make much sense.

Rather, it seems to me that a better solution would be to add predicate canContinue to TypeTracker, with the meaning that this.attr is empty. Then we can reuse t in both places, as long as we make sure to check ... and t.canContinue() and we don't have to waste precious space on declaring a variable that is exactly equal to t anyway.

This, however, I think it outside the scope of the present PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that changing the way it's used probably isn't in the scope of this PR.

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +78 to +79
exists(BinaryExprNode binop, DataFlow::Node operand |
operand.getALocalSource() = re_flag_tracker(flag_name, t.continue()) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exists(BinaryExprNode binop, DataFlow::Node operand |
operand.getALocalSource() = re_flag_tracker(flag_name, t.continue()) and
exists(BinaryExprNode binop, DataFlow::Node operand, DataFlow::TypeTracker t2 |
operand.getALocalSource() = re_flag_tracker(flag_name, t2) and
t = t2.continue() and

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed for the reasons outlined in #5250 (comment)

@tausbn tausbn merged commit 01d581e into github:main Feb 25, 2021
@tausbn tausbn deleted the python-port-re-security-queries branch February 25, 2021 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants