Skip to content

Conversation

tausbn
Copy link
Contributor

@tausbn tausbn commented May 12, 2022

A simple rewrite to use API graphs instead.

The handling of falsy values is potentially a bit more restrictive now,
as it only accounts for local flow. We should probably figure out a
better way of capturing this pattern, but I felt that this was out of
scope for the present PR.


I don't think a change note is necessary for this, but I'll gladly add one if needed.

A simple rewrite to use API graphs instead.

The handling of falsy values is potentially a bit more restrictive now,
as it only accounts for local flow. We should probably figure out a
better way of capturing this pattern, but I felt that this was out of
scope for the present PR.
@tausbn tausbn added the no-change-note-required This PR does not need a change note label May 12, 2022
@tausbn tausbn marked this pull request as ready for review May 12, 2022 13:50
@tausbn tausbn requested a review from a team as a code owner May 12, 2022 13:50
yoff
yoff previously approved these changes May 16, 2022
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should probably use global data flow to track the booleans. Ideally, when parameterised modules arrive in anger, we can have a data flow configuration for tracking booleans that can just be seamlessly referred to in situations such as this.

For now, though, I am fine with this rewrite provided we do not lose many alerts. Do we know that the experiment in progress does contain a bunch of alerts (it looks to be the nightly suite)?

@tausbn
Copy link
Contributor Author

tausbn commented May 16, 2022

The experiment suite does contain results (e.g. for saltstack/salt), but there was something weird in the last run, so I restarted it a short while ago.

Comment on lines 48 to 49
any(DataFlow::LocalSourceNode n | n.asExpr().(ImmutableLiteral).booleanValue() = false)
.flowsTo(getAutoEscapeParameter(call))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if call was an API::CallNode you could use the following code

Suggested change
any(DataFlow::LocalSourceNode n | n.asExpr().(ImmutableLiteral).booleanValue() = false)
.flowsTo(getAutoEscapeParameter(call))
call.getKeywordParameter("autoescape").getAValueReachingRhs().asExpr().(ImmutableLiteral).booleanValue() = false

to also get global flow -- that's how I've been tracking boolean values in other library modeling, so I think doing it here would also make sense 😊

doing this would require inlining getAutoEscapeParameter, which I think is fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried this, and it introduced a false positive in

def checked(cond=False):
    if cond:
        e = Environment(autoescape=cond) # GOOD (but now has an alert)

So I'm inclined not to do it at the present time. I think this may have to do with flow coming out of default values not being quite right.

I'll switch DataFlow::CallCfgNode anyway, though, as that'll set us up better for the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may have to do with flow coming out of default values not being quite right.

Is it not that the if does not act as a barrier, since there is really not a concept of value involved in type tracking? It seems correct to take the default value into account..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. I think I had somehow inverted the logic in my head, thinking it was cond=True rather than cond=False, and that we were seeing the alert due to missing flow (to a call that was in fact safe).

This makes me wonder, though, if we couldn't incorporate these barriers into our local flow (and hence type tracking). Consider e.g. splitting the local flow relation into two forms, one for truthy values, and one for falsy values. We could then only allow flow into the body of an if like the above if it is of the correct truthyness. (The general local flow relation would then be the union of the truthy and falsy relations.)

It's probably not worth the effort, but it would be an interesting experiment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to also get global flow -- that's how I've been tracking boolean values in other library modeling, so I think doing it here would also make sense 😊

I am generally fine with rewriting pointsTo into getAValuereachingRhs as a quick way to convert our queries. I will submit, though, that this does not really get global flow, but a different global property, and in cases where global flow is important, we should use global data flow.

Introduces a false positive, but arguably that false positive should
have been there with the local flow as well.
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if we get many FP reports, we can rewrite it or lower the precision as appropriate.

@yoff yoff merged commit 23d64ff into github:main May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants