-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Python: Modernise py/jinja2/autoescape-false #9135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Modernise py/jinja2/autoescape-false #9135
Conversation
A simple rewrite to use API graphs instead. The handling of falsy values is potentially a bit more restrictive now, as it only accounts for local flow. We should probably figure out a better way of capturing this pattern, but I felt that this was out of scope for the present PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should probably use global data flow to track the booleans. Ideally, when parameterised modules arrive in anger, we can have a data flow configuration for tracking booleans that can just be seamlessly referred to in situations such as this.
For now, though, I am fine with this rewrite provided we do not lose many alerts. Do we know that the experiment in progress does contain a bunch of alerts (it looks to be the nightly suite)?
The experiment suite does contain results (e.g. for |
any(DataFlow::LocalSourceNode n | n.asExpr().(ImmutableLiteral).booleanValue() = false) | ||
.flowsTo(getAutoEscapeParameter(call)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if call
was an API::CallNode
you could use the following code
any(DataFlow::LocalSourceNode n | n.asExpr().(ImmutableLiteral).booleanValue() = false) | |
.flowsTo(getAutoEscapeParameter(call)) | |
call.getKeywordParameter("autoescape").getAValueReachingRhs().asExpr().(ImmutableLiteral).booleanValue() = false |
to also get global flow -- that's how I've been tracking boolean values in other library modeling, so I think doing it here would also make sense 😊
doing this would require inlining getAutoEscapeParameter
, which I think is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried this, and it introduced a false positive in
def checked(cond=False):
if cond:
e = Environment(autoescape=cond) # GOOD (but now has an alert)
So I'm inclined not to do it at the present time. I think this may have to do with flow coming out of default values not being quite right.
I'll switch DataFlow::CallCfgNode
anyway, though, as that'll set us up better for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may have to do with flow coming out of default values not being quite right.
Is it not that the if
does not act as a barrier, since there is really not a concept of value involved in type tracking? It seems correct to take the default value into account..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. I think I had somehow inverted the logic in my head, thinking it was cond=True
rather than cond=False
, and that we were seeing the alert due to missing flow (to a call that was in fact safe).
This makes me wonder, though, if we couldn't incorporate these barriers into our local flow (and hence type tracking). Consider e.g. splitting the local flow relation into two forms, one for truthy values, and one for falsy values. We could then only allow flow into the body of an if
like the above if it is of the correct truthyness. (The general local flow relation would then be the union of the truthy and falsy relations.)
It's probably not worth the effort, but it would be an interesting experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to also get global flow -- that's how I've been tracking boolean values in other library modeling, so I think doing it here would also make sense 😊
I am generally fine with rewriting pointsTo
into getAValuereachingRhs
as a quick way to convert our queries. I will submit, though, that this does not really get global flow, but a different global property, and in cases where global flow is important, we should use global data flow.
Introduces a false positive, but arguably that false positive should have been there with the local flow as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, if we get many FP reports, we can rewrite it or lower the precision as appropriate.
A simple rewrite to use API graphs instead.
The handling of falsy values is potentially a bit more restrictive now,
as it only accounts for local flow. We should probably figure out a
better way of capturing this pattern, but I felt that this was out of
scope for the present PR.
I don't think a change note is necessary for this, but I'll gladly add one if needed.