-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Python: port modification of default value #6557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: port modification of default value #6557
Conversation
The old query uses `pointsTo` to limit the sinks to methods on lists and dictionaries. That constraint is omitted here which could hurt performance.
This should avoid potentially terrible performance. Also noted the missing syntactic constructs, as I went through the documnetation.
Consider removing the original test
The one found in the original test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks like some good improvements 💪
I feel that the tests the makes a modification that end up not changing the value (like dict_nonempty_nochange
and dict_update_op_nochange
), feels a bit like a hypothetical code-pattern, and that we shouldn't spend more time trying to handle those better for now. If we end up getting lots of complaints on this, or seeing this being a major problem from real code, we can consider looking more into it 😊 The same goes for the sanitizer
test-case. (unless of course you have seen this in the wild a lot, but I don't see that mentioned anywhere)
Return of default value
I just wanted to note that a related problem can occur if you return the default value, illustrated in the example below.
def foo(x = [])
return x
foo().append(42)
assert foo() == [42]
BarrierGuard
I wanted to go a bit more into details with the effort that was put into not giving an alert on this code
def sanitizer(l = []):
if l:
l.append(1) # actually ok, since it will not trigger for default value
...
I don't see any changes to results from the barrier guard introduced in a762373 -- the changes to the .expected
files seems to be related to the changes made in 49ae549
I also don't think it quite works as intended, since for if not x:
the expression for x
does not seem to be a part of the IdentityGuarded
class, so the logic for handling the not
will not be applicable. To get it to be part of the IdentityGuarded
class we would need something along the lines of any(If i).getTest().contains(this)
, although that code might not be the best way to handle it in practice (I haven't really thought that through).
As I mentioned above, I would think that code that modifies the argument value in this way would be a rare occurrence, and not something we should worry too much about. So although it might sound a bit disappointing, I will suggest that we remove this bit of the PR, and then just be happy about the other improvements you have made for this query 💪 (if you already have the data indicating I'm wrong, very happy to take a closer look at this though)
In case you agree that we can drop support for this, I think there's a few of my PR comments that can be ignored 😉
python/ql/src/semmle/python/functions/ModificationOfParameterWithDefaultCustomizations.qll
Outdated
Show resolved
Hide resolved
python/ql/test/query-tests/Functions/ModificationOfParameterWithDefault/test.py
Outdated
Show resolved
Hide resolved
python/ql/test/query-tests/Functions/ModificationOfParameterWithDefault/test.py
Outdated
Show resolved
Hide resolved
python/ql/src/semmle/python/functions/ModificationOfParameterWithDefaultCustomizations.qll
Show resolved
Hide resolved
* A sanitizer guard for detecting modifications of a parameters default value. | ||
*/ | ||
abstract class BarrierGuard extends DataFlow::BarrierGuard { | ||
abstract boolean blocksNonEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs QLDoc
python/ql/src/semmle/python/functions/ModificationOfParameterWithDefaultCustomizations.qll
Outdated
Show resolved
Hide resolved
python/ql/src/semmle/python/functions/ModificationOfParameterWithDefaultCustomizations.qll
Outdated
Show resolved
Hide resolved
I don't have any good suggestions for how to fix the tests 😞 |
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
…fication-of-default-value Files have moved around, specifically PrintNode.qll.
The test case has different behaviour between py2/3. When merging this, we should create an issue to resolve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic involving invertedness, emptyness, and truthyness is quite complicated. I'm reminded of this quote:
There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.
I didn't invest too much in addressing this part initially, since I thought it might be going away altogether.
As we discussed offline, we probably don't need to handle not not x
, so we should be able to get away with just handling a single level of not
, thereby removing a lot of the complexity in this queue. Hopefully we'll be able to invest more effort into making guards just work nicely out of the box, instead of implementors having to go through these hoops themselves.
# Python 2 and 3 versions of the analysis. | ||
# We comment it out until we have resoved the issue. | ||
# | ||
# def issue1143(expr, param=[]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AHA, so #1143 is the reason we want to have that guard 🤔
python/ql/src/semmle/python/functions/ModificationOfParameterWithDefault.qll
Outdated
Show resolved
Hide resolved
# OK | ||
def sanitizer(l = []): | ||
if l: | ||
l.append(1) | ||
return l | ||
|
||
# OK | ||
def sanitizer_negated(l = [1]): | ||
if not l: | ||
l.append(1) | ||
return l | ||
|
||
# Not OK | ||
def sanitizer(l = []): | ||
if not l: | ||
l.append(1) #$ modification=l | ||
return l | ||
|
||
# Not OK | ||
def sanitizer_negated(l = [1]): | ||
if l: | ||
l.append(1) #$ modification=l | ||
return l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the tests for guards more complete, can we get else
branches on all of these, such as:
def sanitizer(l = []):
if l:
# OK
l.append(1)
else:
# NOT OK
l.append(1) # $ modification=l
return l
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! These highlighted the inherent problem with using BarrierGuard
s.
/** | ||
* A sanitizer guard that does not let a truthy value flow to the true branch. | ||
* | ||
* Since guards with different behaviour cannot exist on the same node, | ||
* we let all guards have the same behaviour, in the sense that they all check | ||
* the true branch. Instead, we partition guards into those that block | ||
* truthy values and those that block falsy values. | ||
*/ | ||
abstract class BlocksTruthy extends DataFlow::BarrierGuard { } | ||
|
||
/** | ||
* A sanitizer guard that does not let a falsy value flow to the true branch. | ||
* | ||
* Since guards with different behaviour cannot exist on the same node, | ||
* we let all guards have the same behaviour, in the sense that they all check | ||
* the true branch. Instead, we partition guards into those that block | ||
* truthy values and those that block falsy values. | ||
*/ | ||
abstract class BlocksFalsey extends DataFlow::BarrierGuard { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed since new barrier are introduced below in other suggestion.
/** | |
* A sanitizer guard that does not let a truthy value flow to the true branch. | |
* | |
* Since guards with different behaviour cannot exist on the same node, | |
* we let all guards have the same behaviour, in the sense that they all check | |
* the true branch. Instead, we partition guards into those that block | |
* truthy values and those that block falsy values. | |
*/ | |
abstract class BlocksTruthy extends DataFlow::BarrierGuard { } | |
/** | |
* A sanitizer guard that does not let a falsy value flow to the true branch. | |
* | |
* Since guards with different behaviour cannot exist on the same node, | |
* we let all guards have the same behaviour, in the sense that they all check | |
* the true branch. Instead, we partition guards into those that block | |
* truthy values and those that block falsy values. | |
*/ | |
abstract class BlocksFalsey extends DataFlow::BarrierGuard { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we wanted all these abstract classes as extension points for future, unknown customisations?
/** | ||
* An expression that is checked directly in an `if`, possibly with `not`, such as `if x:` or `if not x:`. | ||
*/ | ||
private class IdentityGuarded extends Expr { | ||
boolean inverted; | ||
|
||
IdentityGuarded() { | ||
this = any(If i).getTest() and | ||
inverted = false | ||
or | ||
exists(IdentityGuarded ig, UnaryExpr notExp | | ||
notExp.getOp() instanceof Not and | ||
ig = notExp and | ||
notExp.getOperand() = this | ||
| | ||
inverted = ig.isInverted().booleanNot() | ||
) | ||
} | ||
|
||
/** | ||
* Whether this guard has been inverted. For `if x:` the result is `false`, and for `if not x:` the result is `true`. | ||
*/ | ||
boolean isInverted() { result = inverted } | ||
} | ||
|
||
/** | ||
* Holds iff `guard` is checking the `Name` represented by `guarded` for truthyness. | ||
* `result` is true if the check is inverted and false if it is not. | ||
*/ | ||
boolean isIdentityGuard(DataFlow::GuardNode guard, ControlFlowNode guarded) { | ||
exists(IdentityGuarded ig | | ||
ig instanceof Name and | ||
// In `not l`, the `ControlFlowNode` for `l` is not an instance of `GuardNode`. | ||
// TODO: This is slightly naive, not handling e.g. `l or cond` correctly. | ||
// We should change it when we have a proper guards library. | ||
guard.getNode().getAChildNode*() = ig and | ||
result = ig.isInverted() and | ||
guarded.getNode() = ig | ||
) | ||
} | ||
|
||
/** | ||
* A sanitizer guard that does not let a truthy value flow to the true branch. | ||
* Based on `isIdentityGuard`, so comes with the same caveats. | ||
*/ | ||
class BlocksTruthyGuard extends BlocksTruthy { | ||
ControlFlowNode guarded; | ||
|
||
BlocksTruthyGuard() { | ||
// The raw guard is true if the value is non-empty. | ||
// We wish to send truthy falues to the false branch, | ||
// se we are looking for inverted guards. | ||
isIdentityGuard(this, guarded) = true | ||
} | ||
|
||
override predicate checks(ControlFlowNode node, boolean branch) { | ||
node = guarded and | ||
branch = true | ||
} | ||
} | ||
|
||
/** | ||
* A sanitizer guard that does not let a falsy value flow to the true branch. | ||
* Based on `isIdentityGuard`, so comes with the same caveats. | ||
*/ | ||
class BlocksFalseyGuard extends BlocksFalsey { | ||
ControlFlowNode guarded; | ||
|
||
BlocksFalseyGuard() { | ||
// The raw guard is true if the value is non-empty. | ||
// We wish to send falsy falues to the false branch, | ||
// se we are looking for guards that are not inverted. | ||
isIdentityGuard(this, guarded) = false | ||
} | ||
|
||
override predicate checks(ControlFlowNode node, boolean branch) { | ||
node = guarded and | ||
branch = true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline
/** | |
* An expression that is checked directly in an `if`, possibly with `not`, such as `if x:` or `if not x:`. | |
*/ | |
private class IdentityGuarded extends Expr { | |
boolean inverted; | |
IdentityGuarded() { | |
this = any(If i).getTest() and | |
inverted = false | |
or | |
exists(IdentityGuarded ig, UnaryExpr notExp | | |
notExp.getOp() instanceof Not and | |
ig = notExp and | |
notExp.getOperand() = this | |
| | |
inverted = ig.isInverted().booleanNot() | |
) | |
} | |
/** | |
* Whether this guard has been inverted. For `if x:` the result is `false`, and for `if not x:` the result is `true`. | |
*/ | |
boolean isInverted() { result = inverted } | |
} | |
/** | |
* Holds iff `guard` is checking the `Name` represented by `guarded` for truthyness. | |
* `result` is true if the check is inverted and false if it is not. | |
*/ | |
boolean isIdentityGuard(DataFlow::GuardNode guard, ControlFlowNode guarded) { | |
exists(IdentityGuarded ig | | |
ig instanceof Name and | |
// In `not l`, the `ControlFlowNode` for `l` is not an instance of `GuardNode`. | |
// TODO: This is slightly naive, not handling e.g. `l or cond` correctly. | |
// We should change it when we have a proper guards library. | |
guard.getNode().getAChildNode*() = ig and | |
result = ig.isInverted() and | |
guarded.getNode() = ig | |
) | |
} | |
/** | |
* A sanitizer guard that does not let a truthy value flow to the true branch. | |
* Based on `isIdentityGuard`, so comes with the same caveats. | |
*/ | |
class BlocksTruthyGuard extends BlocksTruthy { | |
ControlFlowNode guarded; | |
BlocksTruthyGuard() { | |
// The raw guard is true if the value is non-empty. | |
// We wish to send truthy falues to the false branch, | |
// se we are looking for inverted guards. | |
isIdentityGuard(this, guarded) = true | |
} | |
override predicate checks(ControlFlowNode node, boolean branch) { | |
node = guarded and | |
branch = true | |
} | |
} | |
/** | |
* A sanitizer guard that does not let a falsy value flow to the true branch. | |
* Based on `isIdentityGuard`, so comes with the same caveats. | |
*/ | |
class BlocksFalseyGuard extends BlocksFalsey { | |
ControlFlowNode guarded; | |
BlocksFalseyGuard() { | |
// The raw guard is true if the value is non-empty. | |
// We wish to send falsy falues to the false branch, | |
// se we are looking for guards that are not inverted. | |
isIdentityGuard(this, guarded) = false | |
} | |
override predicate checks(ControlFlowNode node, boolean branch) { | |
node = guarded and | |
branch = true | |
} | |
} | |
/** | |
* A sanitizer guard that does not let a truthy value flow to the true branch. | |
* | |
* Blocks flow of `x` in the true branch in the example below. | |
* ```py | |
* if x: | |
* x.append(42) | |
* ``` | |
* | |
* Implementation note: Since guards with different behaviour cannot exist on the same | |
* node, we let all guards have the same behaviour, in the sense that they all check | |
* the true branch. Instead, we partition guards into those that block truthy values | |
* and those that block falsy values. | |
*/ | |
class BlocksTruthy extends DataFlow::BarrierGuard { | |
BlocksTruthy() { | |
this instanceof NameNode | |
} | |
override predicate checks(ControlFlowNode node, boolean branch) { | |
node = this and | |
branch = true | |
} | |
} | |
/** | |
* A sanitizer guard that does not let a falsey value flow to the true branch. | |
* | |
* Blocks flow of `x` in the true branch in the example below. | |
* ```py | |
* if not x: | |
* x.append(42) | |
* ``` | |
* | |
* Implementation note: Since guards with different behaviour cannot exist on the same | |
* node, we let all guards have the same behaviour, in the sense that they all check | |
* the true branch. Instead, we partition guards into those that block truthy values | |
* and those that block falsy values. | |
*/ | |
class BlocksFalsey extends DataFlow::BarrierGuard { | |
NameNode guarded; | |
BlocksFalsey() { | |
guarded = this.(UnaryExprNode).getOperand() and | |
this.(UnaryExprNode).getNode().getOp() instanceof Not | |
} | |
override predicate checks(ControlFlowNode node, boolean branch) { | |
node = guarded and | |
branch = true | |
} | |
} |
/** | ||
* A sanitizer for detecting modifications of a parameters default value. | ||
*/ | ||
abstract class Barrier extends DataFlow::Node { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is not used in the configuration anymore, so let's just remove it
/** | |
* A sanitizer for detecting modifications of a parameters default value. | |
*/ | |
abstract class Barrier extends DataFlow::Node { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we wanted all these abstract classes as extension points for future, unknown customisations?
…ithDefault.qll Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
They are simply not right for this problem. We should not even make them available as an extension point.
I have moved away from |
…fication-of-default-value To get the subpaths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks ok. It's not perfect that our data-flow library is not powerful enough to handle the guards nicely out of the box, but I also don't think it's worth spending too much more time on this.
A few minor comments, otherwise looks ready to be merged from my point of view.
python/ql/src/semmle/python/functions/ModificationOfParameterWithDefaultCustomizations.qll
Outdated
Show resolved
Hide resolved
DataFlow::GuardNode guard; | ||
NameNode guarded; | ||
boolean branch; | ||
boolean truthy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's only the field truthy
that is used by subclasses, can you please move the other field to an exists
? (to get more idiomatic CodeQL code) 😊
I don't see how this should change the behavior of the code, but do let me know if I'm wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Analysis of results reproduced here for convenience: For the 15 unclassified: For the 6 classified ones: So how were these results filtered out before? The previous query would only include the forms |
…ithDefaultCustomizations.qll Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
I guess the question is if we should elide these new sources of modifications or not... |
You mean sinks of modification right? (bad dad joke laugh) Anyway, sounds good if they are "intended positives". I think the new additional places we detect modifications are good. The one in |
I guess there is a pattern where you populate the default value before you use it. If all modifications are idempotent and come before all reads, is that simply safe? |
Sounds reasonable to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance looks fine 👍
Only missing bit is sanitizers. @tausbn mentioned comparing against the empty list. I could add that, but I am not really sure that is a recommended practice. On the other hand, we do not want a slew of false positives from missing that. Let me know.