-
Notifications
You must be signed in to change notification settings - Fork 127
Conversation
I added further tests including ones exhibiting inaccuracy of the current implementation. TODO:
|
I also note a search of Github suggests this is now the only user of MarshalOptions.MarshalState that exists (the API was only introduced earlier this year) |
TODOs done apart from the change-note, which I'll write last so as not to forget to update it. I gave up on modelling UnmarshalOptions.UnmarshalState, as that would need to propagate taint like
That would need us to work backwards from the input struct to its field, and then from |
But note that github code search is pretty unreliable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review with a few simple suggestions; will continue later.
ql/src/semmle/go/dataflow/SSA.qll
Outdated
class SsaWithFields extends SsaWithFieldsAndElements { | ||
SsaWithFields() { | ||
this = TRoot(_) or | ||
exists(SsaWithFieldsAndElements base | this = TFieldStep(base, _)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exists(SsaWithFieldsAndElements base | this = TFieldStep(base, _)) | |
exists(SsaWithFields base | this = TFieldStep(base, _)) |
} | ||
|
||
/** | ||
* Additional taint-flow step modelling flow from MarshalInput.Message to MarshalOutput, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Additional taint-flow step modelling flow from MarshalInput.Message to MarshalOutput, | |
* Additional taint-flow step modelling flow from `MarshalInput.Message` to `MarshalOutput`, |
|
||
/** | ||
* Additional taint-flow step modelling flow from MarshalInput.Message to MarshalOutput, | ||
* mediated by a MarshalOptions.MarshalState call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* mediated by a MarshalOptions.MarshalState call. | |
* mediated by a `MarshalOptions.MarshalState` call. |
* Additional taint-flow step modelling flow from MarshalInput.Message to MarshalOutput, | ||
* mediated by a MarshalOptions.MarshalState call. | ||
* | ||
* Note we can taint the whole MarshalOutput as it only has one field (Buf), and taint- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Note we can taint the whole MarshalOutput as it only has one field (Buf), and taint- | |
* Note we can taint the whole `MarshalOutput` as it only has one field (`Buf`), and taint- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you didn't know, you can click the + for comments and drag to make a multi-line comment or suggestion.
passedMarshalInput.asExpr().getGlobalValueNumber() = | ||
marshalInput.asExpr().getGlobalValueNumber() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passedMarshalInput.asExpr().getGlobalValueNumber() = | |
marshalInput.asExpr().getGlobalValueNumber() and | |
globalValueNumber(passedMarshalInput) = globalValueNumber(marshalInput) and |
Alternatively, you can replace the single use of passedMarshalInput
with globalValueNumber(marshalInput).getANode()
.
succ.(DataFlow::PostUpdateNode).getPreUpdateNode().asInstruction() = | ||
accessPath.getBaseVariable().getAUse() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
succ.(DataFlow::PostUpdateNode).getPreUpdateNode().asInstruction() = | |
accessPath.getBaseVariable().getAUse() | |
succ = DataFlow::ssaNode(accessPath.getBaseVariable()) |
I seem to vaguely remember that we don't do field-sensitive propagation through taint, only through plain data flow. In other words, if we know that there is data flow between |
Distcompare result was unremarkable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rather belated review. (Sorry!)
ql/src/semmle/go/dataflow/SSA.qll
Outdated
TFieldStep(SsaWithFieldsAndElements base, Field f) { exists(accessPathFieldAux(base, f)) } or | ||
TElementStep(SsaWithFieldsAndElements base, IR::Instruction idx) { | ||
exists(accessPathElementAux(base, idx)) | ||
} | ||
|
||
/** | ||
* Gets a representation of `nd` as an ssa-with-fields value if there is one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this comment was out of date, can you fix it?
ql/src/semmle/go/dataflow/SSA.qll
Outdated
/** | ||
* Gets the SSA variable corresponding to the base of this SSA variable with fields. | ||
* Gets the SSA variable corresponding to the base of this SSA variable with fields and/or array elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not also include maps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I forgot they were a first-class type. Looks like the Go implementation does indeed produce maps too -- I'll add this.
ql/src/semmle/go/dataflow/SSA.qll
Outdated
TRoot(SsaVariable v) or | ||
TStep(SsaWithFields base, Field f) { exists(accessPathAux(base, f)) } | ||
TFieldStep(SsaWithFieldsAndElements base, Field f) { exists(accessPathFieldAux(base, f)) } or | ||
TElementStep(SsaWithFieldsAndElements base, IR::Instruction idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use GVNs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean as an alternative to IR::Instruction
as the representation of the index? That does sound better, I was just worried that because GVN analysis is based on the SSA representation this would invert the pass order.
ql/src/semmle/go/dataflow/SSA.qll
Outdated
exists(Field f | this = TFieldStep(_, f) | result = f.getType()) | ||
or | ||
exists(SsaWithFieldsAndElements base | this = TElementStep(base, _) | | ||
result = base.getType().getUnderlyingType().(ArrayType).getElementType() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also deal with slices and maps, as well as maybe strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slices and maps agreed -- I'll do both.
* Additional taint-flow step modelling flow from MarshalInput.Message to MarshalOutput, | ||
* mediated by a MarshalOptions.MarshalState call. | ||
* | ||
* Note we can taint the whole MarshalOutput as it only has one field (Buf), and taint- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you didn't know, you can click the + for comments and drag to make a multi-line comment or suggestion.
// pred -> marshalInput.Message | ||
any(DataFlow::Write w) | ||
.writesField(marshalInput.(DataFlow::PostUpdateNode).getPreUpdateNode(), | ||
inputMessageField(), pred) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to just make any write to MarshalInput.Message
taint the entire struct? I feel like that would reduce the complexity of this quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is probably overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me, have we checked that this is sensible in practice, i.e., if one field is tainted then so are the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case the fields are an incoming byte buffer, the message to be serialized into it, and some flags, so I don't think there's a strong chance of a false positive.
@@ -0,0 +1,129 @@ | |||
package main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the tests use something like the comment tests from here and it's corresponding ql? I think it's something we should move to.
I'm also happy to do it myself if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sounds like a good step, I'll have a go
*/ | ||
private class WriteMessageFieldStep extends TaintTracking::AdditionalTaintStep { | ||
override predicate step(DataFlow::Node pred, DataFlow::Node succ) { | ||
exists(SsaWithFieldsAndElements accessPath | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I think SsaWithFieldsAndElements
might be useful elsewhere, I think it might also be useful to have a superclass for instructions / nodes / exprs that have a "base" so you can just use .getBase()+
here.
Pushed fixes for the simple review comments, will get to Sauyon's improvements this afternoon. |
@max-schaefer @sauyon I've pushed the two big changes: support for maps, and using getBase() instead of SsaWithFieldsAndElements, as the top two commits. If you like the getBase solution better I'll drop the SsaWithFieldsAndElements commit. |
Yes, I much prefer it. I hadn't realised (or perhaps had forgotten) that we weren't matching up the series of field and element accesses between write and read anyway, so there really isn't much point to capturing it in a special data structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more suggestions. IIUC, moving away from SsaWithFieldsAndElements
should simplify this considerably, which I look forward to.
/** | ||
* A data-flow node that reads the value of a field from a struct, or an element from an array, slice, map or string. | ||
*/ | ||
abstract class ReadFromAggregateNode extends ReadNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to introducing this.
Inevitable naming bikeshed: I don't think "aggregate" is a common term in Go (it's not mentioned in the language spec), and of course in QL aggregates are something completely different. How about using "component" as an umbrella term for "element" and "field", and then renaming this class to ComponentReadNode
?
Also, I would strongly advise not making this abstract
but instead going the slightly more convoluted route of introducing corresponding IR instructions and AST node types and overriding the type of insn
. This may seem like overkill, but you'll thank me eventually.
On a similar note, instead of making getBase()
abstract (meaning that every subclass has to override it), implement it as none()
.
* | ||
* For example, in the expression a.b[c].d[e], this would return the dataflow node for the read from `a`. | ||
*/ | ||
Node getUnderlyingNode(ReadNode read) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this is currently only used in Protobuf.qll
, so perhaps just move it there?
|
||
private Field inputMessageField() { | ||
result | ||
.hasQualifiedName("google.golang.org/protobuf/runtime/protoiface", "MarshalInput", "Message") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to pull out the very long name of this package into a helper predicate to avoid repetition and line wrapping?
// pred -> marshalInput.Message | ||
any(DataFlow::Write w) | ||
.writesField(marshalInput.(DataFlow::PostUpdateNode).getPreUpdateNode(), | ||
inputMessageField(), pred) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me, have we checked that this is sensible in practice, i.e., if one field is tainted then so are the rest?
*/ | ||
private class WriteMessageFieldStep extends TaintTracking::AdditionalTaintStep { | ||
override predicate step(DataFlow::Node pred, DataFlow::Node succ) { | ||
exists(DataFlow::ReadNode base | succ = DataFlow::getUnderlyingNode(base) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this do the right thing? Shouldn't succ
rather be the post-update node whose pre-update is the base of the write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the post-update node stuff loses you cases like x[0].y = value
, because x
does not have a post-update node. What do you recommend here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, hm, I see. But what's the use of having a step from value
to x
in this example? There isn't any outgoing flow from x
, is there? So wouldn't the flow get stuck?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, looks like I had a pair of nearly-cancelling bugs -- the default taint rule for derefs was conveying the taint from the operand of an implicit deref (here x
) back out to the deref (implicit *x
) and then to the post-update node of x
-- which did exist because of the implicit deref operation.
Only for complex cases I was missing the post-update node's existence because I was failing to account for implicit derefs happening during x[0].y
.
I've now fixed that omission and introduced a cast to PostUpdateNode as expected. This means that this case works, due to an implicit deref of x
at the top level:
x := &SomeType{}
x.y[z].w[v].a = getTaint()
But this case doesn't, as x
doesn't have a post-update node:
x := SomeType{}
x.y[z].w[v].a = getTaint()
or | ||
exists(DataFlow::ReadNode base | succ = DataFlow::getUnderlyingNode(base) | | ||
any(DataFlow::Write w).writesElement(base, _, pred) and | ||
[succ.getType(), succ.getType().getPointerType()] instanceof MessageType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we might need a type that allows us to uniformly treat element/field writes as well as reads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combined these by introducing writesComponent
@max-schaefer most suggestions implemented except as noted |
@max-schaefer pushed a fix to the post-update node problems, and an extra probably controversial commit that bypasses |
extractor/dbscheme/tables.go
Outdated
@@ -165,11 +165,14 @@ var CompositeLitExpr = ExprKind.NewBranch("@compositelit") | |||
// ParenExpr is the type of parenthesis expression AST nodes | |||
var ParenExpr = ExprKind.NewBranch("@parenexpr") | |||
|
|||
// ComponentAccessExpr is the type of field- or element-access nodes | |||
var ComponentAccessExpr = NewUnionType("@componentaccessexpr", NodeType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I didn't mean to suggest that you should do a dbscheme update. That's overkill, and also not the right level of abstraction since a @selectorexpr
can be any number of things, not just a field access.
I meant to suggest that you should do this at the QL level:
class ComponentAccess extends Expr {
ComponentAccess() {
this instanceof @indexexpr or
// slightly more complicated reasoning about selectors omitted
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted the DB and Expr parts of this
* This is a copy of the logic in `DataFlow::PostUpdateNode`, with the restriction to particular use contexts loosened | ||
* to include any underlying target of a message update. | ||
*/ | ||
DataFlow::Node getPreUpdateNode(DataFlow::Node node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would support extending PostUpdateNode
s to do this always. For example, in an assignment
x.y.z = 0
I think it makes sense to consider that both x
and x.y
have been changed, and hence should have post-update nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is a somewhat subtle change and would need careful benchmarking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will certainly have unforeseen consequences. Let's do it, but as its own subsequent PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about trying it out in a separate PR first? This approach is so ugly that I only want to accept it into main
once alternatives have been tried and found wanting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped this commit for now
…s taint-flow cases Only some of the cases are currently working.
This should be either refined to just Message types, or else a macro taint step should be added conducting taint from field-write-of-argument to Marshal's result. On the read-side we're currently fine: the bytes are tainted, so the object is tainted, so the field reads are tainted.
This is top-level, not a member.
The MarshalState test doesn't work yet, because we don't know to read taint from the Message field of the input or write it to the Buf field of the output
Currently relies on blanket field-write propagation.
Support for UnmarshalOptions.UnmarshalState is dropped for now as too hard to model.
I don't think we need these when we have the end-to-end taintFlows test.
No functional changes
634dbdf
to
d4b7129
Compare
@max-schaefer dropped the post-update node changes to pursue in a future PR, and reverted the DB and Expr.qll changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is looking quite reasonable now. I think it would be worth doing a dist-compare, though, just to make sure.
* Gets the base of `node`, looking through any dereference node found. | ||
*/ | ||
DataFlow::Node getBaseLookingThroughDerefs(DataFlow::ComponentReadNode node) { | ||
if node.getBase() instanceof DataFlow::PointerDereferenceNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps pull node.getBase()
out into a variable. While it may not affect performance much in this case, it's generally best to keep if
conditions as simple as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote to not use if
in that case
/** | ||
* Gets the base of `node`, looking through any dereference node found. | ||
*/ | ||
DataFlow::Node getBaseLookingThroughDerefs(DataFlow::ComponentReadNode node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps would could make this predicate private?
marshalStateCall = marshalStateMethod().getACall() and | ||
// pred -> marshalInput.Message | ||
any(DataFlow::Write w) | ||
.writesField(marshalInput.(DataFlow::PostUpdateNode).getPreUpdateNode(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be easier to just declare marshalInput
as a PostUpdateNode
? Then you don't need the cast.
…ritten For example, writing to a[b].c[d] taints 'a'.
This is the union of a field-access and an element-access instruction
This is its only user for now.
…chain of field- and element-access instructions. This enables us to use PostUpdateNode properly. Also introduce a test showing a case where this doesn't work, because the underlying variable doesn't have a post-update node.
No behavioural changes
@max-schaefer changes applied, running a distcompare now |
Distcompare results are unremarkable; think this is ready to merge. |
This isn't quite done yet, but it's close enough that I'd appreciate feedback on the trickier points here.
Problems to overcome:
msg.SubMessage.RepeatedMessage[2].Description = somethingNasty()
)Merge
andClone
, which should copy taint from corresponding field to fieldMarshalOptions.MarshalState
, which takes its message from and returns its buffer in a struct fieldCurrent approach:
Possible changes:
similar()
method will currently conflate access paths using differing indices. In that case we need a different way to walk down an access path consisting of field-read and element-read operations to taint the bottom Message in the pile.