Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: Allow data flow through varargs parameters #11732

Merged
merged 16 commits into from
May 11, 2023

Conversation

owen-mc
Copy link
Contributor

@owen-mc owen-mc commented Dec 19, 2022

@owen-mc owen-mc force-pushed the go/fix/model-data-flow-through-varargs branch 2 times, most recently from 02de827 to 79ad9ff Compare January 4, 2023 16:01
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor niggles, plus the bot is right about spelling

arg.asExpr() =
c.getCall().getImplicitVarargsArgument(index - c.getTarget().getNumParameter() + 1)
|
result.(DataFlow::PostUpdateNode).getPreUpdateNode() = arg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now identical to line 305, push the disjunction inside the exists at line 300

ArgumentNode() { this = getArgument(c, i) }
ArgumentNode() {
exists(CallExpr call | call = c.getCall() |
this = getArgument(c, i) and not this.asExpr() = call.getImplicitVarargsArgument(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is and not this.asExpr() = call.getImplicitVarargsArgument(_) ever false? The implicit arg is wrapped into a slice, and therefore shouldn't be a direct argument per getArgument?

Copy link
Contributor Author

@owen-mc owen-mc Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that piece of code can be false. There is some ambiguity about whether arguments to a variadic parameter (which aren't slices with ... after them) should be called arguments or not. Currently they are returned by getArgument on CallExpr and getArgument on CallNode, but they aren't given argument nodes. I admit this is confusing, and I'm happy to change it for the sake of clarity, though it's unclear exactly what to change it to.

@owen-mc owen-mc force-pushed the go/fix/model-data-flow-through-varargs branch from e410bcf to dd09cc0 Compare January 4, 2023 21:55
@owen-mc owen-mc marked this pull request as draft January 21, 2023 08:31
@owen-mc owen-mc force-pushed the go/fix/model-data-flow-through-varargs branch from 2b321be to a363128 Compare February 11, 2023 14:02
@owen-mc owen-mc force-pushed the go/fix/model-data-flow-through-varargs branch from f117d7a to 8f14cc9 Compare April 20, 2023 16:06
@owen-mc
Copy link
Contributor Author

owen-mc commented Apr 20, 2023

I have fixed most of the test failures. I believe there are only two left now, which will need closer examination. It is possible that some of the changes I made in early commits are misguided, since the implementation has changed since then, so I intend to go through those in more detail before this is ready to review.

We don't want a post update node for the implicit varargs slice, and we
do want one for each argument which is stored in the implicit varargs
slice.
@owen-mc owen-mc force-pushed the go/fix/model-data-flow-through-varargs branch from 8f14cc9 to 4e50d51 Compare April 25, 2023 06:33
@owen-mc owen-mc marked this pull request as ready for review April 25, 2023 06:34
@owen-mc
Copy link
Contributor Author

owen-mc commented Apr 25, 2023

I have fixed the remaining test failures, and cleaned up the commit history so that it makes sense. This is ready for review. I should update the change note to acknowledge more changes that might affect user code, most notably the change to CallNode.getArgument and the new predicate CallNode.getSyntacticArgument which replicates the old behaviour. But maybe that is best done after this PR has been reviewed.

@@ -74,7 +74,17 @@ private class ParameterInput extends FunctionInput, TInParameter {

override predicate isParameter(int i) { i = index }

override DataFlow::Node getEntryNode(DataFlow::CallNode c) { result = c.getArgument(index) }
override DataFlow::Node getEntryNode(DataFlow::CallNode c) {
result = getArgument(c, index) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally adding method receivers through use of getArgument not c.getArgument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that was a copy and paste error 😬

override DataFlow::Node getEntryNode(DataFlow::CallNode c) { result = c.getArgument(index) }
override DataFlow::Node getEntryNode(DataFlow::CallNode c) {
result = getArgument(c, index) and
not (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all this be replaced by getSyntacticArgument?

@@ -299,8 +309,13 @@ private class OutParameter extends FunctionOutput, TOutParameter {
override DataFlow::Node getExitNode(DataFlow::CallNode c) {
exists(DataFlow::Node arg |
arg = getArgument(c, index) and
// exclude slices passed to varargs parameters using `...` calls
not (c.hasEllipsis() and index = c.getNumArgument() - 1)
not (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, except apparently we want to exclude explicit varargs arrays, so syntactic argument except a ... expression?

@@ -10,6 +10,7 @@ private newtype TNode =
MkInstructionNode(IR::Instruction insn) or
MkSsaNode(SsaDefinition ssa) or
MkGlobalFunctionNode(Function f) or
MkImplicitVarargsSlice(CallExpr c) { c.getTarget().isVariadic() and not c.hasEllipsis() } or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull out c.getTarget().isVariadic() and not c.hasEllipsis() as CallExpr.hasImplicitVarargs, and perhaps CallNode.hasImplicitVarargs?

@@ -519,12 +550,62 @@ module Public {
)
}

/**
* Gets the data flow node corresponding to an argument of this call, where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Gets the data flow node corresponding to an argument of this call, where
* Gets a data flow node corresponding to an argument of this call, where

* For calls of the form `f(g())` where `g` has multiple results, the arguments of the call to
* `i` are the (implicit) element extraction nodes for the call to `g`.
*
* For calls to variadic functions without an ellipsis (`...`), there is a single argument of type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For calls to variadic functions without an ellipsis (`...`), there is a single argument of type
* For calls to variadic functions without an ellipsis (`...`), there is a single argument of CodeQL type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded it in a different way

*
* For calls to variadic functions without an ellipsis (`...`), there is a single argument of type
* `ImplicitVarargsSlice` corresponding to the variadic parameter. This is in contrast to the member
* predicate `getArgument` on `CallExpr`, which gets the syntactic arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* predicate `getArgument` on `CallExpr`, which gets the syntactic arguments.
* predicate `getArgument` on `CallExpr`, which gets the syntactic arguments, or similarly `CallNode.getSyntacticArgument`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded it in a different way

Comment on lines 575 to 587
exists(SignatureType t, int lastParamIndex |
t = this.getACalleeIncludingExternals().getType() and
lastParamIndex = t.getNumParameter() - 1
|
if
not this.hasEllipsis() and
t.isVariadic() and
i >= lastParamIndex
then
result.(ImplicitVarargsSlice).getCallNode() = this and
i = lastParamIndex
else result = this.getSyntacticArgument(i)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exists(SignatureType t, int lastParamIndex |
t = this.getACalleeIncludingExternals().getType() and
lastParamIndex = t.getNumParameter() - 1
|
if
not this.hasEllipsis() and
t.isVariadic() and
i >= lastParamIndex
then
result.(ImplicitVarargsSlice).getCallNode() = this and
i = lastParamIndex
else result = this.getSyntacticArgument(i)
)
result = this.getSyntacticArgument(i) and
not result = this.getImplicitVarargsArgument(_)
or
i = this.getACalleeIncludingExternals().getType().getNumParameter() - 1 and
result.(ImplicitVarargsSlice).getCallNode() = this

It now has one only result corresponding to a variadic parameter. If the
argument is followed by an ellipsis then it is just the argument itself.
Otherwise it is a ImplicitVarargsSlice node.
@owen-mc owen-mc force-pushed the go/fix/model-data-flow-through-varargs branch from 4e50d51 to fb61ff1 Compare April 27, 2023 09:32
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good; there are real test failures though

@@ -857,6 +857,24 @@ class CallExpr extends CallOrConversionExpr {
/** Gets the number of argument expressions of this call. */
int getNumArgument() { result = count(this.getAnArgument()) }

/** Holds if this call is has implicit variadic arguments. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Holds if this call is has implicit variadic arguments. */
/** Holds if this call has implicit variadic arguments. */

Comment on lines 303 to 310
(
arg = c.getSyntacticArgument(index) and
// exclude slices passed to varargs parameters using `...` calls
not (c.hasEllipsis() and index = c.getNumArgument() - 1)
or
arg = c.(DataFlow::MethodCallNode).getReceiver() and
index = -1
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(
arg = c.getSyntacticArgument(index) and
// exclude slices passed to varargs parameters using `...` calls
not (c.hasEllipsis() and index = c.getNumArgument() - 1)
or
arg = c.(DataFlow::MethodCallNode).getReceiver() and
index = -1
)
arg = c.getSyntacticArgument(index) and
// exclude slices passed to varargs parameters using a `...` expression
not (c.hasEllipsis() and index = c.getNumArgument() - 1)
or
arg = c.(DataFlow::MethodCallNode).getReceiver() and
index = -1

* Returns a single `Node` corresponding to a variadic parameter. If there is no corresponding
* argument with an ellipsis (`...`), then it is a `ImplicitVarargsSlice`. This is in contrast
* to `getArgument` on `CallExpr`, which gets the syntactic arguments. Use
* `getSyntacticArgument` to get that behaviour.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `getSyntacticArgument` to get that behaviour.
* `getSyntacticArgument` to get that behavior.

*/
Node getArgument(int i) {
result = this.getSyntacticArgument(i) and
not result = this.getImplicitVarargsArgument(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
not result = this.getImplicitVarargsArgument(_)
not (this.getTarget().isVariadic() and i >= this.getTarget().getNumParameter() - 1)

sorry, should have noticed this would lead to an odd result for an explicit varargs ... expression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we should avoid using getTarget() as it doesn't exist when the called expr is a variable.

structParam246.Key, // $ untrustedFlowSource
structParam246.Value, // $ untrustedFlowSource
sink(structParam246.Key) // $ untrustedFlowSource
sink(structParam246.Value) // $ untrustedFlowSource
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
)

@owen-mc owen-mc force-pushed the go/fix/model-data-flow-through-varargs branch from fb61ff1 to aa5cdd7 Compare April 28, 2023 05:10
This avoids using `getTarget()`, so it works even when that doesn't
exist (for example when calling a variable with function type).
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small cleanups then LGTM!

---
category: minorAnalysis
---
* Fixed data flow through variadic function parameters. The arguments corresponding to a variadic parameter are no longer returned by `CallNode.getArgument(int i)` and `CallNode.getAnArgument()`, and hence aren't `ArgumentNode`s. They now have one result, which is an `ImplicitVarargsSlice` node. For example, a call `f(a b, c)` to a function `f(T...)` is treated like `f([]T{a, b, c})`. The old behaviour is preserved by `CallNode.getSyntacticArgument(int i)` and `CallNode.getASyntacticArgument()`. `CallExpr.getArgument(int i)` and `CallExpr.getAnArgument()` are unchanged, and will still have three results in the example given.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Fixed data flow through variadic function parameters. The arguments corresponding to a variadic parameter are no longer returned by `CallNode.getArgument(int i)` and `CallNode.getAnArgument()`, and hence aren't `ArgumentNode`s. They now have one result, which is an `ImplicitVarargsSlice` node. For example, a call `f(a b, c)` to a function `f(T...)` is treated like `f([]T{a, b, c})`. The old behaviour is preserved by `CallNode.getSyntacticArgument(int i)` and `CallNode.getASyntacticArgument()`. `CallExpr.getArgument(int i)` and `CallExpr.getAnArgument()` are unchanged, and will still have three results in the example given.
* Fixed data flow through variadic function parameters. The arguments corresponding to a variadic parameter are no longer returned by `CallNode.getArgument(int i)` and `CallNode.getAnArgument()`, and hence aren't `ArgumentNode`s. They now have one result, which is an `ImplicitVarargsSlice` node. For example, a call `f(a, b, c)` to a function `f(T...)` is treated like `f([]T{a, b, c})`. The old behaviour is preserved by `CallNode.getSyntacticArgument(int i)` and `CallNode.getASyntacticArgument()`. `CallExpr.getArgument(int i)` and `CallExpr.getAnArgument()` are unchanged, and will still have three results in the example given.

@@ -336,7 +331,7 @@ module StringOps {
formatDirective = this.getComponent(n) and
formatDirective.charAt(0) = "%" and
formatDirective.charAt(1) != "%" and
result = this.getArgument((n / 2) + f.getFirstFormattedParameterIndex())
result = this.getImplicitVarargsArgument((n / 2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = this.getImplicitVarargsArgument((n / 2))
result = this.getImplicitVarargsArgument(n / 2)

(
exists(Write w | w.writesElement(node2, _, node1))
or
node1 = node2.(ImplicitVarargsSlice).getCallNode().getImplicitVarargsArgument(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add and use getAnImplicitVarargsArgument(), like most indexing getters

(
preupd instanceof ArgumentNode and not preupd instanceof ImplicitVarargsSlice
or
preupd = any(CallNode c).getImplicitVarargsArgument(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
preupd = any(CallNode c).getImplicitVarargsArgument(_)
preupd = any(CallNode c).getAnImplicitVarargsArgument()

@@ -124,7 +124,7 @@ module Revel {
or
methodName = "RenderText" and
contentType = "text/plain" and
this = methodCall.getAnArgument()
this = methodCall.getSyntacticArgument(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this = methodCall.getSyntacticArgument(_)
this = methodCall.getASyntacticArgument()

@@ -134,40 +134,44 @@ module NetHttp {
result = call.getReceiver()
}

private class ResponseBody extends Http::ResponseBody::Range, DataFlow::ArgumentNode {
private class ResponseBody extends Http::ResponseBody::Range, DataFlow::Node {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private class ResponseBody extends Http::ResponseBody::Range, DataFlow::Node {
private class ResponseBody extends Http::ResponseBody::Range {

Because that class already derives from Node

Comment on lines 141 to 174
this = any(DataFlow::CallNode call).getASyntacticArgument() and
(
exists(DataFlow::CallNode call |
// A direct call to ResponseWriter.Write, conveying taint from the argument to the receiver
call.getTarget().(Method).implements("net/http", "ResponseWriter", "Write") and
this = call.getArgument(0) and
responseWriter = call.(DataFlow::MethodCallNode).getReceiver()
)
or
exists(TaintTracking::FunctionModel model |
// A modeled function conveying taint from some input to the response writer,
// e.g. `io.Copy(responseWriter, someTaintedReader)`
model.taintStep(this, responseWriter) and
responseWriter.getType().implements("net/http", "ResponseWriter")
)
or
exists(
SummarizedCallable callable, DataFlow::CallNode call, SummaryComponentStack input,
SummaryComponentStack output
|
callable = call.getACalleeIncludingExternals() and
callable.propagatesFlow(input, output, _)
|
// A modeled function conveying taint from some input to the response writer,
// e.g. `io.Copy(responseWriter, someTaintedReader)`
// NB. SummarizedCallables do not implement a direct call-site-crossing flow step; instead
// they are implemented by a function body with internal dataflow nodes, so we mimic the
// one-step style for the particular case of taint propagation direct from an argument or receiver
// to another argument, receiver or return value, matching the behavior for a `TaintTracking::FunctionModel`.
this = getSummaryInputOrOutputNode(call, input) and
responseWriter.(DataFlow::PostUpdateNode).getPreUpdateNode() =
getSummaryInputOrOutputNode(call, output) and
responseWriter.getType().implements("net/http", "ResponseWriter")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this = any(DataFlow::CallNode call).getASyntacticArgument() and
(
exists(DataFlow::CallNode call |
// A direct call to ResponseWriter.Write, conveying taint from the argument to the receiver
call.getTarget().(Method).implements("net/http", "ResponseWriter", "Write") and
this = call.getArgument(0) and
responseWriter = call.(DataFlow::MethodCallNode).getReceiver()
)
or
exists(TaintTracking::FunctionModel model |
// A modeled function conveying taint from some input to the response writer,
// e.g. `io.Copy(responseWriter, someTaintedReader)`
model.taintStep(this, responseWriter) and
responseWriter.getType().implements("net/http", "ResponseWriter")
)
or
exists(
SummarizedCallable callable, DataFlow::CallNode call, SummaryComponentStack input,
SummaryComponentStack output
|
callable = call.getACalleeIncludingExternals() and
callable.propagatesFlow(input, output, _)
|
// A modeled function conveying taint from some input to the response writer,
// e.g. `io.Copy(responseWriter, someTaintedReader)`
// NB. SummarizedCallables do not implement a direct call-site-crossing flow step; instead
// they are implemented by a function body with internal dataflow nodes, so we mimic the
// one-step style for the particular case of taint propagation direct from an argument or receiver
// to another argument, receiver or return value, matching the behavior for a `TaintTracking::FunctionModel`.
this = getSummaryInputOrOutputNode(call, input) and
responseWriter.(DataFlow::PostUpdateNode).getPreUpdateNode() =
getSummaryInputOrOutputNode(call, output) and
responseWriter.getType().implements("net/http", "ResponseWriter")
)
exists(DataFlow::CallNode call |
// A direct call to ResponseWriter.Write, conveying taint from the argument to the receiver
call.getTarget().(Method).implements("net/http", "ResponseWriter", "Write") and
this = call.getArgument(0) and
responseWriter = call.(DataFlow::MethodCallNode).getReceiver()
)
or
exists(TaintTracking::FunctionModel model |
// A modeled function conveying taint from some input to the response writer,
// e.g. `io.Copy(responseWriter, someTaintedReader)`
this = model.getACall().getASyntacticArgument() and
model.taintStep(this, responseWriter) and
responseWriter.getType().implements("net/http", "ResponseWriter")
)
or
exists(
SummarizedCallable callable, DataFlow::CallNode call, SummaryComponentStack input,
SummaryComponentStack output
|
this = call.getASyntacticArgument() and
callable = call.getACalleeIncludingExternals() and
callable.propagatesFlow(input, output, _)
|
// A modeled function conveying taint from some input to the response writer,
// e.g. `io.Copy(responseWriter, someTaintedReader)`
// NB. SummarizedCallables do not implement a direct call-site-crossing flow step; instead
// they are implemented by a function body with internal dataflow nodes, so we mimic the
// one-step style for the particular case of taint propagation direct from an argument or receiver
// to another argument, receiver or return value, matching the behavior for a `TaintTracking::FunctionModel`.
this = getSummaryInputOrOutputNode(call, input) and
responseWriter.(DataFlow::PostUpdateNode).getPreUpdateNode() =
getSummaryInputOrOutputNode(call, output) and
responseWriter.getType().implements("net/http", "ResponseWriter")

I appreciate it's more verbose, but it's less of an odd-looking constraint, and hopefully less of a temptation to the join-orderer too!

@owen-mc owen-mc force-pushed the go/fix/model-data-flow-through-varargs branch from a59e7a3 to 1c66564 Compare May 10, 2023 13:05
@smowton
Copy link
Contributor

smowton commented May 10, 2023

I'd feel free to ignore the weird internal failure on the macos job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants