-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Ruby: Document flow summary syntax #10899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@hmac Could you split this PR? Documentation changes for the GA should go into If you think the doc changes are not fit for the GA release, then please retarget this PR to |
I don't know why this has a GA label - these are internal docs (and also this PR is unfinished). I've removed it. |
Ok great! I mistakenly labelled it as GA, sorry for that. |
edadd59
to
9776dfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the excellent write-up.
ruby/ql/docs/flow_summaries.md
Outdated
|
||
`input` and `output` define the "from" and "to" locations in the flow summary. | ||
They use a custom string-based syntax which is similar but not identical to | ||
Models as Data. These strings are often referred to as access paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These strings should be identical to the input
and output
string in MaD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, MaD access paths map to API graph paths wheras flow summary paths map to dataflow nodes, and so the tokens supported by each can differ. This slack thread highlights one difference. Another example I've just spotted is that MaD has no support for the hash-splat
specifier in Argument
tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see what you mean. However, I think of MaD collectively as all the components that make up a summary; but you are right that the access path component (which identifies the relevant calls) maps to API graphs, where as the input/output components have their own (but similar) syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should call out the difference with the MaD path
column just now (called the "identifying access path"). We can elaborate on this when documenting the MaD format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've slightly reworded this sentence, but I want to retain the reference to MaD because I think it's useful to the reader. Putting myself in the shoes of a newcomer to the codebase: if I've encountered MaD already then this helps provide context and a point of comparison, and if I haven't seen MaD yet then this lets me know that it exists and I can choose to explore more by reading the MaD documentation if I want (and if/when it exists).
Any keyword argument to the call. | ||
|
||
#### `hash-splat` | ||
The special "hash splat" argument/parameter, which is written as `**args`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will also include all explicit keyword arguments, wrapped in an implicit hash splat argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit that elaborates on this.
`ReturnValue` refers to the return value of the element identified in the | ||
preceding access path. For example, `Argument[0].ReturnValue` refers to the | ||
return value of the first argument. Of course this only makes sense if the first | ||
argument is a callable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe callback
instead of callable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I wrote callable
to mean that, of course, it makes no sense to use ReturnValue
on an object. Instead it must be a proc, lambda or block. I guess depending on how you define "callback", then these would all be considered callbacks. Ruby doesn't tend to use that terminology, though, and in cases like arr.detect(&:even?)
I would probably not call &:even?
a callback. 🤷
Would it improve things if I change this to say if the first argument is callable - i.e. it is a proc, lambda or block.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
ruby/ql/docs/flow_summaries.md
Outdated
TODO: I've not seen this component used in an output path; I don't know if it makes | ||
sense to do so, or what meaning it would have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't make sense to use in ouput.
ruby/ql/docs/flow_summaries.md
Outdated
TODO: I've not seen this component used in an output path; I don't know if it makes | ||
sense to do so, or what meaning it would have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, doesn't make sense.
ruby/ql/docs/flow_summaries.md
Outdated
``` | ||
|
||
any data in any index of the first argument will be copied to the return value, | ||
with the exception of data at index 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A prominent use case of WithoutElement
is when you want to model a method that removes (a set of) element(s) from a collection. For example, if we want to model a method that removes the first element of the receiver collection, we can do so via
input = "Argument[self].WithoutElement[0]" and
output = "Argument[self]"
where both the input and the output refer to the same argument. The data-flow library will then prevent use-use flow from any corresponding receivers:
a[0] = taint
a[1] = taint
a.remove_first # use-use flow from `a` to the `a` below will be blocked, but there will still be flow from `[post-update] a` to `a` below
sink(a[0]) # good
sink(a[1]) # bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've written this example into the docs. If you get time, it would be great to have an example of where you should use .WithoutElement[...].WithElement[...]
. I've tried removing instances of this pattern from every flow summary and all tests still pass, which leads me to believe that it is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good example of this is something like Hash#except
. If we see a call like hash.except(:foo)
, we would like that to map to a flow summary with
input = Argument[self].WithoutElement[":foo"].WithElement[any]
output = ReturnValue
I.e. data can flow out of the call, but only when it is stored in some element that is not known to be at index :foo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure that WithElement[any]
is necessary. Test 53 has an example:
private class S53 extends SimpleSummarizedCallable {
S53() { this = "s53" }
override predicate propagatesFlowExt(string input, string output, boolean preservesValue) {
preservesValue = true and
input = "Argument[self].WithoutElement[:foo]" and
output = "ReturnValue"
}
}
def m53(i, h)
h[:foo] = source("a")
h[:bar] = source("b")
h[i] = source("c")
sink h[:foo] # $ hasValueFlow=a hasValueFlow=c
sink h[:bar] # $ hasValueFlow=b hasValueFlow=c
x = h.s53()
sink x[:foo]
sink x[:bar] # $ hasValueFlow=b
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also incorrectly include flow for sink(source("d").s53())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... that seems very unintuitive to me, but you're certainly right. How do I explain this? Maybe something like:
An important point to note is that in a summary such as
input = "Argument[self].WithoutElement[0]" and output = "ReturnValue"if
Argument[self]
contains data, it will be copied toReturnValue
. If you only want to copy data in elements, and not in the container itself, addWithElement[any]
to the input path:input = "Argument[self].WithoutElement[0].WithElement[any]" and output = "ReturnValue"See tests 53 and 54 [todo: i'll add this test] for examples of this behaviour.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks good. WithoutElement[x]
just says that data must not be stored inside a collection at index x
, not that data must be stored inside a collection.
This is also the reason why something like a reverse
method needs a summary like
input = "Argument[self].WithElement[any]" and
output = "ReturnValue"
and not
input = "Argument[self]" and
output = "ReturnValue"
Ping @hmac - ok to merge (after resolving conflicts)? |
There's a few comments above where I'd like to get a response from @hvitved before merging this. |
59a519c
to
18fe233
Compare
These test cases are a companion to the flow summary docs, and ensure that the documentated behaviour matches reality.
c8f67ba
to
f49507e
Compare
This PR adds some internal documentation for the Ruby flow summary syntax. There's a few TODOs left where I'm not sure of some behaviour - if you know the answer, let me know and I'll fill them in.