-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Ruby: Document flow summary syntax #10899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0a4a851
6f852aa
78f604a
43f2713
0c2dd1a
5e3a817
5b07c3a
c0501c1
0b06500
fe13ac1
df398fb
f49507e
0340549
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,277 @@ | ||
# Flow summaries | ||
|
||
Flow summaries describe how data flows through methods whose definition is not | ||
included in the database. For example, methods in the standard library or a gem. | ||
|
||
Say we have the following code: | ||
|
||
```rb | ||
x = gets | ||
y = x.chomp | ||
system(y) | ||
``` | ||
|
||
This code reads a line from STDIN, strips any trailing newlines, and executes it | ||
as a shell command. Assuming `x` is considered tainted, we want the argument `y` | ||
to be tainted in the call to `system`. | ||
|
||
`chomp` is a standard library method in the `String` class for which we | ||
have no source code, so we include a flow summary for it: | ||
|
||
```ql | ||
private class ChompSummary extends SimpleSummarizedCallable { | ||
ChompSummary() { this = "chomp" } | ||
|
||
override predicate propagatesFlowExt(string input, string output, boolean preservesValue) { | ||
input = "Argument[self]" and | ||
output = "ReturnValue" and | ||
preservesValue = false | ||
} | ||
} | ||
``` | ||
|
||
The shared dataflow library will use this summary to construct a fake definition | ||
for `chomp`. The behaviour of this definition depends on the body of | ||
`propagatesFlowExt`. In this case, the method will propagate taint flow from the | ||
`self` argument (i.e. the receiver) to the return value. | ||
|
||
If `preservesValue = true` then value flow is propagated. If it is `false` then | ||
only taint flow is propagated. | ||
|
||
Any call to `chomp` in the database will be translated, in the dataflow graph, | ||
to a call to this fake definition. | ||
|
||
`input` and `output` define the "from" and "to" locations in the flow summary. | ||
They use a custom string-based syntax which is similar to that used in `path` | ||
column in the Models as Data format. These strings are often referred to as | ||
access paths. | ||
|
||
Note: The behaviour documented below is tested in | ||
`dataflow/flow-summaries/behaviour.ql`. Where specific quirks exist, we may | ||
reference a particular test case in this file which demonstrates the quirk. | ||
|
||
# Syntax | ||
|
||
Access paths consist of zero or more components separated by dots (`.`). The | ||
permitted components differ for input and output paths. The meaning of each | ||
component is defined relative to the implicit context of the component as | ||
defined by the preceding access path. For example, | ||
|
||
``` | ||
Argument[0].Element[1].ReturnValue | ||
``` | ||
|
||
refers to the return value of the element at index 1 in the array at argument 0 | ||
of the method call. | ||
|
||
## `Argument` and `Parameter` | ||
|
||
The `Argument` and `Parameter` components refer respectively to an argument to a | ||
call or a parameter of a callable. They contain one or more _specifiers_[^1] which | ||
constrain the range of arguments/parameters that the component refers to. For | ||
example, `Argument[0]` refers to the first argument. | ||
|
||
If multiple specifiers are given then the result is a disjunction, meaning that | ||
the component refers to any argument/parameter that satisfies at least one of | ||
the specifiers. For example, `Argument[0, 1]` refers to the first and second | ||
arguments. | ||
|
||
### Specifiers | ||
|
||
#### `self` | ||
The receiver of the call. | ||
|
||
#### `<integer>` | ||
The argument to the method call at the position given by the integer. For | ||
example, `Argument[0]` refers to the first argument to the call. | ||
|
||
#### `<integer>..` | ||
An argument to the call at a position greater or equal to the integer. For | ||
example, `Argument[1..]` refers to all arguments except the first one. This | ||
specifier is not available on `Parameter` components. | ||
|
||
#### `<string>:` | ||
A keyword argument to the call with the given name. For example, | ||
`Argument[foo:]` refers to the keyword argument `foo:` in the call. | ||
|
||
#### `block` | ||
The block argument passed to the call, if any. | ||
|
||
#### `any` | ||
Any argument to the call, except `self` or `block` arguments. | ||
|
||
#### `any-named` | ||
Any keyword argument to the call. | ||
|
||
#### `hash-splat` | ||
The special "hash splat" argument/parameter, which is written as `**args`. | ||
When used in an `Argument` component, this specifier refers to special dataflow | ||
node which is constructed at the call site, containing any elements in a hash | ||
splat argument (`**args`) along with any explicit keyword arguments (`foo: | ||
bar`). The node behaves like a normal dataflow node for a hash, meaning that you | ||
can access specific elements of it using the `Element` component. | ||
|
||
For example, the following flow summary states that values flow from any keyword | ||
arguments (including those in a hash splat) to the return value: | ||
|
||
```ql | ||
input = "Argument[hash-splat].Element[any]" and | ||
output = "ReturnValue" and | ||
preservesValue = true | ||
``` | ||
|
||
Assuming this summary is for a global method `foo`, the following test will pass: | ||
|
||
```rb | ||
a = source "a" | ||
b = source "b" | ||
|
||
h = {a: a} | ||
|
||
x = foo(b: b, **h) | ||
|
||
sink x # $ hasValueFlow=a hasValueFlow=b | ||
``` | ||
|
||
If the method returns the hash itself, you will need to use `WithElement` in | ||
order to preserve taint/value in its elements. For example: | ||
|
||
```ql | ||
input = "Argument[hash-splat].WithElement[any]" and | ||
output = "ReturnValue" and | ||
preservesValue = true | ||
``` | ||
```rb | ||
a = source "a" | ||
x = foo(a: a) | ||
sink x[:a] # $ hasValueFlow=a | ||
``` | ||
|
||
## `ReturnValue` | ||
`ReturnValue` refers to the return value of the element identified in the | ||
preceding access path. For example, `Argument[0].ReturnValue` refers to the | ||
return value of the first argument. Of course this only makes sense if the first | ||
argument is a callable. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting, I wrote Would it improve things if I change this to say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
|
||
## `Element` | ||
This component refers to elements inside a collection of some sort. Typically | ||
this is an Array or Hash. Elements are considered to have an index, which is an | ||
integer in arrays and a symbol or string in hashes (even though hashes can have | ||
arbitrary objects as keys). Elements can also have an unknown index, which means | ||
we know the element exists in the collection but we don't know where. | ||
|
||
Many of the specifiers have an optional suffix `!`. If this suffix is used then | ||
the specifier excludes elements at unknown indices. Otherwise, these are | ||
included by default. | ||
|
||
### Specifiers | ||
|
||
#### `?` | ||
If used in an input path: an element at an unknown index. If used in an output | ||
path: an element at any known or unkown index. In other words, `?` in an output | ||
path means the same as `any`. | ||
|
||
#### `any` | ||
An element at any known or unknown index. | ||
|
||
#### `<integer>`, `<integer>!` | ||
An element at the index given by the integer. | ||
|
||
#### `<integer>..`, `<integer>..!` | ||
Any element at a known index greater or equal to the integer. | ||
|
||
#### `<string>`, `<string>!` | ||
An element at the index given by string. The string should match the result of | ||
`serialize()` on the `ConstantValue` that represents the index. For a string | ||
with contents `foo` this is `"foo"` and for a symbol `:foo` it is `:foo`. The | ||
Ruby values `true`, `false` and `nil` can be written verbatim. See tests 31-33 | ||
for examples. | ||
|
||
## `Field` | ||
A "field" in the object. In practice this refers to a value stored in an | ||
instance variable in the object. The only valid specifier is `@<string>`, where | ||
`<string>` is the name of the instance variable. Currently we assume that a | ||
setter call such as `x.foo = bar` means there is a field `foo` in `x`, backed by | ||
an instance variable `@foo`. | ||
|
||
For example, the access path `Argument[0].Field[@foo]` would refer to the value `"foo"` in | ||
|
||
```rb | ||
x = SomeClass.new | ||
x.foo = "foo" | ||
some_call(x) | ||
``` | ||
|
||
## `WithElement` | ||
This component restricts the set of elements that are included in the preceding | ||
access path to to those at a specific set of indices. The specifiers are the | ||
same as those for `Element`. It is only valid in an input path. | ||
|
||
This component has the effect of copying all relevant elements from the input to | ||
the output. For example, in the following summary: | ||
|
||
```ql | ||
input = "Argument[0].WithElement[1, 2]" and | ||
output = "ReturnValue" | ||
``` | ||
|
||
any data in indices 1 and 2 of the first argument will be copied to indices 1 | ||
and 2 of the return value. We use this in many Hash summaries that return the | ||
receiver, in order to preserve any data stored in it. For example, the summary | ||
for `Hash#to_h` is | ||
|
||
```ql | ||
input = "Argument[self].WithElement[any]" and | ||
output = "ReturnValue" and | ||
preservesValue = true | ||
``` | ||
|
||
## `WithoutElement` | ||
This component is used to exclude certain elements from the set included in the | ||
preceding access path. It takes the same specifiers as `WithElement` and | ||
`Element`. It is only valid in an input path. | ||
|
||
This component has the effect of excluding the relevant elements when copying | ||
from input to output. It is useful for modelling methods that remove elements | ||
from a collection. For example to model a method that removes the first element | ||
from the receiver, we can do so like this: | ||
|
||
```ql | ||
input = "Argument[self].WithoutElement[0]" and | ||
output = "Argument[self]" | ||
``` | ||
|
||
Note that both the input and output refer to the receiver. The effect of this | ||
summary is that use-use flow between the receiver in the method call and a | ||
subsequent use of the same receiver will be blocked: | ||
|
||
```ruby | ||
a[0] = source 0 | ||
a[1] = source 1 | ||
|
||
a.remove_first # use-use flow from `a` on this line to `a` below will be blocked. | ||
# there will still be flow from `[post-update] a` to `a` below. | ||
|
||
sink a[0] | ||
sink a[1] # $ hasValueFlow=1 | ||
``` | ||
|
||
It is also important to note that in a summary such as | ||
|
||
```ql | ||
input = "Argument[self].WithoutElement[0]" and | ||
output = "ReturnValue" | ||
``` | ||
|
||
if `Argument[self]` contains data, it will be copied to `ReturnValue`. If you only want to copy data in elements, and not in the container itself, add `WithElement[any]` to the input path: | ||
|
||
```ql | ||
input = "Argument[self].WithoutElement[0].WithElement[any]" and | ||
output = "ReturnValue" | ||
``` | ||
|
||
See tests 53 and 54 for examples of this behaviour. | ||
|
||
|
||
|
||
[^1]: I've chosen this name to avoid overloading the word "argument". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will also include all explicit keyword arguments, wrapped in an implicit hash splat argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit that elaborates on this.