Ruby: Document flow summary syntax #10899

hmac · 2022-10-20T05:23:31Z

This PR adds some internal documentation for the Ruby flow summary syntax. There's a few TODOs left where I'm not sure of some behaviour - if you know the answer, let me know and I'll fill them in.

aibaars · 2022-10-27T09:56:32Z

@hmac Could you split this PR? Documentation changes for the GA should go into codeql-cli-2.11.2 , but QL changes should go to main.

If you think the doc changes are not fit for the GA release, then please retarget this PR to main and remove the GA label.

hmac · 2022-10-27T20:07:23Z

I don't know why this has a GA label - these are internal docs (and also this PR is unfinished). I've removed it.

aibaars · 2022-10-28T08:10:05Z

I don't know why this has a GA label - these are internal docs (and also this PR is unfinished). I've removed it.

Ok great! I mistakenly labelled it as GA, sorry for that.

hvitved

Thanks for the excellent write-up.

hvitved · 2022-11-04T10:12:05Z

ruby/ql/docs/flow_summaries.md

+
+`input` and `output` define the "from" and "to" locations in the flow summary.
+They use a custom string-based syntax which is similar but not identical to
+Models as Data. These strings are often referred to as access paths.


These strings should be identical to the input and output string in MaD.

As I understand it, MaD access paths map to API graph paths wheras flow summary paths map to dataflow nodes, and so the tokens supported by each can differ. This slack thread highlights one difference. Another example I've just spotted is that MaD has no support for the hash-splat specifier in Argument tokens.

Ok, I see what you mean. However, I think of MaD collectively as all the components that make up a summary; but you are right that the access path component (which identifies the relevant calls) maps to API graphs, where as the input/output components have their own (but similar) syntax.

I don't think we should call out the difference with the MaD path column just now (called the "identifying access path"). We can elaborate on this when documenting the MaD format.

I've slightly reworded this sentence, but I want to retain the reference to MaD because I think it's useful to the reader. Putting myself in the shoes of a newcomer to the codebase: if I've encountered MaD already then this helps provide context and a point of comparison, and if I haven't seen MaD yet then this lets me know that it exists and I can choose to explore more by reading the MaD documentation if I want (and if/when it exists).

hvitved · 2022-11-04T10:14:26Z

ruby/ql/docs/flow_summaries.md

+Any keyword argument to the call.
+
+#### `hash-splat`
+The special "hash splat" argument/parameter, which is written as `**args`.


Note that this will also include all explicit keyword arguments, wrapped in an implicit hash splat argument.

I've pushed a commit that elaborates on this.

hvitved · 2022-11-04T10:14:52Z

ruby/ql/docs/flow_summaries.md

+`ReturnValue` refers to the return value of the element identified in the
+preceding access path. For example, `Argument[0].ReturnValue` refers to the
+return value of the first argument. Of course this only makes sense if the first
+argument is a callable.


Maybe callback instead of callable.

Interesting, I wrote callable to mean that, of course, it makes no sense to use ReturnValue on an object. Instead it must be a proc, lambda or block. I guess depending on how you define "callback", then these would all be considered callbacks. Ruby doesn't tend to use that terminology, though, and in cases like arr.detect(&:even?) I would probably not call &:even? a callback. 🤷

Would it improve things if I change this to say if the first argument is callable - i.e. it is a proc, lambda or block.?

hvitved · 2022-11-04T10:17:28Z

ruby/ql/docs/flow_summaries.md

+TODO: I've not seen this component used in an output path; I don't know if it makes
+sense to do so, or what meaning it would have.


Doesn't make sense to use in ouput.

hvitved · 2022-11-04T10:18:15Z

ruby/ql/docs/flow_summaries.md

+TODO: I've not seen this component used in an output path; I don't know if it makes
+sense to do so, or what meaning it would have.


No, doesn't make sense.

hvitved · 2022-11-04T10:30:48Z

ruby/ql/docs/flow_summaries.md

+```
+
+any data in any index of the first argument will be copied to the return value,
+with the exception of data at index 0.


A prominent use case of WithoutElement is when you want to model a method that removes (a set of) element(s) from a collection. For example, if we want to model a method that removes the first element of the receiver collection, we can do so via

input = "Argument[self].WithoutElement[0]" and output = "Argument[self]"

where both the input and the output refer to the same argument. The data-flow library will then prevent use-use flow from any corresponding receivers:

a[0] = taint a[1] = taint a.remove_first # use-use flow from `a` to the `a` below will be blocked, but there will still be flow from `[post-update] a` to `a` below sink(a[0]) # good sink(a[1]) # bad

Thanks! I've written this example into the docs. If you get time, it would be great to have an example of where you should use .WithoutElement[...].WithElement[...]. I've tried removing instances of this pattern from every flow summary and all tests still pass, which leads me to believe that it is redundant.

A good example of this is something like Hash#except. If we see a call like hash.except(:foo), we would like that to map to a flow summary with

input = Argument[self].WithoutElement[":foo"].WithElement[any] output = ReturnValue

I.e. data can flow out of the call, but only when it is stored in some element that is not known to be at index :foo.

I'm still not sure that WithElement[any] is necessary. Test 53 has an example:

private class S53 extends SimpleSummarizedCallable { S53() { this = "s53" } override predicate propagatesFlowExt(string input, string output, boolean preservesValue) { preservesValue = true and input = "Argument[self].WithoutElement[:foo]" and output = "ReturnValue" } }

def m53(i, h) h[:foo] = source("a") h[:bar] = source("b") h[i] = source("c") sink h[:foo] # $ hasValueFlow=a hasValueFlow=c sink h[:bar] # $ hasValueFlow=b hasValueFlow=c x = h.s53() sink x[:foo] sink x[:bar] # $ hasValueFlow=b end

This would also incorrectly include flow for sink(source("d").s53()).

Hmmm... that seems very unintuitive to me, but you're certainly right. How do I explain this? Maybe something like:

An important point to note is that in a summary such as

input = "Argument[self].WithoutElement[0]" and output = "ReturnValue"

if Argument[self] contains data, it will be copied to ReturnValue. If you only want to copy data in elements, and not in the container itself, add WithElement[any] to the input path:

input = "Argument[self].WithoutElement[0].WithElement[any]" and output = "ReturnValue"

See tests 53 and 54 [todo: i'll add this test] for examples of this behaviour.

?

Yes, that looks good. WithoutElement[x] just says that data must not be stored inside a collection at index x, not that data must be stored inside a collection.

This is also the reason why something like a reverse method needs a summary like

input = "Argument[self].WithElement[any]" and output = "ReturnValue"

and not

input = "Argument[self]" and output = "ReturnValue"

calumgrant · 2022-11-14T09:27:00Z

Ping @hmac - ok to merge (after resolving conflicts)?

hmac · 2022-11-16T00:37:30Z

There's a few comments above where I'd like to get a response from @hvitved before merging this.

ruby/ql/test/library-tests/dataflow/flow-summaries/behaviour.ql

These test cases are a companion to the flow summary docs, and ensure that the documentated behaviour matches reality.

github-actions bot added documentation Ruby labels Oct 20, 2022

aibaars added the Ruby GA label Oct 24, 2022

aibaars changed the base branch from main to codeql-cli-2.11.2 October 27, 2022 09:44

hmac removed Ruby GA documentation labels Oct 27, 2022

hmac changed the base branch from codeql-cli-2.11.2 to main October 27, 2022 22:17

hmac force-pushed the flow-summary-docs branch from edadd59 to 9776dfd Compare November 3, 2022 23:14

github-actions bot added the documentation label Nov 3, 2022

hmac marked this pull request as ready for review November 4, 2022 01:32

hmac requested a review from a team as a code owner November 4, 2022 01:32

hmac added the no-change-note-required This PR does not need a change note label Nov 4, 2022

hvitved reviewed Nov 4, 2022

View reviewed changes

hmac mentioned this pull request Nov 16, 2022

Ruby: add flow summary for Enumerable#index_by #11252

Merged

hmac force-pushed the flow-summary-docs branch from 59a519c to 18fe233 Compare November 16, 2022 04:05

github-advanced-security bot found potential problems Nov 16, 2022

View reviewed changes

ruby/ql/test/library-tests/dataflow/flow-summaries/behaviour.ql Fixed Show resolved Hide resolved

hmac added 9 commits November 25, 2022 16:55

Ruby: simplify Hash#transform_keys! flow summary

0a4a851

Ruby: Document flow summary syntax

6f852aa

Ruby: Document Field access path token

78f604a

Ruby: Update test fixture

43f2713

Ruby: Flesh out hash-splat docs

0c2dd1a

Ruby: With[out]Element only valid in input

5e3a817

Ruby: Elaborate WithoutElement docs

5b07c3a

Ruby: Document ?/any behaviour in output paths

c0501c1

Ruby: Add tests for flow summary behaviour

0b06500

These test cases are a companion to the flow summary docs, and ensure that the documentated behaviour matches reality.

hmac added 3 commits November 25, 2022 16:55

Ruby: US spelling

fe13ac1

Ruby: Add more flow summary tests

df398fb

Ruby: Add note about WithElement usage

f49507e

hmac force-pushed the flow-summary-docs branch from c8f67ba to f49507e Compare November 25, 2022 03:55

calumgrant requested a review from asgerf November 28, 2022 09:54

Ruby: Slight rewording

0340549

aibaars approved these changes Dec 22, 2022

View reviewed changes

hmac merged commit b70ca77 into github:main Dec 27, 2022

hmac deleted the flow-summary-docs branch December 27, 2022 21:47

erik-krogh mentioned this pull request Jan 2, 2023

Ruby: Fix compile error in test #11796

Merged

		TODO: I've not seen this component used in an output path; I don't know if it makes
		sense to do so, or what meaning it would have.

Ruby: Document flow summary syntax #10899

Ruby: Document flow summary syntax #10899

Uh oh!

Conversation

hmac commented Oct 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aibaars commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmac commented Oct 27, 2022

Uh oh!

aibaars commented Oct 28, 2022

Uh oh!

hvitved left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hvitved Nov 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asgerf Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

calumgrant commented Nov 14, 2022

Uh oh!

hmac commented Nov 16, 2022

Uh oh!

Uh oh!

Uh oh!

hmac commented Oct 20, 2022 •

edited

Loading

aibaars commented Oct 27, 2022 •

edited

Loading

hvitved Nov 16, 2022 •

edited

Loading

asgerf Nov 28, 2022 •

edited

Loading