-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Data flow: Introduce ContentSet
#8641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
68503d8
to
c55f66e
Compare
c55f66e
to
415a1c2
Compare
ContentExt
, interpretRead
, and interpretStore
ContentSet
Before ``` [2022-04-06 13:19:29] (96s) Tuple counts for DataFlowImpl2::Stage1::revFlowConsCand#7ad53399#ff/2@i14#aa10f2wi after 4.4s: 10681 ~0% {2} r1 = SCAN DataFlowImpl2::Stage1::revFlow#7ad53399#fff#prev_delta OUTPUT In.0, In.2 'config' 982 ~1% {3} r2 = JOIN r1 WITH DataFlowImpl2::readSet#7ad53399#ffff_2301#join_rhs ON FIRST 2 OUTPUT Rhs.3, Lhs.1 'config', Rhs.2 83691528 ~2% {3} r3 = JOIN r2 WITH DataFlowPublic::ContentSet::getAReadContent#dispred#f0820431#ff ON FIRST 1 OUTPUT Lhs.1 'config', Lhs.2, Rhs.1 'c' 83581763 ~2% {3} r4 = r3 AND NOT DataFlowImpl2::Stage1::revFlowConsCand#7ad53399#ff#prev(Lhs.2 'c', Lhs.0 'config') 83581763 ~0% {3} r5 = SCAN r4 OUTPUT In.2 'c', In.0 'config', In.1 0 ~0% {3} r6 = JOIN r5 WITH DataFlowImpl2::Stage1::fwdFlowConsCand#7ad53399#ff ON FIRST 2 OUTPUT Lhs.2, Lhs.1 'config', Lhs.0 'c' 0 ~0% {2} r7 = JOIN r6 WITH DataFlowImpl2::Stage1::fwdFlow#7ad53399#2#fff_02#join_rhs ON FIRST 2 OUTPUT Lhs.2 'c', Lhs.1 'config' return r7 ``` After ``` [2022-04-06 13:44:38] (6s) Tuple counts for DataFlowImpl2::Stage1::revFlowConsCand#7ad53399#ff/2@i14#5abbf2wn after 6ms: 10681 ~0% {2} r1 = SCAN DataFlowImpl2::Stage1::revFlow#7ad53399#fff#prev_delta OUTPUT In.0, In.2 'config' 982 ~1% {3} r2 = JOIN r1 WITH DataFlowImpl2::readSet#7ad53399#ffff_2301#join_rhs ON FIRST 2 OUTPUT Rhs.3, Lhs.1 'config', Rhs.2 109765 ~0% {3} r3 = JOIN r2 WITH DataFlowImpl2::Stage1::fwdFlowConsCandSet#7ad53399#fff#reorder_0_2_1 ON FIRST 2 OUTPUT Lhs.1 'config', Lhs.2, Rhs.2 'c' 0 ~0% {3} r4 = r3 AND NOT DataFlowImpl2::Stage1::revFlowConsCand#7ad53399#ff#prev(Lhs.2 'c', Lhs.0 'config') 0 ~0% {3} r5 = SCAN r4 OUTPUT In.1, In.0 'config', In.2 'c' 0 ~0% {2} r6 = JOIN r5 WITH DataFlowImpl2::Stage1::fwdFlow#7ad53399#2#fff_02#join_rhs ON FIRST 2 OUTPUT Lhs.2 'c', Lhs.1 'config' return r6 ```
I have pushed another commit that is not strictly needed yet (it will be for #7914), but it belongs logically with this PR. |
I think there's a point in the design space worth talking about here. In the literature of (sound) JavaScript analysis the standard solution is to have two "unknown" contents rather than one:
The rule is then:
(Another common solution is to collapse a heterogenous array into a homogenous array as soon as any operation acts on it with non-constant index, but that is hard to apply in our setting). I can't immediately say whether this is better than Anyway, this isn't meant as an objection, just thought I'd mention it in case it helps somehow. |
I think the
This should be equivalent, but ought to perform a bit better in an analysis that is forward-first, as our is. |
Thanks for your inputs @asgerf and @aschackmull . I agree that the above should work as well (except we would still have to handle reverse stores), but I think I have a slight preference for the
|
As discussed offline: Let's move the store-related |
26e7971
to
360e3e7
Compare
360e3e7
to
bc6ee10
Compare
read(n1, c, n2, pragma[only_bind_into](config)) | ||
revFlowIsReadAndStored(pragma[only_bind_into](c), pragma[only_bind_into](config)) and | ||
read(n1, c, n2, pragma[only_bind_into](config)) and | ||
revFlow(n2, pragma[only_bind_into](config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This join-order is not optimal, but presumably good enough. Realising the optimal order would require another helper predicate, so I'm not suggesting any change here, merely noting the fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
In github#8641, `localFlowExit` was changed to use `Stage2::readStepCand` instead of `read`, which means that the big-step relation is broken up less. This causes test result changes. Nothing is lost from the `select` clause, but some results may have fewer paths, and fewer nodes and edges are output in the test results.
In github#8641, `localFlowExit` was changed to use `Stage2::readStepCand` instead of `read`, which means that the big-step relation is broken up less. This causes test result changes. Nothing is lost from the `select` clause, but some results may have fewer paths, and fewer nodes and edges are output in the test results.
In github#8641, `localFlowExit` was changed to use `Stage2::readStepCand` instead of `read`, which means that the big-step relation is broken up less. This causes test result changes. Nothing is lost from the `select` clause, but some results may have fewer paths, and fewer nodes and edges are output in the test results.
In github#8641, `localFlowExit` was changed to use `Stage2::readStepCand` instead of `read`, which means that the big-step relation is broken up less. This causes test result changes. Nothing is lost from the `select` clause, but some results may have fewer paths, and fewer nodes and edges are output in the test results.
Background
In Ruby, we are tracking flow through arrays with precise array index information, when available. That is, we are correctly able to handle
Flow is tracked using the data flow library's notion of contents, reads, and stores, and we can model known/unknown array information as follows
Then the following statements
correspond to the following read/store steps
assuming that
x
does not have a known constant value.The Problem
In the example above, we can see how
a4[x]
gives rise to 11 read steps, since we need to be able to read out values stored at any known or unknown index. Unfortunately, we need to encode this up-front in thereadStep
relation, which means that all such reads will be replicated 11 times, and it would become even worse if we were to increase the range ofKnownArrayElementContent
.Another issue is that when the data-flow library generates reverse store steps from read steps
we will get the following reverse store steps
whereas what we really wanted was just a single store step
A Solution
This PR proposes a solution to both problems above, where we generalize the interface of the data-flow library for
readStep
andstoreStep
, to use a new typeContentSet
instead ofContent
.ContentSet
will allow us to represent sets ofContent
s:And we will be able to interpret
ContentSet
s differently, depending on whether they occur in reads or in storesWe will still be using
Content
s in access paths, but we will useContentSet
s internally, and the data-flow library will make sure to avoid fan-out duplication viagetAStoreContent/getAReadContent
.Returning to the original example
we will now have the following read/store steps:
And for
we get just a single reverse store step
which translates via
getAStoreContent
into a store intoUnknownArrayElementContent
as desired.PR Structure
This PR is structured into a sequence of logically self-contained commits, and I strongly encourage reviewing them in order:
ContentSet
into the data-flow library.ContentSet
for Ruby, as per the description above.ContentSet
for C++ as a simple bijection intoContent
.ContentSet
for Java as a simple bijection intoContent
.ContentSet
for C# as a simple bijection intoContent
.ContentSet
for Python as a simple bijection intoContent
.