Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign up[RFC] figure out story around linear queries #41710
Comments
nikomatsakis
changed the title
figure out story around linear queries
[RFC] figure out story around linear queries
May 2, 2017
nikomatsakis
added
A-allocators
T-compiler
and removed
A-allocators
labels
May 2, 2017
nikomatsakis
referenced this issue
May 2, 2017
Merged
rework the queries for the MIR pipeline #41625
This comment has been minimized.
This comment has been minimized.
|
IMO not requiring declaring inputs up-front is a plus, that's why I didn't even mention the scheme you came up with but rather dismissed it because of the constraints I had. That is, I wanted a system that could maintain the current flexibility in that queries are never declared (not even when setting up the providers, although I like that it is possible), but rather arise from imperative code. Perhaps more than correctness, I was focused on performance and somewhat on ergonomics, assuming we could use no interior mutability in the values themselves (i.e. no The immediate result is the basic linear query idea, where you can only invoke the query once, and you own the value instead of a clone being made. But then if several other queries want to access the same value, you have no way to share it. You can tuple up those other queries into a single query, but not if you only want some of the results to be linear too. So the constraints more or less involve a bunch of related analyses/transformations, where both the inputs and the outputs contain values you don't want to duplicate, and also values you do want to duplicate (e.g. MIR const-qualification results). All of that said, the only element of correctness was reducing the "trusted abstraction base", specifically the use of interior mutability outside of the implementation of the query engine. |
nikomatsakis
referenced this issue
May 8, 2017
Closed
[FIXME] settle future of MIR pass manager #41712
This comment has been minimized.
This comment has been minimized.
|
I'm still not sure if we need or should have linear queries at all. They complicate things, like caching things to disk in the background (e.g. unoptimized MIR is queued for serialization in another thread, then another query steals it) and I'm not sure they are worth the trouble. As far as I can tell, the MIR transformation pipeline is the only case this is ever going to be used. Can't the optimization pass just make it's own copy of the unoptimized MIR in the beginning and then modify that in place? |
This comment has been minimized.
This comment has been minimized.
|
I don't think background caching is realistic, because of |
This comment has been minimized.
This comment has been minimized.
|
What maps are you referring to here? I would imagine the cache to be mainly |
This comment has been minimized.
This comment has been minimized.
|
@michaelwoerister I mean that serialization might require concurrent map lookup for some leaves. FWIW, you don't even need |
This comment has been minimized.
This comment has been minimized.
|
@eddyb Yeah, that's a good point. I kind of assumed, that anything reachable would already be interned and not need any lookups. That might not be true... |
nikomatsakis commentedMay 2, 2017
•
edited
PR #41625, in an effort to rationalize the MIR pipeline, introduced queries that yield a
Stealtype. Cribbing from a rather long and overwrought comment that I wrote, the idea is to model linear queries as ordinary queries that return a&'tcx Steal<D>, which supports two operations ("borrow" and "steal"). Once a value is stolen, any further operation is abug!(). This means that all parts of the compiler which invoke a query with a steal result must be aware of any other parts that might use that same query and coordinate with them. (This is a performance optimization: that is, we could make "stolen" queries simply regenerate the resulting value, and everything would work, but we don't want to be regenerating the results of these queries multiple times, so instead we make this error abug!.)In all the MIR cases, at least, there really isn't much to intercoordinate. For the most part, each MIR optimization is just used by one other query: the next optimization in the sequence. (In the current PR, each optimization pass has its own query, but if we convert to just having a query-per-suite, then this would apply at the level of suites.) Other parts of the compiler should use one of the queries (e.g.,
optimized_mir()) that do not return aStealresult.However, there are a few exceptions. One example is const qualification. This is a lightweight scan of the contents of a
constitem, and it needs to take place relatively early in the pipeline (before optimizations are applied and so forth). The way I handled this is (a) to have itborrow()from the steal and then (b) to useforcebefore we steal the relevant MIR, so that we know that it has executed. If we forgot to add theforce()call, then the result would be dependent on the order in which queries were issued; that is, if one requestedconst-qualif(D)first, it would execute successfully, but if one requestedoptimized_mir(D)first, thenconst-qualif(D)afterwards, you would get abug!becauseconst-qualifwould be trying to read stolen data. (However, if compilation succeeds, we are always assured of a consistent result.)The complete set of examples that I am aware of where we have a linear query that also needs to be accessed is as follows:
const fn, need to save a copy of the IR before validation and optimization that can be used by miri (indeed, optimization may want to execute miri to evaluate constant expressions, perhaps even some appearing within theconst fnitself).The current solution is simple and may indeed be "good enough" -- it depends a bit (in my mind) on how often we introduce linear queries and how many things depend on them. However, there have been some alternative proposals for how to handle it. The purpose of this issue is to describe those proposals and try to settle on which is best.
Rather than continuing with concrete examples, in the descriptions that follow, I will abstract out the pattern as follows. Consider a "stealable" query A that needs to be read by query B (e.g., borrow checking) but consumed by query C (e.g., optimization).
For each system, I will describe how it works, and then discuss some of the tradeoffs.
Current system: "stealable queries"
How it works: As described in the run-up, you have a query that yields a
&'tcx Steal<T>. This can be either borrowed or stolen. An attempt to borrow after a steal (or a second attempt to steal) will result in abug!call.How you can mess it up: You can forget to call
force()on a potential consumer when stealing. In the case of our example, this would mean that query C fails to force query B.What happens if you mess it up: Compilation may succeed with some query execution orders (e.g., if
Bexecutes first) but fail with abug!in others (e.g., ifCexecues first).Other downsides: entanglement It's kind of a drag that the stealer (C) must be aware of the reader (B). It
Proposed alternative: "linear queries" with "tuple providers"
How it works: @eddyb proposed an alternative in which linear queries can only be used once. In this scheme, once you execute a query once, any further attempt to request the same query is a
bug!. So we can't have a query A that is read by B and consumed by C, as I had, since both B and C must consume, and that would violate linearity. To support this scenario, then, eddyb wanted to introduce "tuple" providers (name is mine). Basically, when setting up theProvidersstruct, I can use a bit of magic to knit together a function that processes the result fromAand produces a(B, C)tuple. This magic would then divide the tuple and store theBinto theBmap and store theCinto theCmap in the right spots. This might look something like this:How you can mess it up: You could fail to use the tuple system, and instead write queries B and C independently.
What happens if you mess it up: In any execution in which both queries B and C are used, no matter which order they are used in, compilation will ICE. This is thus mildly more robust than the stealable scheme.
Other downsides: entanglement. It's kind of a drag that we must produce B and C from one function.
Other downsides: imprecise dependencies. It's possible that producing B and C use different sets of inputs. But since there is one function that is doing both bits of work, we won't be able to tell them apart in the query system, and hence they will wind up with imprecise dependencies.
Other advantages: splitting. This technique allows you to take the result of query A and split it into pieces without cloning. This is not possible with the other alternatives.
Proposed alternative: "linear queries" with "mapping providers"
In my original comment, I described a variant of of tuple providers. I want to describe a variant here of that idea that I've been thinking since. The idea is to make linear queries part of the query framework, as in the "tuple providers" proposal. However, when you define a linear query, you don't use it in the same way as non-linear queries. That is, if you have a linear query
A, you can't dotcx.A(def_id)as I showed earlier. Instead, when creating the provider struct, we use some magic functions to "connect" derived queries to a linear query. This connection can be in one of two modes ("read" or "consume"). For example:(These
read_a()andto_produce_b_with()methods would be auto-generatedby the macro.)
The basic idea here is that we are telling the providers struct two things:
This allows the framework to do more intelligent routing of results. In particular, if there is more than one consume query for a linear query, we can detect that at provider creation time -- i.e., before we even process any input. Moreover, we can ensure that before the "consume" query for a given linear value runs, we force all of the "read queries". (In this case, that means that if you request C, we will force the query B, so that it can read before C executes.) (This last part is why the provider functions have to know what query you are producing.)
How you can mess it up: You cannot mess this up, since you can't access linear queries using the normal methods. Even if you tried to register two "consume" queries for the same linear value, that would fail when creating the provider struct (so such a PR could never land, for example, and no test could ever pass).
What happens if you mess it up: You cannot, unless I missed something. =)
Other advantages: no entanglement. This is the only proposal that avoids the need for queries B and C to know about one another. Simply by registering the queries, you help the framework sort things out and ensure that the appropriate
forcecalls happen.Other proposals?
I'll try to update this header if more ideas come up.
cc @rust-lang/compiler @matthewhammer