-
Notifications
You must be signed in to change notification settings - Fork 553
Add "The Query Evaluation Model in Detail" Chapter. #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
michaelwoerister
merged 2 commits into
rust-lang:master
from
michaelwoerister:query-eval-model-update
Jan 30, 2019
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,237 @@ | ||
|
||
|
||
# The Query Evaluation Model in Detail | ||
|
||
This chapter provides a deeper dive into the abstract model queries are built on. | ||
It does not go into implementation details but tries to explain | ||
the underlying logic. The examples here, therefore, have been stripped down and | ||
simplified and don't directly reflect the compilers internal APIs. | ||
|
||
## What is a query? | ||
|
||
Abstractly we view the compiler's knowledge about a given crate as a "database" | ||
and queries are the way of asking the compiler questions about it, i.e. | ||
we "query" the compiler's "database" for facts. | ||
|
||
However, there's something special to this compiler database: It starts out empty | ||
and is filled on-demand when queries are executed. Consequently, a query must | ||
know how to compute its result if the database does not contain it yet. For | ||
doing so, it can access other queries and certain input values that the database | ||
is pre-filled with on creation. | ||
|
||
A query thus consists of the following things: | ||
|
||
- A name that identifies the query | ||
- A "key" that specifies what we want to look up | ||
- A result type that specifies what kind of result it yields | ||
- A "provider" which is a function that specifies how the result is to be | ||
computed if it isn't already present in the database. | ||
|
||
As an example, the name of the `type_of` query is `type_of`, its query key is a | ||
`DefId` identifying the item we want to know the type of, the result type is | ||
`Ty<'tcx>`, and the provider is a function that, given the query key and access | ||
to the rest of the database, can compute the type of the item identified by the | ||
key. | ||
|
||
So in some sense a query is just a function that maps the query key to the | ||
corresponding result. However, we have to apply some restrictions in order for | ||
this to be sound: | ||
|
||
- The key and result must be immutable values. | ||
- The provider function must be a pure function, that is, for the same key it | ||
must always yield the same result. | ||
- The only parameters a provider function takes are the key and a reference to | ||
the "query context" (which provides access to rest of the "database"). | ||
|
||
The database is built up lazily by invoking queries. The query providers will | ||
invoke other queries, for which the result is either already cached or computed | ||
by calling another query provider. These query provider invocations | ||
conceptually form a directed acyclic graph (DAG) at the leaves of which are | ||
input values that are already known when the query context is created. | ||
|
||
|
||
|
||
## Caching/Memoization | ||
|
||
Results of query invocations are "memoized" which means that the query context | ||
will cache the result in an internal table and, when the query is invoked with | ||
the same query key again, will return the result from the cache instead of | ||
running the provider again. | ||
|
||
This caching is crucial for making the query engine efficient. Without | ||
memoization the system would still be sound (that is, it would yield the same | ||
results) but the same computations would be done over and over again. | ||
|
||
Memoization is one of the main reasons why query providers have to be pure | ||
functions. If calling a provider function could yield different results for | ||
each invocation (because it accesses some global mutable state) then we could | ||
not memoize the result. | ||
|
||
|
||
|
||
## Input data | ||
|
||
When the query context is created, it is still empty: No queries have been | ||
executed, no results are cached. But the context already provides access to | ||
"input" data, i.e. pieces of immutable data that where computed before the | ||
context was created and that queries can access to do their computations. | ||
Currently this input data consists mainly of the HIR map and the command-line | ||
options the compiler was invoked with. In the future, inputs will just consist | ||
of command-line options and a list of source files -- the HIR map will itself | ||
be provided by a query which processes these source files. | ||
|
||
Without inputs, queries would live in a void without anything to compute their | ||
result from (remember, query providers only have access to other queries and | ||
the context but not any other outside state or information). | ||
|
||
For a query provider, input data and results of other queries look exactly the | ||
same: It just tells the context "give me the value of X". Because input data | ||
is immutable, the provider can rely on it being the same across | ||
different query invocations, just as is the case for query results. | ||
|
||
|
||
|
||
## An example execution trace of some queries | ||
|
||
How does this DAG of query invocations come into existence? At some point | ||
the compiler driver will create the, as yet empty, query context. It will then, | ||
from outside of the query system, invoke the queries it needs to perform its | ||
task. This looks something like the following: | ||
|
||
```rust,ignore | ||
fn compile_crate() {} | ||
let cli_options = ...; | ||
let hir_map = ...; | ||
|
||
// Create the query context `tcx` | ||
let tcx = TyCtxt::new(cli_options, hir_map); | ||
|
||
// Do type checking by invoking the type check query | ||
tcx.type_check_crate(); | ||
} | ||
``` | ||
|
||
The `type_check_crate` query provider would look something like the following: | ||
|
||
```rust,ignore | ||
fn type_check_crate_provider(tcx, _key: ()) { | ||
let list_of_items = tcx.hir_map.list_of_items(); | ||
|
||
for item_def_id in list_of_hir_items { | ||
tcx.type_check_item(item_def_id); | ||
} | ||
} | ||
``` | ||
|
||
We see that the `type_check_crate` query accesses input data | ||
(`tcx.hir_map.list_of_items()`) and invokes other queries | ||
(`type_check_item`). The `type_check_item` | ||
invocations will themselves access input data and/or invoke other queries, | ||
so that in the end the DAG of query invocations will be built up backwards | ||
from the node that was initially executed: | ||
|
||
```ignore | ||
(2) (1) | ||
list_of_all_hir_items <----------------------------- type_check_crate() | ||
| | ||
(5) (4) (3) | | ||
Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+ | ||
| | | ||
+-----------------+ | | ||
| | | ||
(7) v (6) (8) | | ||
Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+ | ||
|
||
// (x) denotes invocation order | ||
``` | ||
|
||
We also see that often a query result can be read from the cache: | ||
`type_of(bar)` was computed for `type_check_item(foo)` so when | ||
`type_check_item(bar)` needs it, it is already in the cache. | ||
|
||
Query results stay cached in the query context as long as the context lives. | ||
So if the compiler driver invoked another query later on, the above graph | ||
would still exist and already executed queries would not have to be re-done. | ||
|
||
|
||
|
||
## Cycles | ||
|
||
Earlier we stated that query invocations form a DAG. However, it would be easy | ||
form a cyclic graph by, for example, having a query provider like the following: | ||
|
||
```rust,ignore | ||
fn cyclic_query_provider(tcx, key) -> u32 { | ||
// Invoke the same query with the same key again | ||
tcx.cyclic_query(key) | ||
} | ||
``` | ||
|
||
Since query providers are regular functions, this would behave much as expected: | ||
Evaluation would get stuck in an infinite recursion. A query like this would not | ||
be very useful either. However, sometimes certain kinds of invalid user input | ||
can result in queries being called in a cyclic way. The query engine includes | ||
a check for cyclic invocations and, because cycles are an irrecoverable error, | ||
will abort execution with a "cycle error" messages that tries to be human | ||
readable. | ||
|
||
At some point the compiler had a notion of "cycle recovery", that is, one could | ||
"try" to execute a query and if it ended up causing a cycle, proceed in some | ||
other fashion. However, this was later removed because it is not entirely | ||
clear what the theoretical consequences of this are, especially regarding | ||
incremental compilation. | ||
|
||
|
||
## "Steal" Queries | ||
|
||
Some queries have their result wrapped in a `Steal<T>` struct. These queries | ||
behave exactly the same as regular with one exception: Their result is expected | ||
to be "stolen" out of the cache at some point, meaning some other part of the | ||
program is taking ownership of it and the result cannot be accessed anymore. | ||
|
||
This stealing mechanism exists purely as a performance optimization because some | ||
result values are too costly to clone (e.g. the MIR of a function). It seems | ||
like result stealing would violate the condition that query results must be | ||
immutable (after all we are moving the result value out of the cache) but it is | ||
OK as long as the mutation is not observable. This is achieved by two things: | ||
|
||
- Before a result is stolen, we make sure to eagerly run all queries that | ||
might ever need to read that result. This has to be done manually by calling | ||
those queries. | ||
- Whenever a query tries to access a stolen result, we make the compiler ICE so | ||
that such a condition cannot go unnoticed. | ||
|
||
This is not an ideal setup because of the manual intervention needed, so it | ||
should be used sparingly and only when it is well known which queries might | ||
access a given result. In practice, however, stealing has not turned out to be | ||
much of a maintainance burden. | ||
|
||
To summarize: "Steal queries" break some of the rules in a controlled way. | ||
There are checks in place that make sure that nothing can go silently wrong. | ||
|
||
|
||
## Parallel Query Execution | ||
|
||
The query model has some properties that make it actually feasible to evaluate | ||
multiple queries in parallel without too much of an effort: | ||
|
||
- All data a query provider can access is accessed via the query context, so | ||
the query context can take care of synchronizing access. | ||
- Query results are required to be immutable so they can safely be used by | ||
different threads concurrently. | ||
|
||
The nightly compiler already implements parallel query evaluation as follows: | ||
|
||
When a query `foo` is evaluated, the cache table for `foo` is locked. | ||
|
||
- If there already is a result, we can clone it,release the lock and | ||
we are done. | ||
- If there is no cache entry and no other active query invocation computing the | ||
same result, we mark the key as being "in progress", release the lock and | ||
start evaluating. | ||
- If there *is* another query invocation for the same key in progress, we | ||
release the lock, and just block the thread until the other invocation has | ||
computed the result we are waiting for. This cannot deadlock because, as | ||
mentioned before, query invocations form a DAG. Some thread will always make | ||
progress. | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a dumb question, but why is the query context called
TyCtxt
? I'd have thought it would be calledQryCtxt
orQCtxt
or something.It's been that way as long as I can remember and I assumed it was short for "type context" or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
tcx
is called "type context" or "type checking context" purely for historical reasons, I'd say. There's been talk about renaming to "query context" (e.g.qx
orqcx
) for a while. For now, thetcx
acts the query context (and does a few other things too).