Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCross item dependencies, take 2 #30532
Conversation
rust-highfive
assigned
michaelwoerister
Dec 22, 2015
This comment has been minimized.
This comment has been minimized.
|
I'll take a close look at this later today. |
This comment has been minimized.
This comment has been minimized.
|
I never replied to this comment of yours, sorry:
I think this is equivalent to the current setup in terms of the final result. It might result in a more compact dependency graph (fewer edges), which seems like a good thing, but I see it as a bit of an optimization that strays from the mostly "pure" form that I've done so far. Still, might be a good idea just because a smaller dep graph is generally a win. The "pure" form I am describing is that we keep a split between "fn" and "data". That is, we have various task nodes, like
Hmm. Well, first, I don't see that the edge from task to subtask is "artificial": it models return values. That is, pushing a subtask corresponds to calling a fn or something like that, which has a return value. If we were to omit the edges from task to subtask, then there would be cases where that return value allows data to leak in a kind of subtle way, like so: fn foo(tcx: DepGraph, key: Key) -> Ty<'tcx> {
let _task = tcx.dep_graph.in_task(Foo);
tcx.some_map[&key]
}Note though there is a corresponding potential for a leak that comes through the arguments. That is, when entering a new task, you must "account for" all the data you gained access through implicitly from your scope: fn foo(tcx: DepGraph, def_id: DefId, ty: Ty<'tcx>) -> Ty<'tcx> {
// here, `ty` is the type of `def_id`
let _task = tcx.dep_graph.in_task(Foo);
use(ty); // this read would otherwise go unnoticed
tcx.some_map[&def_id]
}So in a case like this you should add So I guess I could see an argument that we should treat arguments/return-values the same way: you must account for any and all data that comes in/out of a task with explicit reads/writes (or perhaps we add some helper called
I guess here you are suggesting that one task might invoke another in some kind of random order? Still, I do sort of like the idea of distinguishing |
This comment has been minimized.
This comment has been minimized.
|
@michaelwoerister so I sort of forgot the context of the last comment in writing my reply. I think I stand by everything I said, but I had forgotten we were talking about memoized maps in particular. Thinking about this more, I agree with you that the best approach is to say that the "current task" in such a case is the node for the map-entry itself that we are looking up. I still think it's equivalent to the setup that exists today though. I was debating about integrating this into the
The interesting tricky case is that final edge. If we go back to the code in question, it would now look something like:
Note the line marked Other options:
For now I think I prefer the first choice, which is roughly what you suggested. The last idea sounds cool but seems like it's getting just a bit too clever. Maybe worth experimenting with though, it would really help make the system more bullet-proof. (*) I'm actually not entirely clear on what that set is. Certainly HIR nodes. But probably |
This comment has been minimized.
This comment has been minimized.
|
I've also been thinking about these "meta-nodes" like It's not clear to me that it is useful to push such tasks. It seems like pure overhead. It would be better to replace them with two distinct tasks, I think: "read-only" and "meta". The rule of thumb would be something like this:
Note that using
Now anything that depends on Note that I'm actually not aware of any places that we need the Meta task. I think read-only suffices. But it does seem like something we might want in general. |
This comment has been minimized.
This comment has been minimized.
I think we are just talking about two different but equivalent dependency tracking models here. In mine, there's only data, while yours also has nodes for functions/tasks being executed. There is not much of a difference though as long functions don't have side-effects other than writing the piece of data that they are supposed to produce, just that from the perspective of a "data-only" model, the "fn+data" model introduces an "artificial" intermediate node:
As long as we are only interested in data, and each piece of data has a unique key, I think it's simple to reason about. Each piece of code that contributes to the value of some piece of data just has to say so by specifying the key of that piece of data before reading any inputs -- i.e., just what In the end, it does not matter which model we want to follow, as long as we know how to reason about it and it is simple to use in practice. The data-only model is simpler, I think, but it might be less obvious to use, since it doesn't correspond as closely to the structure of the code itself. On the other hand, a "false correspondence", i.e. something that works almost like something else but doesn't in corner cases, can be its own source of bugs and confusion.
True, when function executions are modeled within the dependency graph, they are not. As said above, it's modeling function executions in the graph itself that can be viewed as artificial.
I was thinking of two things here:
Both of these things would not lead to incorrect results, but I do see them as kind of a code smell.
Maybe we could think about introducing something like a
It's definitely a neat idea. I like how Rust allows one to express such things
I would probably move the |
This comment has been minimized.
This comment has been minimized.
This seems about right. What is appealing to me about the design I originally put forward is that one can instrument the code in a naive and direct way, just by mapping out the structure of the tasks it is doing, and by identifying the shared state. This accommodates all kind of code, including functions that produce multiple values or which don't produce any value at all (e.g., lint checking). However, I think what I really want it so adopt a kind of hybrid. That is, where it makes sense, we can conflate the task nodes with the nodes for the value that they are producing. I do this already for memoized tables in some places, and it probably makes sense to do it in the type-checking case. Basically there are places where having distinct nodes for the table and the code that populates it feels sort of artificial, and I think in those cases it makes sense to use a single node. |
nikomatsakis
added some commits
Dec 22, 2015
nikomatsakis
force-pushed the
nikomatsakis:cross-item-dependencies
branch
from
f9cebb2
to
11c671b
Jan 6, 2016
This comment has been minimized.
This comment has been minimized.
|
@michaelwoerister ok, pushed an updated version. It turned out I was wrong about the overhead -- measurements now show around 1 or 2% building rustdoc, and it appears to be slightly faster if you check for a boolean enabled flag. In any case, this isn't quite ready to land -- there are some TODOs for which I should open separate issues, for example -- but it's ready to review, I think. (Essentially, I'd like to land basically this code and go from there.) |
michaelwoerister
reviewed
Jan 6, 2016
| Instead, the idea is to *encapsulate* shared state into some API that | ||
| will invoke `read` and `write` automatically. The most common way to | ||
| do this is to use a `DepTrackingMap`, described in the next section, | ||
| but any sort of abstraction brarier will do. In general, the strategy |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
| `DepNode::ItemSignature(K)` for a given key. | ||
|
|
||
| Once that is done, you can just use the `DepTrackingMap` like any | ||
| other map. |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
| There are various ways to write tests against the dependency graph. | ||
| The simplest mechanism are the | ||
| `#[rustc_if_this_changed` and `#[rustc_then_this_would_need]` |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
| ``` | ||
| would select the predecessors of all `TypeckItemBody` nodes. Usually though you | ||
| want the `TypeckItemBody` nod for some particular fn, so you might write: |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
| This will dump out all the nodes that lead from `Hir(foo)` to | ||
| `TypeckItemBody(bar)`, from which you can (hopefully) see the source | ||
| of the erroneous edge. |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
| map: FnvHashMap<M::Key, M::Value>, | ||
| } | ||
|
|
||
| pub trait DepTrackingMapId { |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
Jan 6, 2016
Contributor
DepTrackingMapId is a bit confusing as a name for this trait.
michaelwoerister
reviewed
Jan 6, 2016
| /// **Important:* when `op` is invoked, the current task will be | ||
| /// switched to `Map(key)`. Therefore, if `op` makes use of any | ||
| /// HIR nodes or shared state accessed through its closure | ||
| /// environment, it must explicitly read that state. As an |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
Jan 6, 2016
Contributor
I'd write this as it must explicitly *read* that state. or it must explicitly register a read from that state. to emphasize that "read" refers to a dependency graph edge. Otherwise it's a bit confusing.
michaelwoerister
reviewed
Jan 6, 2016
| /// The key is the line marked `(*)`: the closure implicitly | ||
| /// accesses the body of the item `item`, so we register a read | ||
| /// from `Hir(item_def_id)`. | ||
| fn memoize<OP>(&self, key: M::Key, op: OP) -> M::Value |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
Jan 6, 2016
Contributor
I think it's very nice how memoize() works now, keeping the depgraph clean.
michaelwoerister
reviewed
Jan 6, 2016
| use super::{DepGraphQuery, DepNode}; | ||
|
|
||
| pub struct DepGraphEdges { | ||
| ids: Vec<DepNode>, |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
| pub fn push_task(&mut self, key: DepNode) { | ||
| let top_node = self.current_node(); | ||
|
|
||
| let new_node = self.make_node(key.clone()); |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
| // where to send buffer when full | ||
| swap_out: Sender<Vec<DepMessage>>, | ||
|
|
||
| // where to receiver query results |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
| //! we send that vector over to the depgraph thread. At the same time, | ||
| //! we receive an empty vector from the depgraph thread that we can use | ||
| //! to accumulate more messages. This way we only ever have two vectors | ||
| //! allocated (and both have a fairly large capacity). |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
| pub rcache: RefCell<FnvHashMap<ty::CReaderCacheKey, Ty<'tcx>>>, | ||
|
|
||
| // Cache for the type-contents routine. FIXME -- track deps? |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
Jan 6, 2016
Contributor
Good question. Initially I thought that it's probably not necessary, since relevant dependencies should already be covered by previous phases. But with generics in play it becomes less clear...
michaelwoerister
reviewed
Jan 6, 2016
|
|
||
| pub fn trait_items(&self, trait_did: DefId) -> Rc<Vec<ty::ImplOrTraitItem<'tcx>>> { | ||
| // since this is cached, pushing a dep-node for the | ||
| // computation yields the correct dependencies. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Jan 6, 2016
Author
Contributor
Isn't this the case that DepTrackingMap::memoize() is for?
Ah yes, I meant to port this and repr_hints to use the new memoize() helper, but forgot.
michaelwoerister
reviewed
Jan 6, 2016
| return None; | ||
| } | ||
|
|
||
| let filters: Vec<&str> = filter.split("&").map(|s| s.trim()).collect(); |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
reviewed
Jan 6, 2016
|
|
||
| #[rustc_then_this_would_need(ItemSignature)] //~ ERROR OK | ||
| #[rustc_then_this_would_need(CollectItem)] //~ ERROR OK | ||
| fn indirect(x: WillChange) { } |
This comment has been minimized.
This comment has been minimized.
michaelwoerister
Jan 6, 2016
Contributor
What's the difference to the some_fn case above? Should this be x: &WillChange maybe?
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Jan 6, 2016
Author
Contributor
What's the difference to the some_fn case above? Should this be x: &WillChange maybe?
Good catch. Was supposed to be WillChanges, I think.
michaelwoerister
reviewed
Jan 6, 2016
| // option. This file may not be copied, modified, or distributed | ||
| // except according to those terms. | ||
|
|
||
| // Test that two unrelated functions have no trans dependency. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Looks very good! Great readme, nice incremental improvements over the previous version too. |
nikomatsakis
added some commits
Jan 6, 2016
This comment has been minimized.
This comment has been minimized.
|
@bors r=mw |
This comment has been minimized.
This comment has been minimized.
|
|
nikomatsakis commentedDec 22, 2015
This is roughly the same as my previous PR that created a dependency graph, but that:
RUST_DEP_GRAPH_FILTERto filter the dep graph to just the nodes you are interested in, which is super help.This is potentially a
[breaking-change]for plugin authors. If you are poking about in tcx state or something like that, you probably want to addlet _ignore = tcx.dep_graph.in_ignore();, which will cause your reads/writes to be ignored and not affect the dep-graph.After this, or perhaps as an add-on to this PR in some cases, what I would like to do is the following:
MetanodeRegarding performance: adding this tracking does add some overhead, approximately 2% in my measurements (I was comparing the build times for rustdoc). Interestingly, enabling or disabling tracking doesn't seem to do very much. I want to poke at this some more and gather a bit more data -- in some tests I've seen that 2% go away, but on others it comes back. It's not entirely clear to me if that 2% is truly due to constructing the dep-graph at all.
The next big step after this is write some code to dump the dep-graph to disk and reload it.
r? @michaelwoerister