Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upFuture-proof the Futures API #59119
Conversation
rust-highfive
assigned
withoutboats
Mar 11, 2019
rust-highfive
added
the
S-waiting-on-review
label
Mar 11, 2019
This comment has been minimized.
This comment has been minimized.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
This comment has been minimized.
This comment has been minimized.
|
In terms of the list of possible extensions this is intended to be forward compatible with, I would divide them into two groups:
I'd like to be confident our API is sufficient for supporting the former, and rule out the idea that the argument passed to future is the correct way to pass the latter (I see this argument, currently called waker, as strictly a way to interact with the executor this future is being executed on). So I would like to see task locals properly explored to see what would need to change about the current API. I am not really in favor of adding this indirection, because I think we should not pass context unrelated to the executor through this interface. In terms of compatibility I'm much more concerned about ending up with a very unwieldly construction API for |
yoshuawuyts
referenced this pull request
Mar 12, 2019
Open
Tracking issue for RFC 2592, futures_api #59113
This comment has been minimized.
This comment has been minimized.
|
Meta-question: why is this a PR rather than an RFC? It seems to be a significant departure of the Futures API design accepted in rust-lang/rfcs#2592. There isn't consensus yet on this API modification. And maybe it's just me, but by opening a PR directly on stdlib instead of going through the RFC process I feel unnecessary urgency (and tension) is added to the decision making process. |
This comment has been minimized.
This comment has been minimized.
I don't agree. I'd like that we didn't rule it out so early. Other languages have shown it to be useful, and we don't have experience to the contrary in Rust. Forwards-compatibility is not something libstd should rule out quickly. In comparison, ergonomics of creating a waker is not something I would optimize for. Regular users will never have to create one. Only a couple executor libraries will.
It was a question brought up multiple times in the RFC, but never really addressed. The tracking issue mentioned it as an unresolved question. As for why it's a PR already, it might be slightly early. :shrug: |
This comment has been minimized.
This comment has been minimized.
I consider the task-locals discussion to have been settled on the RFC thread. As for whether or not to provide executors this way, adding any additional data like this to the |
This comment has been minimized.
This comment has been minimized.
Only if the |
This comment has been minimized.
This comment has been minimized.
|
The future-proofing was left as an open-question during FCP. We will still require an FCP for stabilization which cover all final amendments that have been made to the API. I don't personally view this as a significant digression from the API suggested there, as the only functional change is to move |
This comment has been minimized.
This comment has been minimized.
|
(From a procedural point-of-view, I think that discussing this point in a PR is reasonable. As @cramertj says, we often make amendments to details from the RFC during the implementation period, particularly when addressing unresolved questions.) |
This comment has been minimized.
This comment has been minimized.
Yup! I agree. |
This comment has been minimized.
This comment has been minimized.
|
I'll try reiterating the core arguments of both sides. Please correct me if my understanding is wrong. The argument in favor of
|
This comment has been minimized.
This comment has been minimized.
|
@stjepang that does not reflect my argument. My argument is that we should not block stabilizing Future on exploring all possibilities of the future api to an extent that we can comfortably say “this is it, we will never have to change this ever again”. My examples are simply illustrations of thibgs that could be possible. The fact is that hardly any of the ecosystem has switched to using explicit waker. Once there is more usage who is to say what other use cases will come up. I came up with possibilities as a straw man. Of course the ecosystem won’t start moving until the api is stable, so it is a chicken / egg problem. The proposal for context is to allow changes to the api in the future as cases come up instead of locking the api down now to potential future changes. Focusing on the specific example is missing the point. |
This comment has been minimized.
This comment has been minimized.
|
@carllerche Thanks for clarifying! Do you think it is possible a use case will come up which cannot be satisfied through other means like TLS or global statics? It seems spawners, reactors, timers, and task locals can be accessed through TLS just fine. (Even wakers could be accessed through TLS but there are valid reasons to pass them as arguments instead.) Or did you mean we might want |
This comment has been minimized.
This comment has been minimized.
|
@stjepang I cannot say for sure as I have not spent the time investigating. This is my point. One issue is going to be FuturesUnordered. This type alters the waker. How would this interact with any task local state set via thread locals? Also, I have had cases where I wanted a fresh task context while polling (using Spawn in 0.1). I don’t know how this would apply. Regardless of this, I’m pretty sure they using the context argument would be noticeably faster than thread locals. So, not future proofing would give up these potential improvements. |
cramertj
force-pushed the
cramertj:cx-back
branch
from
6fa0adc
to
37fdb45
Mar 12, 2019
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
I see the problem with task local state, but don't see how putting additional data into
Sorry, I don't follow. The futures 0.1 crate doesn't have
That is true, but is similar to the argument against the RFC proposing to merge That is not to say you're not raising fair points - I appreciate the concerns around stabilizing |
This comment has been minimized.
This comment has been minimized.
|
Hello, async world. This thread seems to be have slowed down, which is good. I wanted to add a link to an experiment that I am trying here. Below is a Dropbox paper document containing my best effort at a summary of the pros/cons for this change: This is based on me skimming the thread here as well as some of the other threads, along with some conversation with @cramertj, @yoshuawuyts, and @withoutboats. I am posting it here because I would like to encourage others to take a look and offer suggestions -- especially @carllerche, as a strong advocate of this PR. I included some directions in the document, but the idea is that this to be a collaboratively produced summary that captures the major points. To that end, my hope is that -- if you leave a comment -- you can offer alternate wording that takes an existing point and makes it more precise (or perhaps adds details). I will either adopt that wording or try to incorporate it into the main text. I do plan to use editorial discretion to keep this concise, but I don't honestly have a very strong opinion about this particular PR, so I aim to be unbiased. Please let me know if you feel I am not (ideally over privmsg, so we can chat). I know we are all eager to see the |
This comment has been minimized.
This comment has been minimized.
|
To clarify, I think that the decisions re: |
This comment has been minimized.
This comment has been minimized.
|
I just wanted to add to the discussion that using async functions it is possible to provide a context (or any argument) without changing the async fn my_future(ctx: Context) -> () {
println!("Context field: {}", ctx.field);
()
}This leave the |
This comment has been minimized.
This comment has been minimized.
|
@Thomasdezeeuw that'd be more like call context. This PR is about execution context. Things that you may not know at the call site (similar to how we don't know what the waker is yet). |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis My primary argument is stronger than "In short, we cannot know the future." It is: "There are things we know about now that are worth exploring, but lets not hold up stabilization to do so." I have explicitly avoided to argue for any of these potential improvements so far in an effort to focus on stabilization. To be honest, I did not expect the forwards compatibility proposal to be controversial. If it is rejected, then we will need to front load evaluating these changes. |
This comment has been minimized.
This comment has been minimized.
OK. Thanks for that. I have attempted to rework the introduction in a way that I think captures what you wrote here:
If that's not quite right, please supply an edit that works better. =) |
This comment has been minimized.
This comment has been minimized.
|
@jethrogb I would like to be able to get mutable access. |
This comment has been minimized.
This comment has been minimized.
|
I also would like to point out that I have a real use case that could benefit from avoiding thread-local access. Tokio Trace is an instrumentation API for Tokio & Tokio libs that requires access data that is task local. Instrumentation will require accessing this data. Currently it is stored in a thread-local variable. The amount of instrumentation points that can be gated is performance dependent. The faster instrumentation is, the more instrumentation can be added and the greater visibility there is into applications. I would expect there to be multiple orders of thread-local access increase. Given that Tokio Trace still isn't fully shipped and there is no production experience with it, it is hard to say how much avoiding thread-locals will impact it, but I would be surprised if it is not significant. This is just to point out that the performance implications of this PR should not be based on only today's usage of futures. I am calling out Tokio Trace as an example of a usage pattern that could potentially benefit noticeably by additions to the futures task system. |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis a minor point, but you can pretty easily get a |
This comment has been minimized.
This comment has been minimized.
|
That's helpful, thanks! Even though Tokio Trace isn't fully shipped, I wonder if it's possible to get some preliminary profile data? I'm not sure what's state it's in. (I also realize your point is that you might add more profiling if the perf would let you get away with it.) Sounds like a cool tool.
Indeed! And if the only thing you have to thread through to do work is the The point of this part of the doc, though, is that likely one would expect to pass around the |
This comment has been minimized.
This comment has been minimized.
|
Thanks so much for the summary @nikomatsakis, and for getting involved in moderating this discussion at all. I'm currently very busy with other work and I haven't been able to participate in this discussion as much as I might have liked, but I wanted to just write down my current thoughts even if I probably won't be able to engage further very much over the next week and a half. For me the important downside of adding context is not that it makes using it less ergonomic as much as that it introduces complexity to the API that makes it harder to understand. What I like about the current API is that it centers the relationship between polling and waking, and we've reduced the number of additional types (like LocalWaker) to as minimal as possible, making it easier to explain an overview of our futures model to users. This would hurt this by requiring them to navigate through Context to understand waking, and this would be quite unfortunate if we never added anything to Context and it was just unnecessary indirection. I'm also concerned about what seems to me like an arbitrary decision around ownership of Context that's being made right now, and which the conversation between niko and carl has highlighted. Because we have no concrete proposal for what will be added to context someday, we really have no basis to decide by which ownership mode context should be passed, but we need to make that decision in order to stabilize. I have to admit I feel frustrated by the sense of pressure I've felt to approve an API change based on what feels like vaporware proposals. It is not the case that this API change is some strictly neutral forward compatibility. It has some drawbacks, and absent a clearly stated motivation those weigh more heavily to me. |
This comment has been minimized.
This comment has been minimized.
I briefly talked with @cramertj in Discord re: an API that I was interested in exploring. His response was (paraphrasing): "sounds interesting, lets punt until after stabilization". This is a reasonable response. I highlighted ways in which the If the concern is that there is no concrete proposal, then I can try to work on something, though it would be much easier to evaluate the merits once the ecosystem has moved vs. now. This would take time.
Dismissing the thoughts as vaporware instead of based on merit does not add to the conversation. I am assuming you are referring to this as the vaporware? |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis thank you for the write-up, I think that has been useful! As far as I can tell, the very core of the issue is at the intersection of:
Since that is the case, I think it is useful to trace the lineage of this particular set of decisions so that we don't re-do conversations that have been had before. We initially added an explicit
So, all that is to say: I don't think a formal justification was ever given for the change from I think this discussion therefore comes down to this: are there concrete examples or strong arguments that counter the original reasoning for adding If I have missed links to important historical arguments here, I'd appreciate if someone could point them out. |
This comment has been minimized.
This comment has been minimized.
Absolutely not, tokio-trace seems like a great project. The "API you are interested in exploring" is what I'm referring to. |
This comment has been minimized.
This comment has been minimized.
For my part, I don't know of any historical arguments to this side. When I made the PR you referenced, I removed |
This comment has been minimized.
This comment has been minimized.
|
@cramertj ah, sorry, I didn't intend to imply that you made the decision as a "we shouldn't have |
This comment has been minimized.
This comment has been minimized.
oh no worries! you're absolutely correct and that's how I interpreted your comment. |
This comment has been minimized.
This comment has been minimized.
|
I mentioned the API I was thinking of in discord. Roughly, it would be a way to get access to extension types: impl<'a> Context<'a> {
fn get_ext_mut<T: 'static>(&mut self) -> Option<&mut T> { ... }
}To impl, pub trait Ext {
unsafe fn get_mut_raw(&mut self, type_id: TypeId) -> Option<*mut ()>;
}Unlike downcasting, it would allow implementations to potentially match multiple type IDs. The Tokio runtime could use this to pass along runtime context. For Tokio Trace, this could be the subscriber that points of instrumentation. The instrumentation point would then do something (very roughly): if let Some(subscriber) = cx.get_ext_mut::<trace::Subscriber>() {
if subscriber.is_interested_in(event_callsite) { // don't do work if not interested in event
subscriber.event(...);
}
}I would like it to be possible to add a lot of instrumentation points at trace level. Doing so would hit the thread-local a lot. It would be nice to avoid it. A off the cuff micro benchmark shows significant improvement in avoiding TLS. Of course, I have no idea how big the impact could be with real usage and I don't think we are close (in the order of weeks) to having a real world case to experiment with. It would obviously not be anywhere close to as much. Up until now, I specifically avoided details of a proposal because I hoped to punt actually exploring this until much later. |
This comment has been minimized.
This comment has been minimized.
Summarizing the results posted in comments on the linked gist, so far we have:
I don't think this is enough data to draw hard conclusions, but it's entirely possible that TLS is not slower than Context on all modern systems except macOS. The case of macOS might be explained by it having a suboptimal threading implementation in some ways (we also had problems in Tokio with thread yielding on macOS which is why I find this somewhat unsurprising). Finally, here's a C++ benchmark showing TLS is essentially as fast as normal variables (I assume this is x86-64 linux). |
This comment has been minimized.
This comment has been minimized.
|
Why don't we need blackbox in that micro benchmark? |
This comment has been minimized.
This comment has been minimized.
|
@stjepang In the C++ benchmark with gcc backend, local var is faster than TLS. |
This comment has been minimized.
This comment has been minimized.
Here's with Code#[bench]
fn tls(b: &mut test::Bencher) {
b.iter(|| {
for _ in 0..ITER {
FOO.with(|f| {
let v = f.get() + 1;
f.set(v);
test::black_box(v);
});
}
});
}
#[bench]
fn obj(b: &mut test::Bencher) {
let mut ext = MyExt { num: 0 };
let mut cx = Context { ext: &mut ext };
b.iter(|| {
for _ in 0..ITER {
let r = cx.get_mut::<usize>().unwrap();
let v = *r + 1;
*r = v;
test::black_box(v);
}
});
}Results (x86-64 linux)TLS is slightly slower now, but not a huge difference:
Looks like LLVM compiles the benchmark slightly better in this case. Still, what matters is the fact that the TLS variable is simply stored at address |
This comment has been minimized.
This comment has been minimized.
|
@stjepang's findings also seem to line up with the numbers Google reported on TLS overhead for their fleet-wide tracing infrastructure ("almost negligable"): section 2.2
section 4.1
|
This comment has been minimized.
This comment has been minimized.
I think we do. Adding [profile.bench]
codegen-units = 1
incremental = falseI get
|
This comment has been minimized.
This comment has been minimized.
|
@lnicola Doesn't that disable ThinLTO? What happens if you add |
This comment has been minimized.
This comment has been minimized.
|
@eddyb With I also tried adding
regardless of LTO. But I question the validity of |
This comment has been minimized.
This comment has been minimized.
|
On Linux x86-64 with the black box variant: codegen [profile.bench]
codegen-units = 1
codegen, incremental off [profile.bench]
codegen-units = 1
incremental = false
These results seem to be similar to what others have found without the black box / optimizations. I think if we look at the context in which these methods are planned to be used it's fair to say it's unlikely choosing one method over the other is going to be the cause of a bottleneck in an application. |
This comment has been minimized.
This comment has been minimized.
BigBigos
commented
Mar 21, 2019
|
Please note that the microbenchmark operates solely on the L1 cache of modern CPUs. With 1000 iterations, each iteration takes 1-2ns in most of your results. That's very fast, I don't think that is attainable in practice. Regarding TLS penalty, according to Agner [1], loads with non-0 segment base (as is used for TLS) add usually a single cycle to load-to-use latency and a single prefix byte of code size, but the benchmark you are using is a throughput benchmark (the iterations don't depend on each other) so the latency effect is not visible. When the data is not cached, this penalty doesn't matter in the least, as the performance will be dominated by the cache miss. If the additional data is not accessed often, the TLS version might miss the L1 and maybe even the other caches. Depending on how the Context version is implemented, it can be a lot faster (if the held data is stored in the same cache line as the Context) or similar (if it stored in some sort of a hash map). What you are also not testing is how the TL;DR; I believe the microbenchmark doesn't tell us much and we can't conclude which approach is faster using it. |
This comment has been minimized.
This comment has been minimized.
|
I feel like we're getting overly bogged down into a discussion of TLS performance. While that may be relevant in the sense that if TLS was significantly slower, that'd be strong argument in favor of Also, as an aside, I'd like to respond to this comment from @withoutboats:
I agree with you that it's unfortunate that we have to decide some of these things ahead of time. One could argue that this implies that we're not ready to stabilize On a more general note, I want the RFC to give a serious and thorough explanation of the alternatives in this space, which I don't think it currently does. And I don't think it's okay for the RFC to go ahead if it cannot make those arguments. This is an important core component of Rust's async story, and saying "it seems to have worked out fine thus far" isn't compelling enough when the trait is still in flux, the ecosystem is still primarily using futures 0.1, and maintainers of important pieces of infrastructure (like |
This comment has been minimized.
This comment has been minimized.
|
@yoshuawuyts while related, the use case for Tokio Trace is fairly different than Dapper. As such, the performance trade offs must be re-evaluated. For example, it is intended for Tokio Trace to be able to instrument the body of a hot loop. I did hesitate to provide the microbench in addition to the API illustration. The convo is getting derailed. As @BigBigos outlined the benchmark is a poor setup. And there is more than performance in question. IMO either we need this PR or we need to do a full investigation of the proposed alternatives. I would rather move forward with stabilizing futures. |
This comment has been minimized.
This comment has been minimized.
|
ping from triage @cramertj @withoutboats any updates on this? |
This comment has been minimized.
This comment has been minimized.
|
@Dylan-DPC I'm waiting on feedback from @rust-lang/libs as to their thoughts on this type of future-proofing, as @nikomatsakis mentioned above. |
Dylan-DPC
added
S-waiting-on-team
and removed
S-waiting-on-review
labels
Apr 1, 2019
This comment has been minimized.
This comment has been minimized.
|
Thanks. labeled it accordingly :) |
This comment has been minimized.
This comment has been minimized.
|
The libs team got a chance to discuss this today (sorry for the delay!), and we wanted to make sure to thank @nikomatsakis and all involved in creating the summary document, it clearly took a lot of work and was very carefully crafted! Our conclusion was that the libs team didn't really feel strongly one way or another on this issue. There's good arguments both ways and nothing really jumped out as a clear winner one way or another. Furthermore we weren't able to think of any examples either in libstd or throughout the ecosystem to draw on for inspiration in attempting to reach a conclusion. What we did conclude, though, was that the issue most paramount is actually stabilizing these APIs. Futures are clearly going to make a huge splash in Rust, so the sooner we can get it out on stable the better! |
cramertj commentedMar 11, 2019
cc #59113, @carllerche, @rust-lang/libs
r? @withoutboats