Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future-proof the Futures API #59119

Open
wants to merge 1 commit into
base: master
from

Conversation

@cramertj
Copy link
Member

cramertj commented Mar 11, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Mar 12, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:03471c00:start=1552348735192620946,finish=1552348811489474158,duration=76296853212
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
Setting environment variables from .travis.yml
---
[00:03:31]    Compiling build_helper v0.1.0 (/checkout/src/build_helper)
[00:03:35] error[E0432]: unresolved import `task::Context`
[00:03:35]  --> src/libcore/future/future.rs:8:12
[00:03:35]   |
[00:03:35] 8 | use task::{Context, Poll};
[00:03:35]   |            ^^^^^^^ no `Context` in `task`
[00:03:37]    Compiling compiler_builtins v0.1.5
[00:03:37]    Compiling cmake v0.1.33
[00:03:37]    Compiling backtrace-sys v0.1.27
[00:03:40]    Compiling std v0.0.0 (/checkout/src/libstd)
[00:03:40]    Compiling std v0.0.0 (/checkout/src/libstd)
[00:03:40]    Compiling rustc_msan v0.0.0 (/checkout/src/librustc_msan)
[00:03:40]    Compiling rustc_tsan v0.0.0 (/checkout/src/librustc_tsan)
[00:03:41]    Compiling rustc_lsan v0.0.0 (/checkout/src/librustc_lsan)
[00:03:41]    Compiling rustc_asan v0.0.0 (/checkout/src/librustc_asan)
[00:03:47] error[E0308]: mismatched types
[00:03:47]    --> src/libcore/task/wake.rs:117:9
[00:03:47]     |
[00:03:47] 116 |       ) -> RawWaker {
[00:03:47]     |            -------- expected `task::wake::RawWaker` because of return type
[00:03:47] 117 | /         RawWakerVTable {
[00:03:47] 119 | |             wake,
[00:03:47] 120 | |             drop,
[00:03:47] 121 | |         }
[00:03:47] 121 | |         }
[00:03:47]     | |_________^ expected struct `task::wake::RawWaker`, found struct `task::wake::RawWakerVTable`
[00:03:47]     |
[00:03:47]     = note: expected type `task::wake::RawWaker`
[00:03:47]                found type `task::wake::RawWakerVTable`
[00:03:49] error: aborting due to 2 previous errors
[00:03:49] 
[00:03:49] Some errors occurred: E0308, E0432.
[00:03:49] For more information about an error, try `rustc --explain E0308`.
---
travis_time:end:0e2ece00:start=1552349050582176853,finish=1552349050587060466,duration=4883613
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:01aad72e
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:1f3d35a7
travis_time:start:1f3d35a7
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0b2028f0
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@withoutboats

This comment has been minimized.

Copy link
Contributor

withoutboats commented Mar 12, 2019

In terms of the list of possible extensions this is intended to be forward compatible with, I would divide them into two groups:

  • Things related to the executor (e.g. task locals)
  • Things unrelated to the executor but instead related to sources of events (e.g. exposing a global timer or reactor)

I'd like to be confident our API is sufficient for supporting the former, and rule out the idea that the argument passed to future is the correct way to pass the latter (I see this argument, currently called waker, as strictly a way to interact with the executor this future is being executed on).

So I would like to see task locals properly explored to see what would need to change about the current API. I am not really in favor of adding this indirection, because I think we should not pass context unrelated to the executor through this interface.

In terms of compatibility I'm much more concerned about ending up with a very unwieldly construction API for RawWaker because we add a bunch of optional fields than I am about having the argument called Waker. Almost so that I'd consider holding off on stabilizing rawwaker's constructor(s) for a cycle or two while we explore ways of interacting with the executor beyond waking.

@yoshuawuyts yoshuawuyts referenced this pull request Mar 12, 2019

Open

Tracking issue for RFC 2592, futures_api #59113

2 of 5 tasks complete
@yoshuawuyts

This comment has been minimized.

Copy link
Member

yoshuawuyts commented Mar 12, 2019

Meta-question: why is this a PR rather than an RFC? It seems to be a significant departure of the Futures API design accepted in rust-lang/rfcs#2592.

There isn't consensus yet on this API modification. And maybe it's just me, but by opening a PR directly on stdlib instead of going through the RFC process I feel unnecessary urgency (and tension) is added to the decision making process.

@seanmonstar

This comment has been minimized.

Copy link
Contributor

seanmonstar commented Mar 12, 2019

I'd like to be confident our API is sufficient for supporting the former, and rule out the idea that the argument passed to future is the correct way to pass the latter.

I don't agree. I'd like that we didn't rule it out so early. Other languages have shown it to be useful, and we don't have experience to the contrary in Rust. Forwards-compatibility is not something libstd should rule out quickly.

In comparison, ergonomics of creating a waker is not something I would optimize for. Regular users will never have to create one. Only a couple executor libraries will.

why is this a PR rather than an RFC?

It was a question brought up multiple times in the RFC, but never really addressed. The tracking issue mentioned it as an unresolved question. As for why it's a PR already, it might be slightly early. :shrug:

@cramertj

This comment has been minimized.

Copy link
Member Author

cramertj commented Mar 12, 2019

@withoutboats

I'd like to be confident our API is sufficient for supporting the former, and rule out the idea that the argument passed to future is the correct way to pass the latter (I see this argument, currently called waker, as strictly a way to interact with the executor this future is being executed on).

I consider the task-locals discussion to have been settled on the RFC thread.

As for whether or not to provide executors this way, adding any additional data like this to the Context type would require an additional RFC thread, and I'd be happy to argue this point there. I'm not interested in discussing it here, as a decision there is not a blocker for stabilizing the Future trait.

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Mar 12, 2019

@cramertj

I'm not interested in discussing it here, as a decision there is not a blocker for stabilizing the Future trait.

Only if the Future trait is forwards-compatible to that change :). I agree with you entirely that Future should be stabilized sooner than later and discussing any possible additions can be punted until later. This is why I am hoping we can agree to a forwards-compatible API and punt any further discussion until the ecosystem catches up.

@cramertj

This comment has been minimized.

Copy link
Member Author

cramertj commented Mar 12, 2019

@yoshuawuyts

The future-proofing was left as an open-question during FCP. We will still require an FCP for stabilization which cover all final amendments that have been made to the API.

I don't personally view this as a significant digression from the API suggested there, as the only functional change is to move &Waker into a wrapper type with a constructor.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 12, 2019

(From a procedural point-of-view, I think that discussing this point in a PR is reasonable. As @cramertj says, we often make amendments to details from the RFC during the implementation period, particularly when addressing unresolved questions.)

@cramertj

This comment has been minimized.

Copy link
Member Author

cramertj commented Mar 12, 2019

@carllerche

Only if the Future trait is forwards-compatible to that change :). I agree with you entirely that Future should be stabilized sooner than later and discussing any possible additions can be punted until later. This is why I am hoping we can agree to a forwards-compatible API and punt any further discussion until the ecosystem catches up.

Yup! I agree.

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Mar 12, 2019

I'll try reiterating the core arguments of both sides. Please correct me if my understanding is wrong.

The argument in favor of Context

We need to make all components of the runtime (waker, spawner, reactor, timer, task locals) available to the polled Future one way or another. We'll either make them available through TLS or through Context passed as an argument to poll().

Passing only Waker as an argument means only one component of the runtime (waker) is made available so everything else will have to be threaded through TLS. This doesn't make sense - either everything should go through TLS or everything should go through the Context argument.

Since we've already decided in a previous RFC that Waker should be passed as an argument, then it makes sense to pass everything else that way, too!

The argument against Context

In order to have fully working async/await, we need three things: the Future trait, Pin for self-references, and Waker for notification. That's it. Everything else (spawner, reactor, timer, task locals) are completely orthogonal features and irrelevant to async/await. As such, they do not belong to the core API!

Futhermore, we want to make all components of runtimes decoupled as much as possible. Passing a Context to every Future would de facto make it the central configuration for the whole runtime. There are potential compatibility hazards among runtime components in such design so we'd rather avoid it altogether.


My feeling is we should not have Context because passing the whole runtime configuration to poll() simply seems like an inappropriate thing to do. It would burden the API with something it doesn't really need and I think the API should be the minimal set of features that makes fully working async/await possible. No more, no less.

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Mar 12, 2019

@stjepang that does not reflect my argument. My argument is that we should not block stabilizing Future on exploring all possibilities of the future api to an extent that we can comfortably say “this is it, we will never have to change this ever again”.

My examples are simply illustrations of thibgs that could be possible. The fact is that hardly any of the ecosystem has switched to using explicit waker. Once there is more usage who is to say what other use cases will come up. I came up with possibilities as a straw man.

Of course the ecosystem won’t start moving until the api is stable, so it is a chicken / egg problem.

The proposal for context is to allow changes to the api in the future as cases come up instead of locking the api down now to potential future changes.

Focusing on the specific example is missing the point.

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Mar 12, 2019

@carllerche Thanks for clarifying!

Do you think it is possible a use case will come up which cannot be satisfied through other means like TLS or global statics? It seems spawners, reactors, timers, and task locals can be accessed through TLS just fine. (Even wakers could be accessed through TLS but there are valid reasons to pass them as arguments instead.)

Or did you mean we might want Waker to be able to do more than wake, be cloned, and be dropped? By that I mean to do something unrelated to other components of the runtime (spawner, reactor, timer, task locals).

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Mar 12, 2019

@stjepang I cannot say for sure as I have not spent the time investigating. This is my point.

One issue is going to be FuturesUnordered. This type alters the waker. How would this interact with any task local state set via thread locals? Also, I have had cases where I wanted a fresh task context while polling (using Spawn in 0.1). I don’t know how this would apply.

Regardless of this, I’m pretty sure they using the context argument would be noticeably faster than thread locals. So, not future proofing would give up these potential improvements.

@cramertj cramertj force-pushed the cramertj:cx-back branch from 6fa0adc to 37fdb45 Mar 12, 2019

@sagebind

This comment has been minimized.

Copy link
Contributor

sagebind commented Mar 13, 2019

  • Unknown unknowns are always unknown. Unless it significantly hurts the API today, forwards compatibility is always a welcome property.
  • I'd avoid stuffing things into thread-local storage if we can. They're like global variables; they should be used with care and only if necessary, and can come with a performance cost. This PR makes them not necessary if we discover new things that could be added down the road. Of course we don't know what those are now, see my previous point.
  • Most code will never touch neither poll nor wakers, especially when async/await lands, so I wouldn't too much weight on the immediate ergonomics.
@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Mar 13, 2019

@carllerche

One issue is going to be FuturesUnordered. This type alters the waker. How would this interact with any task local state set via thread locals?

I see the problem with task local state, but don't see how putting additional data into Context fundamentally changes the situation. Isn't it that whatever can be done with Contexts can also be done equally well with TLS?

Also, I have had cases where I wanted a fresh task context while polling (using Spawn in 0.1). I don’t know how this would apply.

Sorry, I don't follow. The futures 0.1 crate doesn't have Context and yet can do this through futures::executor::spawn? Why couldn't we access the executor through TLS to spawn a fresh task?

Regardless of this, I’m pretty sure they using the context argument would be noticeably faster than thread locals. So, not future proofing would give up these potential improvements.

That is true, but is similar to the argument against the RFC proposing to merge LocalWaker and Waker. In that RFC, the question was whether the performance gains of avoiding atomics and TLS in some cases were worth the ergonomic losses and increased API surface. In the end, we decided to sacrifice performance. Introducing Context is almost a reversal of that decision because now &mut Context is sort of like a beefed up &LocalWaker.

That is not to say you're not raising fair points - I appreciate the concerns around stabilizing Waker and acknowledge the downsides! But still feel pretty confident it's the right way forward.

@nikomatsakis nikomatsakis referenced this pull request Mar 13, 2019

Merged

Roadmap for 2019 #2657

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 17, 2019

Hello, async world. This thread seems to be have slowed down, which is good. I wanted to add a link to an experiment that I am trying here. Below is a Dropbox paper document containing my best effort at a summary of the pros/cons for this change:

Summary Document

This is based on me skimming the thread here as well as some of the other threads, along with some conversation with @cramertj, @yoshuawuyts, and @withoutboats. I am posting it here because I would like to encourage others to take a look and offer suggestions -- especially @carllerche, as a strong advocate of this PR.

I included some directions in the document, but the idea is that this to be a collaboratively produced summary that captures the major points. To that end, my hope is that -- if you leave a comment -- you can offer alternate wording that takes an existing point and makes it more precise (or perhaps adds details). I will either adopt that wording or try to incorporate it into the main text. I do plan to use editorial discretion to keep this concise, but I don't honestly have a very strong opinion about this particular PR, so I aim to be unbiased. Please let me know if you feel I am not (ideally over privmsg, so we can chat). ❤️

I know we are all eager to see the Future trait stabilized. To that end, since this particular PR seems to be largely a @rust-lang/libs question to me -- it is a question of "best practices" around API design -- I've also asked the @rust-lang/libs team to take a look at the summary and weigh in on the questions at the end. I believe they have this on their agenda for this coming Wednesday.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 17, 2019

To clarify, I think that the decisions re: Future are ultimately both a T-lang and a T-libs concern, but I am particularly interested to get T-libs feedback here because of the intersection with API design. For example, the future proofing section of the Rust API guidelines covers a number of important points, but doesn't recommend anything clearly related to this PR, so I was curious if this had come up before in prior discussions. (Also, if others have examples they can point at that seem relevant, leave comments in the doc, maybe I'll make a "prior art" section or something like this.)

@Thomasdezeeuw

This comment has been minimized.

Copy link
Contributor

Thomasdezeeuw commented Mar 17, 2019

I just wanted to add to the discussion that using async functions it is possible to provide a context (or any argument) without changing the Future trait.

async fn my_future(ctx: Context) -> () {
    println!("Context field: {}", ctx.field);
    ()
}

(playground link)

This leave the Future trait to the bare minimum, but allows for extendability as well as framework/crate specific contexts.

@seanmonstar

This comment has been minimized.

Copy link
Contributor

seanmonstar commented Mar 17, 2019

@Thomasdezeeuw that'd be more like call context. This PR is about execution context. Things that you may not know at the call site (similar to how we don't know what the waker is yet).

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Mar 18, 2019

@nikomatsakis My primary argument is stronger than "In short, we cannot know the future." It is: "There are things we know about now that are worth exploring, but lets not hold up stabilization to do so."

I have explicitly avoided to argue for any of these potential improvements so far in an effort to focus on stabilization. To be honest, I did not expect the forwards compatibility proposal to be controversial. If it is rejected, then we will need to front load evaluating these changes.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 18, 2019

@carllerche

My primary argument is stronger than "In short, we cannot know the future." It is: "There are things we know about now that are worth exploring, but lets not hold up stabilization to do so."

OK. Thanks for that. I have attempted to rework the introduction in a way that I think captures what you wrote here:

Pro: Clearly, futures need to be given a Waker, but frameworks (like tokio) often want to give the future access to other parts of the runtime (e.g., to spawn new tasks), and we may find other bits of data we want in the future. There is a bit of a catch-22: we won’t see widespread adoption until we stabilize, but if we stabilize we lose the ability to add more parameters. Therefore, we should add a flexible Context struct that we can extend later.

If that's not quite right, please supply an edit that works better. =)

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Mar 19, 2019

@jethrogb I would like to be able to get mutable access.

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Mar 19, 2019

@nikomatsakis

I also would like to point out that I have a real use case that could benefit from avoiding thread-local access. Tokio Trace is an instrumentation API for Tokio & Tokio libs that requires access data that is task local. Instrumentation will require accessing this data. Currently it is stored in a thread-local variable.

The amount of instrumentation points that can be gated is performance dependent. The faster instrumentation is, the more instrumentation can be added and the greater visibility there is into applications.

I would expect there to be multiple orders of thread-local access increase. Given that Tokio Trace still isn't fully shipped and there is no production experience with it, it is hard to say how much avoiding thread-locals will impact it, but I would be surprised if it is not significant.

This is just to point out that the performance implications of this PR should not be based on only today's usage of futures. I am calling out Tokio Trace as an example of a usage pattern that could potentially benefit noticeably by additions to the futures task system.

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Mar 19, 2019

@nikomatsakis a minor point, but you can pretty easily get a Waker from an &mut Context though?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 19, 2019

@carllerche

That's helpful, thanks! Even though Tokio Trace isn't fully shipped, I wonder if it's possible to get some preliminary profile data? I'm not sure what's state it's in. (I also realize your point is that you might add more profiling if the perf would let you get away with it.) Sounds like a cool tool.


@jonhoo

a minor point, but you can pretty easily get a Waker from an &mut Context though?

Indeed! And if the only thing you have to thread through to do work is the Waker, that is probably fine, and we are left only with the clarity/ergonomic hit of writing cx.waker().

The point of this part of the doc, though, is that likely one would expect to pass around the cx as a whole. After all, the idea here is to "pave the way" for having other bits of runtime data that may be needed by futures. So if you are going to delegate to another future or invoke some other fn, it may very well take a cx and not just an &Waker. In that case, it's not good enough to just store the Waker in your struct, you really want to store the whole Context.

@withoutboats

This comment has been minimized.

Copy link
Contributor

withoutboats commented Mar 20, 2019

Thanks so much for the summary @nikomatsakis, and for getting involved in moderating this discussion at all.

I'm currently very busy with other work and I haven't been able to participate in this discussion as much as I might have liked, but I wanted to just write down my current thoughts even if I probably won't be able to engage further very much over the next week and a half.

For me the important downside of adding context is not that it makes using it less ergonomic as much as that it introduces complexity to the API that makes it harder to understand. What I like about the current API is that it centers the relationship between polling and waking, and we've reduced the number of additional types (like LocalWaker) to as minimal as possible, making it easier to explain an overview of our futures model to users. This would hurt this by requiring them to navigate through Context to understand waking, and this would be quite unfortunate if we never added anything to Context and it was just unnecessary indirection.

I'm also concerned about what seems to me like an arbitrary decision around ownership of Context that's being made right now, and which the conversation between niko and carl has highlighted. Because we have no concrete proposal for what will be added to context someday, we really have no basis to decide by which ownership mode context should be passed, but we need to make that decision in order to stabilize.

I have to admit I feel frustrated by the sense of pressure I've felt to approve an API change based on what feels like vaporware proposals. It is not the case that this API change is some strictly neutral forward compatibility. It has some drawbacks, and absent a clearly stated motivation those weigh more heavily to me.

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Mar 20, 2019

@withoutboats

Because we have no concrete proposal for what will be added to context someday,

I briefly talked with @cramertj in Discord re: an API that I was interested in exploring. His response was (paraphrasing): "sounds interesting, lets punt until after stabilization". This is a reasonable response. I highlighted ways in which the Future trait needed to be future proofed to support the potential API addition, and here we are.

If the concern is that there is no concrete proposal, then I can try to work on something, though it would be much easier to evaluate the merits once the ecosystem has moved vs. now. This would take time.

based on what feels like vaporware proposals

Dismissing the thoughts as vaporware instead of based on merit does not add to the conversation.

I am assuming you are referring to this as the vaporware?

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Mar 20, 2019

@nikomatsakis thank you for the write-up, I think that has been useful! As far as I can tell, the very core of the issue is at the intersection of:

  • Do we think we will want to expose more resources to poll in the future?
  • If we do, are we okay with that to be passed through thread-local storage?

Since that is the case, I think it is useful to trace the lineage of this particular set of decisions so that we don't re-do conversations that have been had before. We initially added an explicit Context argument in rust-lang-nursery/futures-rfcs#2, and that RFC gives the rationale for why that was. As far as I can see, that never changes in 0.2. In rust-lang/rfcs#2418, we also still had Context. Then, rust-lang/rfcs#2592 came along, and it replaced Context with Waker. It re-iterated some of the arguments for why thread-local storage was undesirable, but as far as I can see, did not motivate the change from Context to Waker. Instead, that change happened in #54339 (pointed out here), and the discussion seems to have happened in rustasync/team#7 and rustasync/team#56 (moreso the latter than the former). However, those discussions never actually reached any consensus. Instead, the change was made with the justification:

The point of having the current std API is to experiment with the intended design-- any design (even the current one) needs to go through a real RFC before stabilization. I think the conversation in rustasync/team#56 has turned up enough motivating factors that we should give this approach a shot. If it doesn't work out or the RFC process resolves to a different design, then we can change it. If you think there's a solid reason we shouldn't do this (to the point that it's not worth trying) please leave a comment on rustasync/team#56 so we can take that into account!

So, all that is to say: I don't think a formal justification was ever given for the change from Context to Waker. And all the discussion and opposition to that change suggests to me that we are not at all sure that a Waker is sufficient. This, I think, is what @carllerche is getting at: we have so many reasons to be hesitant about Waker being all we'll ever need, and the whole point of the change to Context in the first place was that thread-locals were deemed insufficient, and that's why we should go with Context not Waker.

I think this discussion therefore comes down to this: are there concrete examples or strong arguments that counter the original reasoning for adding Context in the first place. If there are, those should be a part of the RFC. Currently, the RFC does not motivate why an explicit argument is added to poll (which we can find in rust-lang-nursery/futures-rfcs#2), and it does not motivate why that argument was changed from Context to Waker compared to futures 0.2 (#54339). I think the RFC must include both of those points, and if a strong argument for the latter cannot be made, then I think that speaks to Context being the right choice.


If I have missed links to important historical arguments here, I'd appreciate if someone could point them out.

@withoutboats

This comment has been minimized.

Copy link
Contributor

withoutboats commented Mar 20, 2019

I am assuming you are referring to this as the vaporware?

Absolutely not, tokio-trace seems like a great project. The "API you are interested in exploring" is what I'm referring to.

@cramertj

This comment has been minimized.

Copy link
Member Author

cramertj commented Mar 20, 2019

@jonhoo

If I have missed links to important historical arguments here, I'd appreciate if someone could point them out.

For my part, I don't know of any historical arguments to this side. When I made the PR you referenced, I removed Context because its only remaining field was &Waker (that PR's primary purpose was to remove the other field), so it seemed fine to just replace it with &Waker. I had not taken any future-proofing of the sort that is being discussed here into account when I did this.

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Mar 20, 2019

@cramertj ah, sorry, I didn't intend to imply that you made the decision as a "we shouldn't have Context", I was just trying to trace the lineage of how we arrived at where we are today, and that seems like the exact point where that happened :)

@cramertj

This comment has been minimized.

Copy link
Member Author

cramertj commented Mar 20, 2019

I was just trying to trace the lineage of how we arrived at where we are today, and that seems like the exact point where that happened :)

oh no worries! you're absolutely correct and that's how I interpreted your comment.

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Mar 20, 2019

I mentioned the API I was thinking of in discord. Roughly, it would be a way to get access to extension types:

impl<'a> Context<'a> {
    fn get_ext_mut<T: 'static>(&mut self) -> Option<&mut T> { ... }
}

To impl, Context could have an additional, optional trait obj to access extensions:

pub trait Ext {
    unsafe fn get_mut_raw(&mut self, type_id: TypeId) -> Option<*mut ()>;
}

Unlike downcasting, it would allow implementations to potentially match multiple type IDs.

The Tokio runtime could use this to pass along runtime context. For Tokio Trace, this could be the subscriber that points of instrumentation.

The instrumentation point would then do something (very roughly):

if let Some(subscriber) = cx.get_ext_mut::<trace::Subscriber>() {
    if subscriber.is_interested_in(event_callsite) { // don't do work if not interested in event
        subscriber.event(...);
    }
}

I would like it to be possible to add a lot of instrumentation points at trace level. Doing so would hit the thread-local a lot. It would be nice to avoid it.

@nikomatsakis

A off the cuff micro benchmark shows significant improvement in avoiding TLS. Of course, I have no idea how big the impact could be with real usage and I don't think we are close (in the order of weeks) to having a real world case to experiment with. It would obviously not be anywhere close to as much.


Up until now, I specifically avoided details of a proposal because I hoped to punt actually exploring this until much later.

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Mar 21, 2019

A off the cuff micro benchmark shows significant improvement in avoiding TLS.

Summarizing the results posted in comments on the linked gist, so far we have:

  • TLS is slower than Context on x86-64 macOS and armv7 linux.
  • TLS is comparable to or faster than Context on x86-64 linux, x86-64 windows, and armv7 linux.

I don't think this is enough data to draw hard conclusions, but it's entirely possible that TLS is not slower than Context on all modern systems except macOS.

The case of macOS might be explained by it having a suboptimal threading implementation in some ways (we also had problems in Tokio with thread yielding on macOS which is why I find this somewhat unsurprising).

Finally, here's a C++ benchmark showing TLS is essentially as fast as normal variables (I assume this is x86-64 linux).

@lzutao

This comment has been minimized.

Copy link
Contributor

lzutao commented Mar 21, 2019

Why don't we need blackbox in that micro benchmark?

@lzutao

This comment has been minimized.

Copy link
Contributor

lzutao commented Mar 21, 2019

@stjepang In the C++ benchmark with gcc backend, local var is faster than TLS.

@stjepang

This comment has been minimized.

Copy link
Contributor

stjepang commented Mar 21, 2019

Why don't we need blackbox in that micro benchmark?

Here's with test::black_box.

Code
#[bench]
fn tls(b: &mut test::Bencher) {
    b.iter(|| {
        for _ in 0..ITER {
            FOO.with(|f| {
                let v = f.get() + 1;
                f.set(v);
                test::black_box(v);
            });
        }
    });
}

#[bench]
fn obj(b: &mut test::Bencher) {
    let mut ext = MyExt { num: 0 };
    let mut cx = Context { ext: &mut ext };

    b.iter(|| {
        for _ in 0..ITER {
            let r = cx.get_mut::<usize>().unwrap();
            let v = *r + 1;
            *r = v;
            test::black_box(v);
        }
    });
}
Results (x86-64 linux)

TLS is slightly slower now, but not a huge difference:

test obj ... bench:       2,239 ns/iter (+/- 123)
test tls ... bench:       2,561 ns/iter (+/- 159)

In the C++ benchmark with gcc backend, local var is faster than TLS.

Looks like LLVM compiles the benchmark slightly better in this case. Still, what matters is the fact that the TLS variable is simply stored at address %fs:0xfffffffffffffffc, which is quick to access.

@yoshuawuyts

This comment has been minimized.

Copy link
Member

yoshuawuyts commented Mar 21, 2019

@stjepang's findings also seem to line up with the numbers Google reported on TLS overhead for their fleet-wide tracing infrastructure ("almost negligable"):

section 2.2

When a thread handles a traced control path, Dapper attaches a trace context to thread-local storage. A trace context is a small and easily copyable container of span attributes such as trace and span ids.

section 4.1

The cost of additional span annotations is almost negligible if the span is not sampled for tracing, consisting of a thread-local lookup in the Dapper runtime, averaging about 9 nanoseconds. If it is sampled, annotating the trace with a string literal – much like what’s shown in Figure 4 – costs 40 nanoseconds on average. These measurements were made on a 2.2GHz x86 server.

@lnicola

This comment has been minimized.

Copy link
Contributor

lnicola commented Mar 21, 2019

Why don't we need blackbox in that micro benchmark?

I think we do. Adding

[profile.bench]
codegen-units = 1
incremental = false

I get

test obj ... bench:           0 ns/iter (+/- 0)
test tls ... bench:         233 ns/iter (+/- 8)
@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 21, 2019

@lnicola Doesn't that disable ThinLTO? What happens if you add lto = true?

@lnicola

This comment has been minimized.

Copy link
Contributor

lnicola commented Mar 21, 2019

@eddyb With lto = true I get 0 ns for obj and 1 ns for tls, which doesn't say much.

I also tried adding test::black_box on the reads (not sure if that's the proper way to use it):

test obj ... bench:       1,388 ns/iter (+/- 50)
test tls ... bench:       2,304 ns/iter (+/- 94)

regardless of LTO.

But I question the validity of the any benchmark without codegen-units = 1. See #11084 (comment) for another instance.

@yoshuawuyts

This comment has been minimized.

Copy link
Member

yoshuawuyts commented Mar 21, 2019

On Linux x86-64 with the black box variant:

codegen

[profile.bench]
codegen-units = 1
test obj ... bench:       2,166 ns/iter (+/- 150)
test tls ... bench:       1,993 ns/iter (+/- 196)

codegen, incremental off

[profile.bench]
codegen-units = 1
incremental = false
test obj ... bench:       2,106 ns/iter (+/- 346)
test tls ... bench:       2,021 ns/iter (+/- 184)

These results seem to be similar to what others have found without the black box / optimizations. I think if we look at the context in which these methods are planned to be used it's fair to say it's unlikely choosing one method over the other is going to be the cause of a bottleneck in an application.

@BigBigos

This comment has been minimized.

Copy link

BigBigos commented Mar 21, 2019

Please note that the microbenchmark operates solely on the L1 cache of modern CPUs. With 1000 iterations, each iteration takes 1-2ns in most of your results. That's very fast, I don't think that is attainable in practice.

Regarding TLS penalty, according to Agner [1], loads with non-0 segment base (as is used for TLS) add usually a single cycle to load-to-use latency and a single prefix byte of code size, but the benchmark you are using is a throughput benchmark (the iterations don't depend on each other) so the latency effect is not visible.

When the data is not cached, this penalty doesn't matter in the least, as the performance will be dominated by the cache miss. If the additional data is not accessed often, the TLS version might miss the L1 and maybe even the other caches. Depending on how the Context version is implemented, it can be a lot faster (if the held data is stored in the same cache line as the Context) or similar (if it stored in some sort of a hash map).

What you are also not testing is how the Ext::get_mut_raw() implementation decides between different TypeIds. Since in the benchmark it is only ever called with TypeId::of::<usize>(), the whole body is probably optimized away. In the real world, the TLS is usually resolved by the (dynamic) linker so the access is simple, but your dynamic get_mut_raw wouldn't have that luxury. I imagine a hash map would be used in such a case, which is probably a lot slower than TLS access and is also held far away from the hot parts of memory (if it is not accessed often). A less convenient strategy would be to store the data inline (as is done in the microbenchmark), but that would allow at most one user (i.e. the tracing framework) to use Ext.

TL;DR; I believe the microbenchmark doesn't tell us much and we can't conclude which approach is faster using it.

[1] https://www.agner.org/optimize/

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Mar 21, 2019

I feel like we're getting overly bogged down into a discussion of TLS performance. While that may be relevant in the sense that if TLS was significantly slower, that'd be strong argument in favor of Context, I think it detracts from some of the other question around the "just use TLS for anything else" argument. Specifically, the explicit context RFC argues for explicit context threading over TLS citing concerns over "Reasoning about code", "Debuggability", and no_std support. In fact, even the current RFC argues that passing implicit context through TLS is undesirable. As far as I can tell, those arguments all apply beyond just Waker, and argue more generally for not providing necessary context through TLS. Thus, we should have reasonable counter-arguments to all of them before we are willing to say "anything beyond Waker should just use TLS".


Also, as an aside, I'd like to respond to this comment from @withoutboats:

I'm also concerned about what seems to me like an arbitrary decision around ownership of Context that's being made right now, and which the conversation between niko and carl has highlighted. Because we have no concrete proposal for what will be added to context someday, we really have no basis to decide by which ownership mode context should be passed, but we need to make that decision in order to stabilize.

I agree with you that it's unfortunate that we have to decide some of these things ahead of time. One could argue that this implies that we're not ready to stabilize Future, but at the same time, we have a catch-22 situation where implementations are waiting to move to the new Future trait precisely because it hasn't stabilized yet. I think we may just have to make a decision as best we can. For what it's worth, the idea of an &mut Context argument isn't new — futures 0.2 had it, as did the first futures RFC — and I don't think this is discussion "just for the sake of discussion". The change from &mut to & was also made in #54339 without consideration of the full implications of that change, and I think the RFC should make a compelling case for why we want & over &mut.


On a more general note, I want the RFC to give a serious and thorough explanation of the alternatives in this space, which I don't think it currently does. And I don't think it's okay for the RFC to go ahead if it cannot make those arguments. This is an important core component of Rust's async story, and saying "it seems to have worked out fine thus far" isn't compelling enough when the trait is still in flux, the ecosystem is still primarily using futures 0.1, and maintainers of important pieces of infrastructure (like tokio) say that there are merits to the alternative designs.

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Mar 21, 2019

@yoshuawuyts while related, the use case for Tokio Trace is fairly different than Dapper. As such, the performance trade offs must be re-evaluated. For example, it is intended for Tokio Trace to be able to instrument the body of a hot loop.

I did hesitate to provide the microbench in addition to the API illustration. The convo is getting derailed. As @BigBigos outlined the benchmark is a poor setup. And there is more than performance in question.

IMO either we need this PR or we need to do a full investigation of the proposed alternatives. I would rather move forward with stabilizing futures.

@Dylan-DPC

This comment has been minimized.

Copy link
Member

Dylan-DPC commented Apr 1, 2019

ping from triage @cramertj @withoutboats any updates on this?

@cramertj

This comment has been minimized.

Copy link
Member Author

cramertj commented Apr 1, 2019

@Dylan-DPC I'm waiting on feedback from @rust-lang/libs as to their thoughts on this type of future-proofing, as @nikomatsakis mentioned above.

@Dylan-DPC

This comment has been minimized.

Copy link
Member

Dylan-DPC commented Apr 1, 2019

Thanks. labeled it accordingly :)

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Apr 3, 2019

The libs team got a chance to discuss this today (sorry for the delay!), and we wanted to make sure to thank @nikomatsakis and all involved in creating the summary document, it clearly took a lot of work and was very carefully crafted!

Our conclusion was that the libs team didn't really feel strongly one way or another on this issue. There's good arguments both ways and nothing really jumped out as a clear winner one way or another. Furthermore we weren't able to think of any examples either in libstd or throughout the ecosystem to draw on for inspiration in attempting to reach a conclusion.

What we did conclude, though, was that the issue most paramount is actually stabilizing these APIs. Futures are clearly going to make a huge splash in Rust, so the sooner we can get it out on stable the better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.