Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC: Existential types with external definition #2492
Conversation
Ericson2314
added some commits
Jun 29, 2018
Centril
added
the
T-lang
label
Jul 2, 2018
This was referenced Jul 2, 2018
Ericson2314
changed the title
RFC: existential types with external definition
RFC: Existential types with external definition
Jul 2, 2018
This comment has been minimized.
This comment has been minimized.
|
@crlf0710 Thanks, copied to the OP disassociated from the specific commit. |
Ericson2314
added some commits
Jul 2, 2018
Ericson2314
referenced this pull request
Jul 2, 2018
Closed
Tracking issue for RFC 2070: stable mechanism to specify the behavior of panic! in no-std applications #44489
This comment has been minimized.
This comment has been minimized.
So if I understand this proposal correctly, it avoids that problem by essentially not generating any code until the instantiation of the existential appears in the crate tree? Every function becomes implicitly generic over whatever instance is picked for whatever existentials are still "open"? That seems like a huge change in the way code is generated, which is why I am surprised to not see it discussed further. If that's not what happens, the I do not understand how you plan to actually implement this proposal. |
This comment has been minimized.
This comment has been minimized.
|
@RalfJung Yes exactly. I didn't discuss it further because since that's what generics do so I figured it wasn't very novel, but I'm to add more text. [FWIW, eventually we might be able to do a bound like |
This comment has been minimized.
This comment has been minimized.
|
@RalfJung "how code is generated" is the simplest part, the type-checking side of things is way more involved. It's very similar to a parametrized |
rkruppe
reviewed
Jul 2, 2018
|
I have doubts about two of the motivating examples |
|
|
||
| - [`core::alloc::GlobalAlloc`](https://doc.rust-lang.org/nightly/core/alloc/trait.GlobalAlloc.html), chosen with [`#[global_allocator]`](https://doc.rust-lang.org/1.23.0/unstable-book/language-features/global-allocator.html) | ||
| - `panic_fmt` chosen with [`#[panic_implementation]`](https://github.com/rust-lang/rfcs/blob/master/text/2070-panic-implementation.md) | ||
| - The OOM hook, modified with [`std::alloc::{set,take}_alloc_error_hook`](https://doc.rust-lang.org/nightly/std/alloc/fn.set_alloc_error_hook.html) |
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 2, 2018
Member
The OOM hook can be changed repeatedly times at run time. I don't know where (if anywhere) this ability is used, but at least it's not obvious that we even can replace the OOM hook with a static singleton. Deciding that requires wading into the details of OOM handling which is probably out of scope for this RFC.
(There is of course the option of building a singleton with mutable state that provides exactly the current API, but if that would be used widely, many of the purported benefits evaporate.)
This comment has been minimized.
This comment has been minimized.
Ericson2314
Jul 2, 2018
Author
Contributor
I should link the alloc lang item that exists right now. All this is just machinary over oom_impl, exposed in alloc::alloc::handle_alloc_error.
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 2, 2018
Member
I don't care much about existing implementation details here, but about the public (if unstable) API provided. If the public API we end up providing in alloc is a runtime-settable hook instead of something statically dispatched for whatever reasons, then it simply isn't very relevant to this RFC (though this RFC, if accepted, would be one way to implement that hook). I am sympathetic to wanting static dispatch by default and letting those who need runtime-varying behavior implement a hook themselves, but again, the details of how OOM handling ought to be done are controversial and out of scope for this RFC, so IMO the RFC text is being over-eager by saying this feature would obsolete the hook in its current form.
This comment has been minimized.
This comment has been minimized.
Ericson2314
Jul 5, 2018
Author
Contributor
N.B. per rust-lang/rust#51607 (comment) we might be changing to a static hook anyways.
| - `panic_fmt` chosen with [`#[panic_implementation]`](https://github.com/rust-lang/rfcs/blob/master/text/2070-panic-implementation.md) | ||
| - The OOM hook, modified with [`std::alloc::{set,take}_alloc_error_hook`](https://doc.rust-lang.org/nightly/std/alloc/fn.set_alloc_error_hook.html) | ||
| - [`std::collections::hash_map::RandomState`](https://doc.rust-lang.org/std/collections/hash_map/struct.RandomState.html), if https://github.com/rust-lang/rust/pull/51846 is merged, the `hashmap_random_keys` lang item | ||
| - [`log::Log`](https://docs.rs/log/0.4.3/log/trait.Log.html) set with [`log::set_logger`](https://docs.rs/log/0.4.3/log/fn.set_logger.html) |
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 2, 2018
Member
Logging can require resources such as files, network connections, run-time configuration data, etc. so it's difficult to make some logger implementations into statics. In theory everything could be initalized lazily on first logging call (though then you haven't eliminated the overhead of dynamic dispatch!), but this mostly just shifting the problem (and run-time costs) around and in more complex scenarios -- e.g. when you want to read a configuration file to determine what kind of logging to do -- this can require making much more state global than currently necessary. It also affects all other control flow surrounding logger initialization, e.g. error reporting from being unable to open a file for logging now has to be moved into the lazy-initialization code.
This comment has been minimized.
This comment has been minimized.
Ericson2314
Jul 2, 2018
Author
Contributor
Not every case needs initialization. And example is serial port logging for embedded.
For the logging case there's these options:
-
Just as local allocation exists, there should be
*_invariants of the macros that allow passing around a local logger. One should never be forced into using a singleton. -
The case where one really wants a single xxx, such that a static is nice because any passed pointer / value is just overhead, but also wants to be sure the xxx is initialized first, is heavily explored by @japaric and the rest of the working group. In general, something still needs to be passed around, but it can just be a ZST "token" indicating initialization is complete. So this is just a riff off the above. My stuff still helps if you want to be monophonic over the token type.
-
Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like
lazy_static+ my stuff can make this more ergonomic. -
Lazy init as you mention. Actually
lazy_staticcrate + my stuff makes this decently ergonomic.
So overall I think mine is still value for not needing to cfg around the no-std case my offering a better separation of concerns between functionality and its "singletonness", while inciting one to make the singleton aspect optional.
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 2, 2018
Member
Not every case needs initialization. And example is serial port logging for embedded.
Yes, obviously.
Just as local allocation exists, there should be *_in variants of the macros that allow passing around a local logger. One should never be forced into using a singleton.
While in theory I commend getting rid of global state, the fact is that such an API for logging is entirely hypothetical while the "global logger" API is pervasive in the ecosystem. If it's a huge pain to replace the current global singleton with a different mechanism for installing the same global singleton, everyone who (for whatever reasons) uses the global singleton will rightly prefer the current way of installing it over what this RFC presents.
Besides, this argument cuts both ways: all the code that avoids global singletons doesn't need this RFC either.
Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like lazy_static + my stuff can make this more ergonomic.
As with the OOM hook, the question is how commonly this would be done. If it's quite common, the benefit of external existential traits for this use case is diminished.
Lazy init as you mention. Actually lazy_static crate + my stuff makes this decently ergonomic.
lazy_init is nice but it doesn't help you get the data needed for initialization (e.g., user configuration) into the block doing the initialization, as it can only reference other globals, not e.g. a local binding created in main. That's what I meant by requiring more state to be made global.
None of this is a fatal objection to using external existential types for logging. I am just saying that for some quite common use cases, it won't have all the claimed benefits and indeed some downsides as well.
This comment has been minimized.
This comment has been minimized.
Ericson2314
Jul 2, 2018
Author
Contributor
What downsides? The worst case of manual initialization forced by effects and user config works exactly like today.
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 2, 2018
Member
The downside is that any logger impementations that needs it has to separately go through the work of (and accept the risk of bugs in) emulating today's behavior. (Plus the churn of changing APIs.)
This comment has been minimized.
This comment has been minimized.
Ericson2314
Jul 2, 2018
Author
Contributor
@rkruppe liblog can come with a ManuallyInitializedLogger<L> that does all that for you.
This comment has been minimized.
This comment has been minimized.
rkruppe
Jul 2, 2018
Member
Hm, true. It's still not clear to me that a significant portion of liblog users would do anything different, but let's leave that discussion at #2492 (comment)
This comment has been minimized.
This comment has been minimized.
|
I agree with @eddyb that the codegen aspect is relatively simple conceptually and in implementation. However, applying this feature to pervasive concerns such as panicking and memory allocations will make a staggering amount of currently monomorphic (or generic but monomorphized in libraries rather than leaf crates) code generic and only monomorphized in leaf crates, which has practically the same impact as MIR-only rlibs: virtually no LLVM IR or machine code is generated while compiling libraries, most of it will be monomorphized and codegen'd only when reaching the leaf crates. While MIR-only libs are very desirable for a number of reasons (see the linked issue), there are also good reasons why we still don't have that feature: it significantly regresses wall-clock build times. Further experiments in this direction are considered blocked by parallelizing rustc, which to my knowledge is being pushed forward but probably still a far cry from being turned on by default, let alone being effective enough to offset the downsides of MIR-only rlibs. This should be taken into account when estimating how quickly this feature can be landed and applied. |
jethrogb
reviewed
Jul 2, 2018
| Only one crate in the build plan can define the `pub extern existential type`. | ||
| Unlike the trait system, there are no orphan restrictions that ensure crates can always be composed: | ||
| any crate is free to define the `pub extern existential type`, as long is it isn't used with another that also does, in which case the violation will only be caught when building a crate that depends on both (or if one of the crates depends on the other). | ||
| This is not very nice, but exactly like "lang items" and the annotations that exist for this purpose today, |
This comment has been minimized.
This comment has been minimized.
jethrogb
Jul 2, 2018
Contributor
This sentence and the rest of this paragraph shouldn't be in the "guide-level explanation"
| As mentioned in the introduction, code gen can be reasoned about by comparing with generic and inlining). | ||
| We cannot generate for code for generic items until they are instantiated. | ||
| Likewise, imagine that everything that uses an `pub extern existential type` gets an extra parameter, | ||
| and then when the `impl pub extern existential type` is defined, we go back and eliminate that parameter by substituting the actual definition. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@rkruppe It need not be pushed all the way to the leaf crates. We can have a |
Ericson2314
added some commits
Jul 2, 2018
This comment has been minimized.
This comment has been minimized.
First off, you've not laid out any monomorphization strategy, but this crucially depends on how and when we monomorphize. If I had to guess, I'd say you are suggesting something like:
There are some problems with that:
"Eventually", hopefully this problem disappears entirely as MIR-only rlibs become feasible in general. The issue is whether we're willing to eat massive regressions in the mean time. |
sfackler
reviewed
Jul 2, 2018
| - Of course, we can always do nothing and just keep the grab bag of ad-hoc solutions we have today, and leave log with just a imperative dynamic solution. | ||
|
|
||
| - We could continue special-casing and white-listing in the compiler the use-cases I give in the motivation, but at least use the same sort of annotation for all of them for consistency. | ||
| But that still requires leaving out `log`, or special casing it for the first time. |
This comment has been minimized.
This comment has been minimized.
sfackler
Jul 2, 2018
Member
I don't think we'd want to use this for log at all. Runtime configuration of loggers is pervasive.
This comment has been minimized.
This comment has been minimized.
Ericson2314
Jul 2, 2018
Author
Contributor
Who does that run-time configuration? The libraries logging or the end application?
sfackler
reviewed
Jul 2, 2018
| } | ||
| } | ||
| extern existential type alloc::Heap = JemallocHeap; |
This comment has been minimized.
This comment has been minimized.
sfackler
Jul 2, 2018
Member
nit: This declaration wouldn't live in the jemalloc crate, but rather the downstream consumer. We want to allow people to pull in jemalloc without using it as the global allocator. (Imagine you want to wrap it in a layer of tracking or whatever)
This comment has been minimized.
This comment has been minimized.
Ericson2314
Jul 2, 2018
•
Author
Contributor
Yes good point thanks @sfackler. I'll amend that example to show the 3rd crate. I suppose that is like today, too.
This comment has been minimized.
This comment has been minimized.
|
So another need for this feature might have popped up in Basically, Without a solution to this problem, we cannot really use SIMD intrinsics effectively in Similar issues happen with |
gnzlbg
referenced this pull request
Sep 20, 2018
Closed
Expose is_x86_feature_detected to libcore #464
gnzlbg
referenced this pull request
Oct 8, 2018
Open
Tracking issue for RFC 2351, "Add `is_sorted` to the standard library" #53485
Centril
added
A-traits
A-typesystem
A-impl-trait
A-syntax
labels
Nov 22, 2018
Ericson2314
referenced this pull request
Jan 6, 2019
Open
Use the parking_lot locking primitives #56410
This comment has been minimized.
This comment has been minimized.
graydon
commented
Jan 12, 2019
|
Oppose. This is too intrusive relative to the tolerable costs of the existing not-great system. (Fwiw we did start with a parameterized module system, but that was forever-ago and there are already enough abstraction systems to get by with. The cost of adding back another here is too high.) |
This comment has been minimized.
This comment has been minimized.
Can you please suggest which existing abstraction system solves the specific issues mentioned in the motivation section of this RFC? |
This comment has been minimized.
This comment has been minimized.
graydon
commented
Jan 12, 2019
Trait objects and lang items. Extensively discussed in the doc and thread here. They work, I gather you don't think they work well enough, I'm saying I think they do and probably the replacement will work badly in different ways (as many in this thread have suggested), as well as incurring implementation and cognitive costs. |
This comment has been minimized.
This comment has been minimized.
|
Not saying that I necessarily agree with this proposal (haven't thought about it in a while), however, lang items are not free either and have their own cognitive and implementation costs. In fact I'd say they have a larger cost when you add all the lang items together that would be otherwise be dealt with by this proposal. Since you are fond of limits, it seems to me that the lang items approach are a bad way to manage limits and allow for growth that is harder to understand and make deliberate choices about on from an overview. |
This comment has been minimized.
This comment has been minimized.
graydon
commented
Jan 12, 2019
|
I think you have that backwards. Adding a general facility for any user to declare this sort of thing will certainly make any concerns you have about their growth and proliferation worse. |
This comment has been minimized.
This comment has been minimized.
|
@graydon so the issue here is nastiness of the intersection of state and modularity. If everything was really up to me, I might embrace capability theory and the principle of least authority, and ban these global singletons (global allocator, global logger) and the ambient authority they grant. But given that most pepople don't want to add that many more (type( parameters on everything, and given that at the very bottom of the software stack the physical world rears it's ugly head (singletons in the form of control registers, interrupt tables, peripherals, and other unique resources), I don't see another choice.
Let me step way back and refer to two works of writing of yours. I've read your https://graydon2.dreamwidth.org/263429.html and your concern about endless complexity in so many places. I would agree complexity and growth is very bad, in fact I would say it is the single worst technical problem with the software ecosystem/industry! I think the key is always to abstract, abstract, abstract. Divide things into as fine "layer" or "components" as possible, and compose them with rigor. In the case of languages this means core languages and syntactic sugar. Thankfully Rust has MIR (and also Chalk). A system IRs and embeddings breaks down complexity and ensures a global coherence of concepts. That said, this proposal does not fit into either of these core languages. That is indeed cause for concern! I can only hope to assuage you that I very much intend this to fit into a larger story of module-language research, not unlike https://graydon2.dreamwidth.org/253769.html . Declaring abstract types with abstract scope up front is a pretty straightforward sugar over Λs (functors) in an https://people.mpi-sws.org/~rossberg/f-ing/-like manner. 1Ml, which you cited, is a wonderful example of when it's nice to skip the sugar and go straight for the core forms (and also unify concepts then some). I'd love to see a combination of that with the phase separation which our "const eval" provides to allow "cost-free" module combination without a proliferation of superfluous syntax. (I'd consider that combination immanent / low-risk research.) The odd parts of this are the fact that "functor crates" can't be applied multiple times, and crates include state in statics. This makes for a crudely substructural module system. The nicer version of that, again to address the inherent state of the nasty problems this tackles, I'd characterize as adventurous/uncertain yet-to-be-done research, but still something I'd expect to be worked out by the time Rust would seek to generalize this feature. I've been musing on this line of things for a while with Rust. Get a chuckle of me starting to write https://github.com/Ericson2314/rust-rfcs/tree/modules late 2014 as if anything so radical could be done 6 months before 1.0! I'm trying to find something this time that 1. is very small and incremental while not being myopic 2. "feels similar" to existing features like impl trait chipping away at the larger modularity problem. and 3. solves an immanent problem rather than just works towards something larger given the reduced scope of the change. |
This comment has been minimized.
This comment has been minimized.
|
Let me back up @Centril more on the danger of lang items. Each lang item can seem deceptively simple: a little cordoned-off trick for a specific problem, modest in the way it presents itself and problems it seeks to tackle. But take all the lang items together and now there's a laundry list of one-off hacks. There's nothing at all to unify them, and they way the chip off problems 1 by 1 prevents problems from piling up to the point a larger solution can be discovered. [I'm hoping the various lang items and annotations that already exist to solve these sorts problems can be desugared into this approach.] Fundamentally, I'm wary of people confusing the sheer amount of information with the difficulty of learning said information. [Lots of rhetoric these days ends up framing simplicity/complexity exclusively in terms of that latter.] I maintain that what's easy to learn is heavily biased by what exists, and pedagogical methods can improve. Learning difficulty is immanent to some specific place/time and can be improved after the fact, but total information is irreducible after the fact. Lang items as a solution reek of just optimizing for the second. |
This comment has been minimized.
This comment has been minimized.
graydon
commented
Jan 12, 2019
|
@Ericson2314 I appreciate the thoughtful reply but I think the ship has sailed. The cognitive space that would be in the language for parametric modules wound up assigned to the trait system. Adding in parametric modules at this point in addition is overkill. As is adding "mini-parametric-modules", which you're doing here. IOW I think I would be more sympathetic to approaching this by trying to define |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jan 13, 2019
|
I've no opinions about the As an aside, adversarial crates can break |
This comment has been minimized.
This comment has been minimized.
To be clear, you're talking about the case where there's a default, an unrelated-looking crate provides an implementation (overriding that default), and there are no other crates that also provide an implementation? Or are you thinking of something else? |
This comment has been minimized.
This comment has been minimized.
The restrictions this imposes (when it is viewed as sugar over full parametric modules and crates) makes it cover less territory already covered by traits. If/when we expand this to a full ML-style module system, there's no reason we cannot also combine that with the trait system too. (Traits would be canonicalized modules/functors.)
I just don't believe this. Traditional batch compilation is a really really simplistic system of incremental compilation. Nothing in the language today or with this PR prevents fancier notions of incremental compilation, which the compiler time is already working on. |
This comment has been minimized.
This comment has been minimized.
|
The story around defaults needs to be cleared up. You will always need a definition of these existential types, as of the time the user doesn't care and wants “sane” defaults. The RFC should thoroughly specify how defaults work. Here are some defaults we'd want:
|
This comment has been minimized.
This comment has been minimized.
This would a be a big advantage to tracking the declarations in definitions in Cargo.toml: put the "default provider" there too. I think Cargo.toml referring to a potentially-unused crate is more natural, since it is also a build plan solving concern, and since rustc can be nicely blind to the default if it ends up not being used (Cargo wouldn't mention it.) |
This comment has been minimized.
This comment has been minimized.
crlf0710
commented
Jan 15, 2019
|
@jethrogb I think a third option is to provide some mechanism for downstream crates to declare a candidate who can satisfy some upstream extern existential types. If in the final build plan there's only one such declaration, it's chosen. If there's more than one, and user didn't explicitly make their choice, an error is raised. |
This comment has been minimized.
This comment has been minimized.
Right now, defaults are handled "magically" by rustc, and this RFC wouldn't change that. For the external existential types used in the standard library, rustc can continue to use magic at its own discretion, while for those used by external crates, with this RFC either the final binary contains a single definition, or the build fails. Allowing users to "somehow" specify defaults sounds like an orthogonal problem to me and I am not sure whether that's a problem worth solving at the language level. Maybe we could solve that at the cargo level, e.g., if the build fails because an existential type definition is missing, cargo could automatically add a meaningful crate for you to the dependency graph or something like that, but I don't think we need to solve this problem here. My thoughts on the "parametric modules" issue is that what this feature provides is a "parametric root-module / program", requiring whoever monomorphizes / builds the program (e.g. EDIT: so basically what I was trying to say here is that I think this feature and parametric modules are kind of orthogonal. For all we know, we might end up settling on the same way to pass / use parameters of the root-module that's specified in this RFC. |
This comment has been minimized.
This comment has been minimized.
|
@gnzlbg well put! It's indeed useful that we can punt on defaults while it's just the standard library that uses this. |
This comment has been minimized.
This comment has been minimized.
cbeck88
commented
Jan 28, 2019
•
|
Hi, I just want to say that as a user of Rust I'm really excited for this RFC and Right now Rust doesn't give a very good way to do what I'm going to call This RFC will help to fix this gap in Rust, and the solution will be more sane If you don't have something like this RFC, then the alternatives for writing 1.) Do runtime dependency injection instead Option 1 means roughly this:
This is error prone -- someone has to make sure that each target actually calls Additionally, there are tricky systems problems here with mutable global state, Additionally, using runtime dependency injection instead of link-time dependency Option 2 means roughly:
This is really difficult in practice because any other crate that depends on A Option 2 is basically a non-solution. If I'm developing Crate A as an open-source Option 3 means roughly: We mimic one conventional way of doing this in C and C++. The dependency injection In our rust sources, we probably just have to decorate our functions with The main drawbacks of this compared to existential types as I see it are:
|
This comment has been minimized.
This comment has been minimized.
|
Now that Rust 2018 is out, #2480 is picking up steam again, the switch to hashbrown and parking_lot are in progress, and jamesmunns#1 it's an interesting time for facade related things and hopefully less busy core teams. Could we reach a decision on this soon then? |
Ericson2314 commentedJul 2, 2018
•
edited
Rendered
An extension of #2071's
existential typewhere the definition can live in a different crate than the declaration, rather than the same module. This is a crucial tool untangling for untangling dependencies withinstdand other libraries at the root of the ecosystem concerning global resources.In particular, I hope if we do this before #2480, we can stabilize an
alloccrate whose notions of various global resources to not require copious amounts special attention from the compiler or any run-time cost. I've purposefully picked a more light-weight solution to achieve that goal while minimally delayingalloc's stabilization and deviating from existing/planned Rust features.