Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Existential types with external definition #2492

Open
wants to merge 12 commits into
base: master
from

Conversation

Projects
None yet
@Ericson2314
Copy link
Contributor

commented Jul 2, 2018

Rendered

An extension of #2071's existential type where the definition can live in a different crate than the declaration, rather than the same module. This is a crucial tool untangling for untangling dependencies within std and other libraries at the root of the ecosystem concerning global resources.

In particular, I hope if we do this before #2480, we can stabilize an alloc crate whose notions of various global resources to not require copious amounts special attention from the compiler or any run-time cost. I've purposefully picked a more light-weight solution to achieve that goal while minimally delaying alloc's stabilization and deviating from existing/planned Rust features.

@Centril Centril added the T-lang label Jul 2, 2018

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2018

@crlf0710 Thanks, copied to the OP disassociated from the specific commit.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 2, 2018

In the annotation case, there's essentially an extra extern { fn special_name(..); } whose definition the annotation generates. This isn't easily inlined outside of LTO, and even then would prohibit rustc's own optimizations going into affect.

So if I understand this proposal correctly, it avoids that problem by essentially not generating any code until the instantiation of the existential appears in the crate tree? Every function becomes implicitly generic over whatever instance is picked for whatever existentials are still "open"? That seems like a huge change in the way code is generated, which is why I am surprised to not see it discussed further.

If that's not what happens, the I do not understand how you plan to actually implement this proposal.

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2018

@RalfJung Yes exactly. I didn't discuss it further because since that's what generics do so I figured it wasn't very novel, but I'm to add more text.

[FWIW, eventually we might be able to do a bound like Size<size = 0, align = 0> for extern existential ZST proxy types like Heap, in which case we can generate code again. But honestly, caching e.g. std + jemalloc separately from every project that uses it should be good enough.]

@eddyb

This comment has been minimized.

Copy link
Member

commented Jul 2, 2018

@RalfJung "how code is generated" is the simplest part, the type-checking side of things is way more involved. It's very similar to a parametrized mod, with everything inside inheriting its parameters.

@rkruppe
Copy link
Member

left a comment

I have doubts about two of the motivating examples


- [`core::alloc::GlobalAlloc`](https://doc.rust-lang.org/nightly/core/alloc/trait.GlobalAlloc.html), chosen with [`#[global_allocator]`](https://doc.rust-lang.org/1.23.0/unstable-book/language-features/global-allocator.html)
- `panic_fmt` chosen with [`#[panic_implementation]`](https://github.com/rust-lang/rfcs/blob/master/text/2070-panic-implementation.md)
- The OOM hook, modified with [`std::alloc::{set,take}_alloc_error_hook`](https://doc.rust-lang.org/nightly/std/alloc/fn.set_alloc_error_hook.html)

This comment has been minimized.

Copy link
@rkruppe

rkruppe Jul 2, 2018

Member

The OOM hook can be changed repeatedly times at run time. I don't know where (if anywhere) this ability is used, but at least it's not obvious that we even can replace the OOM hook with a static singleton. Deciding that requires wading into the details of OOM handling which is probably out of scope for this RFC.

(There is of course the option of building a singleton with mutable state that provides exactly the current API, but if that would be used widely, many of the purported benefits evaporate.)

This comment has been minimized.

Copy link
@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

I should link the alloc lang item that exists right now. All this is just machinary over oom_impl, exposed in alloc::alloc::handle_alloc_error.

This comment has been minimized.

Copy link
@rkruppe

rkruppe Jul 2, 2018

Member

I don't care much about existing implementation details here, but about the public (if unstable) API provided. If the public API we end up providing in alloc is a runtime-settable hook instead of something statically dispatched for whatever reasons, then it simply isn't very relevant to this RFC (though this RFC, if accepted, would be one way to implement that hook). I am sympathetic to wanting static dispatch by default and letting those who need runtime-varying behavior implement a hook themselves, but again, the details of how OOM handling ought to be done are controversial and out of scope for this RFC, so IMO the RFC text is being over-eager by saying this feature would obsolete the hook in its current form.

This comment has been minimized.

Copy link
@Ericson2314

Ericson2314 Jul 5, 2018

Author Contributor

N.B. per rust-lang/rust#51607 (comment) we might be changing to a static hook anyways.

- `panic_fmt` chosen with [`#[panic_implementation]`](https://github.com/rust-lang/rfcs/blob/master/text/2070-panic-implementation.md)
- The OOM hook, modified with [`std::alloc::{set,take}_alloc_error_hook`](https://doc.rust-lang.org/nightly/std/alloc/fn.set_alloc_error_hook.html)
- [`std::collections::hash_map::RandomState`](https://doc.rust-lang.org/std/collections/hash_map/struct.RandomState.html), if https://github.com/rust-lang/rust/pull/51846 is merged, the `hashmap_random_keys` lang item
- [`log::Log`](https://docs.rs/log/0.4.3/log/trait.Log.html) set with [`log::set_logger`](https://docs.rs/log/0.4.3/log/fn.set_logger.html)

This comment has been minimized.

Copy link
@rkruppe

rkruppe Jul 2, 2018

Member

Logging can require resources such as files, network connections, run-time configuration data, etc. so it's difficult to make some logger implementations into statics. In theory everything could be initalized lazily on first logging call (though then you haven't eliminated the overhead of dynamic dispatch!), but this mostly just shifting the problem (and run-time costs) around and in more complex scenarios -- e.g. when you want to read a configuration file to determine what kind of logging to do -- this can require making much more state global than currently necessary. It also affects all other control flow surrounding logger initialization, e.g. error reporting from being unable to open a file for logging now has to be moved into the lazy-initialization code.

This comment has been minimized.

Copy link
@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

Not every case needs initialization. And example is serial port logging for embedded.

For the logging case there's these options:

  1. Just as local allocation exists, there should be *_in variants of the macros that allow passing around a local logger. One should never be forced into using a singleton.

  2. The case where one really wants a single xxx, such that a static is nice because any passed pointer / value is just overhead, but also wants to be sure the xxx is initialized first, is heavily explored by @japaric and the rest of the working group. In general, something still needs to be passed around, but it can just be a ZST "token" indicating initialization is complete. So this is just a riff off the above. My stuff still helps if you want to be monophonic over the token type.

  3. Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like lazy_static + my stuff can make this more ergonomic.

  4. Lazy init as you mention. Actually lazy_static crate + my stuff makes this decently ergonomic.

So overall I think mine is still value for not needing to cfg around the no-std case my offering a better separation of concerns between functionality and its "singletonness", while inciting one to make the singleton aspect optional.

This comment has been minimized.

Copy link
@rkruppe

rkruppe Jul 2, 2018

Member

Not every case needs initialization. And example is serial port logging for embedded.

Yes, obviously.

Just as local allocation exists, there should be *_in variants of the macros that allow passing around a local logger. One should never be forced into using a singleton.

While in theory I commend getting rid of global state, the fact is that such an API for logging is entirely hypothetical while the "global logger" API is pervasive in the ecosystem. If it's a huge pain to replace the current global singleton with a different mechanism for installing the same global singleton, everyone who (for whatever reasons) uses the global singleton will rightly prefer the current way of installing it over what this RFC presents.

Besides, this argument cuts both ways: all the code that avoids global singletons doesn't need this RFC either.

Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like lazy_static + my stuff can make this more ergonomic.

As with the OOM hook, the question is how commonly this would be done. If it's quite common, the benefit of external existential traits for this use case is diminished.

Lazy init as you mention. Actually lazy_static crate + my stuff makes this decently ergonomic.

lazy_init is nice but it doesn't help you get the data needed for initialization (e.g., user configuration) into the block doing the initialization, as it can only reference other globals, not e.g. a local binding created in main. That's what I meant by requiring more state to be made global.


None of this is a fatal objection to using external existential types for logging. I am just saying that for some quite common use cases, it won't have all the claimed benefits and indeed some downsides as well.

This comment has been minimized.

Copy link
@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

What downsides? The worst case of manual initialization forced by effects and user config works exactly like today.

This comment has been minimized.

Copy link
@rkruppe

rkruppe Jul 2, 2018

Member

The downside is that any logger impementations that needs it has to separately go through the work of (and accept the risk of bugs in) emulating today's behavior. (Plus the churn of changing APIs.)

This comment has been minimized.

Copy link
@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

@rkruppe liblog can come with a ManuallyInitializedLogger<L> that does all that for you.

This comment has been minimized.

Copy link
@rkruppe

rkruppe Jul 2, 2018

Member

Hm, true. It's still not clear to me that a significant portion of liblog users would do anything different, but let's leave that discussion at #2492 (comment)

@rkruppe

This comment has been minimized.

Copy link
Member

commented Jul 2, 2018

I agree with @eddyb that the codegen aspect is relatively simple conceptually and in implementation. However, applying this feature to pervasive concerns such as panicking and memory allocations will make a staggering amount of currently monomorphic (or generic but monomorphized in libraries rather than leaf crates) code generic and only monomorphized in leaf crates, which has practically the same impact as MIR-only rlibs: virtually no LLVM IR or machine code is generated while compiling libraries, most of it will be monomorphized and codegen'd only when reaching the leaf crates.

While MIR-only libs are very desirable for a number of reasons (see the linked issue), there are also good reasons why we still don't have that feature: it significantly regresses wall-clock build times. Further experiments in this direction are considered blocked by parallelizing rustc, which to my knowledge is being pushed forward but probably still a far cry from being turned on by default, let alone being effective enough to offset the downsides of MIR-only rlibs. This should be taken into account when estimating how quickly this feature can be landed and applied.

@jethrogb
Copy link
Contributor

left a comment

This looks great to me.

NB. existential type is defined in RFC 2071.

Only one crate in the build plan can define the `pub extern existential type`.
Unlike the trait system, there are no orphan restrictions that ensure crates can always be composed:
any crate is free to define the `pub extern existential type`, as long is it isn't used with another that also does, in which case the violation will only be caught when building a crate that depends on both (or if one of the crates depends on the other).
This is not very nice, but exactly like "lang items" and the annotations that exist for this purpose today,

This comment has been minimized.

Copy link
@jethrogb

jethrogb Jul 2, 2018

Contributor

This sentence and the rest of this paragraph shouldn't be in the "guide-level explanation"

As mentioned in the introduction, code gen can be reasoned about by comparing with generic and inlining).
We cannot generate for code for generic items until they are instantiated.
Likewise, imagine that everything that uses an `pub extern existential type` gets an extra parameter,
and then when the `impl pub extern existential type` is defined, we go back and eliminate that parameter by substituting the actual definition.

This comment has been minimized.

Copy link
@jethrogb

jethrogb Jul 2, 2018

Contributor

What is impl pub extern existential type?

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2018

@rkruppe It need not be pushed all the way to the leaf crates. We can have a std+jemalloc rlib in the sysroot, for example. I'd hope to eventually automate that sort of stuff with Cargo's help too.

@rkruppe

This comment has been minimized.

Copy link
Member

commented Jul 2, 2018

@Ericson2314

It need not be pushed all the way to the leaf crates. We can have a std+jemalloc rlib in the sysroot for example.

First off, you've not laid out any monomorphization strategy, but this crucially depends on how and when we monomorphize. If I had to guess, I'd say you are suggesting something like:

  • when a crate defines an extern existential, it also immediately codegens all code from upstream crates that are monomorphic after substituting the now-defined existentials
    • note that this gets more complex when multiple external existentials defined over multiple crates are involved (e.g. consider three crates: A defines the global allocator, B defines the panic implementation, C panics and allocates memory)
  • when a crate using an external existential is compiled, it eagerly checks its transitive dependencies to see which ones are already defined and monomorphizes accordingly

There are some problems with that:

  • providing such library as part of the rust distribution only helps monomorphize the standard library code earlier than in the leaf crates, it doesn't do anything for the (vastly larger) third party ecosystem (edit: this is not quite true as stated here, and most of what is true about it was already stated in the other points below)
  • this hypothetical std+jemalloc rlib would have to be actually linked (transitively) by every crate that wants to benefit from it...
  • ...and doing so in a non-leaf crate makes the non-leaf crate impossible to use with any other choice of global allocator, so it's unreasonable for libraries to use that manually (but libraries are precisely those that need to use it to avoid monomorphizing everything in leaf crates)
    • injecting these crates "automatically" would fix this problem, but that is completely hypothetical
  • more generally the effectiveness of such hacks are highly dependent on the shape of the dependency graph, as one crate fixing some externals can't affect its siblings, only upstream and downstream crates
    • and since upstream crates only get monomorphized at the point where the existentials are defined, defining them relatively late (in a crate with) still has the same effect as MIR-only rlibs within that subgraph of the dependency graph -- it doesn't have to be the leaf crate, but there's still a trade off
  • if we accepted "throw in a crate that prematurely defines some external existentials somewhere in the middle of your dependency graph" as a workaround for build time regressions, we incentivize such hacks in the crates.io ecosystem as well and that sounds extremely unappealing, even actively harmful (since such crates restrict possible uses of everything that transitively depends on them)

I'd hope to eventually automate that sort of stuff with Cargo's help too.

"Eventually", hopefully this problem disappears entirely as MIR-only rlibs become feasible in general. The issue is whether we're willing to eat massive regressions in the mean time.

- Of course, we can always do nothing and just keep the grab bag of ad-hoc solutions we have today, and leave log with just a imperative dynamic solution.

- We could continue special-casing and white-listing in the compiler the use-cases I give in the motivation, but at least use the same sort of annotation for all of them for consistency.
But that still requires leaving out `log`, or special casing it for the first time.

This comment has been minimized.

Copy link
@sfackler

sfackler Jul 2, 2018

Member

I don't think we'd want to use this for log at all. Runtime configuration of loggers is pervasive.

This comment has been minimized.

Copy link
@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

Who does that run-time configuration? The libraries logging or the end application?

}
}
extern existential type alloc::Heap = JemallocHeap;

This comment has been minimized.

Copy link
@sfackler

sfackler Jul 2, 2018

Member

nit: This declaration wouldn't live in the jemalloc crate, but rather the downstream consumer. We want to allow people to pull in jemalloc without using it as the global allocator. (Imagine you want to wrap it in a layer of tracking or whatever)

This comment has been minimized.

Copy link
@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

Yes good point thanks @sfackler. I'll amend that example to show the 3rd crate. I suppose that is like today, too.

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

commented Jan 15, 2019

mini-parametric-modules

The restrictions this imposes (when it is viewed as sugar over full parametric modules and crates) makes it cover less territory already covered by traits. If/when we expand this to a full ML-style module system, there's no reason we cannot also combine that with the trait system too. (Traits would be canonicalized modules/functors.)

As mentioned by many above, parametric modules in the language we have will play hell with the compilation model. That kind of thing really only works well in a uniform representation language, which rust very much turned out not to be.

I just don't believe this. Traditional batch compilation is a really really simplistic system of incremental compilation. Nothing in the language today or with this PR prevents fancier notions of incremental compilation, which the compiler time is already working on.

@jethrogb

This comment has been minimized.

Copy link
Contributor

commented Jan 15, 2019

The story around defaults needs to be cleared up. You will always need a definition of these existential types, as of the time the user doesn't care and wants “sane” defaults. The RFC should thoroughly specify how defaults work.

Here are some defaults we'd want:

  • alloc: alloc_system
  • panic: panic_unwind
  • OOM hook: std
  • std::collections::hash_map::RandomState: std
  • feature detection: stdsimd
  • log::Log: something that hooks into the existing log::init logic for backcompat
  1. One option I've seen mentioned in this thread is "cargo-awareness". This is quite vague and needs to be defined. I imagine something where each [lib] crate specifies which existential types are defined in it and which other crate should be used to provide the default. I don't know if this complexity is worth it

  2. The default could be specified at declaration time. If there is no definition in the final artifact, the default will be used. How would this work for extern existential types where the default implementation is in a different crate than the declaration? For example for Alloc, you'd need a separate crate for the trait definition, then alloc and alloc_system can both depend on that, and alloc can depend on alloc_system (to provide the default).

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

commented Jan 15, 2019

How would this work for extern existential types where the default implementation is in a different crate than the declaration?

This would a be a big advantage to tracking the declarations in definitions in Cargo.toml: put the "default provider" there too. I think Cargo.toml referring to a potentially-unused crate is more natural, since it is also a build plan solving concern, and since rustc can be nicely blind to the default if it ends up not being used (Cargo wouldn't mention it.)

@crlf0710

This comment has been minimized.

Copy link

commented Jan 15, 2019

@jethrogb I think a third option is to provide some mechanism for downstream crates to declare a candidate who can satisfy some upstream extern existential types. If in the final build plan there's only one such declaration, it's chosen. If there's more than one, and user didn't explicitly make their choice, an error is raised.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Jan 15, 2019

The RFC should thoroughly specify how defaults work.

Right now, defaults are handled "magically" by rustc, and this RFC wouldn't change that. For the external existential types used in the standard library, rustc can continue to use magic at its own discretion, while for those used by external crates, with this RFC either the final binary contains a single definition, or the build fails.

Allowing users to "somehow" specify defaults sounds like an orthogonal problem to me and I am not sure whether that's a problem worth solving at the language level.

Maybe we could solve that at the cargo level, e.g., if the build fails because an existential type definition is missing, cargo could automatically add a meaningful crate for you to the dependency graph or something like that, but I don't think we need to solve this problem here.


My thoughts on the "parametric modules" issue is that what this feature provides is a "parametric root-module / program", requiring whoever monomorphizes / builds the program (e.g. main) to pass the appropriate type parameters. Even if we had parametric modules, we would still need to add features to "somehow" allow crates to declare that the root module needs to take some parameters and use them, as well as for whoever builds this root-module to pass the appropriate parameters required by all crates.

EDIT: so basically what I was trying to say here is that I think this feature and parametric modules are kind of orthogonal. For all we know, we might end up settling on the same way to pass / use parameters of the root-module that's specified in this RFC.

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

commented Jan 15, 2019

@gnzlbg well put! It's indeed useful that we can punt on defaults while it's just the standard library that uses this.

@cbeck88

This comment has been minimized.

Copy link

commented Jan 28, 2019

Hi, I just want to say that as a user of Rust I'm really excited for this RFC and
I think it will make it much easier to use Rust for low-level systems programming
and embedded development.

Right now Rust doesn't give a very good way to do what I'm going to call
"link-time dependency injection".
This means, crate A does not depend on crate B, but crate B is able to
modify the behavior of crate A and inject code into it, and this happens
at build time, so there is no overhead and no moving parts required at runtime.
(I could try to give a concrete example like replacing functionality like
std::time which isn't very easy in Rust right now. In firmware applications
this is sometimes necessary, especially if you want to be able to test firmware
in a controlled environment. At the same time, this is likely to end up in the
critical path because it may needed for logging. This is an example where I think
the best solution would be for this functionality to be provided by an existential
type (or a new lang item) with a default implementation in std that can be
swapped out. And a lang item for this seems pretty unsatisfying because time
doesn't seem like a fundamental language feature.)

This RFC will help to fix this gap in Rust, and the solution will be more sane
than the solutions in C and C++. In this RFC, you are allowed to swap out the default
implementation of some function at build time, but only if it was marked by the
library author as an existential type, which means that they anticipate that it
will be swapped out and have considered the implications of that for their
library. In C and C++, any symbols can be interposed at link time whether
the author intended it or not, and there are no real sanity checks around any of that.


If you don't have something like this RFC, then the alternatives for writing
configurable low-level crates and breaking circular dependencies are:

1.) Do runtime dependency injection instead
2.) Try to use features for this
3.) Build your crates as dylibs and try to use shared library interposition
to shadow symbols

Option 1 means roughly this:

  • Crate A contains a static mutable function pointer instead which points to
    some default implementatin, and an explicit initialization function is provided
    which B can call at program startup to swap in a different implementation.
    (In Rust it might instead look like, there is a static mut Box<T> where T
    is some trait, and probably some mutex or atomic variables around it.)

This is error prone -- someone has to make sure that each target actually calls
this initialization routine correctly. Rust has no life-before-main or static
constructors where this initialization call could be placed, and it's probably
for the better. It's always better to have a crate with no explicit
initialization required if at all possible.

Additionally, there are tricky systems problems here with mutable global state,
and swapping out function pointers (or trait objects) safely at runtime like
this can be sketchy. It's hard to be sure that no one began using the function
pointers before they were set to the final value, and there are challenges
introduced if there are multiple threads in the program.

Additionally, using runtime dependency injection instead of link-time dependency
injection means that we end up with an indirect jump instead of a direct jump.
If the program must be compiled with Spectre / Meltdown mitigations enabled,
then an indirect jump may be several times slower, even if using retpolines.
So if the code being injected is some performance-critical I/O routines, then
this approach may have measurable and serious consequences for
performance-critical systems.

Option 2 means roughly:

  • Crate A implementation must know about all possible alternate implementations,
    and expose features that select them appropriately
  • Crate B depends on Crate A with the correct feature selected

This is really difficult in practice because any other crate that depends on A
will also have to have default-features = false or some similar thing in its
Cargo.toml, and it also fundamentally requires patching A to change its behavior.
In a large project this basically means we have to patch the whole world to change
the behavior of A, and in a C/C++ project the analogous thing would be introducing
ifdefs in order to try to solve the problem.

Option 2 is basically a non-solution. If I'm developing Crate A as an open-source
project, I want to be able to give people customization points that they can easily
use to make my crate behave differently, in ways that I didn't anticipate in advance.
Option 2 basically means everyone must patch my crate to get it to work the way
they want.

Option 3 means roughly:

We mimic one conventional way of doing this in C and C++. The dependency injection
isn't happening at run-time, but when the dynamic link loader runs -- we link to
a library that provides symbols of the same name but with different implementations,
in such a way that these implementations get chosen.

In our rust sources, we probably just have to decorate our functions with
[no_mangle] and pub extern.

The main drawbacks of this compared to existential types as I see it are:

  • With existential types, you either replace the whole thing or none of it.
    It's not possible to replace malloc but not free for instance, so that
    seems safer
  • Existential types documents your intention that this is a customization point
    that Rust code can make use of.
    [no_mangle] and pub extern have other uses like interfacing with C, they
    don't make it clear that you expect that someone might start shadowing these
    functions.
  • Ideally this functionality would work with static linkage as well -- shadowing
    symbols is orthogonal to shared vs. static in C and C++. If you want everything
    to actually get inlined and optimized using LTO then it needs to be static.
  • Particularly, it would ideally work for things you are stubbing out in std,
    even if you are statically linking the standard libraries.
  • Using existential types seems to have less cognitive overload than this
    approach.

@Centril Centril referenced this pull request Feb 14, 2019

Merged

Pre-RFC: `std` aware Cargo #1

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

Now that Rust 2018 is out, #2480 is picking up steam again, the switch to hashbrown and parking_lot are in progress, and jamesmunns#1 it's an interesting time for facade related things and hopefully less busy core teams. Could we reach a decision on this soon then?

@jethrogb

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

@nikomatsakis How do we get this RFC on the lang team agenda?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

We discussed this in the @rust-lang/lang meeting yesterday. While we agree that there is a need to "do better" when it comes to the problem being addressed here, and we are interested in this direction as a possible solution, we felt that the time is not yet ripe. Therefore, I am going to move to postpone this RFC to be addressed at a later point.

Let me elaborate:

  • First, this doesn't really fit the general roadmap theme of "finishing up important, in-progress items" -- this is rather a piece of "new design".
  • Second, it's a big job with a lot of implementation complexity. It's essentially a kind of generic module. It seems to be significantly "out ahead" of where the compiler is -- at the moment, we're trying to improve the compiler implementation to the point where it catches up to the RFCs we have (think GATs, impl Trait, specialization) and not to keep pushing the frontier out.
  • Finally, it's a big piece of design, and we'd like to take those on in a "committed way", only when we have all the pieces lined up -- the people to do the design work, yes, but also the implementation, documentation, and support work. I guess this is sort of the two last points rephrased =)

I do want to thank you @Ericson2314 for the work you put into this RFC. I feel pretty confident we'll be revisiting this topic at some point later and that this RFC will provide a useful starting point for that conversation.

@rfcbot fcp postpone

@rfcbot

This comment has been minimized.

Copy link

commented Apr 26, 2019

Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

Also, I should add, if you think that the points made above are wrong, please feel free to let me know =)

@Centril

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

@nikomatsakis Postponing seems like a good idea roadmap & impl wise... However, where does that leave us wrt. various ad-hoc attributes that people may want in the interim that would have been implemented with this? I feel we should postpone those as well until we have a general solution like this.

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

commented Apr 26, 2019

First of all, thanks for discussing it. I'm very sympathetic to "tech debt before features". This isn't even a featured I'd prioritize in a vacuum! But as @Centril says there's a bunch of ad-hoc things to solve this problem appearing (and more on the way), and so I given those priorities I felt the need to get some sort of reusable design out there.

@rkruppe

This comment has been minimized.

Copy link
Member

commented Apr 26, 2019

I don't see any good reason to stall desirable features that are well motivated in isolation, just because we might at some future point reconsider extern existentials. I do not see much cost in introducing those features before extern existentials, whereas postponing those other features would have some opporturnity cost (not to mention being potentially quite frustrating for everyone involved).

Since a significant number of such one-off hooks are already stable/widespread in the library ecosystem, any attempt at extern existentials already needs to work out how to replace all the existing hooks and possibly deprecate the old surface syntax for supplying and using them. Having to do that for some more hooks than we have right now doesn't seem harder (if anything, now that we have this general feature in mind, we can actually take care to be forward-compatible when designing new hooks) or qualitatively worse (deprecating 10 hooks vs 5 hooks? same difference).

Conversely, if replacing the existing hooks with extern existentials and deprecating the current one-off syntax for them turns out to be not possible for the currently existing hooks, then that is IMO already a fatal flaw of extern existentials, as we'd still end up with a considerable menagerie of one-off hooks plus a general feature that looks like it should subsume those hooks but doesn't. So I wouldn't worry about increasing the number of such vestigal hooks that can't be replaced with extern existantials -- IMO, either we can deprecate and replace all such hooks, including ones introduced in the future, or we don't introduce this feature at all.

It is true that there might be features that are relatively simple if extern existentials are available but don't themself justify one-off compiler support. But that should be decided on a case by case basis.

@Centril

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

I do not see much cost in introducing those features before extern existential

I think a bunch of ad-hoc rules in our static semantics is a significant cost, especially if it is hard to know if an ad-hoc rule can be subsumed later by a feature such as this one. (This is well argued in #2492 (comment).)

whereas postponing those other features would have some opporturnity cost (not to mention being potentially quite frustrating for everyone involved).

I personally believe that opportunity costs and frustration is not sufficient motivation in general to accept technical debt. We should build a language that lasts for decades instead of looking for short term gains.

Since a significant number of such one-off hooks are already stable/widespread in the library ecosystem, any attempt at extern existentials already needs to work out how to replace all the existing hooks and possibly deprecate their surface syntax.

I doubt we have a sufficient number of hooks in the stable language to be deserving of "significant". I recall us having two attributes, one for allocators and one for panics. Hooks in the crates.io ecosystem are also something that can be changed as it isn't baked into the language. That makes a whole lot of difference in my view.

Having to do that for some more hooks than we have right now doesn't seem harder (if anything, now that we have this general feature in mind, we can actually take care to be forward-compatible when designing new hooks) or qualitatively worse (deprecating 10 hooks vs 5 hooks? same difference).

Taking care to be forward-compatible sounds good. However, it's easy to make mistakes and forget about things. Having a general framework keeps you honest and reduces risk substantially.

Conversely, if replacing the existing hooks with extern existentials and deprecating the current one-off syntax for them turns out to be not possible for the currently existing hooks, then that is IMO already a fatal flaw of extern existentials, as we'd still end up with a considerable menagerie of one-off hooks plus a general feature that looks like it should subsume those hooks but doesn't.

There's "not possible" and then there's "has some limitations". The benefit of a general framework is to some extent the limitations it provides. Anything that doesn't fit the general plan simply doesn't make it in; I think that's a good sanity check in a design of any language. (e.g. GHC has Core and anything that doesn't fit with that doesn't make it into GHC; and changing Core is not an option)

as we'd still end up with a considerable menagerie of one-off hooks plus a general feature that looks like it should subsume those hooks but doesn't.

Presumably we'd actually test out this feature concurrently with the otherwise one-off hooks.

It is true that there might be features that are relatively simple if extern existentials are available but don't themself justify one-off compiler support. But that should be decided on a case by case basis.

To be clear, I'm suggesting a general policy, not absolute rules.

@jethrogb

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

💯% agree with all of @Centril's points so far.

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2019

I'd like to add that if we use the same ZST-only trick that Fn(Box) uses, we can side step many of the dependencies on low-priority compiler work @nikomatsakis talked about. Fundamentally, it sounds like there is a prioritization conflict between the libs team and lang team, and that a ZST-only short-cut would ideally lead to:

  • No stabilized stop-gaps
  • No one gets blocked
  • No one gets massively preempted, effectively blocking their other priorities.
@jethrogb

This comment has been minimized.

Copy link
Contributor

commented May 4, 2019

Inspired by linkme, I came up with the idea to prototype this using linker features. Here it is:

macro_rules! extern_existential {
    ( extern existential type $i:ident: $tr:path = $ty:path; ) => {
        #[no_mangle]
        pub static $i: &(dyn $tr + Send + Sync + 'static) = &$ty;
    };
    ( pub extern existential type $i:ident: $tr:path; ) => {
        pub struct $i;

        impl core::ops::Deref for $i {
            type Target = dyn $tr + 'static;

            // make sure the use of the extern symbol appears in another crate
            // so the undefined symbol appears in the right place in the link
            // order
            #[inline(always)]
            fn deref(&self) -> &(dyn $tr + 'static) {
                #[allow(improper_ctypes)]
                extern "C" {
                    static $i: &'static (dyn $tr + Send + Sync + 'static);
                }

                unsafe { $i }
            }
        }
    };
}

crates.io, git repo

There's one difference with the suggested syntax from the RFC: you have to specify the trait again when defining the existential type alias. The type you're using there has to be a unit struct. Also, it's not type safe: if the trait doesn't match between declaration and definition, all hell breaks lose, of course.

I intend to test using this in some places in std or some of the other motivating use cases, but not sure when I'll have time for this.

@eddyb

This comment has been minimized.

Copy link
Member

commented May 5, 2019

@jethrogb Heh, that's kind of how the compiler-builtin "weak lang items" work, and there are stable aliases for some of them (like #[panic_impl], I think?)

You have extern { #[lang = "..."] fn foo(...); } for "import it from downstream" (which I think can only work with rlib, not dylib, as an rlib is a collection of object files, not linked yet) and then #[lang = "..."] fn foo(...) {...} to define it (which gives it the same symbol name as the import).

Global allocators use something similar, and we could arguably have libcore contain all of liballoc.

But the issue then is how do you enforce, at a high level, that you don't use e.g. String, unless you have #[global_allocator] somewhere in your crate graph? You really don't want the failure to be in the linker!

Similar for your solution: how would you even enforce opt-in?
(Also, I think it's maybe a bit early to put something like that on crates.io? At least without having some sort of way to force users to acknowledge that this is just an experiment and they shouldn't use it unless they read the documentation and fully understand what they're getting themselves into)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.