Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Existential types with external definition #2492

Closed
239 changes: 239 additions & 0 deletions text/0000-extern-existential-type.md
@@ -0,0 +1,239 @@
- Feature Name: extern-existential-type
- Start Date: 2018-6-29
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

An extension of [#2071](https://github.com/rust-lang/rfcs/pull/2071)'s `existential type` where the definition can live in a different crate than the declaration, rather than the same module.
This is a crucial tool untangling for untangling dependencies within `std` and other libraries at the root of the ecosystem concerning global resources.

# Motivation
[motivation]: #motivation

We have a number of situations where one crate defines an interface, and a different crate implements that interface with a canonical singleton:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know!

Copy link
Contributor

@gnzlbg gnzlbg Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was mentioned back then here (#2492 (comment)) and here (#2492 (comment)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnzlbg Well predicted :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cramertj is the one that realized that this could be useful for futures


- [`core::alloc::GlobalAlloc`](https://doc.rust-lang.org/nightly/core/alloc/trait.GlobalAlloc.html), chosen with [`#[global_allocator]`](https://doc.rust-lang.org/1.23.0/unstable-book/language-features/global-allocator.html)
- `panic_fmt` chosen with [`#[panic_implementation]`](https://github.com/rust-lang/rfcs/blob/master/text/2070-panic-implementation.md)
- The OOM hook, modified with [`std::alloc::{set,take}_alloc_error_hook`](https://doc.rust-lang.org/nightly/std/alloc/fn.set_alloc_error_hook.html)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OOM hook can be changed repeatedly times at run time. I don't know where (if anywhere) this ability is used, but at least it's not obvious that we even can replace the OOM hook with a static singleton. Deciding that requires wading into the details of OOM handling which is probably out of scope for this RFC.

(There is of course the option of building a singleton with mutable state that provides exactly the current API, but if that would be used widely, many of the purported benefits evaporate.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should link the alloc lang item that exists right now. All this is just machinary over oom_impl, exposed in alloc::alloc::handle_alloc_error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care much about existing implementation details here, but about the public (if unstable) API provided. If the public API we end up providing in alloc is a runtime-settable hook instead of something statically dispatched for whatever reasons, then it simply isn't very relevant to this RFC (though this RFC, if accepted, would be one way to implement that hook). I am sympathetic to wanting static dispatch by default and letting those who need runtime-varying behavior implement a hook themselves, but again, the details of how OOM handling ought to be done are controversial and out of scope for this RFC, so IMO the RFC text is being over-eager by saying this feature would obsolete the hook in its current form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B. per rust-lang/rust#51607 (comment) we might be changing to a static hook anyways.

- [`std::collections::hash_map::RandomState`](https://doc.rust-lang.org/std/collections/hash_map/struct.RandomState.html), if https://github.com/rust-lang/rust/pull/51846 is merged, the `hashmap_random_keys` lang item
- [`log::Log`](https://docs.rs/log/0.4.3/log/trait.Log.html) set with [`log::set_logger`](https://docs.rs/log/0.4.3/log/fn.set_logger.html)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging can require resources such as files, network connections, run-time configuration data, etc. so it's difficult to make some logger implementations into statics. In theory everything could be initalized lazily on first logging call (though then you haven't eliminated the overhead of dynamic dispatch!), but this mostly just shifting the problem (and run-time costs) around and in more complex scenarios -- e.g. when you want to read a configuration file to determine what kind of logging to do -- this can require making much more state global than currently necessary. It also affects all other control flow surrounding logger initialization, e.g. error reporting from being unable to open a file for logging now has to be moved into the lazy-initialization code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every case needs initialization. And example is serial port logging for embedded.

For the logging case there's these options:

  1. Just as local allocation exists, there should be *_in variants of the macros that allow passing around a local logger. One should never be forced into using a singleton.

  2. The case where one really wants a single xxx, such that a static is nice because any passed pointer / value is just overhead, but also wants to be sure the xxx is initialized first, is heavily explored by @japaric and the rest of the working group. In general, something still needs to be passed around, but it can just be a ZST "token" indicating initialization is complete. So this is just a riff off the above. My stuff still helps if you want to be monophonic over the token type.

  3. Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like lazy_static + my stuff can make this more ergonomic.

  4. Lazy init as you mention. Actually lazy_static crate + my stuff makes this decently ergonomic.

So overall I think mine is still value for not needing to cfg around the no-std case my offering a better separation of concerns between functionality and its "singletonness", while inciting one to make the singleton aspect optional.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every case needs initialization. And example is serial port logging for embedded.

Yes, obviously.

Just as local allocation exists, there should be *_in variants of the macros that allow passing around a local logger. One should never be forced into using a singleton.

While in theory I commend getting rid of global state, the fact is that such an API for logging is entirely hypothetical while the "global logger" API is pervasive in the ecosystem. If it's a huge pain to replace the current global singleton with a different mechanism for installing the same global singleton, everyone who (for whatever reasons) uses the global singleton will rightly prefer the current way of installing it over what this RFC presents.

Besides, this argument cuts both ways: all the code that avoids global singletons doesn't need this RFC either.

Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like lazy_static + my stuff can make this more ergonomic.

As with the OOM hook, the question is how commonly this would be done. If it's quite common, the benefit of external existential traits for this use case is diminished.

Lazy init as you mention. Actually lazy_static crate + my stuff makes this decently ergonomic.

lazy_init is nice but it doesn't help you get the data needed for initialization (e.g., user configuration) into the block doing the initialization, as it can only reference other globals, not e.g. a local binding created in main. That's what I meant by requiring more state to be made global.


None of this is a fatal objection to using external existential types for logging. I am just saying that for some quite common use cases, it won't have all the claimed benefits and indeed some downsides as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What downsides? The worst case of manual initialization forced by effects and user config works exactly like today.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside is that any logger impementations that needs it has to separately go through the work of (and accept the risk of bugs in) emulating today's behavior. (Plus the churn of changing APIs.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkruppe liblog can come with a ManuallyInitializedLogger<L> that does all that for you.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, true. It's still not clear to me that a significant portion of liblog users would do anything different, but let's leave that discussion at #2492 (comment)


Each of these is an instance of the same general pattern.
But the solutions are all ad-hoc and distinct, burdening the user of Rust and rustc with extra work remembering/implementing, and preventing more rapid prototyping.

They also incur a run-time cost due to dynamism and indirection, which can lead to initialization bugs or bloat in space-constrained environments.
In the annotation case, there's essentially an extra `extern { fn special_name(..); }` whose definition the annotation generates.
This isn't easily inlined outside of LTO, and even then would prohibit rustc's own optimizations going into affect.
The `set`-method based ones involve mutating a `static mut` or equivalent with a function or trait object, and thus can basically never be inlined away.
So there's the overhead of the initialization, and then one or two memory dereferences to get the implementation function's actual address.
The potential bugs are due to not `set`ing before the resource is needed, a manual task because there's static way to prevent accessing the resource while it isn't set.

The `extern existential type` feature just covers the deferred definition of a type, and not the singleton itself, but that is actually enough. For example, with global allocation:

```rust
// In `alloc`

pub extern existential type Heap: Copy + Alloc + Default + Send + Sync;

struct Box<T, A: Alloc = Heap>;

impl Box<T, A: Alloc> {
fn new_in(a: A) { .. }
}

impl Box<T, A: Alloc + Default = Heap> {
fn new() { Self::new_in(Default::default()) }
}
```

```rust
// In `jemalloc`

#[deriving(Default, Copy)]
struct JemallocHeap;

impl Alloc for JemallocHeap {
fn alloc(stuff: Stuff) -> Thing {
...
}
}

extern existential type alloc::Heap = JemallocHeap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This declaration wouldn't live in the jemalloc crate, but rather the downstream consumer. We want to allow people to pull in jemalloc without using it as the global allocator. (Imagine you want to wrap it in a layer of tracking or whatever)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point thanks @sfackler. I'll amend that example to show the 3rd crate. I suppose that is like today, too.

```

```rust
// In a crate making an rust-implemented local allocator global.

struct MyConcurrentLocalAlloc(..);

impl Alloc for MyConcurrentLocalAlloc;

static GLOBALIZED_LOCAL_ALLOC = MyConcurrentLocalAlloc(..):

#[deriving(Default, Copy)]
struct MyConcurrentLocalAllocHeap;

impl Alloc for MyConcurrentLocalAllocHeap {
fn alloc(stuff: Stuff) -> Thing {
GLOBALIZED_LOCAL_ALLOC.alloc(stuff)
}
}

extern existential type alloc::Heap = JemallocHeap;
```

By defining traits for each bit of deferred functionality (`Alloc`, `Log`), we can likewise cover each of the other use-cases.
This frees the compiler and programmer to forget about the specific instances and just learn the general pattern.
This is especially important for `log`, which isn't a sysroot crate and thus isn't known to the compiler at all at the moment.
It would be very hard to justify special casing `log` in rustc with e.g. another attribute as the problem is solved today, when it needs none at the moment.
As for the cost concerns with the existing techniques, no code is generated until the `extern existential type` is created, similar to with generic types, so there is no run-time cost whatsoever.

Many of the mechanisms listed in this RFC above are on the verge of stabilization.
This RFC doesn't want to appear to by tying things up forever, so the design strives to be simple while still being general enough.
This ought to also be forwards compatible with the more comprehensive solutions as described in the [alternatives](#alternatives) section.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

It's best to understand this feature in terms of regular `existential type`.
Type checking when *using* `pub extern existential type` works exactly the same way:
The type is opaque except for its bounds, and no traits can be implemented for it.
```rust
pub extern existential type Foo: Baz;
existential type Bar: Baz;
// both are used the same way in other modules
```
C and C++ programmers too will be familiar with the remote definition aspect of this from those language's "forward declarations" and their use in header files.

On the definition side, since it is explicitly defined (somewhere), there are no inference constraints on items in the same module as the declaration or definition.

One more interesting difference is the scope of where the type is transparent vs opaque: i.e. where can we see the type's definition, or only it's bounds.
Just as in C where one gets:
```rust
struct Foo;

// I know nothing about Foo

struct Foo { int a; };

// Ah now I do
```
when the `extern existential type` is in scope, the `existential existential type` becomes transparent and behaves as if the declaration and definition were put together into a normal type alias.
The definer can decide how one downstream gets to take advantage of it by making the definition public or not.
```rust
pub extern existential type alloc::Foo = Bar; // the big reveal
extern existential type alloc::Foo = Bar; // the tiny reveal
```
private allows the item to be used (as some definition is needed), but while no one downstream knows its true definition. like regular `existential type`.
Public allows downstream to choose between staying agnostic for increased flexibility, or peaking the hind the veil for extra functionality.
(e.g. maybe it wants to require the global allocator by jemalloc to use some special jemalloc-specific debug output.)
There are no restrictions on the type of publicity on the definitions compared to other items.

Only one crate in the build plan can define the `pub extern existential type`.
Unlike the trait system, there are no orphan restrictions that ensure crates can always be composed:
any crate is free to define the `pub extern existential type`, as long is it isn't used with another that also does, in which case the violation will only be caught when building a crate that depends on both (or if one of the crates depends on the other).

As mentioned in the introduction, code gen can be reasoned about by comparing with generic and inlining).
We cannot generate for code for generic items until they are instantiated.
Likewise, imagine that everything that uses an `pub extern existential type` gets an extra parameter,
and then when the `pub extern existential type = ...` is defined, we go back and eliminate that parameter by substituting the actual definition.
Only then can we generate code.
This is why from the root crate of compilation (the binary, static library, or other such "final" component), the dependency closure must contain an `extern existential type` for every `pub extern existential type` that's actually used.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

`pub extern existential type` can also be formally defined in reference to `existential type`.
As explained in the guide-level explanation,
```rust
(pub <scope>)? extern existential type <name>: <bounds>;
```
creates an existential type alias that behaves just like a `use`d `existential type <name>: <bounds>` defined in another modules so it's opaque, while
```rust
(pub <scope>)? extern existential type <path> = <type>;
```
reveals the definition of the existential type alias at `path` as if it was a regular type alias.

There is a post-hoc coherence rule that every used `pub extern existential type` contains exactly one `extern exisitential type` definition within the entire build-plan of crates.
"used" here can be roughly defined by tracing all identifiers through their bindings, but should make intuitive sense.

There is nothing preventing private `extern existential type`, or a `extern extern existential type` in the same module as its `extern existential type`.
Both these situations make the feature useless and could be linted, but are well defined from the rules above.

# Drawbacks
[drawbacks]: #drawbacks

The fact that not all crates can compose with this, due to duplicate or missing definitions per declaration, is not very nice.
However, this is exactly like "lang items" and the annotations that exist for this purpose today, so it is nothing worse than what's currently about to be stabilized.
There is no natural orphan rules for this feature (or alternatively, regular `existential type` can be seen as this with the orphan rule that it must be defined in the same module), so this is expected.
See the first alternative for how we can use Cargo to ameliorate this.

Niko Matsakis has expressed concerns about this being abused because singletons are bad.
Singletons are indeed bad, but the connection between existential types and singletons is not obvious at first sight (imagine if we had deferred definition mechanism with `static`s directly), which hopefully will make this be sufficiently difficult to abuse.
Even if we deem this very toxic, better all the use cases I listed above be white-listed and use same mechanism used for consistency (and one that is cost-free at run time), than use a bunch of separate solutions.
Also, by forcing the use of a trait in the bounds of the `extern existential type`, we hopefully nudge the user in the direction of providing a non-singleton-based way of accomplishing the same task (e.g. local allocators in addition to the global allocator).

Stabilization of many annotations and APIs called out in the [motivation](#motivation) section is imminent, and yes this would delay that a bit if we opted to do this and then rewrite those APIs to use it.

As per the [prior art](#prior-art) section, something like Haskell's backpack is wholly superior.
But as stabilization of the status quo is imminent, I wanted to pick something easier to implement and closer to existing rust features mentally/pedagogically.

# Rationale and alternatives
[alternatives]: #alternatives

- We can additionally mandate that `Cargo.toml` include all `extern existential type` declarations and definitions, and Cargo reject any build plan where they don't match 1-1.
This ameliorates the crate composition issue in practice for the vast majority of users using Cargo (even just `Cargo.toml`s).

- Of course, we can always do nothing and just keep the grab bag of ad-hoc solutions we have today, and leave log with just a imperative dynamic solution.

- We could continue special-casing and white-listing in the compiler the use-cases I give in the motivation, but at least use the same sort of annotation for all of them for consistency.
But that still requires leaving out `log`, or special casing it for the first time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'd want to use this for log at all. Runtime configuration of loggers is pervasive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who does that run-time configuration? The libraries logging or the end application?

As bad as I agree singletons are, I imagine a few yet-unforseen use-cases for this (e.g. for peripherals for bare-metal programming, which are morally singletons) arising.
So that leaves other special-cases we would need to add in the future.

- I mention just doing this would delay stabilization.
But, we could also retrofit those annotations as desugaring into this feature so as not to delay it.
This keeps around the crust in the compiler forever, but at least we can deprecate the annotations in the next edition.
I don't think it's worth it to bend over backwards for something that is still unstable, and consider it unwise to so "whiplash" the ecosystem telling them to use one stable thing and then immediate another, but for those that really want to stabilize stuff, this is an option.

- In many cases, the `extern existential type` would just be a ZST proxy used in a default argument.
If we could add default arguments to existing type parameters, then the original items wouldn't need an abstract stand-in.
@eddyb and others have thought very hard about this approach for many years, and it doesn't seem possible, however.

See the [prior art](#prior-art) section below for context on the last two.

- We couldn't do exactly ML's functors for this problem, because people both want to import `std` without passing in a global allocator, yet also be able to use `std` with different global allocators.

- I opted out from proposing exactly Haskell' backpack because of the perceived time pressure mentioned above, but it's straightforward to imagine `extern existential type` being convenient sugar for some more general parameterized module system, similar to `impl Trait` and regular `existential type`.

# Prior art
[prior-art]: #prior-art

The basic idea come from the "functors" of the ML family of languages, where a module is given explicit parameters, like
```rust
mod foo<...> { ... }
```
in Rust syntax, and then those modules can be applied like functions.
```rust
mod bar = foo<...>;
```

More appropriate is Haskell's new [backpack](https://plv.mpi-sws.org/backpack/) module system, where the parameterization is not explicit in the code (`use`d modules may be resolved or just module signatures, in which case they act as parameters), and Cabal (the Cargo equivalent), auto-applies everything.
This would work for Rust, and in fact is wholly better:

- It is more expressive because modules can be applied multiple times like ML and unlike this.

- There is still no syntactic overhead of manual applications at use sites, like this and unlike ML.

- Cabal, with it's knowledge of who needs what, can still complain early if something would be defined twice / two different instantiations do not unify as a downstream crate needs, like the first alternative.

[That latter issue problem is not possible here under the single-instantiation rule.]

# Unresolved questions
[unresolved]: #unresolved-questions

- The exact syntax. "existential" is a temporary stand-in from [#2071](https://github.com/rust-lang/rfcs/pull/2071), which I just use here for consistency. I personally prefer "abstract" FWIW.

- Should Cargo have some knowledge of `extern abstract type` declarations and definitions from the get-go so it can catch invalid build plans early?