Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Existential types with external definition #2492

Open
wants to merge 12 commits into
base: master
from

Conversation

Projects
None yet
@Ericson2314
Copy link
Contributor

Ericson2314 commented Jul 2, 2018

Rendered

An extension of #2071's existential type where the definition can live in a different crate than the declaration, rather than the same module. This is a crucial tool untangling for untangling dependencies within std and other libraries at the root of the ecosystem concerning global resources.

In particular, I hope if we do this before #2480, we can stabilize an alloc crate whose notions of various global resources to not require copious amounts special attention from the compiler or any run-time cost. I've purposefully picked a more light-weight solution to achieve that goal while minimally delaying alloc's stabilization and deviating from existing/planned Rust features.

@Centril Centril added the T-lang label Jul 2, 2018

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

Ericson2314 commented Jul 2, 2018

@crlf0710 Thanks, copied to the OP disassociated from the specific commit.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Jul 2, 2018

In the annotation case, there's essentially an extra extern { fn special_name(..); } whose definition the annotation generates. This isn't easily inlined outside of LTO, and even then would prohibit rustc's own optimizations going into affect.

So if I understand this proposal correctly, it avoids that problem by essentially not generating any code until the instantiation of the existential appears in the crate tree? Every function becomes implicitly generic over whatever instance is picked for whatever existentials are still "open"? That seems like a huge change in the way code is generated, which is why I am surprised to not see it discussed further.

If that's not what happens, the I do not understand how you plan to actually implement this proposal.

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

Ericson2314 commented Jul 2, 2018

@RalfJung Yes exactly. I didn't discuss it further because since that's what generics do so I figured it wasn't very novel, but I'm to add more text.

[FWIW, eventually we might be able to do a bound like Size<size = 0, align = 0> for extern existential ZST proxy types like Heap, in which case we can generate code again. But honestly, caching e.g. std + jemalloc separately from every project that uses it should be good enough.]

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Jul 2, 2018

@RalfJung "how code is generated" is the simplest part, the type-checking side of things is way more involved. It's very similar to a parametrized mod, with everything inside inheriting its parameters.

@rkruppe
Copy link
Member

rkruppe left a comment

I have doubts about two of the motivating examples


- [`core::alloc::GlobalAlloc`](https://doc.rust-lang.org/nightly/core/alloc/trait.GlobalAlloc.html), chosen with [`#[global_allocator]`](https://doc.rust-lang.org/1.23.0/unstable-book/language-features/global-allocator.html)
- `panic_fmt` chosen with [`#[panic_implementation]`](https://github.com/rust-lang/rfcs/blob/master/text/2070-panic-implementation.md)
- The OOM hook, modified with [`std::alloc::{set,take}_alloc_error_hook`](https://doc.rust-lang.org/nightly/std/alloc/fn.set_alloc_error_hook.html)

This comment has been minimized.

@rkruppe

rkruppe Jul 2, 2018

Member

The OOM hook can be changed repeatedly times at run time. I don't know where (if anywhere) this ability is used, but at least it's not obvious that we even can replace the OOM hook with a static singleton. Deciding that requires wading into the details of OOM handling which is probably out of scope for this RFC.

(There is of course the option of building a singleton with mutable state that provides exactly the current API, but if that would be used widely, many of the purported benefits evaporate.)

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

I should link the alloc lang item that exists right now. All this is just machinary over oom_impl, exposed in alloc::alloc::handle_alloc_error.

This comment has been minimized.

@rkruppe

rkruppe Jul 2, 2018

Member

I don't care much about existing implementation details here, but about the public (if unstable) API provided. If the public API we end up providing in alloc is a runtime-settable hook instead of something statically dispatched for whatever reasons, then it simply isn't very relevant to this RFC (though this RFC, if accepted, would be one way to implement that hook). I am sympathetic to wanting static dispatch by default and letting those who need runtime-varying behavior implement a hook themselves, but again, the details of how OOM handling ought to be done are controversial and out of scope for this RFC, so IMO the RFC text is being over-eager by saying this feature would obsolete the hook in its current form.

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 5, 2018

Author Contributor

N.B. per rust-lang/rust#51607 (comment) we might be changing to a static hook anyways.

- `panic_fmt` chosen with [`#[panic_implementation]`](https://github.com/rust-lang/rfcs/blob/master/text/2070-panic-implementation.md)
- The OOM hook, modified with [`std::alloc::{set,take}_alloc_error_hook`](https://doc.rust-lang.org/nightly/std/alloc/fn.set_alloc_error_hook.html)
- [`std::collections::hash_map::RandomState`](https://doc.rust-lang.org/std/collections/hash_map/struct.RandomState.html), if https://github.com/rust-lang/rust/pull/51846 is merged, the `hashmap_random_keys` lang item
- [`log::Log`](https://docs.rs/log/0.4.3/log/trait.Log.html) set with [`log::set_logger`](https://docs.rs/log/0.4.3/log/fn.set_logger.html)

This comment has been minimized.

@rkruppe

rkruppe Jul 2, 2018

Member

Logging can require resources such as files, network connections, run-time configuration data, etc. so it's difficult to make some logger implementations into statics. In theory everything could be initalized lazily on first logging call (though then you haven't eliminated the overhead of dynamic dispatch!), but this mostly just shifting the problem (and run-time costs) around and in more complex scenarios -- e.g. when you want to read a configuration file to determine what kind of logging to do -- this can require making much more state global than currently necessary. It also affects all other control flow surrounding logger initialization, e.g. error reporting from being unable to open a file for logging now has to be moved into the lazy-initialization code.

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

Not every case needs initialization. And example is serial port logging for embedded.

For the logging case there's these options:

  1. Just as local allocation exists, there should be *_in variants of the macros that allow passing around a local logger. One should never be forced into using a singleton.

  2. The case where one really wants a single xxx, such that a static is nice because any passed pointer / value is just overhead, but also wants to be sure the xxx is initialized first, is heavily explored by @japaric and the rest of the working group. In general, something still needs to be passed around, but it can just be a ZST "token" indicating initialization is complete. So this is just a riff off the above. My stuff still helps if you want to be monophonic over the token type.

  3. Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like lazy_static + my stuff can make this more ergonomic.

  4. Lazy init as you mention. Actually lazy_static crate + my stuff makes this decently ergonomic.

So overall I think mine is still value for not needing to cfg around the no-std case my offering a better separation of concerns between functionality and its "singletonness", while inciting one to make the singleton aspect optional.

This comment has been minimized.

@rkruppe

rkruppe Jul 2, 2018

Member

Not every case needs initialization. And example is serial port logging for embedded.

Yes, obviously.

Just as local allocation exists, there should be *_in variants of the macros that allow passing around a local logger. One should never be forced into using a singleton.

While in theory I commend getting rid of global state, the fact is that such an API for logging is entirely hypothetical while the "global logger" API is pervasive in the ecosystem. If it's a huge pain to replace the current global singleton with a different mechanism for installing the same global singleton, everyone who (for whatever reasons) uses the global singleton will rightly prefer the current way of installing it over what this RFC presents.

Besides, this argument cuts both ways: all the code that avoids global singletons doesn't need this RFC either.

Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like lazy_static + my stuff can make this more ergonomic.

As with the OOM hook, the question is how commonly this would be done. If it's quite common, the benefit of external existential traits for this use case is diminished.

Lazy init as you mention. Actually lazy_static crate + my stuff makes this decently ergonomic.

lazy_init is nice but it doesn't help you get the data needed for initialization (e.g., user configuration) into the block doing the initialization, as it can only reference other globals, not e.g. a local binding created in main. That's what I meant by requiring more state to be made global.


None of this is a fatal objection to using external existential types for logging. I am just saying that for some quite common use cases, it won't have all the claimed benefits and indeed some downsides as well.

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

What downsides? The worst case of manual initialization forced by effects and user config works exactly like today.

This comment has been minimized.

@rkruppe

rkruppe Jul 2, 2018

Member

The downside is that any logger impementations that needs it has to separately go through the work of (and accept the risk of bugs in) emulating today's behavior. (Plus the churn of changing APIs.)

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

@rkruppe liblog can come with a ManuallyInitializedLogger<L> that does all that for you.

This comment has been minimized.

@rkruppe

rkruppe Jul 2, 2018

Member

Hm, true. It's still not clear to me that a significant portion of liblog users would do anything different, but let's leave that discussion at #2492 (comment)

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Jul 2, 2018

I agree with @eddyb that the codegen aspect is relatively simple conceptually and in implementation. However, applying this feature to pervasive concerns such as panicking and memory allocations will make a staggering amount of currently monomorphic (or generic but monomorphized in libraries rather than leaf crates) code generic and only monomorphized in leaf crates, which has practically the same impact as MIR-only rlibs: virtually no LLVM IR or machine code is generated while compiling libraries, most of it will be monomorphized and codegen'd only when reaching the leaf crates.

While MIR-only libs are very desirable for a number of reasons (see the linked issue), there are also good reasons why we still don't have that feature: it significantly regresses wall-clock build times. Further experiments in this direction are considered blocked by parallelizing rustc, which to my knowledge is being pushed forward but probably still a far cry from being turned on by default, let alone being effective enough to offset the downsides of MIR-only rlibs. This should be taken into account when estimating how quickly this feature can be landed and applied.

@jethrogb
Copy link
Contributor

jethrogb left a comment

This looks great to me.

NB. existential type is defined in RFC 2071.

Only one crate in the build plan can define the `pub extern existential type`.
Unlike the trait system, there are no orphan restrictions that ensure crates can always be composed:
any crate is free to define the `pub extern existential type`, as long is it isn't used with another that also does, in which case the violation will only be caught when building a crate that depends on both (or if one of the crates depends on the other).
This is not very nice, but exactly like "lang items" and the annotations that exist for this purpose today,

This comment has been minimized.

@jethrogb

jethrogb Jul 2, 2018

Contributor

This sentence and the rest of this paragraph shouldn't be in the "guide-level explanation"

As mentioned in the introduction, code gen can be reasoned about by comparing with generic and inlining).
We cannot generate for code for generic items until they are instantiated.
Likewise, imagine that everything that uses an `pub extern existential type` gets an extra parameter,
and then when the `impl pub extern existential type` is defined, we go back and eliminate that parameter by substituting the actual definition.

This comment has been minimized.

@jethrogb

jethrogb Jul 2, 2018

Contributor

What is impl pub extern existential type?

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

Ericson2314 commented Jul 2, 2018

@rkruppe It need not be pushed all the way to the leaf crates. We can have a std+jemalloc rlib in the sysroot, for example. I'd hope to eventually automate that sort of stuff with Cargo's help too.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Jul 2, 2018

@Ericson2314

It need not be pushed all the way to the leaf crates. We can have a std+jemalloc rlib in the sysroot for example.

First off, you've not laid out any monomorphization strategy, but this crucially depends on how and when we monomorphize. If I had to guess, I'd say you are suggesting something like:

  • when a crate defines an extern existential, it also immediately codegens all code from upstream crates that are monomorphic after substituting the now-defined existentials
    • note that this gets more complex when multiple external existentials defined over multiple crates are involved (e.g. consider three crates: A defines the global allocator, B defines the panic implementation, C panics and allocates memory)
  • when a crate using an external existential is compiled, it eagerly checks its transitive dependencies to see which ones are already defined and monomorphizes accordingly

There are some problems with that:

  • providing such library as part of the rust distribution only helps monomorphize the standard library code earlier than in the leaf crates, it doesn't do anything for the (vastly larger) third party ecosystem (edit: this is not quite true as stated here, and most of what is true about it was already stated in the other points below)
  • this hypothetical std+jemalloc rlib would have to be actually linked (transitively) by every crate that wants to benefit from it...
  • ...and doing so in a non-leaf crate makes the non-leaf crate impossible to use with any other choice of global allocator, so it's unreasonable for libraries to use that manually (but libraries are precisely those that need to use it to avoid monomorphizing everything in leaf crates)
    • injecting these crates "automatically" would fix this problem, but that is completely hypothetical
  • more generally the effectiveness of such hacks are highly dependent on the shape of the dependency graph, as one crate fixing some externals can't affect its siblings, only upstream and downstream crates
    • and since upstream crates only get monomorphized at the point where the existentials are defined, defining them relatively late (in a crate with) still has the same effect as MIR-only rlibs within that subgraph of the dependency graph -- it doesn't have to be the leaf crate, but there's still a trade off
  • if we accepted "throw in a crate that prematurely defines some external existentials somewhere in the middle of your dependency graph" as a workaround for build time regressions, we incentivize such hacks in the crates.io ecosystem as well and that sounds extremely unappealing, even actively harmful (since such crates restrict possible uses of everything that transitively depends on them)

I'd hope to eventually automate that sort of stuff with Cargo's help too.

"Eventually", hopefully this problem disappears entirely as MIR-only rlibs become feasible in general. The issue is whether we're willing to eat massive regressions in the mean time.

- Of course, we can always do nothing and just keep the grab bag of ad-hoc solutions we have today, and leave log with just a imperative dynamic solution.

- We could continue special-casing and white-listing in the compiler the use-cases I give in the motivation, but at least use the same sort of annotation for all of them for consistency.
But that still requires leaving out `log`, or special casing it for the first time.

This comment has been minimized.

@sfackler

sfackler Jul 2, 2018

Member

I don't think we'd want to use this for log at all. Runtime configuration of loggers is pervasive.

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

Who does that run-time configuration? The libraries logging or the end application?

}
}
extern existential type alloc::Heap = JemallocHeap;

This comment has been minimized.

@sfackler

sfackler Jul 2, 2018

Member

nit: This declaration wouldn't live in the jemalloc crate, but rather the downstream consumer. We want to allow people to pull in jemalloc without using it as the global allocator. (Imagine you want to wrap it in a layer of tracking or whatever)

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 2, 2018

Author Contributor

Yes good point thanks @sfackler. I'll amend that example to show the 3rd crate. I suppose that is like today, too.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Sep 19, 2018

So another need for this feature might have popped up in libcore and libstd (https://internals.rust-lang.org/t/using-run-time-feature-detection-in-core/8419/3).

Basically, libstd currently provides run-time feature detection support, but libcore does not. However, it would be extremely convenient if libcore could use libstds run-time feature detection support when the final binary is linked with it, requiring the user to either implement their own, or using a libcore-specific implementation otherwise.

Without a solution to this problem, we cannot really use SIMD intrinsics effectively in libcore. Since libstd re-exports many libcore components, this means that libstd cannot really use SIMD effectively either.


Similar issues happen with libm, libmvec, etc. For example, floats in core could use libm when it is linked against the final binary, or some rust-specific re-implementation when it is not. The same applies to packed_simd, which could use libmvec when it is linked with the final binary, and a libmvec re-implementation (that just scalarizes vector operations and calls libm) when it is not.

@graydon

This comment has been minimized.

Copy link

graydon commented Jan 12, 2019

Oppose. This is too intrusive relative to the tolerable costs of the existing not-great system.

(Fwiw we did start with a parameterized module system, but that was forever-ago and there are already enough abstraction systems to get by with. The cost of adding back another here is too high.)

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Jan 12, 2019

@graydon

there are already enough abstraction systems to get by with

Can you please suggest which existing abstraction system solves the specific issues mentioned in the motivation section of this RFC?

@graydon

This comment has been minimized.

Copy link

graydon commented Jan 12, 2019

@graydon

there are already enough abstraction systems to get by with

Can you please suggest which existing abstraction system solves the specific issues mentioned in the motivation section of this RFC?

Trait objects and lang items. Extensively discussed in the doc and thread here. They work, I gather you don't think they work well enough, I'm saying I think they do and probably the replacement will work badly in different ways (as many in this thread have suggested), as well as incurring implementation and cognitive costs.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Jan 12, 2019

Not saying that I necessarily agree with this proposal (haven't thought about it in a while), however, lang items are not free either and have their own cognitive and implementation costs. In fact I'd say they have a larger cost when you add all the lang items together that would be otherwise be dealt with by this proposal. Since you are fond of limits, it seems to me that the lang items approach are a bad way to manage limits and allow for growth that is harder to understand and make deliberate choices about on from an overview.

@graydon

This comment has been minimized.

Copy link

graydon commented Jan 12, 2019

I think you have that backwards. Adding a general facility for any user to declare this sort of thing will certainly make any concerns you have about their growth and proliferation worse.

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

Ericson2314 commented Jan 12, 2019

@graydon so the issue here is nastiness of the intersection of state and modularity. If everything was really up to me, I might embrace capability theory and the principle of least authority, and ban these global singletons (global allocator, global logger) and the ambient authority they grant.

But given that most pepople don't want to add that many more (type( parameters on everything, and given that at the very bottom of the software stack the physical world rears it's ugly head (singletons in the form of control registers, interrupt tables, peripherals, and other unique resources), I don't see another choice.

  1. Trait objects: This uses dynamism to solve a static problem. I always reject such dynamism as papering over the issue. Life before main, extra synchronization for lazy initialization, or manual initialization all come with their own can of worms and lack of safety.

  2. "Lang items": This is saying we don't trust our users to avoid the lure of singletons and thus they much lobby with an RFC for official recognition of their problematic peripheral. I am sympathetic; developers definitely shoot themselves in the foot, and a popular, extensible but unduely singleton-ish library could beget all sorts of ecosystem problems. But the various ways to get around this feature without lang items (and new lang items take far too long for most programmers to bother waiting for) are also mine-strewn.


Let me step way back and refer to two works of writing of yours. I've read your https://graydon2.dreamwidth.org/263429.html and your concern about endless complexity in so many places. I would agree complexity and growth is very bad, in fact I would say it is the single worst technical problem with the software ecosystem/industry! I think the key is always to abstract, abstract, abstract. Divide things into as fine "layer" or "components" as possible, and compose them with rigor. In the case of languages this means core languages and syntactic sugar. Thankfully Rust has MIR (and also Chalk). A system IRs and embeddings breaks down complexity and ensures a global coherence of concepts.

That said, this proposal does not fit into either of these core languages. That is indeed cause for concern! I can only hope to assuage you that I very much intend this to fit into a larger story of module-language research, not unlike https://graydon2.dreamwidth.org/253769.html . Declaring abstract types with abstract scope up front is a pretty straightforward sugar over Λs (functors) in an https://people.mpi-sws.org/~rossberg/f-ing/-like manner. 1Ml, which you cited, is a wonderful example of when it's nice to skip the sugar and go straight for the core forms (and also unify concepts then some). I'd love to see a combination of that with the phase separation which our "const eval" provides to allow "cost-free" module combination without a proliferation of superfluous syntax. (I'd consider that combination immanent / low-risk research.)

The odd parts of this are the fact that "functor crates" can't be applied multiple times, and crates include state in statics. This makes for a crudely substructural module system. The nicer version of that, again to address the inherent state of the nasty problems this tackles, I'd characterize as adventurous/uncertain yet-to-be-done research, but still something I'd expect to be worked out by the time Rust would seek to generalize this feature.


I've been musing on this line of things for a while with Rust. Get a chuckle of me starting to write https://github.com/Ericson2314/rust-rfcs/tree/modules late 2014 as if anything so radical could be done 6 months before 1.0! I'm trying to find something this time that 1. is very small and incremental while not being myopic 2. "feels similar" to existing features like impl trait chipping away at the larger modularity problem. and 3. solves an immanent problem rather than just works towards something larger given the reduced scope of the change.

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

Ericson2314 commented Jan 12, 2019

Let me back up @Centril more on the danger of lang items. Each lang item can seem deceptively simple: a little cordoned-off trick for a specific problem, modest in the way it presents itself and problems it seeks to tackle. But take all the lang items together and now there's a laundry list of one-off hacks. There's nothing at all to unify them, and they way the chip off problems 1 by 1 prevents problems from piling up to the point a larger solution can be discovered. [I'm hoping the various lang items and annotations that already exist to solve these sorts problems can be desugared into this approach.]

Fundamentally, I'm wary of people confusing the sheer amount of information with the difficulty of learning said information. [Lots of rhetoric these days ends up framing simplicity/complexity exclusively in terms of that latter.] I maintain that what's easy to learn is heavily biased by what exists, and pedagogical methods can improve. Learning difficulty is immanent to some specific place/time and can be improved after the fact, but total information is irreducible after the fact. Lang items as a solution reek of just optimizing for the second.

@graydon

This comment has been minimized.

Copy link

graydon commented Jan 12, 2019

@Ericson2314 I appreciate the thoughtful reply but I think the ship has sailed. The cognitive space that would be in the language for parametric modules wound up assigned to the trait system. Adding in parametric modules at this point in addition is overkill. As is adding "mini-parametric-modules", which you're doing here.

IOW I think I would be more sympathetic to approaching this by trying to define impl Trait for mod to mean something, but only if it was being done six years ago while we were deciding how to do the monomorphization system. As mentioned by many above, parametric modules in the language we have will play hell with the compilation model. That kind of thing really only works well in a uniform representation language, which rust very much turned out not to be (I wanted it to be on a variety of seams that would allow this sort of thing more easily -- I lost).

@burdges

This comment has been minimized.

Copy link

burdges commented Jan 13, 2019

I've no opinions about the alloc case for this, but.. I worry this would be abused quite badly by the problematic micro-crating trend, meaning it'd make documentation for complex entangled crates even less comprehensible.

As an aside, adversarial crates can break static mut trait objects, so those should be avoided by anything sensitive like randomness. We've enough options for those issues afaik, but if we'd benefit from more then lighter options exist and this sounds like overkill.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Jan 13, 2019

As an aside, adversarial crates can break static mut trait objects, so those should be avoided by anything sensitive like randomness.

To be clear, you're talking about the case where there's a default, an unrelated-looking crate provides an implementation (overriding that default), and there are no other crates that also provide an implementation? Or are you thinking of something else?

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

Ericson2314 commented Jan 15, 2019

mini-parametric-modules

The restrictions this imposes (when it is viewed as sugar over full parametric modules and crates) makes it cover less territory already covered by traits. If/when we expand this to a full ML-style module system, there's no reason we cannot also combine that with the trait system too. (Traits would be canonicalized modules/functors.)

As mentioned by many above, parametric modules in the language we have will play hell with the compilation model. That kind of thing really only works well in a uniform representation language, which rust very much turned out not to be.

I just don't believe this. Traditional batch compilation is a really really simplistic system of incremental compilation. Nothing in the language today or with this PR prevents fancier notions of incremental compilation, which the compiler time is already working on.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Jan 15, 2019

The story around defaults needs to be cleared up. You will always need a definition of these existential types, as of the time the user doesn't care and wants “sane” defaults. The RFC should thoroughly specify how defaults work.

Here are some defaults we'd want:

  • alloc: alloc_system
  • panic: panic_unwind
  • OOM hook: std
  • std::collections::hash_map::RandomState: std
  • feature detection: stdsimd
  • log::Log: something that hooks into the existing log::init logic for backcompat
  1. One option I've seen mentioned in this thread is "cargo-awareness". This is quite vague and needs to be defined. I imagine something where each [lib] crate specifies which existential types are defined in it and which other crate should be used to provide the default. I don't know if this complexity is worth it

  2. The default could be specified at declaration time. If there is no definition in the final artifact, the default will be used. How would this work for extern existential types where the default implementation is in a different crate than the declaration? For example for Alloc, you'd need a separate crate for the trait definition, then alloc and alloc_system can both depend on that, and alloc can depend on alloc_system (to provide the default).

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

Ericson2314 commented Jan 15, 2019

How would this work for extern existential types where the default implementation is in a different crate than the declaration?

This would a be a big advantage to tracking the declarations in definitions in Cargo.toml: put the "default provider" there too. I think Cargo.toml referring to a potentially-unused crate is more natural, since it is also a build plan solving concern, and since rustc can be nicely blind to the default if it ends up not being used (Cargo wouldn't mention it.)

@crlf0710

This comment has been minimized.

Copy link

crlf0710 commented Jan 15, 2019

@jethrogb I think a third option is to provide some mechanism for downstream crates to declare a candidate who can satisfy some upstream extern existential types. If in the final build plan there's only one such declaration, it's chosen. If there's more than one, and user didn't explicitly make their choice, an error is raised.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Jan 15, 2019

The RFC should thoroughly specify how defaults work.

Right now, defaults are handled "magically" by rustc, and this RFC wouldn't change that. For the external existential types used in the standard library, rustc can continue to use magic at its own discretion, while for those used by external crates, with this RFC either the final binary contains a single definition, or the build fails.

Allowing users to "somehow" specify defaults sounds like an orthogonal problem to me and I am not sure whether that's a problem worth solving at the language level.

Maybe we could solve that at the cargo level, e.g., if the build fails because an existential type definition is missing, cargo could automatically add a meaningful crate for you to the dependency graph or something like that, but I don't think we need to solve this problem here.


My thoughts on the "parametric modules" issue is that what this feature provides is a "parametric root-module / program", requiring whoever monomorphizes / builds the program (e.g. main) to pass the appropriate type parameters. Even if we had parametric modules, we would still need to add features to "somehow" allow crates to declare that the root module needs to take some parameters and use them, as well as for whoever builds this root-module to pass the appropriate parameters required by all crates.

EDIT: so basically what I was trying to say here is that I think this feature and parametric modules are kind of orthogonal. For all we know, we might end up settling on the same way to pass / use parameters of the root-module that's specified in this RFC.

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

Ericson2314 commented Jan 15, 2019

@gnzlbg well put! It's indeed useful that we can punt on defaults while it's just the standard library that uses this.

@cbeck88

This comment has been minimized.

Copy link

cbeck88 commented Jan 28, 2019

Hi, I just want to say that as a user of Rust I'm really excited for this RFC and
I think it will make it much easier to use Rust for low-level systems programming
and embedded development.

Right now Rust doesn't give a very good way to do what I'm going to call
"link-time dependency injection".
This means, crate A does not depend on crate B, but crate B is able to
modify the behavior of crate A and inject code into it, and this happens
at build time, so there is no overhead and no moving parts required at runtime.
(I could try to give a concrete example like replacing functionality like
std::time which isn't very easy in Rust right now. In firmware applications
this is sometimes necessary, especially if you want to be able to test firmware
in a controlled environment. At the same time, this is likely to end up in the
critical path because it may needed for logging. This is an example where I think
the best solution would be for this functionality to be provided by an existential
type (or a new lang item) with a default implementation in std that can be
swapped out. And a lang item for this seems pretty unsatisfying because time
doesn't seem like a fundamental language feature.)

This RFC will help to fix this gap in Rust, and the solution will be more sane
than the solutions in C and C++. In this RFC, you are allowed to swap out the default
implementation of some function at build time, but only if it was marked by the
library author as an existential type, which means that they anticipate that it
will be swapped out and have considered the implications of that for their
library. In C and C++, any symbols can be interposed at link time whether
the author intended it or not, and there are no real sanity checks around any of that.


If you don't have something like this RFC, then the alternatives for writing
configurable low-level crates and breaking circular dependencies are:

1.) Do runtime dependency injection instead
2.) Try to use features for this
3.) Build your crates as dylibs and try to use shared library interposition
to shadow symbols

Option 1 means roughly this:

  • Crate A contains a static mutable function pointer instead which points to
    some default implementatin, and an explicit initialization function is provided
    which B can call at program startup to swap in a different implementation.
    (In Rust it might instead look like, there is a static mut Box<T> where T
    is some trait, and probably some mutex or atomic variables around it.)

This is error prone -- someone has to make sure that each target actually calls
this initialization routine correctly. Rust has no life-before-main or static
constructors where this initialization call could be placed, and it's probably
for the better. It's always better to have a crate with no explicit
initialization required if at all possible.

Additionally, there are tricky systems problems here with mutable global state,
and swapping out function pointers (or trait objects) safely at runtime like
this can be sketchy. It's hard to be sure that no one began using the function
pointers before they were set to the final value, and there are challenges
introduced if there are multiple threads in the program.

Additionally, using runtime dependency injection instead of link-time dependency
injection means that we end up with an indirect jump instead of a direct jump.
If the program must be compiled with Spectre / Meltdown mitigations enabled,
then an indirect jump may be several times slower, even if using retpolines.
So if the code being injected is some performance-critical I/O routines, then
this approach may have measurable and serious consequences for
performance-critical systems.

Option 2 means roughly:

  • Crate A implementation must know about all possible alternate implementations,
    and expose features that select them appropriately
  • Crate B depends on Crate A with the correct feature selected

This is really difficult in practice because any other crate that depends on A
will also have to have default-features = false or some similar thing in its
Cargo.toml, and it also fundamentally requires patching A to change its behavior.
In a large project this basically means we have to patch the whole world to change
the behavior of A, and in a C/C++ project the analogous thing would be introducing
ifdefs in order to try to solve the problem.

Option 2 is basically a non-solution. If I'm developing Crate A as an open-source
project, I want to be able to give people customization points that they can easily
use to make my crate behave differently, in ways that I didn't anticipate in advance.
Option 2 basically means everyone must patch my crate to get it to work the way
they want.

Option 3 means roughly:

We mimic one conventional way of doing this in C and C++. The dependency injection
isn't happening at run-time, but when the dynamic link loader runs -- we link to
a library that provides symbols of the same name but with different implementations,
in such a way that these implementations get chosen.

In our rust sources, we probably just have to decorate our functions with
[no_mangle] and pub extern.

The main drawbacks of this compared to existential types as I see it are:

  • With existential types, you either replace the whole thing or none of it.
    It's not possible to replace malloc but not free for instance, so that
    seems safer
  • Existential types documents your intention that this is a customization point
    that Rust code can make use of.
    [no_mangle] and pub extern have other uses like interfacing with C, they
    don't make it clear that you expect that someone might start shadowing these
    functions.
  • Ideally this functionality would work with static linkage as well -- shadowing
    symbols is orthogonal to shared vs. static in C and C++. If you want everything
    to actually get inlined and optimized using LTO then it needs to be static.
  • Particularly, it would ideally work for things you are stubbing out in std,
    even if you are statically linking the standard libraries.
  • Using existential types seems to have less cognitive overload than this
    approach.

@Centril Centril referenced this pull request Feb 14, 2019

Merged

Pre-RFC: `std` aware Cargo #1

@Ericson2314

This comment has been minimized.

Copy link
Contributor Author

Ericson2314 commented Mar 6, 2019

Now that Rust 2018 is out, #2480 is picking up steam again, the switch to hashbrown and parking_lot are in progress, and jamesmunns#1 it's an interesting time for facade related things and hopefully less busy core teams. Could we reach a decision on this soon then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.