Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a Rust ABI #600

Open
steveklabnik opened this Issue Jan 20, 2015 · 63 comments

Comments

Projects
None yet
@steveklabnik
Copy link
Member

steveklabnik commented Jan 20, 2015

Right now, Rust has no defined ABI. That may or may not be something we want eventually.

@ranma42

This comment was marked as off-topic.

Copy link
Contributor

ranma42 commented Jan 20, 2015

CC me

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Aug 17, 2016

See #1675 for some motivation for this feature (implementing plugins, that is plugins for Rust programs, not for the compiler).

@genodeftest

This comment has been minimized.

Copy link

genodeftest commented Dec 16, 2016

Another motivation is the ability to ship shared libraries which could be reused by multiple applications on disk (reducing bandwith usage on update, reducing disk usage) and in RAM (through shared .text pages, reducing RAM usage).

@sdroege

This comment has been minimized.

Copy link

sdroege commented Mar 27, 2017

It would also make Linux distributions more happy as it would allow usage of shared libraries. Which apart from memory reduction in different ways, also make handling of security issues (or otherwise important bugs) simpler. You only have to update the code in a single place and rebuild that, instead of having to fix lots of copies of the code in different versions and rebuild everything.

@Arzte

This comment has been minimized.

Copy link

Arzte commented Mar 29, 2017

Shared libraries can be cool provided that there is a way to work around similar libraries accomplishing something in different ways such as openssl & libressl. There could also be value with making a way to allow a switch in-between static & dynamic libraries that can be set by the person compiling the crate. (possible flag?)

@Conan-Kudo

This comment has been minimized.

Copy link

Conan-Kudo commented Mar 29, 2017

In my opinion, it's very hard to take Rust seriously as a replacement systems programming language if it's not possible in any reasonably sane manner to build applications linking to libraries with a safe, stable ABI.

There are a lot of good reasons for supporting shared libraries, not the least of which is building fully-featured systems for more resource constrained devices (like your average SBC or mobile device). Not having shared libraries really blows up the disk utilization in places where it's not cheap.

@jpakkane describes this really well in his blog post where he conducts an experiment to prove this problem.

@vks

This comment has been minimized.

Copy link

vks commented Mar 29, 2017

In my opinion, it's very hard to take Rust seriously as a replacement systems programming language if it's not possible in any reasonably sane manner to build applications linking to libraries with a safe, stable ABI.

Note that C++ still doesn't have a stable ABI, and it took C decades to get one.

@steveklabnik

This comment has been minimized.

Copy link
Member Author

steveklabnik commented Mar 29, 2017

It would also make Linux distributions more happy as it would allow usage of shared libraries.

There are a lot of good reasons for supporting shared libraries

(two quotes from two different people) Note that rust does support shared libraries. What it doesn't support is mixing them from different compiler toolchains. Some linux distros already do this with Rust; since they have one global rustc version, it works just fine.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Mar 29, 2017

Rust also supports exporting functions with stable ABI just fine, as, for example, this shows.

@sdroege

This comment has been minimized.

Copy link

sdroege commented Mar 29, 2017

Note that C++ still doesn't have a stable ABI, and it took C decades to get one.

While that is true, implementations (GCC, clang, MSVC at least) have a (somewhat) defined ABI and it only changes every now and then. With Rust there is no defined ABI at all and things might break in incompatible ways with any change in the compiler, a library you're using or your code, and you can't know when this is the case as the ABI is in no way defined (sure, you could look at the compiler code, but that could also change any moment).

What it doesn't support is mixing them from different compiler toolchains. Some linux distros already do this with Rust; since they have one global rustc version, it works just fine.

The problem is not only about compiler versions, but as written above about knowing what you can change in your code without breaking your ABI. And crates generally tracking their ABI in one way or another. Otherwise you can't use shared libraries created from crates in any reasonable way other than recompiling everything (and assuming ABI changed) whenever something changes.

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Mar 29, 2017

At least on Linux, everyone has pretty much settled on the Itanium C++ ABI. But even with the compiler locked down, it still requires very careful maintenance by a library author who hopes to export a stable ABI. Check out these KDE policies, for instance.

Rust crates would have many of the same challenges in presenting a stable ABI. Plus I think this is actually compounded by not having the separation between headers and source, so it's harder to actually tell what is reachable code. It's much more than just pub -- any generic code that gets monomorphized in your consumers may have many layers of both public and private calls to make to the original crate's library. And all of that monomorphized code has to remain supported as-is when you're shipping updates to your crate.

@plietar

This comment has been minimized.

Copy link

plietar commented Mar 29, 2017

Had a long term crazy idea to solve this.

Define a stable format for MIR and distribute all Rust binaries/shared libraries in that format. Package managers have a post-install step to translate the MIR into executables or .so files. Only the version of the MIR->binary backend "linker" has to be the same, the version of the source->MIR frontend compiler can differ.

Because of monomorphization and unboxed types, you need to still need to relink all the MIR files when a dependency is updated. Similarily, updating the backend compiler requires all the MIR files to be recompiled.

However, assuming we can push more and more optimisation passes into MIR rather than llvm, the time spent in the backend should be reduced to something acceptable.

If you want to push it even further, keep everything in MIR form and use miri (or a JIT version of it) to run them. Frequently used files can be linked and persisted to disk. And we've just reinvented the JVM/CLR/Webassembly.

@jpakkane

This comment has been minimized.

Copy link

jpakkane commented Mar 29, 2017

Rust also supports exporting functions with stable ABI just fine, as, for example, this shows.

For the purposes of this discussion that would require that every single Rust crate only exports a C ABI. Which is not going to happen.

@vks

This comment has been minimized.

Copy link

vks commented Mar 29, 2017

While that is true, implementations (GCC, clang, MSVC at least) have a (somewhat) defined ABI and it only changes every now and then.

Try linking to any interface that uses std::string. You cannot mix different gcc versions and clang, because they use incompatible implementations of std::string.

@sdroege

This comment has been minimized.

Copy link

sdroege commented Mar 29, 2017

While that is true, implementations (GCC, clang, MSVC at least) have a (somewhat) defined ABI and it only changes every now and then.

Try linking to any interface that uses std::string. You cannot mix different gcc versions and clang, because they use incompatible implementations of std::string.

Somewhat off-topic (we're not talking about C++ here), but as long as you stay in a compatible release series of those there is no problem. And with the correct compiler switches you can also e.g. also build your C++ code with gcc 7 against a library that was built with gcc 4.8 and uses/exposes std::string in its API.

The second part seems unnecessary but nice and useful (and a lot of work), but if the first part would be true for Rust that would be a big improvement already: a defined ABI, which might change for a new release whenever necessary

@FranklinYu

This comment has been minimized.

Copy link

FranklinYu commented May 29, 2017

Quote from GCC about ABI compatibility (as a note to myself):

…Versioning gives subsequent releases of library binaries the ability to add new symbols and add functionality, all the while retaining compatibility with the previous releases in the series. Thus, program binaries linked with the initial release of a library binary will still run correctly if the library binary is replaced by carefully-managed subsequent library binaries. This is called forward compatibility. …

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented May 29, 2017

Relying on libraries to correctly maintain binary compatibility is just an easily avoidable safety hazard. What is wrong with just letting the package repository rebuild dependent code? (To be clear, in this scenario using shared libraries is a given. Shared libraries and ABI stability are independent issues.)

And before someone argues about update download size, I must note that differential updates are not that difficult (no pun intended). In fact, if reproducible builds are done well, the resulting binary should be identical unless the library ABI has actually changed.

@Conan-Kudo

This comment has been minimized.

Copy link

Conan-Kudo commented May 29, 2017

Because that assumes rebuilding is cheap. While it might be true for small projects, it's a fairly expensive and crappy process when you have long chains of things in big projects.

And it also makes it impossible to rely on Rust for actual systems programming because pure Rust libraries cannot be relied on for any given period of time.

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented May 29, 2017

And it also makes it impossible to rely on Rust for actual systems programming because pure Rust libraries cannot be relied on for any given period of time.

What exactly can't be relied on? Systems programming is my area of interest and I don't see what you mean.

@jpakkane

This comment has been minimized.

Copy link

jpakkane commented May 29, 2017

To get an understanding of how much rebuilding and interdependencies there actually are in a full blown Linux distro, please read this blog post. It talks about static linking so it is not directly related to this discussion but useful to get a sense of scale.

What is wrong with just letting the package repository rebuild dependent code?

Well as an example on Debian there are 2906 packages that depend on GLib 2.0. Many more depend in it indirectly.

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented May 29, 2017

Fair enough. But you don't actually need stable ABI for any of that. The ABI can change between rustc versions, but different builds on the same version are still compatible. So your distro only needs to rebuild stuff whenever they bump rustc version, which I assume is not gonna be often for a typical distro.

@Conan-Kudo

This comment has been minimized.

Copy link

Conan-Kudo commented May 29, 2017

At least in Fedora and Mageia, you'd be wrong about that. We bump Rust almost right after the new version arrives.

@mssun

This comment has been minimized.

Copy link

mssun commented Sep 27, 2017

Additionally, you can build with -C prefer-dynamic to get rustc to prefer linking against dylibs when possible. You can get cargo to use this with the RUSTFLAGS env var.

This is not as easy as described. Setting RUSTFLAGS will affect cargo's building behavior. See this issue: rust-lang/cargo#4538

Basically, you cannot easily build dependencies as dylib by setting RUSTFLAGS with -C prefer-dynamic. Cargo will compile dependencies as rlib by default (and is not configurable).

Therefore, I agree that the whole discussion is not just about ABI stability. To achieve dynamic linking, some core tools like cargo needs modifications.

One misconception is that because Rust doesn't have a defined ABI, you can't do dylibs. This is factually incorrect. rustc is dynamically linked to many of its deps. You can dynamically link to all your crate dependencies. This doesn't mean all code will be shared, or even most. Any generic types will get monomorphized on-demand and usually inlined into their callsites.

Yes. One can compile dylib by setting rustc with the --crate-type dylib flag. However, as discussed before, you cannot link lib compiled with old rustc to binary with new rustc. All binaries and dynamic libraries need to be compiled with same rustc. I believe this is more about ABI compatibility.

All problems you mention can already be solved nowadays without any of the two though, you "only" need to recompile the world whenever anything changes.

Yes. But when using Rust as a system language, "recompile the world" is not a good idea. This means that when a libraries is updated, all libraries and binaries depend on it needs to be recompiled. It will have a very long compilation time when many Rust binaries are dynamic linked.

not sure how e.g. Linux distributions are handling issues related to that.

I think package maintainers have to resolve the linking issues. In addition, I guess most core C/C++ libraries in Linux distributions are more stable than Rust crate lib.

@mzabaluev

This comment has been minimized.

Copy link
Contributor

mzabaluev commented Nov 18, 2017

Because you have no guarantees whether that change causes the ABI to be different, and if that change in generic code requires rebuilding users of that generic code because they got their own monomorphized copies of it.

This means the following two things:

  1. The ABI boundary of a crate does not necessarily consist of public items only. So the crate developers would have much more than #1105 to consider if semver semantics are extended to the ABI. To mirror one particular conclusion in that RFC, backward compatibility across behavioral changes in generics would be entirely up to the developers to maintain.
  2. ABI stabilization shall include stabilization of the serialization format for generics baked into dylibs.

I think the linkability guarantee can be solved by an ABI check tool akin to rust-semverver, which should be able to check the serialized generics and the underlying non-generic items for semver-significant breaks.

Making the crate's ABI surface evident to the developers could perhaps also be solved by tools, by forcing to annotate every internal item exposed via generics with API stability attributes when commitment to an API version is made.

@mzabaluev

This comment has been minimized.

Copy link
Contributor

mzabaluev commented Nov 18, 2017

Note that reachability analysis for internal functions and methods should be performed by the crate linker already now, unless it emits all internal non-generic functions as dynamic symbols just in case they are used in some generic. That would be a bad thing.

@dgrunwald

This comment has been minimized.

Copy link
Contributor

dgrunwald commented Nov 18, 2017

There's really three different goals here:

  1. Provide guarantees of what code changes a developers can make to their crates without breaking ABI compatibility for binaries compiled against the older version of that crate.
  2. Allow recompiling a crate with a newer rustc version without breaking ABI compatibility.
  3. Document the ABI used by rustc, so that other compilers that link against rust binaries can be written.

The Rust ABI Specification would solve all three goals. However the first goal is only solved indirectly -- an explicit list of allowed code changes is still desirable.
But there's nothing really stopping us from reaching goal 1 without defining a Rust ABI. Yes, you'll still need to recompile the world after upgrading rustc, but at least developers can upgrade their dynamic library if they keep using the same rustc. This kinda already works in practice (as long as you follow an undocumented set of rules), we just need some documentation, and ideally an automatic ABI compatibility check.

As an example for a starting guarantee that we could already give:

  • Changing the body of non-generic, non-inline functions does not affect the crate ABI.

Such a guarantee does restrict our ABI choice -- e.g. it prevents us from picking the best calling convention based on the call sites or the function body (at least for exported functions). But it places no restrictions on what the calling convention is, as long as it's only influenced by the function signature and rustc version.

Start with something like that, build an ABI compatibility check tool, then slowly add more guarantees (thus slowly restricting the set of possible ABIs). This way we get the benefits of goal 1 without the huge amount of work of fully defining the rust ABI (for each platform!), and without preventing all future optimizations to type layout / calling conventions.

@marmistrz

This comment has been minimized.

Copy link

marmistrz commented Mar 22, 2018

It could be a good idea to do exactly what GHC does - the binaries produced by 8.x are compatible with what 8.y produces. This means that compiler ABI change implies a major version bump.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 22, 2018

@marmistrz I'd like to note that GHC has a version 8, so there was a version 7, but Rust will never have a 2, so in effect what you are saying is that we should define a stable ABI for ever.

@marmistrz

This comment has been minimized.

Copy link

marmistrz commented Mar 22, 2018

Why not? Why not release a 2.0 even if there are no API changes but just ABI ones?

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 22, 2018

@marmistrz Sure, that is possible. However, moving to a version 2.0 has psychological effects beyond technical aspects. It could lead to the impression that we don't take backwards compatibility seriously even if there only were ABI changes. We would need a new merged RFC to support this change.

@marmistrz

This comment has been minimized.

Copy link

marmistrz commented Mar 22, 2018

@Centril
Rust breaks backwards compat anyway, e.g. by rust-lang/rust#34537

I sometimes wish C++ threw away some of the old C-compat features. That would be a much more nice language to write in. Rust may need to face the same in a 20 years timeframe.

When Python 3 was announced, Python 2 was given a dying period of at least 5 years to give people the time to move to Python 3.

the main point is that ABI compatibility may be handled by versioning conventions, and the GHC scheme was just an example. If it's handled by minor versions which are multiples of prime numbers or any other scheme - that's just an implementation detail.

@sdroege

This comment has been minimized.

Copy link

sdroege commented Mar 22, 2018

It could be a good idea to do exactly what GHC does - the binaries produced by 8.x are compatible with what 8.y produces. This means that compiler ABI change implies a major version bump.

This solves nothing from the issue here. The problem is defining an ABI so that also crate authors know which changes they can make without breaking ABI, and ideally defining even some way of versioning the ABI :)

While it would certainly be nice if Rust and the standard library itself would somehow signal ABI compatibility between versions, that's only a very small part of the whole thing (and to some degree we already have that: every stable release changes the ABI).

@saschmit

This comment has been minimized.

Copy link

saschmit commented Mar 23, 2018

Let me see if I understand this right:

  • Rust will never have a version 2 because people will think Rust doesn't take backwards compatibility seriously, and that would be bad
  • Rust deliberately breaks ABI every release with no backward compatibility at all, and that's just fine for a compiled system programming language that made a big deal out of language stability starting with 1.0.
@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Mar 23, 2018

Rust has been committed to a stable language and API since 1.0, but ABI stability has never been claimed. Your code is compatible; your binaries are not.

@marmistrz

This comment has been minimized.

Copy link

marmistrz commented Mar 23, 2018

For reference: on the ABI state in Swift: https://swift.org/abi-stability/#data-layout

@Timmmm

This comment has been minimized.

Copy link

Timmmm commented Jul 31, 2018

Another benefit I don't think anyone has mentioned is that distributing closed source libraries is a huge pain without a stable, or at least versioned ABI.

C++ also has this problem - most people's solution is just to compile the library with a load of different compilers and settings and hope for the best, but that is pretty awful. Another awful solution is to provide a C wrapper of your C++ library, and then an open source C++ wrapper of that. It would be nice if Rust was better.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Jul 31, 2018

@Timmmm You can totally distribute a closed source library, but you have to pick a rustc version and by default we keep far too much information around to be too "closed source".

This isn't even about ABI compatibility - you can't compile against the Rust typesystem across compiler versions, and I don't recall proposals for how this could even be made to work.

@Diggsey

This comment has been minimized.

Copy link
Contributor

Diggsey commented Jul 31, 2018

Yeah, I don't really see this happening as it would effectively prohibit all future changes to the language. However, there's another option which might work:

  • Define a new ABI, which will essentially be a superset of the C ABI
  • This ABI will support C types, plus a limited set of rust types (Vec, slices, strings, structs annotated with repr(NewABI), etc.)

This would allow much more convenient interop, whilst only locking down those layouts which are already effectively stable.

For bonus points, the layout of types in this new ABI could be defined as mappings to equivalent C structures, allowing the use of the new ABI from other languages as well.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Jul 31, 2018

@Diggsey So you would still be restricted to monomorphic declarations? What's the advantage over some sort of interop library using the C ABI that's provided as source?

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Jul 31, 2018

Some previous discussion on a "safe superset of C" ABI: https://internals.rust-lang.org/t/cross-language-safer-abi-based-on-rust/4691

@Timmmm

This comment has been minimized.

Copy link

Timmmm commented Aug 1, 2018

So if this task were completed, and you wanted to use Rust plugins or shared libraries you'd still have to compile them all with exactly the same compiler, settings, libc, etc? That doesn't seem particularly workable.

I like the idea of a better-than-C ABI though that supports a stable subset of Rust types.

@golddranks

This comment has been minimized.

Copy link

golddranks commented Aug 7, 2018

I also think it would be valuable to have an officially supported ABI that supports, at version 1, a set of commonly used Rust "vocabulary" types: Option (with deterministic null-pointer optimization and tag layout), Result, Vecs, strings, slices, references (with possibly limited semantics for lifetimes, like supporting non-static things only as parameters to higher-rank lifetimed functions), possibly dyn Traits. It would need to be opt-in for types used in FFI not to limit the evolution of the language, but that doesn't differ from the current situation, doesn't it.

I think it would be great to have an official solution because that would bring people together, towards a shared interface. We only have a single alternative – the C ABI – at the moment, and people are building their own abstractions on top of that because C ABI isn't expressive enough. I think Rust provides a great set of primitives, and I think providing an official ABI would bring value to other languages too. I could imagine some scripting language runtimes and some other emerging systems languages such as Zig saying "we now also support Rust ABI v1.0 for FFI!"

@marcthe12

This comment has been minimized.

Copy link

marcthe12 commented Aug 11, 2018

So useful points from other languages.
Here is an arch wiki article on Haskell, arch dynamic links haskell packages even though there is no stable abi.
This should gives some ideas on how to go about dynmic linking aspect.
An important quote:

Dynamic linking is used for most Haskell modules packaged through pacman and some packages in the AUR. Since GHC provides no ABI compatibility between compiler releases, static linking is often the preferred option for local development outside of the package system.

Here is an Haskell package and one of its depencies. Could also be a refrence on how to go about it
https://www.archlinux.org/packages/community/x86_64/shellcheck/
https://www.archlinux.org/packages/community/x86_64/haskell-aeson/
As I use shellcheck personally, I note that every update of ghc and haskell-aeson, shellcheck gets rebuilt. At the time of writing shellcheck has 59 rebuilds without a version bump (c++ packages generally have less than 10). So a stable abi is not needed, but is very much preferred for dynamic linking.

One step is to list what parts that need to be exported in binary form. Some ideas for ABI could be taken from Vala and Zig since they have a stable ABI. Vala can be a reference for OO feature for example.

@mzabaluev

This comment has been minimized.

Copy link
Contributor

mzabaluev commented Oct 8, 2018

Vala can be a reference for OO feature for example.

As far as I know Vala translates into C-equivalent code + GLib/GObject/Gio library calls, so it does not need to define its own ABI. Its OO features are partially dynamic and are backed by the runtime library stack.

@sdroege

This comment has been minimized.

Copy link

sdroege commented Oct 8, 2018

As far as I know Vala translates into C-equivalent code + GLib/GObject/Gio library calls

Yes, Vala just compiles to C and apart from that uses the GLib/GObject conventions and ABI for naming functions, types, etc. It does not really apply to Rust in any way.


https://internals.rust-lang.org/t/pre-rfc-a-new-symbol-mangling-scheme/8501 is a good starting point for defining the symbol name part of an ABI instead of the current ad-hoc (AFAIU?) scheme.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 3, 2018

@sdroege Note that I really don't want to guarantee any symbol mangling scheme, and I'd prefer if, from the start, we had an option to generate short symbol names (even just a hash).
cc @michaelwoerister

@sdroege

This comment has been minimized.

Copy link

sdroege commented Nov 3, 2018

@sdroege Note that I really don't want to guarantee any symbol mangling scheme, and I'd prefer if, from the start, we had an option to generate short symbol names (even just a hash).

Sure but having it documented and having it "guaranteed" for a single, specific compiler version is already a good improvement over the undocumented (AFAIK, apart from the rustc code of course) current mangling scheme.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 3, 2018

@sdroege Sure, but it would depend on compiler flags, and not be part of the ABI itself.

That is, resolving cross-compilation-unit function/global references will still be done in an implementation-defined manner¹, and symbol names would only serve as a form of debuginfo.

¹ this is done through the identity of the item being referenced, even if that requires recording a mapping to the symbol name or enough information to deterministically recompute it (I wish binary formats were more identity-oriented instead of relying on strings everywhere...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.