Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: add Rust submodule as libbasic_rust #19598

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

bluca
Copy link
Member

@bluca bluca commented May 13, 2021

Proof-of-concept to test integrating Rust submodules.

This adds a (very naive) rewrite of string_extract_line from string-util.h
in Rust, and compiles it in so that the rest of our C code, including the
unit tests, call into it instead.

Given we are only using the standard library, there is no need for a
(complicated) integration with Cargo, and the only additional
requirement is the Rust compiler. Meson abstracts it quite nicely.

Note that this is the first time I've written Rust - so it's most likely horrible and completely wrong. Learning was the main idea of the exercise :-)
Opening a PR as I think there were questions in the past about how easily it would be to add Rust modules in the code base.

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the meson integration doesn't look to bad.

one thing that makes me wonder: does it make sense to build mixed C+rust projects still with gcc at all? i.e. during the fedora default compiler discussion people suggested gcc and llvm still generate slightly incompatible interfaces. but otoh it sounds weird using llvm for the rust stuff but then still use C for the gcc part. what#s the story there?

@mbiebl on a scale from 1 to 10 how much would debian hate if we actually would adopt this?

src/basic/string-util.h Outdated Show resolved Hide resolved
pub extern "C" fn string_extract_line(s: *const c_char, i: usize, ret: *mut *mut c_char) -> i32 {
if ret.is_null() {
panic!("string_extract_line(): 'ret' cannot be NULL");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i ifgure the 4ch indenting is rust-native styling?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, not too much clue about rust here, but there must be a way to encode this in the function call signature, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i ifgure the 4ch indenting is rust-native styling?

https://github.com/rust-dev-tools/fmt-rfcs/blob/master/guide/guide.md#indentation-and-line-width

hmm, not too much clue about rust here, but there must be a way to encode this in the function call signature, no?

Quite possibly, but I don't know. First time ever I wrote Rust :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust person here; can I offer some help? 😀

i ifgure the 4ch indenting is rust-native styling?

Yes

hmm, not too much clue about rust here, but there must be a way to encode this in the function call signature, no?

Not really. An actual Rust-y version of this function would look somewhat like this:

fn string_extract_line(s: &str, line_index: usize) -> Option<&str> {
    s.lines().nth(line_index)
}

(Note that this always returns a slice of the original string and never does a copy.)

But this function is instead written to use C types (and C ABI) and be callable from external C. So it's rather pointless to try to enforce anything in its Rust-side signature, as C is still going to see its own definition. That, and also raw pointers (as opposed to references) can be null, dangling and unaligned.

On the other hand, since this function is already unsafe to call (it dereferences raw pointers! — and by the way, it should be marked unsafe for this reason), and since the (old) C version of this function check doesn't appear to check the pointer for being null, it's OK to just say this is similarly a precondition here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust person here; can I offer some help?

Yes, please! Pushed a new version, with some improvements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Rust code should be tested thoroughly as well.

Sure.

I'm talking about various analyzers (like Coverity and LGTM for C), fuzzers and so on.

Well, I again admit I don't have any experience with using those tools with Rust, but Rust should not be much different from C in the regard. You pass the appropriate flags to the compiler (and perhaps Meson handles this for you similarly to how Cargo does), and then you invoke whatever analyzer tool. As discovered below, you'd likely need libstd built with the same flags, but that should be it.

But this is not my area of expertise (I do use Cargo for all my Rust projects...); better ask Meson developers for what they recommend.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are 2 ways that cargo runs commands

  1. The subcommand is built into cargo. In this case it is intended that anything you can do with cargo you can also do by calling rustc or other binaries like rustdoc directly.
  2. Cargo also provides an extension mechanism so arbitrary commands can masquerade as cargo subcommands. Basically, if some executable with a name like cargo-mytool or cargo-mytool.exe on windows exists, then running cargo mytool will run your tool and pass command line parameters in.

It should be possible to run those programs simply by calling them directly (e.g. cargo-fuzz --my --args).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cursory search of cargo-fuzz suggests that there's no requirement for cargo when running the tool (although building it does require cargo).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the second case the command very likely uses cargo internally. Depending on the tool, the cargo subcommand either has a hard dependency on cargo or just uses cargo to build everything with the necessary flags and for example overrides the rustc used by cargo or gets the necessary information about the target crate after building all dependencies and then invokes the main tool. cargo clippy for example uses RUSTC_WORKSPACE_WRAPPER to run clippy-driver instead of rustc for all crates inside the current workspace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I more or less figured out how to extract the part of "cargo fuzz" dealing with the compiler options like -Cpasses=sancov, -Cllvm-args=-sanitizer-coverage-level=4 and so on I need. It kind of even works in the sense that I was able to run the systemd fuzz targets built with both clang and rustc. So I agree that it's possible to kind of bring cargo plugins to the systemd build system. The problem is that borrowing code like that isn't maintainable. Someone would have to keep track of what changed where and update systemd accordingly.

@poettering
Copy link
Member

poettering commented May 14, 2021

i wonder if one approach to the rustification would be to start with the stuff in src/basic/ and convert it one by one, always making sure our tests for this still pass, and keeping the C version in the codebase, verifying they match the rest version in behaviour by providing careful tests. THen one day, when we are confident about it we could just remove the C code and then make rust a requirement. This would allow us to graudually move things over, and more forward looking distros could build the rust versions of stuff while more traditional distros (where compat with legacy archs like sparc or alpha is important) could hold out a bit longer with the C versions of stuff.

The price for such an approach would of course be that we'd have to maintain two versions of much of the stuff in src/basic for a while, but maybe that's a good thing, while we are still learning (and by "we" I mean in particular myself, since at this point my superficial understanding of Rust is very superficial)

@mbiebl
Copy link
Contributor

mbiebl commented May 14, 2021

@mbiebl on a scale from 1 to 10 how much would debian hate if we actually would adopt this?

https://buildd.debian.org/status/package.php?p=rustc
This would affect quite a few ports, admittedly the more exotic ones.
I know there was quite some backlash when librsvg switch to rust.

@mbiebl
Copy link
Contributor

mbiebl commented May 14, 2021

https://buildd.debian.org/status/package.php?p=rustc
This would affect quite a few ports, admittedly the more exotic ones.
I know there was quite some backlash when librsvg switch to rust.

I know @smcv was involved there, maybe he can share his experience.

@poettering
Copy link
Member

This would affect quite a few ports, admittedly the more exotic ones.
I know there was quite some backlash when librsvg switch to rust.

But how did that play out? Is DEbian still stuck with the old C version of librsvg for all arcs? or did they finally make the jump? does debian ship the rust version on modern archs and the C version on legacy archs?

@mbiebl
Copy link
Contributor

mbiebl commented May 14, 2021

Would it make more sense to start with a leaf/optional component, like say homed, which can be disabled if necessary to start getting familiar with rust?

};
let mut found: bool = false;

for (j, item) in s.lines().enumerate() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's a "line" in rust btw? i.e. what separators are accepted? \n? \r? \0? combinations thereof?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's quite some difference to our current logic... i'd prefer if we didn#t introduce that silently

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the tests are passing :-) Apart from the \r\n, what else is different?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW using \n unconditionally is simple enough:

    for (j, item) in s.split('\n').enumerate() {

@mbiebl
Copy link
Contributor

mbiebl commented May 14, 2021

But how did that play out? Is DEbian still stuck with the old C version of librsvg for all arcs? or did they finally make the jump? does debian ship the rust version on modern archs and the C version on legacy archs?

I think older versions of the C-based library still linger on in the archive as binary packages. Afaik, there is no source package of the old librsvg anymore. So librsvg can't be rebuilt on those architectures. I assume for librsvg one could introduce a separate source package which a different name for those architectures but it seems nobody has done it, as this is quite a maintenance overhead. For systemd this wouldn't be feasible.

@mbiebl
Copy link
Contributor

mbiebl commented May 14, 2021

I know there was quite some backlash when librsvg switch to rust.

https://lists.debian.org/debian-devel/2018/11/msg00035.html

@mbiebl
Copy link
Contributor

mbiebl commented May 14, 2021

@glaubitz ^

@bluca
Copy link
Member Author

bluca commented May 14, 2021

the meson integration doesn't look to bad.

one thing that makes me wonder: does it make sense to build mixed C+rust projects still with gcc at all? i.e. during the fedora default compiler discussion people suggested gcc and llvm still generate slightly incompatible interfaces. but otoh it sounds weird using llvm for the rust stuff but then still use C for the gcc part. what#s the story there?

It's using rustc though, not llvm? Or are they the same thing? The rustc debian package pulls in gcc for example, not llvm

@behrmann
Copy link
Contributor

It's using rustc though, not llvm? Or are they the same thing? The rustc debian package pulls in gcc for example, not llvm

rustc is LLVM-based. Currently GCC doesn't have a rust frontend, although one is currently being built and there is also the mrustc project not related to either LLVM or GCC.

@mbiebl
Copy link
Contributor

mbiebl commented May 14, 2021

It's using rustc though, not llvm? Or are they the same thing? The rustc debian package pulls in gcc for example, not llvm

I guess rustc uses llvm via libllvm? Not sure why it has the gcc dependency tbh.

@bluca
Copy link
Member Author

bluca commented May 14, 2021

i wonder if one approach to the rustification would be to start with the stuff in src/basic/ and convert it one by one, always making sure our tests for this still pass, and keeping the C version in the codebase, verifying they match the rest version in behaviour by providing careful tests. THen one day, when we are confident about it we could just remove the C code and then make rust a requirement. This would allow us to graudually move things over, and more forward looking distros could build the rust versions of stuff while more traditional distros (where compat with legacy archs like sparc or alpha is important) could hold out a bit longer with the C versions of stuff.

The price for such an approach would of course be that we'd have to maintain two versions of much of the stuff in src/basic for a while, but maybe that's a good thing, while we are still learning (and by "we" I mean in particular myself, since at this point my superficial understanding of Rust is very superficial)

Yeah, that's pretty much what I was thinking too, it would make sense. The double implementation wouldn't probably be too bad - in the end when we add helpers to src/basic/ they are always pretty small and self-contained, so writing them twice shouldn't take too much time. And tree-wide refactors are often due to style or C-specific stuff, so wouldn't apply anyway.

The main issue I see is the one you already clocked - there are many architectures where rustc is not available. This is a known problem, and unfortunately I do not have an answer there. In Debian, the affected architectures are not "release" archs - so by the letter of the law, one is allowed to break things there. But it is not very nice :-)

If we do the double-implementation thing, then I can cook up the meson-ry required to make it work, so that if there's no rustc things work gracefully.

@glaubitz
Copy link
Contributor

FWIW, a Rust frontend for GCC is currently work-in-progress with the developers being commercially supported:

https://rust-gcc.github.io/

So it might not be need to use the original Rust compiler in the near future.

@bluca
Copy link
Member Author

bluca commented May 14, 2021

It's using rustc though, not llvm? Or are they the same thing? The rustc debian package pulls in gcc for example, not llvm

rustc is LLVM-based. Currently GCC doesn't have a rust frontend, although one is currently being built and there is also the mrustc project not related to either LLVM or GCC.

It's using rustc though, not llvm? Or are they the same thing? The rustc debian package pulls in gcc for example, not llvm

I guess rustc uses llvm via libllvm? Not sure why it has the gcc dependency tbh.

Ah I see, thank you - it shows that I'm a total Rust n00b :-)
I just looked at the runtime dependency and versioning and got confused, but I see rustc has indeed a libllvm build-dependency.

@glaubitz
Copy link
Contributor

But how did that play out? Is DEbian still stuck with the old C version of librsvg for all arcs? or did they finally make the jump? does debian ship the rust version on modern archs and the C version on legacy archs?

Yes, we're still stuck with these old librsvg versions on the architectures which don't have Rust support yet.

m68k is gaining Rust support in the foreseeable future since LLVM has now an M68k backend as well and I have already added support to m68k in Rust in a forked repository.

However, I think in the long-term, the better approach will be to use the upcoming Rust frontend in GCC. This way, there also won't be any need for additional compilers or tools to be able to build the codebase.

@glaubitz
Copy link
Contributor

I think older versions of the C-based library still linger on in the archive as binary packages. Afaik, there is no source package of the old librsvg anymore.

There is on snapshot.debian.org.

So librsvg can't be rebuilt on those architectures.

It can. But the API has changed in the meantime a bit so some packages using librsvg currently FTBFS.

I assume for librsvg one could introduce a separate source package which a different name for those architectures but it seems nobody has done it, as this is quite a maintenance overhead. For systemd this wouldn't be feasible.

librsvg will be buildable on the other architectures in the foreseeable future when GCC has merged the upcoming Rust frontend.

From my discussion with GCC developers I know that they are looking forward to the frontend being part of the official sources.

I already tested it myself and while it doesn't support macros yet which are required for stuff like println(), I managed to compile a small sample Rust code on m68k.

@bluca
Copy link
Member Author

bluca commented May 14, 2021

That's really promising! I had heard a frontend was under development, but didn't know it was that far ahead, as haven't been following closely.

@poettering
Copy link
Member

so maybe we can do as i proposed and the time where we drop the C versions would be the moment where gcc rust is merged?

@glaubitz
Copy link
Contributor

so maybe we can do as i proposed and the time where we drop the C versions would be the moment where gcc rust is merged?

I would probably wait for that particular GCC version to be released first, but yes, that's sounds like a reasonable approach.

@mbiebl
Copy link
Contributor

mbiebl commented May 14, 2021

one thing that makes me wonder: does it make sense to build mixed C+rust projects still with gcc at all? i.e. during the fedora default compiler discussion people suggested gcc and llvm still generate slightly incompatible interfaces. but otoh it sounds weird using llvm for the rust stuff but then still use C for the gcc part. what#s the story there?

I assume you are suggesting here, that we build all of systemd with clang/llvm?
What does this mean for libudev/libsystemd which are used by hundreds of binaries and even linked into other libraries, which still use GCC? Would this cause any (subtle) issues?

@poettering
Copy link
Member

I assume you are suggesting here, that we build all of systemd with clang/llvm?

Yeah, that's what I was wondering. i.e. to me it sounds like the best option to go full-llvm or full-gcc, but mix and match not sure.

What does this mean for libudev/libsystemd which are used by hundreds of binaries and even linked into other libraries, which still use GCC? Would this cause any (subtle) issues?

No idea, this was pretty much the question i was asking.

But i guess if gcc-rust is a thing soon, then maybe the question doesn't matter that much, as then one can easily do a pure-gcc build

@mbiebl
Copy link
Contributor

mbiebl commented May 14, 2021

Do we have any idea how this might affect binary sizes?
At least looking at https://packages.debian.org/unstable/librsvg2-2 , the size increase is quite significant: 300KB vs 10MB

Proof-of-concept to test integrating Rust submodules.

This adds a (very naive) rewrite of string_extract_line from string-util.h
in Rust, and compiles it in so that the rest of our C code, including the
unit tests, call into it instead.

Given we are only using the standard library, there is no need for a
(complicated) integration with Cargo, and the only additional
requirement is the Rust compiler. Meson abstracts it quite nicely.
@TheAlgorythm
Copy link

Property-based testing (basically fuzzing of unittests with different inputs) could be a tool to check for differential behaviour of the Rust and C implementations. This should decrease the maintenance overhead of having two implementations.
Possible tools would be:

@keszybz keszybz added please-review PR is ready for (re-)review by a maintainer and removed please-review PR is ready for (re-)review by a maintainer labels Jan 25, 2023
Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I can't shake the feeling that systemd is probably not the best project for exploration of integration methods. It'd be better to develop a working solution somewhere else first and then apply it cleanly here. Our CI is a monster with a thousand heads already, and we would make it doubly hard by mixing Rust into it.

@@ -1071,6 +1071,7 @@ int string_truncate_lines(const char *s, size_t n_lines, char **ret) {
return truncation_applied;
}

#ifdef BUILD_LEGACY_HELPERS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use ifdef, but change the define to be 0 or 1, and use #if here.

Comment on lines -28 to +31
- { COMPILER: "gcc", COMPILER_VERSION: "11", LINKER: "bfd", CRYPTOLIB: "gcrypt" }
- { COMPILER: "gcc", COMPILER_VERSION: "11", LINKER: "bfd", CRYPTOLIB: "gcrypt", RUST: "yes" }
- { COMPILER: "gcc", COMPILER_VERSION: "12", LINKER: "gold", CRYPTOLIB: "openssl" }
- { COMPILER: "clang", COMPILER_VERSION: "13", LINKER: "mold", CRYPTOLIB: "gcrypt" }
- { COMPILER: "clang", COMPILER_VERSION: "14", LINKER: "lld", CRYPTOLIB: "openssl" }
- { COMPILER: "clang", COMPILER_VERSION: "14", LINKER: "lld", CRYPTOLIB: "openssl", RUST: "yes" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd generally want to try rust with the latest compilers, not oldest. More likely to work ;)

}
}

*ret = CString::new("").expect("CString::new failed").into_raw();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want to use Result and ? everywhere, and then just have a wrapper at the Rust↔C interface which converts it into a negative-errno.

@glaubitz
Copy link
Contributor

Looking at this, I can't shake the feeling that systemd is probably not the best project for exploration of integration methods. It'd be better to develop a working solution somewhere else first and then apply it cleanly here. Our CI is a monster with a thousand heads already, and we would make it doubly hard by mixing Rust into it.

I agree with this stance. Luckily, the Rust frontend for GCC is already progressing nicely and I think projects will be able to integrate Rust code in the foreseeable future much easier.

@bluca
Copy link
Member Author

bluca commented Mar 21, 2023

Looking at this, I can't shake the feeling that systemd is probably not the best project for exploration of integration methods. It'd be better to develop a working solution somewhere else first and then apply it cleanly here. Our CI is a monster with a thousand heads already, and we would make it doubly hard by mixing Rust into it.

I have a pretty good idea about how to do the integration, and it relies on rust-lang/rust#106560 landing. Once that's happened, I'll update the Meson PR. I wouldn't consider this anywhere ready until well after there are stable versions of both rustc and meson available with everything buttoned up and ready.

@keszybz keszybz added postponed and removed please-review PR is ready for (re-)review by a maintainer labels Apr 12, 2023
@bjorn3
Copy link

bjorn3 commented May 11, 2023

rust-lang/rust#106560 landed yesterday and is available on the latest nightly behind a feature flag.

@bluca
Copy link
Member Author

bluca commented May 11, 2023

rust-lang/rust#106560 landed yesterday and is available on the latest nightly behind a feature flag.

Nice, thank you! I'll resume work on the meson side as soon as I can

1 similar comment
@bluca
Copy link
Member Author

bluca commented May 11, 2023

rust-lang/rust#106560 landed yesterday and is available on the latest nightly behind a feature flag.

Nice, thank you! I'll resume work on the meson side as soon as I can

@c3Ls1US
Copy link

c3Ls1US commented Dec 6, 2023

@bluca what's the status of this? I notice the meson PR has been open for a while, so I'm curious if Rust integration is still planned for systemd.

@bluca
Copy link
Member Author

bluca commented Dec 8, 2023

Haven't had time yet, sadly

@nyabinary
Copy link

@bluca what's the status of this? I notice the meson PR has been open for a while, so I'm curious if Rust integration is still planned for systemd.

Yeah, we should get on it since the kernel is having more mature rust support (would be cool to rewrite memory safety critical parts of systemd into rust)

@ssokolow
Copy link

ssokolow commented Jan 10, 2024

Sorry if any of these have already been answered and I forgot to pull them back off my TODO pile while reading through the thread, but I noticed a few spots where I felt I may be able to add some useful information:

I guess rustc uses llvm via libllvm? Not sure why it has the gcc dependency tbh.

Rustc's POSIX platform, which assumes BSD/UNIX-style "libc is the point of ABI stability for syscalls", is also used on Linux, where it treats delegating to to the system C compiler for the final link as the point of API stability for identifying what linker flags are needed for libc.

I can imagine that having multiple rust components talk to each other through a C ABI can be awkward on larger scale as you'll lose the tracking of ownership which is exactly the thing you want to catch classes of bugs that are harder to track in C.

If we are forced to talk through the C ABI between rust code then I wonder if the added complexity actually is a net plus or not.

The abi_stable crate may be relevant. It's sort of a Rust-to-C-to-Rust FFI framework intended to provide a stable way to encode Rust types in C FFI.

Considering binary size: the best thing you can do to reduce binary size for rust code is to a) enable link time optimization and b) strip the symbols. It might also be worthwhile to evaluate if it is possible to not use std at all, but only core.

https://github.com/johnthagen/min-sized-rust

I'd suggest starting by seeing what cargo-bloat (sort of a clone of Bloaty McBloatface) says about something that got much bigger in Rust. In my experience, it tends to be a mixture of the following:

  1. Static linking of things that C would link dynamically.
  2. Overeager use of monomorphic dispatch. (I remember the clap argument parser announcing they'd cleared out a lot of that)
  3. Neglecting to strip the debug information that shows up in release builds by default because both debug and release builds statically link the same precompiled "optimized, with debug symbols" copy of libstd that's included with rustc.

...and possibly pathological expressions of LTO not being used, though I haven't investigated how likely or severe that would be.

Though it's much older than min-sized-rust, Why is a Rust executable large? is also good, since it does detailed comparisons, including test builds, between C, C++, and Rust, to explore how much each change helps and why.

.expect("...") will panic on failure. By default, a panic is implemented via unwinding, which triggers UB if it crosses a foreign boundary.

In case it's relevant, a C-unwind ABI type was merged on July 31, 2020 and is documented in the FFI and unwinding section of Foreign Function Interface - The Rustonomicon.

According to this comment use of it requires that calling C code be compiled with -fexceptions or undefined behaviour results.

There's currently an RFC to close the soundness hole relating to extern "C" and panics by converting escaping panics into aborts.

Looks like rustc switched to tikv-jemallocator.

jemallocator now gets published as a "same contents, different names, but we recommend you use the tikv-xxxx versions" duplicate of tikv-jemallocator and its most recent update was on Jul 27, 2023.

-C prefer-dynamic

Bear in mind that that Rust makes no affordances for C++-style de facto ABI stabilization, so even a day-to-day bump in the nightly compiler might turn a dynamic link against libstd into undefined behaviour, corruption, and crashes.

-C prefer-dynamic is generally intended for embedded use-cases where space is tight but a BusyBox-style multibinary is unsuited and you have complete control over packaging so you can ensure version mismatches never occur.

There's ongoing discussion about a stable ABI, but it's not intended to be the internal ABI used for libstd but, rather, an easier-to-design-and-bikeshed external ABI for things like plugins.

Rust's liberal use of generics like Option<T> and Result<T, E> runs into the same problems that were enumerated by Michał Górny's The impact of C++ templates on library ABI in 2012.

However, if the concern is symbol collisions between different copies of the same libraries, I'd focus on the binary sizes, not the effects on whether it will build successfully. Rust's whole "fearless upgrades" mantra stems in part from an attitude that symbol collisions should only be possible when dependency resolution fails to prevent two different versions of the same C library from getting pulled in.

As such, it's designed to default to not exporting symbols and to use name-mangling unless asked not to, with a hash getting incorporated into the symbol to allow non-conflicting use of different versions of the same dependency in the same artifact if that's what it takes to get a dependency tree to resolve.

(eg. the image crate is prone to pulling in multiple versions of compression codecs if you turn on all the image formats, different image format crates decided to update their dependencies at different rates, and they aren't all on the same side of a breaking change.)

That would be a problem until there's mixed C and Rust in the executable itself though - because as far as I can tell, that is what forced us to build the Rust modules as staticlib (which does the vendoring of the whole stack). If everything was Rust that wouldn't be an issue and we could do the dynamic linking thing, for the libraries too.

If it helps, here's the documentation page that covers all the crate types (staticlib and so on): Linkage - The Rust Reference.

@bjorn3
Copy link

bjorn3 commented Jan 10, 2024

Neglecting to strip the debug information that shows up in release builds by default because both debug and release builds statically link the same precompiled "optimized, with debug symbols" copy of libstd that's included with rustc.

When using cargo this will likely be fixed in the near future by having cargo passing -Cstrip=debuginfo by default when doing a release build: rust-lang/cargo#13257 Making a similar change to Meson would make sense I think.

Bear in mind that that Rust makes no affordances for C++-style de facto ABI stabilization, so even a day-to-day bump in the nightly compiler might turn a dynamic link against libstd into undefined behaviour, corruption, and crashes.

I don't think this is the case at all. While the ABI itself is not stable, if you use the same libstd when running as when compiling, you shouldn't ever have any issues. And because the filename of libstd is unique for every rustc version and every mangled symbol mangled differently depending on the rustc version string, so even the official builds and distro builds of the same rust release will use different filenames and mangled symbol names and thus don't risk accidentally mixing different abi's. The only case I can imagine where mixing would be possible is if a distro does a patch release of rust without changing the version reported by rustc using rustc -V.

@ssokolow
Copy link

ssokolow commented Jan 10, 2024

I don't think this is the case at all. While the ABI itself is not stable, if you use the same libstd when running as when compiling, you shouldn't ever have any issues. And because the filename of libstd is unique for every rustc version and every mangled symbol mangled differently depending on the rustc version string, so even the official builds and distro builds of the same rust release will use different filenames and mangled symbol names and thus don't risk accidentally mixing different abi's. The only case I can imagine where mixing would be possible is if a distro does a patch release of rust without changing the version reported by rustc using rustc -V.

That does sound sensible, but, especially for a piece of important infrastructure, I'd want to see if anyone from the Rust team can point out something we're missing.

@bjorn3
Copy link

bjorn3 commented Jan 10, 2024

I'm part of the compiler contributors team. This means that I have merge permission, but don't have decision making power like full compiler team members do. (See https://forge.rust-lang.org/compiler/membership.html for more details) I have a fair amount of knowledge about the current state of the compilation/linking model of rust. I can tell you that it should currently work without getting UB and will at least for a long time as rustc itself depends on it. I don't have the decision making power to block any changes that would break this, but I don't think anyone wants to break it and I did personally oppose breaking it.

Be aware however that dynamic linking in rust has a couple of restrictions and rough edges:

  • When compiling a dylib you must specify -Cprefer-dynamic as otherwise libstd is statically linked into it and then when you try to depend on it, rustc will complain because it tries to dynamically link libstd and include it from your dylib at the same time, which is not allowed. This could probably be fixed.
  • Two dylibs may not statically link the same crate. If rustc detects this at compile time you will get a compile error, but trying to do this with dlopen it is UB. The easiest way to avoid this error is to build all crates as dylib, but it is also possible to have one dylib statically link several crates and then have other dylibs depend on this dylib. Note that rustc by default doesn't actually link against dependencies specified with --extern. Only when actually referencing them do they get linked. This referencing can be as simple as use my_crate as _;, it doesn't have to be calling a function defined by the dependency.

If on the other hand you choose to use the staticlib crate type, be aware of rust-lang/rust#104707. You will almost certainly want to use a version script to avoid exporting the entire rust standard library from the dynamic library into which you are linking the staticlib. Not doing so increases the size of the dynamic library (as --gc-sections will consider every function in the standard library used) and has the risk of conflicting with downstream rust code. By the way if you know of a cross-platform (so no ld -r) fix for this issue, please do let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.