-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtualised cargo fails with a "No such device" error n°19 likely because of MAP_SHARED mmap call on qemu + virtiofsd with --cache=never. #122262
Comments
Confirmation. I did an [pid 43835] mmap(NULL, 48973144, PROT_READ, MAP_SHARED, 9, 0 <unfinished ...>
[pid 43835] <... mmap resumed>) = -1 ENODEV (No such device) MAP_SHARED is the culprit that triggers an ENODEV failure. That MAP_PRIVATE instead of MAP_SHARED fixes it is witnessed by the gix fix in the gitoxide repo. |
Thank you for the report. Really appreciate the effort! The error came from rustc the compiler. Seems pretty likely from here. |
I'm tracing it here in strace -f: [pid 43793] execve("/home/mini-me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc", ["/home/mini-me/.rustup/toolchains"..., "-", "--crate-name", "___", "--print=file-names", "--crate-type", "bin", "--crate-type", "rlib", "--crate-type", "dylib", "--crate-type", "cdylib", "--crate-type", "staticlib", "--crate-type", "proc-macro", "--print=sysroot", "--print=split-debuginfo", "--print=crate-name", "--print=cfg"], 0x61e426d8b710 /* 72 vars */ <unfinished ...> If anyone knows of toolings able to cross-examine both some strace stuff and a debugger, I'd be happy to pinpoint the offending piece of rust code that calls MAP_SHARED.
OK. |
I meant, the rustc code shares the same code path with gitxoide.
We need someone familiar with this part to determine if it is safe and better to use |
From what I understood, MAP_PRIVATE has some corner cases that make it unsafe, but I tend to believe it is nitpicking to keep MAP_SHARED because of that. It would be sacrificing pragmatism (having rustc behave when virtualised) for ideological purity (safety in a context where it doesn't mean that much.) There are quite a bit of pieces of code that rely on virtualisation, specifically given the (horrible) modern tendency to use docker everywhere to build code, the worst of which not being GitHub Actions. That issue seems to pop up unrecognised in many places where people use arcane workarounds. |
I implemented Miri's |
The man page of MAP_PRIVATE says:
People on #irc seemed to claim that this unspecified behaviour made MAP_PRIVATE unsafe in some sense or another. I find that besides the point of the current problem. But, well, that's not the kind of discussion that really is up to me. I'd be happy getting MAP_PRIVATE instead of MAP_SHARED into rustc w/r to such virtualisation issues. Trying to recompile rust with that change at the moment. We'll see if that fixes my virtualisation issue. |
Ah. Whatever that IRC discussion was is not about the Rust concept of unsafety. In Rust, the way we use I'd be happy to review such a change (put r? saethlin in the PR description on its own line), or I'll make the PR myself when I have time. |
UB ?
I'd be happy to try making a first commit to the rust project, to be honest. Just busy doing nonsense like resizing my VM to compile rust on it. May take a few days given that I'm new to the rust codebase. I'll try to make it happen ASAP. |
UB is short for Undefined Behavior; alternatively "things the compiler will assume implicitly in order to optimize". In this case, Normally this property of |
So, roughly, we can safely assume that MAP_SHARED will misbehave, and unsafely assume MAP_PRIVATE will behave. Great. |
OK. Got a fix and tested it (in as much as it is possible in my limited context). Familiarising myself with contribution guidelines, and you people should get a PR before tomorrow. |
I'm not against changing this but I'll note that going around and patching projects using mmap doesn't seem like a great solution either. If you execute your programs on a weird filesystem that doesn't support it you might hit this issue every time you install new software or upgrade existing ones (that start using it). So it'd be better to fix the rootcause, either by switching to a different filesystem or by getting the FS fixed. |
Which is why I am currently trying to raise the issue in the virtiofs gitlab repository. https://gitlab.com/virtio-fs/virtiofsd/-/issues/149 Note that while this is here specific to virtiofs, it seems many virtualisation bug reports tend to refer to such a situation in somewhat veiled terms. The 2016 bug report reference at the top of this bug report already points to MAP_SHARED in its strace analysis, but it did not quite light a bulb and people have been using workarounds for years now because the issue has been misunderstood. This is essentially a question of what rust aims to support. Any virtualisation setup is bound at one point to grapple with such issues such as memmap() (as virtiofs is currently doing with dax window which currently is not merged in qemu and requires going down the serial patching route) and this is going to force virtualisation support for rust code to continuously lag behind until virtualisation has become a fully fully mature technology. If people want to use MAP_SHARED everywhere, fine. But at least get rustc running in these setup. If an email client or whatever doesn't work when virtualised because of MAP_SHARED, fine, whatever. If rustc bails out, that is an entirely different matter. P.S.: it remains a rust specific issue insofar as MAP_SHARED is being used as a default much more in rust than it is in other language ecosystems. While the POSIX standard may say something, the real reality of the real world is that much less code will break if not written in rust. |
No, not really. If you just virtualize a plain block device then such an issue does not arise because mmap will be implemented inside the guest OS with a regular filesystem. I've been using rust in various virtual machines (virtualbox, hyperv, kvm) for years. It is only specific less commonly used filesystems that cause issues. But virtiofs is not alone in this. 9p, NFS, various FUSE drivers, the linux NTFS implementation... many niche filesystems have quirks that cause problems in a lot of software. Switching to a mature filesystem implementation rather than an immature/incomplete one can very well be the answer. |
That is called deflecting. The issue is not whether I like chocolate or vanilla. The issue is that this chocolate tastes bad. And so does vanilla, if you ask me. |
Please cease to argue about this, or at least find a more suitable venue to do so. GitHub issues only survive a limited number of back-and-forth remarks and we have approximately hit that limit. If rustc does not need to use |
Hey folks, this discussion is starting to get out of hand and off topic. Please take the discussion to another venue other than Github issues. |
Yes. Though, it is worth mentionning that go is impacted too in much the same way. It just bailed out on building gosec with that same kind of error. Big C/C++ projects like qemu, however, do not seem to be impacted. |
Use `MAP_PRIVATE` (not unsound-prone `MAP_SHARED`) Solves rust-lang#122262
Use `MAP_PRIVATE` (not unsound-prone `MAP_SHARED`) Solves rust-lang/rust#122262
Problem
I believe this issue is possibly kind of the same issue that was raised back in 2016:
rust-lang/cargo#2808
Context: Virtualised "cargo build" fails with "No such device" whenever setup is qemu +virtiofsd with --cache=never. Essentially the same issue that I just got fixed for gix. (The link below has more details and context.)
GitoxideLabs/gitoxide#1312
As can be seen on the issue for gix, in the links referenced there, rust uses by default a MAP_SHARED (like gix before today's fix) instead of a MAP_PRIVATE (like git and not gix) for mmap system calls. MAP_SHARED has more stringent coherence requirements than MAP_PRIVATE, BUT that bug doesn't materialise with a virtiofs filesystem unless cache is disabled with --cache=never (and I need to disable caching because heavy IO workloads on the guest blow up the file descriptors on the host if cache is enabled).
To see the "No such device" issue, one needs both MAP_SHARED in place of MAP_PRIVATE and --cache=never on virtiosfd.
The fix is to use MAP_PRIVATE in rust code, like done for gix:
GitoxideLabs/gitoxide@88061a1
I believe this is a bug that is pervasive in the wider rust ecosystem when virtualised because MAP_SHARED is used by default. It hits gix. It hits cargo.
Conclusion: Either the rust ecosystem was wrong to choose to use MAP_SHARED as a default, either the virtiofs has a semi-broken memory mapped file implementation. I assume the latter is true, but that doesn't change the fact that virtualised rust using mmap() system calls tends to be affected, like cargo is, by spurrious "No such device" linux OS n° 19 errors for code that runs perfectly fine when not virtualised.
Debugging gix was easy with strace. Debugging cargo to provide nice reproducible strace dumps is harder to me as I get lost in the subprocesses it triggers. Any guidance as to how to get the relevant strace dumps would be appreciated. But the gix link shows that MAP_SHARED indeed is what triggers these spurrious errors when virtualised, like in that old 2016 bug report linked at the top. Here is my own stracing of cargo build:
The problem likely occurs in the clone3() system call. I don't know how to strace it to provide clear cut proof.
But MAP_SHARED inside the clone does seem to be something that occured in the 2016 bug:
rust-lang/cargo#2808 (comment)
Steps
Virtualise an Ubuntu Mantic VM, with a virtiofs mount backed up by a ZFS share. Ensure virtiofsd is launched with --cache=never. Install cargo on that virtualised filesystem. Try to build some rust code with cargo build. Observe "No such device" error 19 with "cargo build".
This MAP_SHARED issue likely impacts more virtualisation setups than the one I described. MAP_PRIVATE is arguably more unsafe with ugly corner cases, but MAP_SHARED asks for too much guarantees on some virtualised filesystems.
Possible Solution(s)
Replace MAP_SHARED mmap() system calls in the rust stack of cargo by MAP_PRIVATE mmap() system calls. Like gix / gitoxide just did.
Notes
No response
Version
The text was updated successfully, but these errors were encountered: