Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected getrandom error due to Linux kernel bug on QNAP #52609

Closed
glebpom opened this issue Jul 22, 2018 · 18 comments · Fixed by #60453
Closed

Unexpected getrandom error due to Linux kernel bug on QNAP #52609

glebpom opened this issue Jul 22, 2018 · 18 comments · Fixed by #60453
Labels
C-bug Category: This is a bug. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state O-linux Operating system: Linux T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@glebpom
Copy link

glebpom commented Jul 22, 2018

Hi,

Rust code panics on linux 3.10:

[admin@lcstr-qnap lanlord]# uname -a
Linux lcstr-qnap 3.10.20-al-2.5.3 #39 SMP PREEMPT Tue Jul 10 04:05:13 CST 2018 armv7l unknown

Panic message:

thread 'main' panicked at 'unexpected getrandom error: 14', libstd/sys/unix/rand.rs:56:21

Rustc from rustup, version 1.27.0
The code is running on Qnap NAS

Rust core library relies on SYS_getrandom which was introduced only in kernel version 3.4.17.

@glebpom
Copy link
Author

glebpom commented Jul 22, 2018

I see there is a check in the code (is_getrandom_available), but it doesn't seem to work correctly.
Backtrace:

thread 'main' panicked at 'unexpected getrandom error: 14', libstd/sys/unix/rand.rs:56:21
stack backtrace:
   0:   0x7dcf83 - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::h04ab691e2d5a2a25
                       at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1:   0x7e2e23 - std::sys_common::backtrace::print::hac888a06b71fcb05
                       at libstd/sys_common/backtrace.rs:71
                       at libstd/sys_common/backtrace.rs:59
   2:   0x7c84d3 - std::panicking::default_hook::{{closure}}::h726fe9dee04cfca4
                       at libstd/panicking.rs:211
   3:   0x7c817b - std::panicking::default_hook::h50a9ec73820d3541
                       at libstd/panicking.rs:227
   4:   0x7c8b97 - std::panicking::rust_panic_with_hook::ha5eff468dabed44e
                       at libstd/panicking.rs:463
   5:   0x7c8723 - std::panicking::begin_panic_fmt::h127c95263e9d91e1
                       at libstd/panicking.rs:350
   6:   0x7d2323 - std::sys::unix::rand::imp::fill_bytes::h107c175968024ac6
                       at libstd/sys/unix/rand.rs:56
                       at libstd/sys/unix/rand.rs:101
   7:   0x7daaaf - std::sys::unix::rand::hashmap_random_keys::h7d2ea6e457897731
                       at libstd/sys/unix/rand.rs:19
   8:   0x2a99bb - <clap::args::arg_matches::ArgMatches<'a> as core::default::Default>::default::hdc9a1ee68179ea39
   9:   0x2a8d17 - clap::app::App::get_matches_from_safe_borrow::h9dd85a73ec350cce
  10:   0x2a867f - clap::app::App::get_matches::hd0be77d101c2233a
  11:    0xe2c03 - lanlord::main::hf5a0300982924cc3
  12:   0x17d6ff - std::rt::lang_start::{{closure}}::hf87301db5ac10996
  13:   0x7c862b - std::panicking::try::do_call::hbdb89cc4df2089c8
                       at libstd/rt.rs:59
                       at libstd/panicking.rs:310
  14:   0x7ea81f - __rust_maybe_catch_panic
                       at libpanic_unwind/lib.rs:105
  15:   0x7ca263 - std::rt::lang_start_internal::hbbf5e767c9c6dfac
                       at libstd/panicking.rs:289
                       at libstd/panic.rs:374
                       at libstd/rt.rs:58
  16:    0xe7d13 - main

is_getrandom_available treats syscall as available if the error is not libc::ENOSYS. I think Qnap kernel violates this rule (there are already some similar bug reports: https://bugs.python.org/issue27955).

@kennytm kennytm added O-linux Operating system: Linux C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jul 22, 2018
@kennytm
Copy link
Member

kennytm commented Jul 22, 2018

Is this sufficient to fix it?

diff --git a/src/libstd/sys/unix/rand.rs b/src/libstd/sys/unix/rand.rs
index 01c0ada4ff..967a517442 100644
--- a/src/libstd/sys/unix/rand.rs
+++ b/src/libstd/sys/unix/rand.rs
@@ -77,7 +77,7 @@ mod imp {
             let result = getrandom(&mut buf);
             let available = if result == -1 {
                 let err = io::Error::last_os_error().raw_os_error();
-                err != Some(libc::ENOSYS)
+                err != Some(libc::ENOSYS) && err != Some(libc::EPERM)
             } else {
                 true
             };

@glebpom
Copy link
Author

glebpom commented Jul 22, 2018

I'm trying to reproduce this error with the minimal code. Will get back soon

@glebpom
Copy link
Author

glebpom commented Jul 22, 2018

The problem is that QNAP kernel returns EFAULT error on getrandom syscall.
So we need to handle three error codes:
err != Some(libc::ENOSYS) && err != Some(libc::EPERM) && err != Some(libc::EFAULT)

@nagisa
Copy link
Member

nagisa commented Jul 22, 2018

That’s very nonsensical error code usage by QNAP. I can understand EPERM (even though it is not exactly a valid code for it to return), but EFAULT is just outright wrong to ignore or interpret as lack of the syscall as it could easily indicate a bug somewhere in the code.

@glebpom
Copy link
Author

glebpom commented Jul 22, 2018

I totally agree.
But Qnap devices exist with all these buggy kernels. The code related to random syscall is in the rust core library, which makes the rust code just impossible to use there.
I don't know what is the rust policy regarding the buggy implementations in external systems, but it seems like there should be some way to deal with such situations.

@glebpom
Copy link
Author

glebpom commented Jul 22, 2018

It's also possible to handle this by providing the way to disable getrandom syscall completely. E.g.

mod imp {
   ....
    pub fn disable_getrandom() {
      ...
    }
   ...
}

@kennytm
Copy link
Member

kennytm commented Jul 22, 2018

According to https://bugs.php.net/bug.php?id=75409, receiving anything besides EINTR or EAGAIN should fallback to /dev/urandom.

@nagisa
Copy link
Member

nagisa commented Jul 22, 2018

According to that page, it seems that PHP people aren’t really sure what should have been done either, other than that QNAP should fix their own stuff. Whatever they implemented is just them giving up on a sensible solution.

My ultimate suggestion would be to apply a QNAP specific patch for whatever distribution is used by QNAP during compilation/packaging of their rustc package and use that rustc to compile all the rust code.

@glebpom
Copy link
Author

glebpom commented Jul 22, 2018

It's very common in embedded software to have this type of bugs, which exist for many years without having a chance to fix it someday (including the fact, that some devices' firmware cannot be upgraded at all). If rust targets embedded development, there should always be a clear way on how to deal with it. Rebuilding rustc works, but it's very distracting and unsustainable from the development side.

As of this QNAP bug, I fully agree, that EFAULT means completely different, and using it in auto-detection is a wrong and harmful way.

I would suggest to refactor this code a bit and implement "random backends" with a clear strategy on how to deal with them. The default should be the same - auto-detect. But in addition to that, the developer would have a chance to select which particular random implementation to use (or even provide own?).

I can image many cases where it may be useful, besides this horrible QNAP bug.

@nagisa
Copy link
Member

nagisa commented Jul 22, 2018

It's very common in embedded software to have this type of bugs, which exist for many years without having a chance to fix it someday

The statement is true, but *-unknown-linux-gnu is not an embedded target at all! It is a target that targets (heh) a very well specified and understood hosted system (even if that hosted system is implemented in many embedded-like use-cases). Furthermore, “real” embedded targets don’t get to use libstd, which makes dealing with… hosted system idiosyncrasies a problem of no relevance.

But in addition to that, the developer would have a chance to select which particular random implementation to use (or even provide own?).

I can image many cases where it may be useful, besides this horrible QNAP bug.

For the user code this is absolutely a non-problem. libstd exposes no way to generate random data – for that you would use a library from the ecosystem. Furthermore, the infrastructure to switch out the random number generator used by the hash maps (the only user of randomness in libstd) is already present.

Given that neither of those seem like a satisfactory solution, there already exists a mechanism to tailor for particularities of certain hosted system. Namely, the target_os, target_vendor and target_env configuration variables that correspond to the final three parts of a target specification. Since QNAP is not really a "well specified and understood Linux" that the *-unknown-linux-* is targeting, then perhaps adding a *-qnap-linux-* would be an acceptable solution?

Bear in mind, that a new target does not necessarily imply that there would be binary artefacts (sysroot and/or librustc itself) for the target, in which case you’d still have to at least build your own sysroot for the target.

@glebpom
Copy link
Author

glebpom commented Jul 22, 2018

Regardless of how we call it (embedded or embedded-like), there are still hundreds of millions of devices, including routers, NASes, IoT hubs, smart speakers, kiosks, IP cameras, etc. running Linux. And vendors typically provide their SDKs, with the modified kernel and *libc, which may work unpredictably sometimes.

It sounds like the idea with custom target makes sense. Will try to use xargo for that

@nicokoch
Copy link
Contributor

On a side note: -EPERM can be generated by any system call, because the end user can use the seccomp kernel module. We had the same problem in #51270 already.

@tbu-
Copy link
Contributor

tbu- commented Aug 29, 2018

Python treats -EPERM the same as -ENOSYS: https://bugs.python.org/issue27955, https://hg.python.org/cpython/rev/27d05bb6f832.

@sanxiyn sanxiyn added the O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state label Mar 15, 2019
@sanxiyn sanxiyn changed the title Unexpected getrandom error on linux kernel 3.10 Unexpected getrandom error due to Linux kernel bug on QNAP Mar 15, 2019
@newpavlov
Copy link
Contributor

When you'll decide on how to deal with this issue, please also send a patch to rand/getrandom.

@sanxiyn
Copy link
Member

sanxiyn commented Apr 19, 2019

Nominating this. I propose we follow Python here, the rationale that seccomp can block getrandom seems reasonable to me.

@alexcrichton
Copy link
Member

This was discussed recently at a libs triage meeting, and the conclusion was that following python's precedent of handling EPERM as well as ENOSYS makes sense for our usage of getrandom

tbu- added a commit to tbu-/rust that referenced this issue May 1, 2019
This can happen because of seccomp or some VMs.

Fixes rust-lang#52609.
Centril added a commit to Centril/rust that referenced this issue May 20, 2019
Fall back to `/dev/urandom` on `EPERM` for `getrandom`

This can happen because of seccomp or some VMs.

Fixes rust-lang#52609.
@nanpuyue
Copy link
Contributor

http://man7.org/linux/man-pages/man2/getrandom.2.html

ERRORS

       EAGAIN The requested entropy was not available, and getrandom() would
              have blocked if the GRND_NONBLOCK flag was not set.

       EFAULT The address referred to by buf is outside the accessible
              address space.

       EINTR  The call was interrupted by a signal handler; see the
              description of how interrupted read(2) calls on "slow" devices
              are handled with and without the SA_RESTART flag in the
              signal(7) man page.

       EINVAL An invalid flag was specified in flags.

       ENOSYS The glibc wrapper function for getrandom() determined that the
              underlying kernel does not implement this system call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state O-linux Operating system: Linux T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants