Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rand: crash process on error reading randomness #66821

Open
FiloSottile opened this issue Apr 14, 2024 · 22 comments
Open

crypto/rand: crash process on error reading randomness #66821

FiloSottile opened this issue Apr 14, 2024 · 22 comments
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Milestone

Comments

@FiloSottile
Copy link
Contributor

FiloSottile commented Apr 14, 2024

On almost all our platforms, we now have crypto/rand backends that ~never fail.

  • On Linux, we primarily use the getrandom(2) system call, which never fails.

    • It may block if the pool is not initialized yet at early boot, and may be interrupted by a signal handler if requesting more than 256 bytes, but neither of those surface as errors to the application.
    • getrandom() was first available in Linux 3.17, released in October 2014. Debian oldstable is on Linux 5.10.
    • getrandom() can be blocked with seccomp. That's a bad (and weird) idea, and the default Docker profile doesn't do that. In that case we fall back to opening /dev/urandom, which might fail if the file is not available or file descriptors run out.
  • On macOS and iOS we use arc4random() since https://go.dev/cl/569655. From the man page:

    These functions are always successful, and no return value is reserved to indicate an error.

  • On Windows we use the ProcessPrng function. From the docs:

    Always returns TRUE.

  • The BSDs use similar syscalls with similar properties (whether getrandom or getentropy) although we should switch the ones we can to arc4random.

  • On js/wasm we use getRandomValues which doesn't have documented failure modes.

  • On WASIP1 there's random_get which regrettably has an error return value, making it the one platform (ignoring misconfigured Linux) where there might be errors getting platform random bytes. Since WASI rests on an underlying platform, and every underlying platform has failure-less CSPRNGs, it's hard to imagine why random_get should actually return an error.

I'm proposing we make crypto/rand throw (irrecoverably crash the program) if an error occurs, and document that the error return values of crypto/rand.Read and crypto/rand.Reader.Read are always nil.

This will free applications from having to do error handling for a condition that essentially can't happen, and that if it did happen is essentially not possible to handle securely by the application.

This will also allow introducing new APIs like a hypothetical String(charset string) string (not part of this proposal) without an error return, making them more usable and appealing.

Based on a suggestion by @rsc.

/cc @golang/security @golang/proposal-review

@FiloSottile FiloSottile added the Proposal-Crypto Proposal related to crypto packages or other security issues label Apr 14, 2024
@gopherbot gopherbot added this to the Proposal milestone Apr 14, 2024
@daira
Copy link

daira commented Apr 14, 2024

Do you mean crypto/rand should panic on errors? I'm not very familiar with Go but I didn't think it used the terminology of "throwing" errors. I agree, on general language-independent robustness principles, that it should panic.

@FiloSottile
Copy link
Contributor Author

Panics are recoverable, throw is an internal name for fatal errors. Think of it as a call to exit(1).

If we made crypto/rand panic that risks encouraging applications to wrap the calls in defer/recover "for robustness", when really we think it's so unlikely and so unrecoverable that applications shouldn't try.

go/src/runtime/panic.go

Lines 1008 to 1022 in 519f6a0

// throw triggers a fatal error that dumps a stack trace and exits.
//
// throw should be used for runtime-internal fatal errors where Go itself,
// rather than user code, may be at fault for the failure.
//
//go:nosplit
func throw(s string) {
// Everything throw does should be recursively nosplit so it
// can be called even when it's unsafe to grow the stack.
systemstack(func() {
print("fatal error: ", s, "\n")
})
fatalthrow(throwTypeRuntime)
}

@bradfitz
Copy link
Contributor

I think all our code already panics on crypto/rand errors (via wrapppers that don't return errors) so SGTM 😀

@ydnar
Copy link

ydnar commented Apr 14, 2024

WASI 0.2 random interface thankfully does not return an error: https://github.com/WebAssembly/wasi-random/blob/main/wit/random.wit

@icholy
Copy link

icholy commented Apr 15, 2024

Sounds like we need a rand v3

@Jorropo
Copy link
Member

Jorropo commented Apr 15, 2024

Sounds like we need a rand v3

This is the best argument for always using crand and mrand aliases.
||crypto/rand != math/rand 😉||

@Jorropo
Copy link
Member

Jorropo commented Apr 16, 2024

  • On Linux, we primarily use the getrandom(2) system call, which never fails.
    • It may block if the pool is not initialized yet at early boot, and may be interrupted by a signal handler if requesting more than 256 bytes, but neither of those surface as errors to the application.
    • getrandom() was first available in Linux 3.17, released in October 2014. Debian oldstable is on Linux 5.10.
    • getrandom() can be blocked with seccomp. That's a bad (and weird) idea, and [the default Docker profile doesn't do that]

Should we keep the /dev/urandom fallback for 3.17+ ?
I would rather be forced to tweak my seccomp config than having to debug rare flaky throws because I incorrectly configured some sandboxing options.
We can't remove /dev/urandom completely on linux without raising the 2.6.32 baseline.

@FiloSottile
Copy link
Contributor Author

Should we keep the /dev/urandom fallback for 3.17+ ?

This is tempting, but I think making decisions based on kernel version is opening a can of worms. I think even the urandom fallback is reasonably reliable: the file is opened only once, so either crypto/rand never works in a given process or it always works (although it might flake across process executions, if you run out of fds before the first Read call, or if the file is removed).

@Jorropo
Copy link
Member

Jorropo commented Apr 16, 2024

Ah I thought it opened a new file each time.

Then can we use import time side effects to solve this (open the file in init) ?
It's very unlikely you are running out of fds before main even started running.

I get the std tries to not do that, but the overwhelming majority of cases init will start running, try getrandom, succeed and do nothing that seems fine to me. (there also is a clear path to solving this if anyone finds it to be an issue, fixing their seccomp config)

@FiloSottile
Copy link
Contributor Author

I was thinking about that but I have no intuition as to whether the cost of calling getrandom (to check if it's available) on init() for every Linux program is acceptable.

@randall77
Copy link
Contributor

If crypto/rand is imported (even indirectly), it should be fine to make a single system call during init. I'm kinda surprised it doesn't do so already.
The runtime reads from /dev/urandom on every startup.

@mateusz834
Copy link
Member

mateusz834 commented Apr 16, 2024

@randall77

The runtime reads from /dev/urandom on every startup.

Does it? On linux when there is random in auxv, then it does not even open it.

go/src/runtime/rand.go

Lines 44 to 58 in f17b28d

if startupRand != nil {
for i, c := range startupRand {
seed[i%len(seed)] ^= c
}
clear(startupRand)
startupRand = nil
} else {
if readRandom(seed[:]) != len(seed) {
// readRandom should never fail, but if it does we'd rather
// not make Go binaries completely unusable, so make up
// some random data based on the current time.
readRandomFailed = true
readTimeRandom(seed[:])
}
}

@randall77
Copy link
Contributor

@mateusz834 True, on linux if we get auxv randomness we don't read /dev/urandom.
We read it unconditionally on lots of OSes, like darwin and the BSDs.

@rsc rsc changed the title proposal: crypto/rand: throw on errors proposal: crypto/rand: crash process on errors Apr 24, 2024
@rsc rsc changed the title proposal: crypto/rand: crash process on errors proposal: crypto/rand: crash process on error reading randomness Apr 24, 2024
@rsc
Copy link
Contributor

rsc commented Apr 24, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented May 8, 2024

Have all remaining concerns about this proposal been addressed?

The proposal is to document that crypto/rand.Read and crypto/rand.Reader.Read always return the full amount requested and never return errors. If the underlying OS returns an error, the Go process will runtime.throw, meaning the process crashes with no chance to recover. But no underlying OS’s actually return errors from random reads anymore.

This lets callers simplify and delete their dead error handling paths.

@mateusz834
Copy link
Member

mateusz834 commented May 9, 2024

FYI, the Reader is a var and people might change it to a custom Reader implementation. I wonder whether that might be an issue for us here?

The proposal is to document that crypto/rand.Read and crypto/rand.Reader.Read always return the full amount requested and never return errors.

@rsc I think that we cannot document rand.Read as such.

I think that, we can only document the default rand.Reader this way.

@mateusz834
Copy link
Member

mateusz834 commented May 11, 2024

Also it seems like plan9 always opens /dev/urandom, so i guess it might not always exist. Maybe we can replace it with some kind of syscall?

func (r *reader) Read(b []byte) (n int, err error) {
r.seeded.Do(func() {
t := time.AfterFunc(time.Minute, func() {
println("crypto/rand: blocked for 60 seconds waiting to read random data from the kernel")
})
defer t.Stop()
entropy, err := os.Open(randomDevice)
if err != nil {
r.seedErr = err
return
}
defer entropy.Close()
_, r.seedErr = io.ReadFull(entropy, r.key[:])
})

CC @0intro

@rsc
Copy link
Contributor

rsc commented May 14, 2024

Plan 9 does not have /dev/urandom. It has /dev/random. That may not be present in the name space. But '#c/random' is always present, and the code should be opening that anyway.

@rsc
Copy link
Contributor

rsc commented May 14, 2024

@FiloSottile and I discussed this.

We believe that func Read should be documented to never return an error. It is also documented to use Reader, but if it observes an error from Reader, it will crash the program. That helps with the security of code that assumes Read never returns an error because the default implementations don't. If that code runs when Reader has been replaced with an erroring implementation, the call sites calling Read may not be correct. The security guarantee simply doesn't happen if Read has to be as lax as any possible overwritten Reader. The value-add for Read is simply that it does this check and provides this guarantee.

We also believe that we should document that if Reader is replaced, it should be replaced with an implementation that never returns an error.

@rsc
Copy link
Contributor

rsc commented May 23, 2024

Have all remaining concerns about this proposal been addressed?

The proposal is to document that rand.Read never returns an error, nor does the default rand.Reader. If rand.Reader is set to an erroring io.Reader, then rand.Read throws (fatal crashes) on error. Progarms that leave rand.Reader alone will never observe the “out of randomness” throw because all operating systems guarantee that getrandom works.

(This issue depends on #67001.)

@rsc
Copy link
Contributor

rsc commented May 30, 2024

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is to document that rand.Read never returns an error, nor does the default rand.Reader. If rand.Reader is set to an erroring io.Reader, then rand.Read throws (fatal crashes) on error. Progarms that leave rand.Reader alone will never observe the “out of randomness” throw because all operating systems guarantee that getrandom works.

(This issue depends on #67001.)

@rsc
Copy link
Contributor

rsc commented Jun 5, 2024

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

The proposal is to document that rand.Read never returns an error, nor does the default rand.Reader. If rand.Reader is set to an erroring io.Reader, then rand.Read throws (fatal crashes) on error. Progarms that leave rand.Reader alone will never observe the “out of randomness” throw because all operating systems guarantee that getrandom works.

(This issue depends on #67001.)

@rsc rsc changed the title proposal: crypto/rand: crash process on error reading randomness crypto/rand: crash process on error reading randomness Jun 5, 2024
@rsc rsc modified the milestones: Proposal, Backlog Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Projects
Status: Accepted
Development

No branches or pull requests

10 participants