-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSRng initialization crash on Windows #44911
Comments
Possibly related: golang/go#15655 They thought it was due to delay-loaded DLLs not loading, but didn't figure it out. Delay-loading a DLL certainly sounds like the type of thing that would fail occasionally due to filesystem issues. |
Yeah, i saw that issue and figured it probably was something similar, but also this is crashing pretty often so having some built in mitigation (I don't know what) might be interesting. |
What is the distribution of Windows’ versions that experience these failures? |
You can see that in the crash report (I linked to an older one by accident). |
It's very likely that these numbers are representative of the actual distribution of Firefox users, since you can expect most Windows 8 users to have upgraded to 10. I don't know the actual numbers though. |
Windows 7 is overrepresented here. The distribution of OS versions among Windows Firefox users is about 52% Windows 7 and 36% Windows 10. https://hardware.metrics.mozilla.com/#goto-os-and-architecture |
That somewhat supports the broken DLL theory, you'd probably have more broken DLLs on older systems. |
We started to see a similar error recently in geckodriver tests [1]. That seems to be on Linux 64, so it's possible that this is a different root cause to the Windows bug. |
This commit updates the OS random number generator on Windows to match the upstream implementation in the `rand` crate. First proposed in rust-random/rand#111 this implementation uses a "private" API of `RtlGenRandom`. Despite the [documentation][dox] indicating this is a private function its widespread use in Chromium and Firefox as well as [comments] from Microsoft internally indicates that it's highly unlikely to break. Another motivation for switching this is to also attempt to make progress on rust-lang#44911. It may be the case that this function succeeds while the previous implementation may fail in "weird" scenarios. [dox]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa387694(v=vs.85).aspx [comments]: rust-random/rand#111 (comment)
std: Update randomness implementation on Windows This commit updates the OS random number generator on Windows to match the upstream implementation in the `rand` crate. First proposed in rust-random/rand#111 this implementation uses a "private" API of `RtlGenRandom`. Despite the [documentation][dox] indicating this is a private function its widespread use in Chromium and Firefox as well as [comments] from Microsoft internally indicates that it's highly unlikely to break. Another motivation for switching this is to also attempt to make progress on #44911. It may be the case that this function succeeds while the previous implementation may fail in "weird" scenarios. [dox]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa387694(v=vs.85).aspx [comments]: rust-random/rand#111 (comment)
Not entirely sure if this story is relevant: While debugging a SlimerJS application that runs on Firefox 59, I encountered the identical symptom, here is the output of Firefox captured by SlimerJS:
!!! could not start server on port 41599: [Exception... "Component returned fail
ure code: 0x80004005 (NS_ERROR_FAILURE) [nsIServerSocket.init]" nsresult: "0x80
004005 (NS_ERROR_FAILURE)" location: "JS frame :: jar:file:///C:/xxxxxxx/SlimerJS/omni.ja!/components/httpd.js :: _start :: line 556" data:
no]
The SlimerJS is usually launched via a Go program, and the above symptom was visible in that way; however, the issue disappears when launching the SlimerJS process manually in command prompt. Then, curious about the process environment, I instructed SlimerJS to inherit OS and user's environment variables, and the issue disappeared. |
@HouzuoGuo Which version of Windows is this? Do you have a backtrace? The caller might be able to switch from OsRng to one of the other rand RNGs that does panic on error: https://docs.rs/rand/0.5.3/rand/rngs/ |
@cpeterso The system is an old Windows 7 64-bit laptop running on a slow AMD A10 CPU, I'll try to turn on back trace on Tuesday. |
Here is the back trace, perhaps not as useful as it could be... !!! could not start server on port 1512: [Exception... "Component returned failu re code: 0x80004005 (NS_ERROR_FAILURE) [nsIServerSocket.init]" nsresult: "0x800 04005 (NS_ERROR_FAILURE)" location: "JS frame :: jar:file:///C:/laitos-windows- supplements/SlimerJS/omni.ja!/components/httpd.js :: _start :: line 556" data: no] PhantomJS Javascript Exception Error: 2147746065 thread '' panicked at 'failed to create an OS RNG: Error { repr: Os { c ode: -2146893818, message: "Invalid Signature." } }', src\libcore\result.rs:906: 4 stack backtrace: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: mozilla_dump_image 10: mozilla_dump_image 11: mozilla_dump_image 12: 13: 14: 15: 16: 17: 18: 19: 20: PRP_TryLock 21: PR_MD_UNLOCK 22: iswascii 23: 24: 25: 26: c:\gopath\src> Be aware that SlimerJS is not compatible with firefox version newer than 59, hence the error was captured from firefox version 59.0.3, launched via SlimerJS 1.0.0. |
Unfortunately, I don't have a lot of helpful information to share. The "Invalid Signature" error message might mean that:
99% of these "Invalid Signature" errors in Firefox happen on Windows 7. |
thanks @cpeterso |
@Manishearth @cpeterso is this still a thing? Several things have changed since:
Note that Thus, I presume you still see occasional crashes of FF due to this? Secondary question: do you believe having a backup entropy source in Rand is generally useful enough to be worth keeping? I ask because quite a few people object to |
I don't this is currently a problem (for Firefox). I just looked for recent Firefox crashes that matched the original function signature and only found a few. So we can probably close this issue.
I think it's useful for Rand to have a default that "just works" (i.e. has a backup entropy source). If people have a use case requires a secure RNG, then they can use a secure RNG in their code. 😃 OTOH, Rand could change to not use the backup entropy source by default and Rust std libs could explicitly opt into the backup entropy source for internal use cases that don't absolutely require a secure RNG (like seeding a hash map). |
Doing so. |
Firefox is getting a rare but recurring crash on Windows due to OsRng failing to initialize.
The bug is being tracked here, and you can see the crash reports here
Since the crash message isn't public on that page, just to give an idea, these are the kinds of crash reasons we have:
failed to create an OS RNG: Error { repr: Os { code: 127, message: "OS Error 127 (FormatMessageW() returned error 15100)" } }
(15100 isERROR_MUI_FILE_NOT_FOUND
, i.e. it's unable to find the strings file to return the translated error message)failed to create an OS RNG: Error { repr: Os { code: 127, message: "OS Error 127 (FormatMessageW() returned error 5)" } }
(5 isERROR_ACCESS_DENIED
, so the translation string file is)failed to create an OS RNG: Error { repr: Os { code: -2146893801, message: "Provider type not defined." } }
failed to create an OS RNG: Error { repr: Os { code: -2146893801, message: "Tipo de proveedor no definido." } }
and basically the same thing in different languagesfailed to create an OS RNG: Error { repr: Os { code: -2146893818, message: "Invalid Signature." } }
The FormatMessage errors are not a problem, they just mean that the formatter was unable to figure out what the error message was -- if we hit that we're erroring out already.
The actual OS error seems to mostly be error code 127 (
ERROR_PROC_NOT_FOUND
). I believe-2146893801
is also one of these with some extra flags set. This is concerning; it doesn't seem like this should happen, because we are only feeding known system crypto provider (PROV_RSA_FULL
) toCryptAcquireContextA
. I'm unsure what the "invalid signature" error is about.This might just be system DLLs being broken, but it's occurring regularly enough (we have a almost thousand crash reports) that we're having to work around it by disabling RandomState on systems where this doesn't work (which we can do because we've already forked HashMap for fallible allocations). Furthermore, Firefox code uses
CryptAcquireContext
andCryptGenRandom
in a bunch of places and those don't seem to be particularly crashy.We should see if this is an issue on our side, or perhaps make OsRng (or perhaps just
RandomState
's initializer?) more resilient to this (perhaps with a more expensive fallback?).The text was updated successfully, but these errors were encountered: