Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSRng initialization crash on Windows #44911

Closed
Manishearth opened this issue Sep 28, 2017 · 17 comments
Closed

OSRng initialization crash on Windows #44911

Manishearth opened this issue Sep 28, 2017 · 17 comments
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Manishearth
Copy link
Member

Firefox is getting a rare but recurring crash on Windows due to OsRng failing to initialize.

The bug is being tracked here, and you can see the crash reports here

Since the crash message isn't public on that page, just to give an idea, these are the kinds of crash reasons we have:

  • failed to create an OS RNG: Error { repr: Os { code: 127, message: "OS Error 127 (FormatMessageW() returned error 15100)" } } (15100 is ERROR_MUI_FILE_NOT_FOUND, i.e. it's unable to find the strings file to return the translated error message)
  • failed to create an OS RNG: Error { repr: Os { code: 127, message: "OS Error 127 (FormatMessageW() returned error 5)" } } (5 is ERROR_ACCESS_DENIED, so the translation string file is)
  • failed to create an OS RNG: Error { repr: Os { code: -2146893801, message: "Provider type not defined." } }
    • failed to create an OS RNG: Error { repr: Os { code: -2146893801, message: "Tipo de proveedor no definido." } } and basically the same thing in different languages
  • failed to create an OS RNG: Error { repr: Os { code: -2146893818, message: "Invalid Signature." } }

The FormatMessage errors are not a problem, they just mean that the formatter was unable to figure out what the error message was -- if we hit that we're erroring out already.

The actual OS error seems to mostly be error code 127 (ERROR_PROC_NOT_FOUND). I believe -2146893801 is also one of these with some extra flags set. This is concerning; it doesn't seem like this should happen, because we are only feeding known system crypto provider (PROV_RSA_FULL) to CryptAcquireContextA. I'm unsure what the "invalid signature" error is about.

This might just be system DLLs being broken, but it's occurring regularly enough (we have a almost thousand crash reports) that we're having to work around it by disabling RandomState on systems where this doesn't work (which we can do because we've already forked HashMap for fallible allocations). Furthermore, Firefox code uses CryptAcquireContext and CryptGenRandom in a bunch of places and those don't seem to be particularly crashy.

We should see if this is an issue on our side, or perhaps make OsRng (or perhaps just RandomState's initializer?) more resilient to this (perhaps with a more expensive fallback?).

@Manishearth Manishearth added O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 28, 2017
@mattico
Copy link
Contributor

mattico commented Sep 28, 2017

Possibly related: golang/go#15655

They thought it was due to delay-loaded DLLs not loading, but didn't figure it out. Delay-loading a DLL certainly sounds like the type of thing that would fail occasionally due to filesystem issues.

@Manishearth
Copy link
Member Author

Yeah, i saw that issue and figured it probably was something similar, but also this is crashing pretty often so having some built in mitigation (I don't know what) might be interesting.

@nagisa
Copy link
Member

nagisa commented Sep 29, 2017

What is the distribution of Windows’ versions that experience these failures?

@Manishearth
Copy link
Member Author

You can see that in the crash report (I linked to an older one by accident).

image

@Manishearth
Copy link
Member Author

It's very likely that these numbers are representative of the actual distribution of Firefox users, since you can expect most Windows 8 users to have upgraded to 10. I don't know the actual numbers though.

@cpeterso
Copy link
Contributor

cpeterso commented Sep 29, 2017

Windows 7 is overrepresented here. The distribution of OS versions among Windows Firefox users is about 52% Windows 7 and 36% Windows 10.

https://hardware.metrics.mozilla.com/#goto-os-and-architecture

@Manishearth
Copy link
Member Author

That somewhat supports the broken DLL theory, you'd probably have more broken DLLs on older systems.

@TimNN TimNN added the C-bug Category: This is a bug. label Oct 1, 2017
@jgraham
Copy link

jgraham commented Oct 2, 2017

We started to see a similar error recently in geckodriver tests [1]. That seems to be on Linux 64, so it's possible that this is a different root cause to the Windows bug.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1404647

alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 18, 2017
This commit updates the OS random number generator on Windows to match the
upstream implementation in the `rand` crate. First proposed in
rust-random/rand#111 this implementation uses a "private" API of
`RtlGenRandom`. Despite the [documentation][dox] indicating this is a private
function its widespread use in Chromium and Firefox as well as [comments] from
Microsoft internally indicates that it's highly unlikely to break.

Another motivation for switching this is to also attempt to make progress
on rust-lang#44911. It may be the case that this function succeeds while the previous
implementation may fail in "weird" scenarios.

[dox]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa387694(v=vs.85).aspx
[comments]: rust-random/rand#111 (comment)
bors added a commit that referenced this issue Oct 21, 2017
std: Update randomness implementation on Windows

This commit updates the OS random number generator on Windows to match the
upstream implementation in the `rand` crate. First proposed in
rust-random/rand#111 this implementation uses a "private" API of
`RtlGenRandom`. Despite the [documentation][dox] indicating this is a private
function its widespread use in Chromium and Firefox as well as [comments] from
Microsoft internally indicates that it's highly unlikely to break.

Another motivation for switching this is to also attempt to make progress
on #44911. It may be the case that this function succeeds while the previous
implementation may fail in "weird" scenarios.

[dox]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa387694(v=vs.85).aspx
[comments]: rust-random/rand#111 (comment)
@HouzuoGuo
Copy link

Not entirely sure if this story is relevant:

While debugging a SlimerJS application that runs on Firefox 59, I encountered the identical symptom, here is the output of Firefox captured by SlimerJS:

!!! could not start server on port 41599: [Exception... "Component returned fail ure code: 0x80004005 (NS_ERROR_FAILURE) [nsIServerSocket.init]" nsresult: "0x80 004005 (NS_ERROR_FAILURE)" location: "JS frame :: jar:file:///C:/xxxxxxx/SlimerJS/omni.ja!/components/httpd.js :: _start :: line 556" data: no]

PhantomJS Javascript Exception
Error: 2147746065
thread '' panicked at 'failed to create an OS RNG: Error { repr: Os { c
ode: -2146893818, message: "Invalid Signature." } }', src\libcore\result.rs:906:
4
note: Run with RUST_BACKTRACE=1 for a backtrace.

The SlimerJS is usually launched via a Go program, and the above symptom was visible in that way; however, the issue disappears when launching the SlimerJS process manually in command prompt. Then, curious about the process environment, I instructed SlimerJS to inherit OS and user's environment variables, and the issue disappeared.

@cpeterso
Copy link
Contributor

@HouzuoGuo Which version of Windows is this? Do you have a backtrace? The caller might be able to switch from OsRng to one of the other rand RNGs that does panic on error: https://docs.rs/rand/0.5.3/rand/rngs/

@HouzuoGuo
Copy link

@cpeterso The system is an old Windows 7 64-bit laptop running on a slow AMD A10 CPU, I'll try to turn on back trace on Tuesday.

@HouzuoGuo
Copy link

Here is the back trace, perhaps not as useful as it could be...

!!! could not start server on port 1512: [Exception... "Component returned failu
re code: 0x80004005 (NS_ERROR_FAILURE) [nsIServerSocket.init]"  nsresult: "0x800
04005 (NS_ERROR_FAILURE)"  location: "JS frame :: jar:file:///C:/laitos-windows-
supplements/SlimerJS/omni.ja!/components/httpd.js :: _start :: line 556"  data:
no]


PhantomJS Javascript Exception
Error: 2147746065
thread '' panicked at 'failed to create an OS RNG: Error { repr: Os { c
ode: -2146893818, message: "Invalid Signature." } }', src\libcore\result.rs:906:
4
stack backtrace:
   0: 
   1: 
   2: 
   3: 
   4: 
   5: 
   6: 
   7: 
   8: 
   9: mozilla_dump_image
  10: mozilla_dump_image
  11: mozilla_dump_image
  12: 
  13: 
  14: 
  15: 
  16: 
  17: 
  18: 
  19: 
  20: PRP_TryLock
  21: PR_MD_UNLOCK
  22: iswascii
  23: 
  24: 
  25: 
  26: 

c:\gopath\src>

Be aware that SlimerJS is not compatible with firefox version newer than 59, hence the error was captured from firefox version 59.0.3, launched via SlimerJS 1.0.0.

@cpeterso
Copy link
Contributor

cpeterso commented Jul 6, 2018

Unfortunately, I don't have a lot of helpful information to share. The "Invalid Signature" error message might mean that:

  • You have some anti-virus or internet security software or malware installed (and it is interfering with your program loading the advapi32.dll).
  • Your advapi32.dll is corrupt, either on disk or due to some hardware issue like bad RAM.

99% of these "Invalid Signature" errors in Firefox happen on Windows 7.

@HouzuoGuo
Copy link

thanks @cpeterso

@dhardy
Copy link
Contributor

dhardy commented Feb 15, 2019

@Manishearth @cpeterso is this still a thing?

Several things have changed since:

  • the Rand lib has a backup entropy source based on CPU jitter (since Dec 2017, as you know)
  • RandomState no longer uses the Rand lib, but uses a minimal hashmap_random_keys function
  • hashmap_random_keys is a simplified re-implementation of OsRng and panics on failure (see Windows implementation)

Note that hashmap_random_keys actually returns constants on some platforms, hence it is somewhat surprising that other platforms will panic on error. It might make more sense if hashmap_random_keys returned an Option (it's not part of the public API).

Thus, I presume you still see occasional crashes of FF due to this?

Secondary question: do you believe having a backup entropy source in Rand is generally useful enough to be worth keeping? I ask because quite a few people object to JitterRng due to it not being provably secure.

@cpeterso
Copy link
Contributor

is this still a thing?

I don't this is currently a problem (for Firefox). I just looked for recent Firefox crashes that matched the original function signature and only found a few. So we can probably close this issue.

do you believe having a backup entropy source in Rand is generally useful enough to be worth keeping?

I think it's useful for Rand to have a default that "just works" (i.e. has a backup entropy source). If people have a use case requires a secure RNG, then they can use a secure RNG in their code. 😃

OTOH, Rand could change to not use the backup entropy source by default and Rust std libs could explicitly opt into the backup entropy source for internal use cases that don't absolutely require a secure RNG (like seeding a hash map).

@the8472
Copy link
Member

the8472 commented Dec 7, 2021

I don't this is currently a problem (for Firefox). I just looked for recent Firefox crashes that matched the original function signature and only found a few. So we can probably close this issue.

Doing so.

@the8472 the8472 closed this as completed Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants