Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issues in ReseedingRng #1169

Closed
joshlf opened this issue Aug 30, 2021 · 10 comments
Closed

Potential issues in ReseedingRng #1169

joshlf opened this issue Aug 30, 2021 · 10 comments

Comments

@joshlf
Copy link
Contributor

joshlf commented Aug 30, 2021

There are two potential issues in this function:

pub fn register_fork_handler() {
static REGISTER: Once = Once::new();
REGISTER.call_once(|| unsafe {
libc::pthread_atfork(None, None, Some(fork_handler));
});
}

Error handling

The first is that, per the POSIX spec, pthread_atfork can return an error. This error should be checked (the code should probably panic on error because an error means that the handler has not been registered).

libc dependency

The second is more speculative. If there's any way for libc's implementation to depend on ReseedingRng, then it's possible that libc code will call into a ReseedingRng after forking but before the pthread_atfork callback has been called, which would result I've run into a similar issue before when writing a global allocator (see the alloc-tls crate for details).

For this, I recommend registering the same callback for all three of pthread_atfork's arguments. This will result in the counter being incremented at the following points:

  • In the parent, at the beginning of the fork call, before any processing has begun
  • In the parent, at the end of the fork call, after all processing has completed
  • In the child, at the end of the fork call

If the counter is incremented in these places, the only way for there to be a problem is if ReseedingRng is used in the parent in the middle of fork processing (after the first increment but before the second) and in the child before the fork call ends (and thus before the child's counter is incremented). This isn't 100% guaranteed not to happen (and I haven't dug through any fork implementations to figure out whether it's likely to happen in practice), but it's the best you can do with what pthread_atfork makes available. If you wanted to be 100% certain, you could do something more custom - e.g., directly invoke syscalls to get your PID - but that's almost certainly too complex and brittle to justify hedging against something that will almost certainly never happen anyway.

@vks
Copy link
Collaborator

vks commented Aug 30, 2021

I think the first issue is not a problem. Reseeding is not strictly necessary and on a "best-effort" basis. If it does not work, it's better to ignore the error and continue.

@joshlf
Copy link
Contributor Author

joshlf commented Aug 30, 2021

I think the first issue is not a problem. Reseeding is not strictly necessary and on a "best-effort" basis. If it does not work, it's better to ignore the error and continue.

I don't think that's true. If you fail to reseed on fork, then you end up with two processes which are both producing identical PRNG output until the next reseeding event. This may be a correctness issue for some applications, and it's certainly a security issue if the PRNGs are ever used in a cryptographic context (ReseedingRng implements CryptoRng).

@dhardy
Copy link
Member

dhardy commented Aug 31, 2021

Error handling

Panic-on-error does sound the most appropriate here. The one reservation I have is that, should any erroneous/incomplete posix impls return an error instead of implementing pthread_atfork, it would break rand for all users on that platform. This is purely hypothetical however, thus it's likely sensible to care about this only if such an issue turns up.

libc dependency

@joshlf could you please be more specific since a search of the alloc-tls source does not find the mentioned issue?

Worth mentioning also is that the fork protection already does not guarantee no duplicate bytes in the forked process, since the protection only applies from the next time BlockRngCore::generate is called.

@vks
Copy link
Collaborator

vks commented Aug 31, 2021

@dhardy @joshlf

Worth mentioning also is that the fork protection already does not guarantee no duplicate bytes in the forked process, since the protection only applies from the next time BlockRngCore::generate is called.

This is what I meant with "best effort". Furthermore, it only works on POSIX platforms. You can't rely on ReseedingRng alone to avoid duplicate bytes for correctness. You can end up with duplicate output even if pthread_atfork didn't fail. I think a better approach is to initialize the RNG after forking.

I'm also worried about breaking rand for a lot of users. Under which conditions does pthread_atfork fail?

@vks
Copy link
Collaborator

vks commented Aug 31, 2021

I'm also worried about breaking rand for a lot of users. Under which conditions does pthread_atfork fail?

It seems that pthread_atfork can only fail if it cannot allocate memory. In this case, panicking should be fine.

However, does that mean we should only use pthread_atfork if the alloc feature is enabled?

@joshlf
Copy link
Contributor Author

joshlf commented Aug 31, 2021

@joshlf could you please be more specific since a search of the alloc-tls source does not find the mentioned issue?

Sure. libc itself uses the global allocator to perform certain operations, which includes both thread-local storage initialization and thread teardown. This has two consequences:

  • The first time that the allocator is accessed in a thread, it lazily initializes thread-local storage. Since this initialization can itself perform allocation, a naive implementation will infinitely recurse. alloc-tls is able to detect this kind of reentrancy and fail gracefully (which the allocator implementation is supposed to handle by falling back to an allocation slow path).
  • During thread teardown, if the allocator is accessed after the allocator's thread-local storage has already been destructed, a naive implementation will access uninitialized memory expecting it to contain a thread-local cache. alloc-tls is able to detect this and fail gracefully.

For global allocators, the issue is thread-local variables, not pthread_atfork (although I could see that being an issue for global allocators as well - I just never ran into it). The point that I'm making by mentioning alloc-tls is simply that it's possible for library code to accidentally assume that it won't be called from libc itself. In the case of ReseedingRng, I could imagine this happening, e.g., if client code registers its own pthread_atfork handler which uses ReseedingRng. There may be other ways to cause this to happen that I'm not aware of - that's just off the top of my head. (here's the glibc fork source if you want to poke around on your own)

Worth mentioning also is that the fork protection already does not guarantee no duplicate bytes in the forked process, since the protection only applies from the next time BlockRngCore::generate is called.

This is what I meant with "best effort". Furthermore, it only works on POSIX platforms. You can't rely on ReseedingRng alone to avoid duplicate bytes for correctness. You can end up with duplicate output even if pthread_atfork didn't fail. I think a better approach is to initialize the RNG after forking.

Reading the ReseedingRng docs, I see what you're saying. However, given how expensive forking is anyway, it probably wouldn't impose a meaningful performance cost to register all three handlers instead of only the child handler, so you might as well do it just to be safe.

I'm also worried about breaking rand for a lot of users. Under which conditions does pthread_atfork fail?

It seems that pthread_atfork can only fail if it cannot allocate memory. In this case, panicking should be fine.

However, does that mean we should only use pthread_atfork if the alloc feature is enabled?

It looks like the entire adapter module is gated behind the std feature flag, so I don't think this is an issue.

@dhardy
Copy link
Member

dhardy commented Sep 11, 2021

e.g., if client code registers its own pthread_atfork handler which uses ReseedingRng

This does not sound like a good idea: besides the potential allocation issue, any returned result may not be unique to the process even after fixing the issue you mention due to BlockRng's buffer. I guess we should document that.

This is what I meant with "best effort".

Another point of interest is that the current implementation only logs a warning if, for some reason, reseeding fails (presumably if the call to getrandom fails, depending on the platform).

@joshlf
Copy link
Contributor Author

joshlf commented Sep 11, 2021

Another point of interest is that the current implementation only logs a warning if, for some reason, reseeding fails (presumably if the call to getrandom fails, depending on the platform).

Some other libraries - such as BoringSSL - handle this by aborting the process on the theory that, in practice, no code securely handles this case (and that it happens so rarely that aborting is acceptable).

@dhardy
Copy link
Member

dhardy commented Sep 11, 2021

@joshlf the reasons we don't panic are two-fold: (1) getrandom (crate) is portable across a bunch of platforms, and I don't care to investigate just how rare failures are, and (2) that thread_rng has always been a compromise between convenience and security. I think here the compromise is acceptable?

And I think we can close this now?

@vks
Copy link
Collaborator

vks commented Sep 17, 2021

I think the original two issues are fixed by #1178. Please open a new issue if there is anything else that could be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants