Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure API for the RNG #117

Open
infinity0 opened this issue Mar 7, 2017 · 8 comments
Open

Pure API for the RNG #117

infinity0 opened this issue Mar 7, 2017 · 8 comments

Comments

@infinity0
Copy link

infinity0 commented Mar 7, 2017

It would be good to have a pure interface for the RNG, such as what the Haskell crypto-api library provides. This would be useful for composing with pure code as well as for easier testing.

However I'm not sure what the best approach is. The obvious "first attempt" is to clone your g before calling generate on it. However, this causes the RNG internal secret to be stored many times in memory, which makes it more difficult to securely delete later (#11). This may or may not be a big deal security-wise; this needs further thought. It also reduces performance, though probably this wouldn't be significant compared within the overall application.

A "second attempt" is to have:

type pure_g = (index * ref g)

and extend g to remember everything it has generated so far. Then, generate pure_g would either reproduce a previously-generated output (at the relevant index), or advance the "real g" as normal.

This would improve time-performance but would waste a lot of memory, and also it's unclear if storing previous RNG output in memory, is really more beneficial security-wise than storing the previous RNG internal secret states.

A "third attempt" is like the second one above, but with an extra forget function for g to delete all the previously-saved RNG output. This could be called when the application is certain that it never needs old output again, e.g. inside the impure react/poll/read loop of a network application. [1] This would reduce the space costs, however the downside is that the API becomes more complex for the end user.

[1] In fact in most cases it could be called automatically even after pure uses of the RNG; but this would expose the effects of the impure internal implementation to the pure code that is using it, potentially making it impure.

@pqwy
Copy link
Contributor

pqwy commented Sep 30, 2017

RNG as an ambient effect is a very strong design decision, because randomness is pervasive.

In Haskell, people typically thread pure RNGs around using a state monad. Internally, the meaning of a state monad exactly matches the meaning of our operations on mutable gs. The difference is that externally, you can create a referentially transparent observer runRandom :: Rng -> RandomM a -> a.

You can match this either by consistently threading a generator around

let f ?g ... =
  ... gen ?g 42 ...
  ... generate ?g n ...
  ... prime ?g 1024 ...

or by setting the assumed generator before your operation.

You can make your own generators by implementing Rng.S.Generator and invoking Rng.create, or you can use the trivial generator that is provided for testing.

@infinity0
Copy link
Author

infinity0 commented Sep 30, 2017

Pseudo-randomness does not need to be an effect, because it's completely deterministic for each given seed. I think it's good to make that obvious in the type signature so that it's obvious where the security comes from (i.e. when you seed the generator from a non-deterministic IO entropy source).

Yes, I can achieve a similar-looking result by adding ref g as a parameter to each function but then they would be impure and less-easily testable.

For example in ocaml-tls you have a bunch of pure functions which take state -> state (or similar), I argue that the RNG state is an inherent part of this and passing it around as a reference to a mutable g is destroying some of the purity.

In terms of cryptography, there is really nothing special about RNG secret state vs TLS protocol secret state.

Given all the media uproar about "doing RNGs properly" I can understand the desire to hide away the RNG secret state underneath a private mutable reference that clients can't access, but this doesn't improve the security. As long as you make it clear where the security comes from (the "seed" IO operation) I think that is fine from a "responsible crypto library development" point of view.

@pqwy
Copy link
Contributor

pqwy commented Oct 1, 2017

This idea, that the RNG is an effect because of security concerns -- where did this come from?

I'm staring at the text, staring at the code, then staring at the ceiling.... we cannot possibly be talking about the same code here.

@infinity0
Copy link
Author

You didn't explain where it comes from here so I extrapolated from previous conversations. Do you want to explain it in more detail then?

@pqwy
Copy link
Contributor

pqwy commented Oct 5, 2017

It's pure convenience, really.

And the generator state tends to be abstract anyway. I'm not even sure what a flawed reasoning, that hiding the concrete values is improving security, would look like. It's not like you can just leak them.

@pqwy
Copy link
Contributor

pqwy commented Oct 10, 2017

For reference:

To make a well-behaved randomized function, pass an optional generator down:

let f ?g a ... =
  ... Rng.generate ?g ...
  ... h ?g ....
(* val f : ?g:Rng.g -> ... *)

To use it deterministically:

let new_fortuna seed = Rng.(create ~seed (module Generators.Fortuna))

f ~g:(new_fortuna ...) ...

To have it observe exactly the bytestream that you want:

let generates seed = Rng.(create ~seed (module Generators.Null))

f ~g:(generates "deadbeef") ...

(Or write your own generator.)

To control an ill-behaved function:

let with_gen ~g f =
  let g0 = !Rng.generator in
  Rng.generator := g;
  let r = f () in Rng.generator := g0; r

with_gen ~g:(new_fortuna ...) (fun () -> bad_f ...)

(But only for playing around; sticking that into a library is bad form.)

This is how you are meant to test the RNG clients, or generally make them referentially transparent when needed.

To answer your original point, the simplest way to get e.g. Fortuna to be stateless would be to implement it that way, which is not difficult. Its just that it would complicate things for no gain.

RNGs are special. From the information perspective, not cryptographically. The information that interaction traces impart on an RNG instance is zero, so interacting with RNG is (near-perfectly) non-informative. You never care if somebody arbitrarily advanced an instance, or reseeded it. Why worry about keeping track of its movement around the code, then?

@infinity0
Copy link
Author

Sure, I can see how the API works and wrote similar stuff to the above already, what I meant is that it would be nice (convenience) for those of us that are writing pure code, to not have to go through the above shenanigans.

Actually one thing your example misses out, is that we can't easily use an existing generator deterministically - your example creates one from scratch using the same seed. My original post was taking about ways of using existing already-created generators, deterministically - so that we can write decomposable utility functions that use an already-existing RNG and can't seed a new one (because it's not appropriate for them to "collect entropy", since they are pure utility functions).

The gain I see, is analogous to the gain of writing pure code in general - I can clearly see which parts of the code are using (pseudo-)randomness and which parts are deterministic.

I don't think your argument about RNGs being "non-informative" follows here. It's only perfectly non-informative if you don't have access to the internal secret-state, otherwise it's completely deterministic and informative - it helps me write test cases using specific values! Now, this is very similar to encryption, which should be perfectly non-informative if you don't have access to the secret key. So your argument would apply for encryption and basically all other crypto as well - i.e. "there is no need" to write pure functions implementing any sort of cryptography whatsoever.

Now it's true that when I'm writing an application, different layers shouldn't know each other's secrets, but that's why we expose only the abstract type declaration in the .mli and not the full type definition. It would still be good for application writers (who are writing pure code) to be able to get immutable instances of this opaque type and pass them around. A PRNG is not a magic third-party oracle that we get random values from, it's very much a first-party deterministic algorithm that I'd like to be in full control of when writing pure code.

I totally understand why most people would just use the mutable API, it is convenient. But I think implementing protocols using pure code is an interesting and valuable approach that you started with the ocaml-tls stuff, and it's worth exploring it further including extending it to any impure stuff remaining on the lower levels of the stack.

@infinity0
Copy link
Author

To answer your original point, the simplest way to get e.g. Fortuna to be stateless would be to implement it that way, which is not difficult. Its just that it would complicate things for no gain.

The existing implementation could be left alone, what's needed would minimally be a "clone" function. The pure API could then clone the input, perform operations on the clone, and return that. (i.e. the first option I described in my OP.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants