New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonrandom distribution of Random.generate #673

Closed
rtfeldman opened this Issue Jul 24, 2016 · 7 comments

Comments

Projects
None yet
5 participants
@rtfeldman
Member

rtfeldman commented Jul 24, 2016

To demonstrate the problem, put this into Try Elm.

It generates a random integer between 0 and 2 using Random.generate on init, and reports it, along with the current time at startup (just so you can confirm that it actually did recompile). The time changes, but the "random" number is extremely consistent.

import Html exposing (..)
import Html.App
import Random
import Time exposing (Time)
import Task


main =
  Html.App.program
    { init = init
    , view = view
    , update = update
    , subscriptions = \_ -> Sub.none
    }



-- MODEL


type alias Model =
  { randomNumber : Maybe Int
  , initialTime : Maybe Time
  }


initialModel : Model
initialModel =
  { randomNumber = Nothing
  , initialTime = Nothing
  }


init : (Model, Cmd Msg)
init =
  ( initialModel
  , Cmd.batch
      [ Random.generate SetRandomNumber (Random.int 0 2) 
      , Task.perform never SetInitialTime Time.now
      ]
  )


never : Never -> a
never a =
  never a


-- UPDATE


type Msg
  = SetRandomNumber Int
  | SetInitialTime Time


update : Msg -> Model -> (Model, Cmd Msg)
update msg model =
  case msg of
    SetRandomNumber num ->
      ( { model | randomNumber = Just num }, Cmd.none)

    SetInitialTime newTime ->
      ( { model | initialTime = Just newTime }, Cmd.none)



-- VIEW


view : Model -> Html Msg
view { randomNumber, initialTime } =
  case ( randomNumber, initialTime ) of 
    ( Just num, Just time ) ->
      ul []
        [ li [] [ text (toString num ++ " <-- Initial random number between 0 and 2") ]
        , li [] [ text (toString time ++ " <-- Initial time") ]
        ]

    _ ->
      text ""

Its output looks like this:

screen shot 2016-07-24 at 11 27 25 am

If you keep pressing Compile, you'll see the initial time change, but the "initial random number between 0 and 2" stays extremely consistent. I'm seeing "runs" of like 20 times in a row where it outputs (let's say) 0, then the time will change sufficiently, and then I'll get another ~20 times in a row where it outputs 2.

@process-bot

This comment has been minimized.

Show comment
Hide comment
@process-bot

process-bot Jul 24, 2016

Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

process-bot commented Jul 24, 2016

Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Jul 24, 2016

Contributor

Just the same as https://github.com/elm-lang/core/issues/635, isn't it?

And the cause of this is known. It has been brought up several times, but not acted upon. I know that @mgold could tell a long story about it. His most recent mention of it, in an attempt to get the issue fixed, is the last item in https://github.com/elm-lang/core/pull/643#issue-159564404, I think.

Contributor

jvoigtlaender commented Jul 24, 2016

Just the same as https://github.com/elm-lang/core/issues/635, isn't it?

And the cause of this is known. It has been brought up several times, but not acted upon. I know that @mgold could tell a long story about it. His most recent mention of it, in an attempt to get the issue fixed, is the last item in https://github.com/elm-lang/core/pull/643#issue-159564404, I think.

@mgold

This comment has been minimized.

Show comment
Hide comment
@mgold

mgold Jul 24, 2016

Contributor

Yes, this is part of how spectacularly bad the current RNG is. If you were to chose a 32-bit integer chosen uniformly at random, my impression is that the RNG would be reasonably random. However, the current time is far from such a number. It seems there are large segments (millions?) of sequential numbers that all produce the same "random" number.

When Richard, Evan, and I spoke at the last meetup, I was caught up in statistical measurements and totally forgot that user-observable cases like this one also occur. I should have emphasized that more. But, I also remember him pointing out that if you chain seeds like you're supposed to instead of calling initialSeed, things work out mostly okay.

PCG does not (to my knowledge) have this deficiency, and it would be acceptable to initialize it to the current time (as long as you do that once and not n times). A task that called Math.floor(Math.random() * 0xFFFFFFFF) would be even better, but not essential.

Janis, I've been working on implementing a variant of PCG that has 32 bits of state but still produces 32 bits of output. The result is nearly as fast as core, and will be faster once Evan implements a fairly simple optimization. The statistical performance is only slightly diminished. Long term we still want to merge that to core, but short-term it's going to be third-party.

Contributor

mgold commented Jul 24, 2016

Yes, this is part of how spectacularly bad the current RNG is. If you were to chose a 32-bit integer chosen uniformly at random, my impression is that the RNG would be reasonably random. However, the current time is far from such a number. It seems there are large segments (millions?) of sequential numbers that all produce the same "random" number.

When Richard, Evan, and I spoke at the last meetup, I was caught up in statistical measurements and totally forgot that user-observable cases like this one also occur. I should have emphasized that more. But, I also remember him pointing out that if you chain seeds like you're supposed to instead of calling initialSeed, things work out mostly okay.

PCG does not (to my knowledge) have this deficiency, and it would be acceptable to initialize it to the current time (as long as you do that once and not n times). A task that called Math.floor(Math.random() * 0xFFFFFFFF) would be even better, but not essential.

Janis, I've been working on implementing a variant of PCG that has 32 bits of state but still produces 32 bits of output. The result is nearly as fast as core, and will be faster once Evan implements a fairly simple optimization. The statistical performance is only slightly diminished. Long term we still want to merge that to core, but short-term it's going to be third-party.

@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Jul 24, 2016

Member

So can we close this? What are we actually tracking here?

Member

evancz commented Jul 24, 2016

So can we close this? What are we actually tracking here?

@rtfeldman

This comment has been minimized.

Show comment
Hide comment
@rtfeldman
Member

rtfeldman commented Jul 25, 2016

@rtfeldman rtfeldman closed this Jul 25, 2016

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Jul 25, 2016

Contributor

@mgold, about this:

PCG does not (to my knowledge) have this deficiency, and it would be acceptable to initialize it to the current time (as long as you do that once and not n times). A task that called Math.floor(Math.random() * 0xFFFFFFFF) would be even better, but not essential.

There is a post on "the PCG page" that seems to imply that using the current time to seed PCG would also be problematic.

http://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html

Contributor

jvoigtlaender commented Jul 25, 2016

@mgold, about this:

PCG does not (to my knowledge) have this deficiency, and it would be acceptable to initialize it to the current time (as long as you do that once and not n times). A task that called Math.floor(Math.random() * 0xFFFFFFFF) would be even better, but not essential.

There is a post on "the PCG page" that seems to imply that using the current time to seed PCG would also be problematic.

http://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html

@mgold

This comment has been minimized.

Show comment
Hide comment
@mgold

mgold Jul 26, 2016

Contributor

Good to know, but I think it's not pathological the way we're seeing with core.

Evan has said that he'd rather wait and fix things properly than put on a bandaid, so I'll mention only briefly that one could, in initialSeed, discard the first seed and return the next one instead.

Contributor

mgold commented Jul 26, 2016

Good to know, but I think it's not pathological the way we're seeing with core.

Evan has said that he'd rather wait and fix things properly than put on a bandaid, so I'll mention only briefly that one could, in initialSeed, discard the first seed and return the next one instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment