New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate a superfluous List.reverse call #522

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@jvoigtlaender
Contributor

jvoigtlaender commented Mar 14, 2016

Note that we are accumulating random values there. So their order does not matter!

@mgold

This comment has been minimized.

Show comment
Hide comment
@mgold

mgold Mar 14, 2016

Contributor

It matters if you are accumulating a different number of items. Consider this example:

import Random
import Html

seed =
  Random.initialSeed 314159

list n =
  let gen = Random.list n (Random.int 0 100)
  in Random.generate gen seed |> fst

main =
  Html.text <| toString (list 2) ++ " " ++ toString (list 5)

The result is

[2,22] [2,22,89,50,42]

Notice that the shorter list is a prefix of the longer list. I think this is a nice property to have, and is the purpose of the List.reverse call.

Yes, this example reuses a seed. Sometimes reproducible randomness is desired within a program. Similar situations can arise if n is determined by user input, or varied between different compilations of otherwise the same program.

Contributor

mgold commented Mar 14, 2016

It matters if you are accumulating a different number of items. Consider this example:

import Random
import Html

seed =
  Random.initialSeed 314159

list n =
  let gen = Random.list n (Random.int 0 100)
  in Random.generate gen seed |> fst

main =
  Html.text <| toString (list 2) ++ " " ++ toString (list 5)

The result is

[2,22] [2,22,89,50,42]

Notice that the shorter list is a prefix of the longer list. I think this is a nice property to have, and is the purpose of the List.reverse call.

Yes, this example reuses a seed. Sometimes reproducible randomness is desired within a program. Similar situations can arise if n is determined by user input, or varied between different compilations of otherwise the same program.

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Mar 14, 2016

Contributor

You are postulating a property of Random.list that is nowhere specified, and not tested.

Specifically, this is not about reproducability. Even after my change, the results of every expression will still be the same when started with the same random seed.

What you instead are saying is that Random.list 2 generator should be related to Random.list 5 generator in some way. But that is not "reproducability", because 2 and 5 are different.

Let me put it differently: Can you come up with a test for core that will let the current version of Random pass, but not anymore after my suggested change? Of course, you can, but will the test you come up with be defensible by anything that is currently documented or specified? If not, what is the justification for your test?

(I'm actually interested in seeing what test you would come up with.)

Contributor

jvoigtlaender commented Mar 14, 2016

You are postulating a property of Random.list that is nowhere specified, and not tested.

Specifically, this is not about reproducability. Even after my change, the results of every expression will still be the same when started with the same random seed.

What you instead are saying is that Random.list 2 generator should be related to Random.list 5 generator in some way. But that is not "reproducability", because 2 and 5 are different.

Let me put it differently: Can you come up with a test for core that will let the current version of Random pass, but not anymore after my suggested change? Of course, you can, but will the test you come up with be defensible by anything that is currently documented or specified? If not, what is the justification for your test?

(I'm actually interested in seeing what test you would come up with.)

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Mar 14, 2016

Contributor

Just to add the obvious thing. You said, about using Random.list with two different lengths:

Notice that the shorter list is a prefix of the longer list.

After the proposed change (which improves efficiency), the same statement holds with "suffix" instead of "prefix".

In what sense is "prefix" preferrable to "suffix"? For example, is there a specific user interaction example (given your comment about repeated runs with different user input) where it's clear that "prefix" is the better behavior?

Contributor

jvoigtlaender commented Mar 14, 2016

Just to add the obvious thing. You said, about using Random.list with two different lengths:

Notice that the shorter list is a prefix of the longer list.

After the proposed change (which improves efficiency), the same statement holds with "suffix" instead of "prefix".

In what sense is "prefix" preferrable to "suffix"? For example, is there a specific user interaction example (given your comment about repeated runs with different user input) where it's clear that "prefix" is the better behavior?

@mgold

This comment has been minimized.

Show comment
Hide comment
@mgold

mgold Mar 14, 2016

Contributor

Okay, it's not about reproducibility. Let's use the word stability for "similar programs giving similar output". Usually stability is not desired for pseudo-randomness, but for lists I think it's useful (for reasons I'll get to).

As for a test, using my definition of list above, List.head (list 1) == List.head (list 2). This will pass in current core and not without reversing the list (except when list 2 generates the same value twice).

It's true that this behavior isn't "documented or specified" anywhere. But a conceptual model for Random.list is not unlike taking from Haskell's infinite lists. "I can generate any number of values I like, and I'll take the first n of them." This is why prefix is "preferable" to suffix.

A UI example: let's say I have a deck of cards and can be dealt some cards. It's not a perfect example because it samples without replacement, but go with it. If I draw three cards, I'd expect to get the first two cards I draw to be exactly the same and in the same order as if I had drawn only two cards. I'll still get the same cards, but they should be in order. (Think of a card game where there's a card I'm trying to avoid drawing, and I'm allowed to privately view the top of the deck without drawing. Now order really matters.)

From the opposite vantage point, is there any reason to remove the reverse call other than performance? Have you benchmarked what effect it will make? (I doubt it will save more time, in the entire future of Elm, than we've already spent arguing over it...)

Contributor

mgold commented Mar 14, 2016

Okay, it's not about reproducibility. Let's use the word stability for "similar programs giving similar output". Usually stability is not desired for pseudo-randomness, but for lists I think it's useful (for reasons I'll get to).

As for a test, using my definition of list above, List.head (list 1) == List.head (list 2). This will pass in current core and not without reversing the list (except when list 2 generates the same value twice).

It's true that this behavior isn't "documented or specified" anywhere. But a conceptual model for Random.list is not unlike taking from Haskell's infinite lists. "I can generate any number of values I like, and I'll take the first n of them." This is why prefix is "preferable" to suffix.

A UI example: let's say I have a deck of cards and can be dealt some cards. It's not a perfect example because it samples without replacement, but go with it. If I draw three cards, I'd expect to get the first two cards I draw to be exactly the same and in the same order as if I had drawn only two cards. I'll still get the same cards, but they should be in order. (Think of a card game where there's a card I'm trying to avoid drawing, and I'm allowed to privately view the top of the deck without drawing. Now order really matters.)

From the opposite vantage point, is there any reason to remove the reverse call other than performance? Have you benchmarked what effect it will make? (I doubt it will save more time, in the entire future of Elm, than we've already spent arguing over it...)

@mgold

This comment has been minimized.

Show comment
Hide comment
@mgold

mgold Mar 14, 2016

Contributor

Another UI example: procedurally generated content. Let's say I've defined a very complex generator for some kind of image (I'm thinking of these). My program displays many of these images, using Random.list. Now let's say that, either by changing the code and recompiling, or by adjusting a slider, I affect some parameter of the generator. I'd expect the images generated to look something like the old ones, so I can see the effect of the change. If done interactively, I might get start with a row of images, and then get a new row when I change the parameter, so I can read down a column of similar images.

So far, having the list reversed wouldn't make a difference since it's random anyway. But let's say that I could also vary the number of images generated. Then I would expect that you'd just add or lose images from the end, and that the prefixes would be the same.

I wouldn't expect the first value generated to be affected by the number of values I ask for.

Contributor

mgold commented Mar 14, 2016

Another UI example: procedurally generated content. Let's say I've defined a very complex generator for some kind of image (I'm thinking of these). My program displays many of these images, using Random.list. Now let's say that, either by changing the code and recompiling, or by adjusting a slider, I affect some parameter of the generator. I'd expect the images generated to look something like the old ones, so I can see the effect of the change. If done interactively, I might get start with a row of images, and then get a new row when I change the parameter, so I can read down a column of similar images.

So far, having the list reversed wouldn't make a difference since it's random anyway. But let's say that I could also vary the number of images generated. Then I would expect that you'd just add or lose images from the end, and that the prefixes would be the same.

I wouldn't expect the first value generated to be affected by the number of values I ask for.

@mgold

This comment has been minimized.

Show comment
Hide comment
@mgold

mgold Mar 14, 2016

Contributor

Finally, I know that "Haskell does it this way" is a pretty flimsy argument, but.... Haskell does it this way.

$ cabal install random
<SNIP>
$ ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
Prelude> import System.Random
Prelude System.Random> let seed = mkStdGen 31415
Prelude System.Random> take 5 $ randoms seed
[3138221575734723342,-7346230400103730202,5835898653713406151,-4136156571590303768,-7113147695219624588]
Prelude System.Random> take 2 $ randoms seed
[3138221575734723342,-7346230400103730202]
Contributor

mgold commented Mar 14, 2016

Finally, I know that "Haskell does it this way" is a pretty flimsy argument, but.... Haskell does it this way.

$ cabal install random
<SNIP>
$ ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
Prelude> import System.Random
Prelude System.Random> let seed = mkStdGen 31415
Prelude System.Random> take 5 $ randoms seed
[3138221575734723342,-7346230400103730202,5835898653713406151,-4136156571590303768,-7113147695219624588]
Prelude System.Random> take 2 $ randoms seed
[3138221575734723342,-7346230400103730202]
@bbqbaron

This comment has been minimized.

Show comment
Hide comment
@bbqbaron

bbqbaron Mar 14, 2016

I understand the intent of the PR, but if “randomness” from a given seed is deterministic, it does kind of feel as if you’d expect the first X random list number generated from that seed to be the same. If you think of random lists as slices of an infinite lazy list, I can see why prefix ought to be privileged over suffix.

bbqbaron commented Mar 14, 2016

I understand the intent of the PR, but if “randomness” from a given seed is deterministic, it does kind of feel as if you’d expect the first X random list number generated from that seed to be the same. If you think of random lists as slices of an infinite lazy list, I can see why prefix ought to be privileged over suffix.

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Mar 14, 2016

Contributor

About performance: I haven't benchmarked, and I won't. It's an obvious strict performance improvement (you are mentioning time, but there are also space/allocation costs). So in case there are no semantic reasons to prefer one version over the other, there is no justification for not making the change.

So let's talk about stability.

If the property is not documented, I'd rather not have it. Invariants that are only true "accidentally" can be dangerous. If not documented, it might be changed any time. By then, people might have developed an expectation (even if the property is not documented, just observed), and built assumptions into their code or development habits that then break, maybe with bad discoverability of what actually has changed and breaks their stuff.

Even if the implementation is never changed, the property is very brittle due to its non-compositionality. Say I have a generator for some custom tree type of mine. Why, on earth, is the produced list stable in Random.map2 (,) (Random.list n (Random.int 0 100)) treeGenerator, but not in Random.map2 (,) treeGenerator (Random.list n (Random.int 0 100))? Big puzzlement on the developer's part. Or, expanding on your minimal example above, let's look at

triple n =
  let list = Random.list n (Random.int 0 100)
      gen = Random.map3 (,,) list Random.bool list
  in Random.generate gen seed |> fst

main =
  Html.text <| toString (triple 2) ++ " " ++ toString (triple 5)

with results ([2,22],True,[50,42]) vs. ([2,22,89,50,42],False,[47,79,82,79,6]). How disppointing that one list is stable, but the other list isn't. I'd say that from a development/methodology standpoint, an "invariant" that holds only sometimes is worse than one that holds never (because people are tempted into making assumptions etc.). It's a bit like the "fail early, fail often" principle?

For the same reason, the card or image examples are not really convincing me. Such stuff would happen inside a bigger program. Only under very specific circumstances would you actually see the kind of stability you describe. As soon as you happen to do some non-trivial (dynamic size) random value generation "before" the call to the sequence generator, the property is lost. Which might easily be introduced during an otherwise completely innocent refactoring step. (Think of the Random.map2 (,) listGenerator treeGenerator vs. Random.map2 (,) treeGenerator listGenerator example from above.)

I guess my overall stance is: Even disregarding performance, I find the property more potentially hurtful (in context and in the long run) than helpful, and certainly so if it is not elevated to an "official" and documented and tested property.

Contributor

jvoigtlaender commented Mar 14, 2016

About performance: I haven't benchmarked, and I won't. It's an obvious strict performance improvement (you are mentioning time, but there are also space/allocation costs). So in case there are no semantic reasons to prefer one version over the other, there is no justification for not making the change.

So let's talk about stability.

If the property is not documented, I'd rather not have it. Invariants that are only true "accidentally" can be dangerous. If not documented, it might be changed any time. By then, people might have developed an expectation (even if the property is not documented, just observed), and built assumptions into their code or development habits that then break, maybe with bad discoverability of what actually has changed and breaks their stuff.

Even if the implementation is never changed, the property is very brittle due to its non-compositionality. Say I have a generator for some custom tree type of mine. Why, on earth, is the produced list stable in Random.map2 (,) (Random.list n (Random.int 0 100)) treeGenerator, but not in Random.map2 (,) treeGenerator (Random.list n (Random.int 0 100))? Big puzzlement on the developer's part. Or, expanding on your minimal example above, let's look at

triple n =
  let list = Random.list n (Random.int 0 100)
      gen = Random.map3 (,,) list Random.bool list
  in Random.generate gen seed |> fst

main =
  Html.text <| toString (triple 2) ++ " " ++ toString (triple 5)

with results ([2,22],True,[50,42]) vs. ([2,22,89,50,42],False,[47,79,82,79,6]). How disppointing that one list is stable, but the other list isn't. I'd say that from a development/methodology standpoint, an "invariant" that holds only sometimes is worse than one that holds never (because people are tempted into making assumptions etc.). It's a bit like the "fail early, fail often" principle?

For the same reason, the card or image examples are not really convincing me. Such stuff would happen inside a bigger program. Only under very specific circumstances would you actually see the kind of stability you describe. As soon as you happen to do some non-trivial (dynamic size) random value generation "before" the call to the sequence generator, the property is lost. Which might easily be introduced during an otherwise completely innocent refactoring step. (Think of the Random.map2 (,) listGenerator treeGenerator vs. Random.map2 (,) treeGenerator listGenerator example from above.)

I guess my overall stance is: Even disregarding performance, I find the property more potentially hurtful (in context and in the long run) than helpful, and certainly so if it is not elevated to an "official" and documented and tested property.

@jvoigtlaender

This comment has been minimized.

Show comment
Hide comment
@jvoigtlaender

jvoigtlaender Mar 14, 2016

Contributor

Oh, BTW, the comparison to take n (randoms seed) in Haskell is apples to oranges, since there is no composable generator type in use there. An example using QuickCheck's Gen monad would be more convincing.

Contributor

jvoigtlaender commented Mar 14, 2016

Oh, BTW, the comparison to take n (randoms seed) in Haskell is apples to oranges, since there is no composable generator type in use there. An example using QuickCheck's Gen monad would be more convincing.

@evancz evancz closed this Apr 28, 2016

@evancz evancz deleted the jvoigtlaender-patch-1 branch Apr 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment