Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upEliminate a superfluous List.reverse call #522
Conversation
jvoigtlaender
referenced this pull request
Mar 14, 2016
Closed
Eliminate a superfluous List.reverse call #2
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mgold
Mar 14, 2016
Contributor
It matters if you are accumulating a different number of items. Consider this example:
import Random
import Html
seed =
Random.initialSeed 314159
list n =
let gen = Random.list n (Random.int 0 100)
in Random.generate gen seed |> fst
main =
Html.text <| toString (list 2) ++ " " ++ toString (list 5)The result is
[2,22] [2,22,89,50,42]
Notice that the shorter list is a prefix of the longer list. I think this is a nice property to have, and is the purpose of the List.reverse call.
Yes, this example reuses a seed. Sometimes reproducible randomness is desired within a program. Similar situations can arise if n is determined by user input, or varied between different compilations of otherwise the same program.
|
It matters if you are accumulating a different number of items. Consider this example: import Random
import Html
seed =
Random.initialSeed 314159
list n =
let gen = Random.list n (Random.int 0 100)
in Random.generate gen seed |> fst
main =
Html.text <| toString (list 2) ++ " " ++ toString (list 5)The result is
Notice that the shorter list is a prefix of the longer list. I think this is a nice property to have, and is the purpose of the Yes, this example reuses a seed. Sometimes reproducible randomness is desired within a program. Similar situations can arise if n is determined by user input, or varied between different compilations of otherwise the same program. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jvoigtlaender
Mar 14, 2016
Contributor
You are postulating a property of Random.list that is nowhere specified, and not tested.
Specifically, this is not about reproducability. Even after my change, the results of every expression will still be the same when started with the same random seed.
What you instead are saying is that Random.list 2 generator should be related to Random.list 5 generator in some way. But that is not "reproducability", because 2 and 5 are different.
Let me put it differently: Can you come up with a test for core that will let the current version of Random pass, but not anymore after my suggested change? Of course, you can, but will the test you come up with be defensible by anything that is currently documented or specified? If not, what is the justification for your test?
(I'm actually interested in seeing what test you would come up with.)
|
You are postulating a property of Specifically, this is not about reproducability. Even after my change, the results of every expression will still be the same when started with the same random seed. What you instead are saying is that Let me put it differently: Can you come up with a test for (I'm actually interested in seeing what test you would come up with.) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jvoigtlaender
Mar 14, 2016
Contributor
Just to add the obvious thing. You said, about using Random.list with two different lengths:
Notice that the shorter list is a prefix of the longer list.
After the proposed change (which improves efficiency), the same statement holds with "suffix" instead of "prefix".
In what sense is "prefix" preferrable to "suffix"? For example, is there a specific user interaction example (given your comment about repeated runs with different user input) where it's clear that "prefix" is the better behavior?
|
Just to add the obvious thing. You said, about using
After the proposed change (which improves efficiency), the same statement holds with "suffix" instead of "prefix". In what sense is "prefix" preferrable to "suffix"? For example, is there a specific user interaction example (given your comment about repeated runs with different user input) where it's clear that "prefix" is the better behavior? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mgold
Mar 14, 2016
Contributor
Okay, it's not about reproducibility. Let's use the word stability for "similar programs giving similar output". Usually stability is not desired for pseudo-randomness, but for lists I think it's useful (for reasons I'll get to).
As for a test, using my definition of list above, List.head (list 1) == List.head (list 2). This will pass in current core and not without reversing the list (except when list 2 generates the same value twice).
It's true that this behavior isn't "documented or specified" anywhere. But a conceptual model for Random.list is not unlike taking from Haskell's infinite lists. "I can generate any number of values I like, and I'll take the first n of them." This is why prefix is "preferable" to suffix.
A UI example: let's say I have a deck of cards and can be dealt some cards. It's not a perfect example because it samples without replacement, but go with it. If I draw three cards, I'd expect to get the first two cards I draw to be exactly the same and in the same order as if I had drawn only two cards. I'll still get the same cards, but they should be in order. (Think of a card game where there's a card I'm trying to avoid drawing, and I'm allowed to privately view the top of the deck without drawing. Now order really matters.)
From the opposite vantage point, is there any reason to remove the reverse call other than performance? Have you benchmarked what effect it will make? (I doubt it will save more time, in the entire future of Elm, than we've already spent arguing over it...)
|
Okay, it's not about reproducibility. Let's use the word stability for "similar programs giving similar output". Usually stability is not desired for pseudo-randomness, but for lists I think it's useful (for reasons I'll get to). As for a test, using my definition of It's true that this behavior isn't "documented or specified" anywhere. But a conceptual model for A UI example: let's say I have a deck of cards and can be dealt some cards. It's not a perfect example because it samples without replacement, but go with it. If I draw three cards, I'd expect to get the first two cards I draw to be exactly the same and in the same order as if I had drawn only two cards. I'll still get the same cards, but they should be in order. (Think of a card game where there's a card I'm trying to avoid drawing, and I'm allowed to privately view the top of the deck without drawing. Now order really matters.) From the opposite vantage point, is there any reason to remove the reverse call other than performance? Have you benchmarked what effect it will make? (I doubt it will save more time, in the entire future of Elm, than we've already spent arguing over it...) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mgold
Mar 14, 2016
Contributor
Another UI example: procedurally generated content. Let's say I've defined a very complex generator for some kind of image (I'm thinking of these). My program displays many of these images, using Random.list. Now let's say that, either by changing the code and recompiling, or by adjusting a slider, I affect some parameter of the generator. I'd expect the images generated to look something like the old ones, so I can see the effect of the change. If done interactively, I might get start with a row of images, and then get a new row when I change the parameter, so I can read down a column of similar images.
So far, having the list reversed wouldn't make a difference since it's random anyway. But let's say that I could also vary the number of images generated. Then I would expect that you'd just add or lose images from the end, and that the prefixes would be the same.
I wouldn't expect the first value generated to be affected by the number of values I ask for.
|
Another UI example: procedurally generated content. Let's say I've defined a very complex generator for some kind of image (I'm thinking of these). My program displays many of these images, using So far, having the list reversed wouldn't make a difference since it's random anyway. But let's say that I could also vary the number of images generated. Then I would expect that you'd just add or lose images from the end, and that the prefixes would be the same. I wouldn't expect the first value generated to be affected by the number of values I ask for. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mgold
Mar 14, 2016
Contributor
Finally, I know that "Haskell does it this way" is a pretty flimsy argument, but.... Haskell does it this way.
$ cabal install random
<SNIP>
$ ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help
Prelude> import System.Random
Prelude System.Random> let seed = mkStdGen 31415
Prelude System.Random> take 5 $ randoms seed
[3138221575734723342,-7346230400103730202,5835898653713406151,-4136156571590303768,-7113147695219624588]
Prelude System.Random> take 2 $ randoms seed
[3138221575734723342,-7346230400103730202]
|
Finally, I know that "Haskell does it this way" is a pretty flimsy argument, but.... Haskell does it this way.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bbqbaron
Mar 14, 2016
I understand the intent of the PR, but if “randomness” from a given seed is deterministic, it does kind of feel as if you’d expect the first X random list number generated from that seed to be the same. If you think of random lists as slices of an infinite lazy list, I can see why prefix ought to be privileged over suffix.
bbqbaron
commented
Mar 14, 2016
|
I understand the intent of the PR, but if “randomness” from a given seed is deterministic, it does kind of feel as if you’d expect the first X random list number generated from that seed to be the same. If you think of random lists as slices of an infinite lazy list, I can see why prefix ought to be privileged over suffix. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jvoigtlaender
Mar 14, 2016
Contributor
About performance: I haven't benchmarked, and I won't. It's an obvious strict performance improvement (you are mentioning time, but there are also space/allocation costs). So in case there are no semantic reasons to prefer one version over the other, there is no justification for not making the change.
So let's talk about stability.
If the property is not documented, I'd rather not have it. Invariants that are only true "accidentally" can be dangerous. If not documented, it might be changed any time. By then, people might have developed an expectation (even if the property is not documented, just observed), and built assumptions into their code or development habits that then break, maybe with bad discoverability of what actually has changed and breaks their stuff.
Even if the implementation is never changed, the property is very brittle due to its non-compositionality. Say I have a generator for some custom tree type of mine. Why, on earth, is the produced list stable in Random.map2 (,) (Random.list n (Random.int 0 100)) treeGenerator, but not in Random.map2 (,) treeGenerator (Random.list n (Random.int 0 100))? Big puzzlement on the developer's part. Or, expanding on your minimal example above, let's look at
triple n =
let list = Random.list n (Random.int 0 100)
gen = Random.map3 (,,) list Random.bool list
in Random.generate gen seed |> fst
main =
Html.text <| toString (triple 2) ++ " " ++ toString (triple 5)with results ([2,22],True,[50,42]) vs. ([2,22,89,50,42],False,[47,79,82,79,6]). How disppointing that one list is stable, but the other list isn't. I'd say that from a development/methodology standpoint, an "invariant" that holds only sometimes is worse than one that holds never (because people are tempted into making assumptions etc.). It's a bit like the "fail early, fail often" principle?
For the same reason, the card or image examples are not really convincing me. Such stuff would happen inside a bigger program. Only under very specific circumstances would you actually see the kind of stability you describe. As soon as you happen to do some non-trivial (dynamic size) random value generation "before" the call to the sequence generator, the property is lost. Which might easily be introduced during an otherwise completely innocent refactoring step. (Think of the Random.map2 (,) listGenerator treeGenerator vs. Random.map2 (,) treeGenerator listGenerator example from above.)
I guess my overall stance is: Even disregarding performance, I find the property more potentially hurtful (in context and in the long run) than helpful, and certainly so if it is not elevated to an "official" and documented and tested property.
|
About performance: I haven't benchmarked, and I won't. It's an obvious strict performance improvement (you are mentioning time, but there are also space/allocation costs). So in case there are no semantic reasons to prefer one version over the other, there is no justification for not making the change. So let's talk about stability. If the property is not documented, I'd rather not have it. Invariants that are only true "accidentally" can be dangerous. If not documented, it might be changed any time. By then, people might have developed an expectation (even if the property is not documented, just observed), and built assumptions into their code or development habits that then break, maybe with bad discoverability of what actually has changed and breaks their stuff. Even if the implementation is never changed, the property is very brittle due to its non-compositionality. Say I have a generator for some custom tree type of mine. Why, on earth, is the produced list stable in triple n =
let list = Random.list n (Random.int 0 100)
gen = Random.map3 (,,) list Random.bool list
in Random.generate gen seed |> fst
main =
Html.text <| toString (triple 2) ++ " " ++ toString (triple 5)with results For the same reason, the card or image examples are not really convincing me. Such stuff would happen inside a bigger program. Only under very specific circumstances would you actually see the kind of stability you describe. As soon as you happen to do some non-trivial (dynamic size) random value generation "before" the call to the sequence generator, the property is lost. Which might easily be introduced during an otherwise completely innocent refactoring step. (Think of the I guess my overall stance is: Even disregarding performance, I find the property more potentially hurtful (in context and in the long run) than helpful, and certainly so if it is not elevated to an "official" and documented and tested property. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jvoigtlaender
Mar 14, 2016
Contributor
Oh, BTW, the comparison to take n (randoms seed) in Haskell is apples to oranges, since there is no composable generator type in use there. An example using QuickCheck's Gen monad would be more convincing.
|
Oh, BTW, the comparison to |
jvoigtlaender commentedMar 14, 2016
Note that we are accumulating random values there. So their order does not matter!