Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested linear data structures (list, array) cause exponential size blowup #309

Closed
eiriktsarpalis opened this issue Sep 20, 2016 · 6 comments

Comments

@eiriktsarpalis
Copy link
Contributor

eiriktsarpalis commented Sep 20, 2016

Consider the code:

open FsCheck

Check.Quick(fun (t: int list list list list list) -> true)

Running the above never completes on my machine, I had to kill the process after it ended up allocating > 3 gigs of memory. Is this reasonable behaviour to expect or have I stumbled on some kind of performance bug?

Even though the type I gave is capable of having monstrous instances, my feeling is I should still be able to generate 100 instances without utterly exploding in size. Is this a reasonable expectation?

@moodmosaic
Copy link
Contributor

Can you try setting a smaller size?


This works:

open FsCheck

let gn = Arb.from<int list list list list list>.Generator
gn |> Gen.eval 10 (Random.newSeed())

val gn : Gen<int list list list list list>
val it : int list list list list list =
  [[[[[-3; 6; -5]; [-9; 0; -7; 3; -8; 1]];
     [[8; -3; -10; 0; 10; -2]; [2; -5]; [0; 10; 3; -8; 2]; [-6];
      [-8; 2; -5; 5]; []; [5; -6; 4]; [-1; -8; 2; -9; 0; -7; 3; -8; 6; -6];
      [-3; 7]]; [[-1; -8]];
     [[1; -10; 4; 9; 2; -9; 5; 10]; []]; [[]; ...]; ...]; ...]; ...]
> 

Though, this doesn't and reproduces the reported behavior:

let gn = Arb.from<int list list list list list>.Generator
gn |> Gen.eval 1000 (Random.newSeed())

// Never returns.

@moodmosaic
Copy link
Contributor

@kurtschelfthout could that be a bug, in the sense that size is picked up incorrectly in some cases?

@kurtschelfthout
Copy link
Member

kurtschelfthout commented Sep 21, 2016

I'm not sure what to call it (somewhere between feature and bug...), but I think I know what's going on.

Size in relation to lists (and probably any linear data structre which is based on it) works like this. If you generate a list of size n, that means the length of the list can be up to n elements. So far so good. The problem occurs because this size is not "divided" among the elements, but each element is again generated with the same size bound (or maybe size-1, can't remember, in any case not significantly lower).

So you generate a list with size 100, with each element being a list of size 100, and each of those again being a list of size 100 and so on. Exponential blowup.

A solution - perhaps - would be to instead divide the size among the elements somehow. So generate a list of size 100, then since it has 100 elements each of those has size 1 (or some other size that is less than 100). You see the problem: in your case, you're only going to get empty lists at the leaves.

So it's a bit tricky to tackle generically; in this case how the size is divided among constituents parts determines what the generator does.

Another way to look at it is that you're really generating a 5-dimensional structure (like a 5D array). For that kind of structure, you probably want the number of elements to be proportional to size, whereas now it's proportional to size^5. See the generator for Array2D for example.

If you're interested, this issue is also described in https://gupea.ub.gu.se/handle/2077/22087

@eiriktsarpalis
Copy link
Contributor Author

@moodmosaic @kurtschelfthout thanks for the info, it all makes sense.

For some more context, here's what I've been trying to do:
https://github.com/eiriktsarpalis/TypeShape/blob/master/samples/testing.fsx
The tested types are auto-generated by FsCheck and passed to a nested checker.

@kurtschelfthout
Copy link
Member

kurtschelfthout commented Sep 22, 2016

That's a nice application! Thanks for mentioning it.

I'm going to keep this issue around to raise awareness and pre-empt questions. Perhaps someone gets a brilliant idea to tackle it :) For now I must admit I'm not sure how to tackle generically (ad hoc workaround is writing a 'a list list list etc generator yourself).

We could perhaps also add a list generator that allows you to specify how the size gets divided among elements.

@kurtschelfthout kurtschelfthout changed the title Performance issues when generating values for complex types Nested linear data structures (list, array) cause exponential size blowup Sep 22, 2016
@kurtschelfthout
Copy link
Member

Released in 2.10.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants