Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative approaches to prop_flat_map due to shrinking behavior? #181

Open
Andlon opened this issue Mar 1, 2020 · 4 comments
Open

Alternative approaches to prop_flat_map due to shrinking behavior? #181

Andlon opened this issue Mar 1, 2020 · 4 comments

Comments

@Andlon
Copy link
Contributor

Andlon commented Mar 1, 2020

I'm using proptest to generate random matrices (dense and sparse). I do so as follows:

  1. Use a strategy of (rows, cols).
  2. prop_flat_map on this strategy and generate a vector of rows * cols number of items.

Here's a concrete example:

pub fn dense_matrix_strategy<T, S>(
    rows: impl Strategy<Value = usize>,
    cols: impl Strategy<Value = usize>,
    strategy: S,
) -> impl Strategy<Value = MockDenseMatrix<T>>
where
    T: Debug,
    S: Clone + Strategy<Value = T>,
{
    (rows, cols).prop_flat_map(move |(r, c)| {
        proptest::collection::vec(strategy.clone(), r * c)
            .prop_map(move |data| MockDenseMatrix::from_row_major(r, c, data))
    })
}

This is simple and works very well. However, if I understand correctly, the shrinking procedure for flat_map prefers to try to shrink the "inner" strategy (the set of elements) before it tries to shrink the "outer" strategy (rows/cols). This is problematic, because often proptest never tries to reduce the dimension of the matrices, instead favoring trying to reduce the size of the individual elements. If I increase PROPTEST_MAX_SHRINK_ITERS, it will typically manage to reduce the size as well, but this obviously takes longer to run and I guess there are no guarantees.

I have so far been able to think of a good way around this. I suppose in principle, one could generate a large number of elements first, flat map on this and take only rows * cols items from this set, but this seems highly inefficient and you'd need to set a maximum number of elements and it would require some additional care in choosing the rows and cols properly. Does anyone have an idea for how I can approach this problem differently? Any help would be greatly appreciated!

@Andlon
Copy link
Contributor Author

Andlon commented Mar 1, 2020

Upon further reflection, it seems to me that these kind of data structures warrant custom ValueTree implementations. In particular, we probably want to try to shrink by selecting submatrices (subject to provided constraints, perhaps), which is something that we'll probably have to code ourselves rather than rely on combinators. Am I right in thinking so?

@Centril
Copy link
Collaborator

Centril commented Mar 1, 2020

Alternatively, you could try to generate from a triple (strategy, row, cols) at the same time, where strategy is within a range of min/max number of elements that would be satisfactory, and then you apportion out the number of elements you got to row * cols. When the number of elements are too many or too few as compared to row * cols, you can either: a) adjust row * cols to fit the number of elements, b) throw away the sample using filtering -- I would suggest attempting a) first as it will avoid flat mapping and filtering entirely. It's a good idea to try different approaches and see what works best for your use case.

Upon further reflection, it seems to me that these kind of data structures warrant custom ValueTree implementations. In particular, we probably want to try to shrink by selecting submatrices (subject to provided constraints, perhaps), which is something that we'll probably have to code ourselves rather than rely on combinators. Am I right in thinking so?

This might give you the best results, but it would seem likely that there is some new combinator that you could derive for this that works for matrices in general. One could then e.g. build up an extended set of combinator for more advanced data structures as a library (though proptest is fairly large already, so it's probably best to have this as separate crate).

@Andlon
Copy link
Contributor Author

Andlon commented Mar 2, 2020

@Centril: Thank you very much for your input! I suppose your first option is immediately the most pragmatic, I can maybe try that out. Though it does have some issues: If for examples rows and cols are e.g. 5 and 5, I would somehow need to make sure I have enough elements to begin with. And I'd prefer having to separately specify min/max dimensions, and rather take a strategy. Actually, it should probably rather take a (row, col) tuple strategy instead of separate strategies. I'll think about it.

This might give you the best results, but it would seem likely that there is some new combinator that you could derive for this that works for matrices in general. One could then e.g. build up an extended set of combinator for more advanced data structures as a library (though proptest is fairly large already, so it's probably best to have this as separate crate).

Do you already have an inkling on how such a combinator might look like? Basically like flat map, but one that prefers to shrink its "outer" layer first? I am not sure how the current flat map shrinker works, but how does one obtain the shrunk values for the inner layer if one shrinks the outer layer first? I suppose I'll find the answer if I dive into the code.

Thinking more about it, it does however seem like some kind of generalization of the vec shrinker might be the best fit. Although I haven't looked into the exact details of how the vec shrinker works, I would expect it to perform something like a binary search, taking various subselections of the original vec in a binary-search-like fashion. I suppose one could generalize this to a quadtree/octree/hyper-octree-like approach for multi-dimensional arrays, in which one would take multi-dimensional subarrays while moving through a tree of such possible selections. It does seem rather complicated to accomplish in the general case though, and it quickly suffers from the curse of dimensionality (i.e. combinatorial explosion in the number of dimensions). I guess at that point, writing a custom one for matrices might be a more pragmatic option...

@Centril
Copy link
Collaborator

Centril commented Mar 2, 2020

I don't have the code for the flat map shrinker in my mental cache right now so I would recommend looking at the code. :) Studying the code for vectors and flat mapping does seem sensible though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants