Alternative approaches to `prop_flat_map` due to shrinking behavior? #181

Andlon · 2020-03-01T09:46:26Z

I'm using proptest to generate random matrices (dense and sparse). I do so as follows:

Use a strategy of (rows, cols).
prop_flat_map on this strategy and generate a vector of rows * cols number of items.

Here's a concrete example:

pub fn dense_matrix_strategy<T, S>(
    rows: impl Strategy<Value = usize>,
    cols: impl Strategy<Value = usize>,
    strategy: S,
) -> impl Strategy<Value = MockDenseMatrix<T>>
where
    T: Debug,
    S: Clone + Strategy<Value = T>,
{
    (rows, cols).prop_flat_map(move |(r, c)| {
        proptest::collection::vec(strategy.clone(), r * c)
            .prop_map(move |data| MockDenseMatrix::from_row_major(r, c, data))
    })
}

This is simple and works very well. However, if I understand correctly, the shrinking procedure for flat_map prefers to try to shrink the "inner" strategy (the set of elements) before it tries to shrink the "outer" strategy (rows/cols). This is problematic, because often proptest never tries to reduce the dimension of the matrices, instead favoring trying to reduce the size of the individual elements. If I increase PROPTEST_MAX_SHRINK_ITERS, it will typically manage to reduce the size as well, but this obviously takes longer to run and I guess there are no guarantees.

I have so far been able to think of a good way around this. I suppose in principle, one could generate a large number of elements first, flat map on this and take only rows * cols items from this set, but this seems highly inefficient and you'd need to set a maximum number of elements and it would require some additional care in choosing the rows and cols properly. Does anyone have an idea for how I can approach this problem differently? Any help would be greatly appreciated!

The text was updated successfully, but these errors were encountered:

Andlon · 2020-03-01T11:20:34Z

Upon further reflection, it seems to me that these kind of data structures warrant custom ValueTree implementations. In particular, we probably want to try to shrink by selecting submatrices (subject to provided constraints, perhaps), which is something that we'll probably have to code ourselves rather than rely on combinators. Am I right in thinking so?

Centril · 2020-03-01T23:38:32Z

Alternatively, you could try to generate from a triple (strategy, row, cols) at the same time, where strategy is within a range of min/max number of elements that would be satisfactory, and then you apportion out the number of elements you got to row * cols. When the number of elements are too many or too few as compared to row * cols, you can either: a) adjust row * cols to fit the number of elements, b) throw away the sample using filtering -- I would suggest attempting a) first as it will avoid flat mapping and filtering entirely. It's a good idea to try different approaches and see what works best for your use case.

Upon further reflection, it seems to me that these kind of data structures warrant custom ValueTree implementations. In particular, we probably want to try to shrink by selecting submatrices (subject to provided constraints, perhaps), which is something that we'll probably have to code ourselves rather than rely on combinators. Am I right in thinking so?

This might give you the best results, but it would seem likely that there is some new combinator that you could derive for this that works for matrices in general. One could then e.g. build up an extended set of combinator for more advanced data structures as a library (though proptest is fairly large already, so it's probably best to have this as separate crate).

Andlon · 2020-03-02T07:54:10Z

@Centril: Thank you very much for your input! I suppose your first option is immediately the most pragmatic, I can maybe try that out. Though it does have some issues: If for examples rows and cols are e.g. 5 and 5, I would somehow need to make sure I have enough elements to begin with. And I'd prefer having to separately specify min/max dimensions, and rather take a strategy. Actually, it should probably rather take a (row, col) tuple strategy instead of separate strategies. I'll think about it.

This might give you the best results, but it would seem likely that there is some new combinator that you could derive for this that works for matrices in general. One could then e.g. build up an extended set of combinator for more advanced data structures as a library (though proptest is fairly large already, so it's probably best to have this as separate crate).

Do you already have an inkling on how such a combinator might look like? Basically like flat map, but one that prefers to shrink its "outer" layer first? I am not sure how the current flat map shrinker works, but how does one obtain the shrunk values for the inner layer if one shrinks the outer layer first? I suppose I'll find the answer if I dive into the code.

Thinking more about it, it does however seem like some kind of generalization of the vec shrinker might be the best fit. Although I haven't looked into the exact details of how the vec shrinker works, I would expect it to perform something like a binary search, taking various subselections of the original vec in a binary-search-like fashion. I suppose one could generalize this to a quadtree/octree/hyper-octree-like approach for multi-dimensional arrays, in which one would take multi-dimensional subarrays while moving through a tree of such possible selections. It does seem rather complicated to accomplish in the general case though, and it quickly suffers from the curse of dimensionality (i.e. combinatorial explosion in the number of dimensions). I guess at that point, writing a custom one for matrices might be a more pragmatic option...

Centril · 2020-03-02T10:45:35Z

I don't have the code for the flat map shrinker in my mental cache right now so I would recommend looking at the code. :) Studying the code for vectors and flat mapping does seem sensible though.

nzeh mentioned this issue Jul 28, 2022

Make tests pass and improve shrinking for prop_flat_map #269

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative approaches to `prop_flat_map` due to shrinking behavior? #181

Alternative approaches to `prop_flat_map` due to shrinking behavior? #181

Andlon commented Mar 1, 2020

Andlon commented Mar 1, 2020

Centril commented Mar 1, 2020 •

edited

Loading

Andlon commented Mar 2, 2020 •

edited

Loading

Centril commented Mar 2, 2020

Alternative approaches to prop_flat_map due to shrinking behavior? #181

Alternative approaches to prop_flat_map due to shrinking behavior? #181

Comments

Andlon commented Mar 1, 2020

Andlon commented Mar 1, 2020

Centril commented Mar 1, 2020 • edited Loading

Andlon commented Mar 2, 2020 • edited Loading

Centril commented Mar 2, 2020

Alternative approaches to `prop_flat_map` due to shrinking behavior? #181

Alternative approaches to `prop_flat_map` due to shrinking behavior? #181

Centril commented Mar 1, 2020 •

edited

Loading

Andlon commented Mar 2, 2020 •

edited

Loading