Skip to content

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Feb 7, 2020

e.g. check_figures_equal tests should typically work regardless of
the random dataset used. If anything this may slightly help catching
brittle tests that only work for certain seed values.

PR Summary

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@tacaswell
Copy link
Member

I'm a little worried about not seeding the random numbers as it could expose some value dependent bugs, but we have no way of reproducing.

@anntzer
Copy link
Contributor Author

anntzer commented Feb 7, 2020

But then at least we can pin the numbers back while investigating.

@pharshalp
Copy link
Contributor

I'm a little worried about not seeding the random numbers as it could expose some value dependent bugs, but we have no way of reproducing.

Would a parameterized test work better to address both a random case and a reproducible case?

@pytest.mark.parametrize('np_rand_seed', [1234, None])
def test_long_path(np_rand_seed):
    np.random.seed(np_rand_seed)

@anntzer
Copy link
Contributor Author

anntzer commented Feb 7, 2020

That feels like overengineering it, imho.

@tacaswell
Copy link
Member

@anntzer fair and knowing it has a possible value dependent bug is better than not knowing it has one.

@pharshalp That is starting to go down the path of re-writing hypothesis ( https://hypothesis.readthedocs.io/en/latest/) and while useful, is not a route I think we want to go down in our test suite.

y = np.random.random(npoints)
x[duplicate] = x[duplicate_of]
y[duplicate] = y[duplicate_of]
xy = np.random.random((npoints, 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://dev.azure.com/matplotlib/matplotlib/_build/results?buildId=8800&view=logs&j=a00bdc14-80a3-51a4-3934-86a314523526&t=1906b723-82f5-57cd-ebf1-a6ef0c620a3f&l=138

It looks like sometimes it picks 7 to drop and sometimes it picks 3 to drop. Should confirm with @ianthomas23 that is is OK behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly the kind of things that I was hoping to catch...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer:
The seed call should stay.

Long answer:
If we are using truly random (i.e. non-seeded) points, the qhull-generated delaunay triangulation algorithm will use just one of the two duplicated points, and we don't know in advance which one it will be. So there are two options:

  1. Keep the seed call.

  2. Remove the seed call and change the assert at the end of the test because the indices of the points used in the triangulation that are returned by qhull, np.unique(triang.triangles), will either be np.delete(np.arange(npoints), duplicate) (the existing code) or np.delete(np.arange(npoints), duplicate_of).

The first option is better for 3 reasons:

  • It is simpler (less code).
  • It is what the code currently does and existing working code shouldn't be changed without good reason.
  • It is far better defensive programming to keep all tests deterministic. There is an exceedingly small but finite chance that a truly random (non-seeded) point will be close enough to the real duplicate points that the triangulation algorithm identifies this as the same point as the duplicate points and the test will fail with a different point index. The failure will happen at some point in the future on an entirely unrelated PR and will be totally unreproducible.

Hence keep the seed call. Then there is no reason to change any other code in the same function in a PR about seed calls.

There is a wider issue here that all tests should be deterministic and hence no seed calls should be removed without the explicit permission of each person who wrote the initial code in each case. Otherwise we are risking a number of extremely unlikely possible failures that are avoided with the existing code. The use of random numbers here is just shorthand for a small number of hard-coded points, we don't actually want to run the test with truly random numbers. If there is fundamental opposition to the use of seed in tests then the random values can be replaced with hard-coded ones. That is easily done for this test, but what about in test_agg that needs 70000 values?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine reverting the change here (which I will do) -- especially you point out to a real possible failure mode -- but actually, for most other cases, I am strongly convinced removing the seed is correct. For eample, test_agg tests that you can write a 70,000-point line without raising an exception. If there is any 70,000 point line which causes an exception, this is clearly a bug on our side. It may be tricky to investigate, so we may patch back to forcing the seed while investigating, but we can't just say "oh, that's just an unlikely case, let's ignore it". Likewise, tests involving check_figures_equal are testing logical equivalences -- two series of calls should result in the same backend-level draw operations, regardless of the input.

np.testing.assert_allclose(cb.get_ticks(), [0.2, 0.4, 0.6, 0.8])
plt.rcParams["_internal.classic_mode"] = False
fig, ax = plt.subplots()
pc = ax.pcolormesh([[.05, .95]])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is nicer (and more robust) without using random data...

e.g. check_figures_equal tests should typically work regardless of
the random dataset used.  If anything this may slightly help catching
brittle tests that only work for certain seed values.
@timhoffm
Copy link
Member

timhoffm commented May 8, 2020

I don't think we should randomize regular test runs to try and find bugs.

If they hit, it may not be the time you want to investigate. More generally, getting tests to pass is a significant step for contribution. It is frustrating for new but also experienced contributors if your PR tests fail and you don't know why.

If we really care about this, we should collect the relevant tests and run them locally without seed. This is about 30 tests, so you can run them quite often in a reasonable amount of time.

@anntzer
Copy link
Contributor Author

anntzer commented May 8, 2020

ok, extracted one of the changes to #17361 (as I think that one is certainly more correct without the randomness) and closing the rest.

@anntzer anntzer closed this May 8, 2020
@anntzer anntzer deleted the unseed branch May 8, 2020 09:31
@QuLogic QuLogic modified the milestones: v3.3.0, unassigned Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants