-
-
Notifications
You must be signed in to change notification settings - Fork 8k
Remove np.random.seed from tests that don't need it. #16438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm a little worried about not seeding the random numbers as it could expose some value dependent bugs, but we have no way of reproducing. |
But then at least we can pin the numbers back while investigating. |
Would a parameterized test work better to address both a random case and a reproducible case? @pytest.mark.parametrize('np_rand_seed', [1234, None])
def test_long_path(np_rand_seed):
np.random.seed(np_rand_seed) |
That feels like overengineering it, imho. |
@anntzer fair and knowing it has a possible value dependent bug is better than not knowing it has one. @pharshalp That is starting to go down the path of re-writing hypothesis ( https://hypothesis.readthedocs.io/en/latest/) and while useful, is not a route I think we want to go down in our test suite. |
y = np.random.random(npoints) | ||
x[duplicate] = x[duplicate_of] | ||
y[duplicate] = y[duplicate_of] | ||
xy = np.random.random((npoints, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like sometimes it picks 7 to drop and sometimes it picks 3 to drop. Should confirm with @ianthomas23 that is is OK behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly the kind of things that I was hoping to catch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer:
The seed
call should stay.
Long answer:
If we are using truly random (i.e. non-seeded) points, the qhull-generated delaunay triangulation algorithm will use just one of the two duplicated points, and we don't know in advance which one it will be. So there are two options:
-
Keep the
seed
call. -
Remove the
seed
call and change theassert
at the end of the test because the indices of the points used in the triangulation that are returned by qhull,np.unique(triang.triangles)
, will either benp.delete(np.arange(npoints), duplicate)
(the existing code) ornp.delete(np.arange(npoints), duplicate_of)
.
The first option is better for 3 reasons:
- It is simpler (less code).
- It is what the code currently does and existing working code shouldn't be changed without good reason.
- It is far better defensive programming to keep all tests deterministic. There is an exceedingly small but finite chance that a truly random (non-seeded) point will be close enough to the real duplicate points that the triangulation algorithm identifies this as the same point as the duplicate points and the test will fail with a different point index. The failure will happen at some point in the future on an entirely unrelated PR and will be totally unreproducible.
Hence keep the seed
call. Then there is no reason to change any other code in the same function in a PR about seed
calls.
There is a wider issue here that all tests should be deterministic and hence no seed
calls should be removed without the explicit permission of each person who wrote the initial code in each case. Otherwise we are risking a number of extremely unlikely possible failures that are avoided with the existing code. The use of random numbers here is just shorthand for a small number of hard-coded points, we don't actually want to run the test with truly random numbers. If there is fundamental opposition to the use of seed
in tests then the random values can be replaced with hard-coded ones. That is easily done for this test, but what about in test_agg
that needs 70000 values?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine reverting the change here (which I will do) -- especially you point out to a real possible failure mode -- but actually, for most other cases, I am strongly convinced removing the seed is correct. For eample, test_agg tests that you can write a 70,000-point line without raising an exception. If there is any 70,000 point line which causes an exception, this is clearly a bug on our side. It may be tricky to investigate, so we may patch back to forcing the seed while investigating, but we can't just say "oh, that's just an unlikely case, let's ignore it". Likewise, tests involving check_figures_equal are testing logical equivalences -- two series of calls should result in the same backend-level draw operations, regardless of the input.
np.testing.assert_allclose(cb.get_ticks(), [0.2, 0.4, 0.6, 0.8]) | ||
plt.rcParams["_internal.classic_mode"] = False | ||
fig, ax = plt.subplots() | ||
pc = ax.pcolormesh([[.05, .95]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is nicer (and more robust) without using random data...
e.g. check_figures_equal tests should typically work regardless of the random dataset used. If anything this may slightly help catching brittle tests that only work for certain seed values.
I don't think we should randomize regular test runs to try and find bugs. If they hit, it may not be the time you want to investigate. More generally, getting tests to pass is a significant step for contribution. It is frustrating for new but also experienced contributors if your PR tests fail and you don't know why. If we really care about this, we should collect the relevant tests and run them locally without seed. This is about 30 tests, so you can run them quite often in a reasonable amount of time. |
ok, extracted one of the changes to #17361 (as I think that one is certainly more correct without the randomness) and closing the rest. |
e.g. check_figures_equal tests should typically work regardless of
the random dataset used. If anything this may slightly help catching
brittle tests that only work for certain seed values.
PR Summary
PR Checklist