Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update random and asserts in test guidelines #18777

Merged
merged 5 commits into from Apr 21, 2021

Conversation

tupui
Copy link
Contributor

@tupui tupui commented Apr 14, 2021

I propose to update the testing guide.

  • Change np.random.seed to using `np.random. default_rng.
  • Change assert_ to plain assert as the statement about python stripping them out is not relevant in this context.
  • Hide sections about assert_almost_equal and alike as better alternative are recommended.

There was a discussion over at SciPy which suggested that this guide needed updates.
xref scipy/scipy#8583

@rkern
Copy link
Member

rkern commented Apr 14, 2021

Using np.random and np.random.seed() is still OK for unit tests. In fact, depending on the use case, it still might be preferred due to the stability requirements we have kept on RandomState. default_rng() is actually probably not recommended, except for feeding into code under test that uses Generators. But for pseudorandomly generating data to pass to the code under test, np.random functions or RandomState are typically preferred.

@tupui
Copy link
Contributor Author

tupui commented Apr 14, 2021

Using np.random and np.random.seed() is still OK for unit tests. In fact, depending on the use case, it still might be preferred due to the stability requirements we have kept on RandomState. default_rng() is actually probably not recommended, except for feeding into code under test that uses Generators. But for pseudorandomly generating data to pass to the code under test, np.random functions or RandomState are typically preferred.

Oh sorry yes, I got confused with the other PR on SciPy which is about the examples and not the tests. I will remove this. Is the rest fine?

@tupui
Copy link
Contributor Author

tupui commented Apr 14, 2021

Apparently there is an issue with using :hidden:, if this is not possible I would suggest to add a dedicated section to preface that the other methods are preferred?

@tupui
Copy link
Contributor Author

tupui commented Apr 14, 2021

I am not sure about the global state though. Shouldn't it advertise at least for rng = np.random.RandomState? I am not a fan of global state settings. I reverted the original. Let me know if I should change anything else.

@rkern
Copy link
Member

rkern commented Apr 14, 2021

In the confines of a unit test, the global doesn't really present a problem. It even makes it a smidge easier to do the seeding in a setUp() and then just use np.random in each test. I'm not particularly zealous about recommending people away from it in this context (and I'm super-zealous about it in most contexts). If you want to document the pattern using self.rng = RandomState(seed) as a primary recommendation, but then also document that np.random.seed() will work for tests, that's fine.

Comment on lines +35 to +43
assert_almost_equal
assert_approx_equal
assert_array_almost_equal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are not recommended, can you add a ..note in the docstring for each method that explains why, and what the alternative is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a note in these functions pointing out to assert_allclose and alike. That's why I am listing these specifically.

Are you talking about something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am not sure how to hide these. Because if I am not mistaking they are only defined here so I would need to do the autosummary. But then I don't know how to hide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was not aware that we already had those notes.

Perhaps instead of hiding these you should just put them under a different heading, along with a remark saying that they are not recommended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sure. And also I am still bugged with how to do it otherwise 😅. So this? Asserts (not recommanded)

@tupui tupui force-pushed the test_guidelines_random_asserts branch from a86919e to 2cbaaba Compare April 15, 2021 09:38
@eric-wieser
Copy link
Member

It even makes it a smidge easier to do the seeding in a setUp() and then just use np.random in each test.

Doesn't this result in non-deterministic tests if pytest decides to run tests in multiple threads, such as with pytest-parallel?

@tupui
Copy link
Contributor Author

tupui commented Apr 15, 2021

It even makes it a smidge easier to do the seeding in a setUp() and then just use np.random in each test.

Doesn't this result in non-deterministic tests if pytest decides to run tests in multiple threads, such as with pytest-parallel?

My 2 cents. For testing most people just need to draw number from a random uniform distribution. At most from a normal. Then instead of fighting with seeds and complicated algo, I would recommend to use simple deterministic sequences such as Halton. These are fixed and if someone want a different "seed", they can just advance the sequence. This way you are always guaranty to have the same exact sequence.

@tupui tupui force-pushed the test_guidelines_random_asserts branch from 2cbaaba to f47f64e Compare April 15, 2021 10:00
@tupui
Copy link
Contributor Author

tupui commented Apr 15, 2021

@rkern @eric-wieser I updated the PR with your suggestions. As for the seed, I could add the following if wanted:

Instead of setting the seed globally, it is recommanded to use NumPy's API::

    rng = np.random.RandomState(some_number)
    random_numbers = rng.random(...)

@mattip
Copy link
Member

mattip commented Apr 15, 2021

In the confines of a unit test, the global doesn't really present a problem. It even makes it a smidge easier to do the seeding in a setUp() and then just use np.random in each test. I'm not particularly zealous about recommending people away from it in this context (and I'm super-zealous about it in most contexts).

@rkern I am curious about this. Why would you be hesitant to recommend using an rng in tests to get people used to the more correct API? I think

  • using them in tests should be convenient and natural. If not we have a UX problem
  • changing the order tests are run can change results, whether by running only specific tests or by using multithreaded frameworks. These types of errors are quite hard to debug.
  • the sooner we transition people off the older APIs the sooner we can deprecate them, reducing the NumPy API surface.

@tupui
Copy link
Contributor Author

tupui commented Apr 15, 2021

  • using them in tests should be convenient and natural. If not we have a UX problem

I agree. If we must stick with RandomState, as the doc suggest, I have another proposal. We could add a parameter to default_rng to specify that we want to ensure future reproducibility. Ex. default_rng(mode='test'). This way, it could call behind the seen RandomState or anything else. But we would have a single entry point.

  • the sooner we transition people off the older APIs the sooner we can deprecate them, reducing the NumPy API surface.

Can't agree more as currently we have 2 cases to handle and it leads to confusions.

@rkern
Copy link
Member

rkern commented Apr 15, 2021

Why would you be hesitant to recommend using an rng in tests to get people used to the more correct API?

I'm not. I said to recommend it as the primary method. I definitely would prefer that people do that in new code; their tests will be a little bit better for it. I'm saying not to disrecommend np.random.seed(). Using np.random.seed() in tests is still an official use case, and I don't want our documentation to chide people with mounds of existing code into doing a lot of churn. They're not getting any benefit from Generator and might not be able (or need) to use multithreading for their test suite. Churn is a quick way to generate resentment against the changes that we've made.

By all means, document RandomState(seed) as the primary way to do it, and document the limitation that using np.random functions prevents speeding up your test suite via multithreading. Provide the carrots to move away from np.random.seed().

@rkern
Copy link
Member

rkern commented Apr 15, 2021

changing the order tests are run can change results

Not really. People do know enough to set the seed in the setUp() method or at the top of each function that is using np.random, not relying on one call at the beginning to cross multiple test methods.

@tupui
Copy link
Contributor Author

tupui commented Apr 15, 2021

By all means, document RandomState(seed) as the primary way to do it, and document the limitation that using np.random functions prevents speeding up your test suite via multithreading. Provide the carrots to move away from np.random.seed().

My reading of the following is that I should add the text I proposed. This is ok?

Instead of setting the seed globally, it is recommanded to use NumPy's API::

    rng = np.random.RandomState(some_number)
    random_numbers = rng.random(...)

@mattip
Copy link
Member

mattip commented Apr 15, 2021

By all means, document RandomState(seed) as the primary way to do it,

If they already are getting a rng, why not recommend rng = np.random.default_rng(seed) ?

@rkern
Copy link
Member

rkern commented Apr 15, 2021

Relevant NEP 19 section.

@mattip
Copy link
Member

mattip commented Apr 16, 2021

@rkern would it be acceptable to recommend

rng = np.random.default_rng(seed)
random_numbers = rng.random(...)

@seberg
Copy link
Member

seberg commented Apr 16, 2021

Sorry didn't notice this one. The changes in doc/TESTS.rst.txt were now already made in gh-18787

@tupui
Copy link
Contributor Author

tupui commented Apr 16, 2021

Sorry didn't notice this one. The changes in doc/TESTS.rst.txt were now already made in gh-18787

Ah no problem. What matters is that this is in 😃

cc @larsoner then.

@larsoner
Copy link
Contributor

Ahh sorry didn't see you already had a PR open for some related changes @tupui . I think my PR had a bit more explanatory text and content related to pytest so if you rebase and just keep my changes you should be good I think.

@tupui
Copy link
Contributor Author

tupui commented Apr 16, 2021

Ahh sorry didn't see you already had a PR open for some related changes @tupui . I think my PR had a bit more explanatory text and content related to pytest so if you rebase and just keep my changes you should be good I think.

No worries @larsoner 😄 , I merge master with your changes already.

@rkern
Copy link
Member

rkern commented Apr 16, 2021

@mattip No. Use RandomState.

Co-authored-by: Matthias Bussonnier <bussonniermatthias@gmail.com>
@tupui
Copy link
Contributor Author

tupui commented Apr 17, 2021

Anything else? I believe this is ready.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert-related changed LGTM.

+1 to all @rkern said about unit testing. The current version of this PR doesn't include and random changes, which is right.

@tupui
Copy link
Contributor Author

tupui commented Apr 21, 2021

@mattip @seberg is there anything I should do here?

@charris charris merged commit ab24563 into numpy:main Apr 21, 2021
@charris
Copy link
Member

charris commented Apr 21, 2021

Thanks @tupui , I was waiting to see if someone else wanted to comment.

@tupui tupui deleted the test_guidelines_random_asserts branch April 21, 2021 13:42
@tupui
Copy link
Contributor Author

tupui commented Apr 21, 2021

Thanks @tupui , I was waiting to see if someone else wanted to comment.

Thanks, no problem 😃 My first commits in NumPy, hopefully not the last ones!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants