Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINT] Use Numpy Random Generator to replace legacy RandomState #4084

Merged
merged 9 commits into from Nov 14, 2023

Conversation

ymzayek
Copy link
Member

@ymzayek ymzayek commented Oct 26, 2023

Changes proposed in this pull request:

  • Update Numpy RandomState to Numpy Generator using numpy.random.default_rng
    • The Generator has a similar API to RandomState with some methods having a different name or different signature. Numpy's API docs have recommendations for what new methods replace ones in RandomState
  • I did not yet update all instances of RandomState in the code but mostly stuck to our tests. The problem comes from some of our functions (e.g. in decomposition, decoding, massunivariate) passing RandomState to scikit learn functions which do not accept the Generator yet so it will take a bit more care to make sure we don't pass these where we call those functions. It could work to just pass a seed as int in some testing cases as scikit learn uses sklearn.utils.check_random_state internally to instantiate a RandomState object.

@github-actions
Copy link
Contributor

👋 @ymzayek Thanks for creating a PR!

Until this PR is ready for review, you can include the [WIP] tag in its title, or leave it as a github draft.

Please make sure it is compliant with our contributing guidelines. In particular, be sure it checks the boxes listed below.

  • PR has an interpretable title.
  • PR links to Github issue with mention Closes #XXXX (see our documentation on PR structure)
  • Code is PEP8-compliant (see our documentation on coding style)
  • Changelog or what's new entry in doc/changes/latest.rst (see our documentation on PR structure)

For new features:

  • There is at least one unit test per new function / class (see our documentation on testing)
  • The new feature is demoed in at least one relevant example.

For bug fixes:

  • There is at least one test that would fail under the original bug conditions.

We will review it as quick as possible, feel free to ping us with questions if needed.

@ymzayek
Copy link
Member Author

ymzayek commented Oct 27, 2023

I updated the codebase where possible to use the new generator. In decomposition primarily this is not possible because for CanICA and DictLearning you can pass a RandomState object to their parameter random_state so this will break user code and will need proper deprecation.

@Remi-Gau
Copy link
Collaborator

I updated the codebase where possible to use the new generator. In decomposition primarily this is not possible because for CanICA and DictLearning you can pass a RandomState object to their parameter random_state so this will break user code and will need proper deprecation.

I have not looked at it in detail but could we not just expand what types are passed to support also more "modern" rng options?

@Remi-Gau
Copy link
Collaborator

reread the top message. ignore what I just wrote.

@ymzayek
Copy link
Member Author

ymzayek commented Oct 27, 2023

Yes I should've been more clear. And it turns out it only happens in the public API that you can pass RandomState in the cases where it is eventually passed to a sklearn function that uses check_random_state internally. There are open issues on scikit learn github about this so I guess we can see how that develops but I have not read through them yet. scikit-learn/scikit-learn#16988 and scikit-learn/scikit-learn#20669

@ymzayek ymzayek marked this pull request as ready for review October 31, 2023 13:33
Copy link
Member

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx.

Copy link

codecov bot commented Nov 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (aa9c914) 91.63% compared to head (0e4f200) 91.56%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4084      +/-   ##
==========================================
- Coverage   91.63%   91.56%   -0.07%     
==========================================
  Files         143      143              
  Lines       16115    16113       -2     
  Branches     3353     3353              
==========================================
- Hits        14767    14754      -13     
- Misses        800      813      +13     
+ Partials      548      546       -2     
Flag Coverage Δ
macos-latest_3.10 91.55% <100.00%> (+0.01%) ⬆️
macos-latest_3.11 91.55% <100.00%> (+0.01%) ⬆️
macos-latest_3.12 91.55% <100.00%> (+0.01%) ⬆️
macos-latest_3.9 91.52% <100.00%> (+0.01%) ⬆️
ubuntu-latest_3.10 91.55% <100.00%> (+0.01%) ⬆️
ubuntu-latest_3.11 91.55% <100.00%> (+0.01%) ⬆️
ubuntu-latest_3.12 91.55% <100.00%> (+0.01%) ⬆️
ubuntu-latest_3.9 91.52% <100.00%> (+0.01%) ⬆️
windows-latest_3.10 91.51% <100.00%> (+0.01%) ⬆️
windows-latest_3.11 91.51% <100.00%> (+0.01%) ⬆️
windows-latest_3.12 91.51% <100.00%> (+0.01%) ⬆️
windows-latest_3.9 91.48% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -979,7 +980,7 @@ def test_decoder_multiclass_error_incorrect_cv(multiclass_data):

def test_decoder_multiclass_warnings(multiclass_data, rng):
X, y, _ = multiclass_data
groups = rng.binomial(2, 0.3, size=len(y))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand this change?

Any reason we cannot use the fixture rng here?

And if we cannot then it should be removed from the arguments of the test as it is not use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Remi-Gau in some test cases the random numbers generated differ enough between the 2 strategies leading to a failure and so they need a different seed. So in cases where I used the function with an explicit seed like this is because the fixture's seed of 42 fails the test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but then we can remove rng from the list of arguments, no?
I may need more coffee but I don't see it being called anywhere else in the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I need more coffee haha

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing it

Copy link
Collaborator

@Remi-Gau Remi-Gau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for one change I don't understand, the rest is good with me.

Thanks for taking of this one @ymzayek

@ymzayek ymzayek merged commit a24bf42 into nilearn:main Nov 14, 2023
29 checks passed
@ymzayek ymzayek deleted the update-rng-fixture branch November 14, 2023 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MAINT] Use np.random.default_rng for rng fixture
3 participants