New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Replace numpy with pandas in data loaders #2829
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2829 +/- ##
==========================================
- Coverage 90.45% 90.44% -0.02%
==========================================
Files 122 122
Lines 14562 14593 +31
Branches 2973 2982 +9
==========================================
+ Hits 13172 13198 +26
Misses 824 824
- Partials 566 571 +5
Continue to review full report at Codecov.
|
Thanks @achamma723 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thx !
Hi @achamma723 |
Hello @NicolasGensollen , Is there an error with |
nilearn/datasets/atlas.py
Outdated
@@ -237,7 +238,7 @@ def fetch_atlas_destrieux_2009(lateralized=True, data_dir=None, url=None, | |||
files_ = _fetch_files(data_dir, files, resume=resume, | |||
verbose=verbose) | |||
|
|||
params = dict(maps=files_[1], labels=np.recfromcsv(files_[0])) | |||
params = dict(maps=files_[1], labels=pd.read_csv(files_[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need to be something like
params = dict(maps=files_[1], labels=pd.read_csv(files_[0], index_col=0).to_records())
there isn't an error with the test; the PR changes the output of the function: it is now a pd DataFrame with 2 columns (one repeating the index) rather than a 1d np recarray |
@@ -0,0 +1,6 @@ | |||
{ | |||
"cells": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you remove the checkpoints and the notebook? thanks!
@NicolasGensollen I couldn't tell from the issue description: did we just want to start using pandas internally, or do we want to start returning DataFrames or Series instead of recarrays wherever they were used? In the latter case we need a deprecation cycle |
@jeromedockes there is already a deprecation cycle running, initiated with #2663, and supposed to end with release 0.9. |
Also linking related #2655 |
Ok thanks! in that case we need to track down those that still return recarrays, such as |
@achamma723 @jeromedockes I might be missing something, but I don't understand why we would sometimes return pandas dataframes and sometimes rec arrays... 🤔 |
yes, IIUC it will always be dataframes or series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achamma723 Could you rebase or merge master to get the doc changes of #3083 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for merging master @achamma723
In the last meeting, we decided to extend the deprecation cycle for rec arrays as we realized the current warning might have been a little light for such a change.
The idea would be to add an optional parameter to affected fetchers (return_dataframe
or something alike) defaulting to the current behavior of returning rec arrays for backward compatibility.
Let me know if you want to take care of this, otherwise I can commit to your branch.
so many fetchers may have a part that roughly looks like: if legacy_format:
result["confounds"] = result["confounds"].to_records()
... |
* Add failing test * get rid of scaling_axis attribute * deprecate scaling_axis * [circle full] add whatsnew entry * Fix
) * Including Hierarchical Kmeans in doc and example * Adding hierarchical k_means and make it available in parcellations * test hierarchical_kmeans and test Parcellation using it * fix incompatible apis of fit() and transform() methods * Improve docstring and example layout * "Upgrade to get_data function" * "changing nose function to pytest function as the rest of the library evolved" * "solving sklearn.utils.testing deprecation" * "improve flake8" * "flake8, missing import, adding test exception" * Apply suggestions from code review Including B.T. suggestions Co-authored-by: bthirion <bertrand.thirion@inria.fr> * "adding tests for 'scaling' and warning ; fixed scaling bug" * "test_refactoring" * "flake8 fixes" * "clarify example, fix PEP8" * Apply suggestions from code review Co-authored-by: bthirion <bertrand.thirion@inria.fr> * "apply suggestions from N.G. review" * Apply suggestions from code review Co-authored-by: Gensollen <nicolas.gensollen@gmail.com> * "apply B.T refactoring" * "flake8" * "flake8" * "codespell" * "enforcing final number of clusters is the one asked by the user" * "solving bugging test_case" * "solving edge case" * "missing float casting" * "check that adjusted clusters are int" * "flake8 fix" * Update nilearn/regions/hierarchical_kmeans_clustering.py Co-authored-by: bthirion <bertrand.thirion@inria.fr> * Update examples/03_connectivity/plot_data_driven_parcellations.py Co-authored-by: bthirion <bertrand.thirion@inria.fr> * Update examples/03_connectivity/plot_data_driven_parcellations.py Co-authored-by: bthirion <bertrand.thirion@inria.fr> * "fix code review comments" * 'fix bug introduced by review suggestion' * Update nilearn/regions/hierarchical_kmeans_clustering.py Co-authored-by: Gensollen <nicolas.gensollen@gmail.com> * Update nilearn/regions/hierarchical_kmeans_clustering.py Co-authored-by: Gensollen <nicolas.gensollen@gmail.com> * Update nilearn/regions/hierarchical_kmeans_clustering.py Co-authored-by: Gensollen <nicolas.gensollen@gmail.com> * Update nilearn/regions/hierarchical_kmeans_clustering.py Co-authored-by: Gensollen <nicolas.gensollen@gmail.com> * "apply code review suggestions" * "add whats_new" * fixing docstring * addressing comments of jdockes * trying to fix pep8 violations * simplifying hierarchical k-means avoiding useless checks * pep8 * Addressing other comments by jdockes Co-authored-by: BAZEILLE Thomas <thomas.bazeille@inria.fr> Co-authored-by: bthirion <bertrand.thirion@inria.fr> Co-authored-by: Gensollen <nicolas.gensollen@gmail.com>
* added deploy key * clean deploy key * clean deploy key * clean cricreci.yml * clean circleci.yml * clean circleci.yml * clean circleci.yml * fixed Hommel function + added test * pep8 fixes * pep8 fixes * pep8 fixes * correct use of assert * revert unrelated change * minor fixed on hommel value computation * addedwhatsnew entry Co-authored-by: Gensollen <nicolas.gensollen@gmail.com>
* Add gh_substitutions * Add links to contributor profiles (unfinished) * Use contributor links in AUTHORS * Break and refactor changelog (unfinished...) * [circle full] Fix typo and run full build in strict mode. * Add more links * iter * [circle full] request full build * Add more links again * [circle full] Iter. * Update AUTHORS.rst * gh_role --> _gh_role * [circle full] Add fundings. * [circle full] Fix wrong ref. * [circle full] update niconnect link * [circle full] fix ref * fix refs: input_data -> maskers * [circle full] request full build * [circle full] fix old ref to fetch_cobre * [circle full] fix broken ref * [circle full] Fix error in rebase
Thanks @achamma723 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @achamma723 and @NicolasGensollen
@NicolasGensollen if you had another look at the fetchers and all the changes in column names have been fixed I think we're good!
my only remaining question is about the whatsnew, not sure if there was an issue with the rebase due to #3049
* bump dependencies * [circle full] Add whatsnew.
requirements-min.txt
Outdated
@@ -1,8 +1,8 @@ | |||
numpy==1.16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this should not be here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achamma723 could you rebase to clean this? Thanks!
besides the requirement thing, this looks OK. |
* Add tests min req with matplotlib * remove useless variable * [circle full] update README
Thanks @achamma723 but it looks like there are still unrelated changes. I'm not sure how you rebase, but you should be able to drop the unrelated commits. |
Hello @NicolasGensollen , I'm pulling from main, rebasing and then pushing again. Should I drop something unrelated? |
Hmm, it's weird that you get some commits from main in here. You can try to manually drop them when rebasing. I think the following commits shouldn't be there: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thx!
Thanks everyone! Merging... |
This PR is fixing the issue of replacing
Numpy
withPandas
in data loaders:Closes #2621