Skip to content

ENH introduce SKRUB_DATA_DIRECTORY envar to control the data directory#1215

Merged
jeromedockes merged 11 commits into
skrub-data:mainfrom
thomass-dev:control-skrub-data-home-using-envvar
Feb 4, 2025
Merged

ENH introduce SKRUB_DATA_DIRECTORY envar to control the data directory#1215
jeromedockes merged 11 commits into
skrub-data:mainfrom
thomass-dev:control-skrub-data-home-using-envvar

Conversation

@thomass-dev

@thomass-dev thomass-dev commented Jan 14, 2025

Copy link
Copy Markdown
Contributor

Closes #1216.

Adding a new way to control the location of the data directory, using envar.

It can be useful when you want to set the data cache directory from outside the code, especially in CI.
In addition, fix the expansion of ~ when passing programmatically an explicit path to get_data_home.

@thomass-dev thomass-dev force-pushed the control-skrub-data-home-using-envvar branch from 39c9f22 to ceeb5ec Compare January 14, 2025 22:30
@thomass-dev

Copy link
Copy Markdown
Contributor Author

Can i instantiate a logger from stdlib?
I've seen the use of loguru in the benchmark folder, so i reproduced, but i see that it's not a dependency.

@glemaitre

Copy link
Copy Markdown
Member

loguru will not be a dependency for the main package (the benchmark lives on its own).

I see that we are using logger:

https://github.com/probabl-ai/skore/blob/418b72ca6711ef467c95f67416f2162382c645a7/skore/src/skore/__init__.py#L21-L23

I did not recall because in scikit-learn, we using some verbose parameter + print function. But since, the logger is here, it should be used I assume.

@jeromedockes can probably confirm it

@glemaitre

Copy link
Copy Markdown
Member

Whoops, it is actually late and I'm confusing the sk*** pacakge. So we don't use logger which is more what I thought. I would expect that we use verbose + print.

@jeromedockes could confirm

@glemaitre

Copy link
Copy Markdown
Member

we will need a small test to check that we write in the good directory as well. I think that we only need to test the get_data_home for that.

@jeromedockes jeromedockes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Comment thread skrub/datasets/_utils.py Outdated
Comment thread skrub/datasets/_utils.py Outdated
Comment thread skrub/datasets/_utils.py Outdated
Comment thread skrub/datasets/_utils.py Outdated
Comment thread skrub/datasets/_utils.py
@jeromedockes

Copy link
Copy Markdown
Member

the tests failure are not related to your PR but to #1217 :/

@thomass-dev

thomass-dev commented Jan 21, 2025

Copy link
Copy Markdown
Contributor Author

@jeromedockes i have fixed your comments.
In addition, i have removed the print statement. It makes no sense to print the data_home each time the function is called.

@thomass-dev thomass-dev force-pushed the control-skrub-data-home-using-envvar branch from 738923c to ba61142 Compare January 21, 2025 12:58
@thomass-dev thomass-dev force-pushed the control-skrub-data-home-using-envvar branch from eed90a4 to 618d9ea Compare January 21, 2025 13:00
@thomass-dev thomass-dev force-pushed the control-skrub-data-home-using-envvar branch from 6163279 to b366fe0 Compare February 3, 2025 12:28
@thomass-dev

Copy link
Copy Markdown
Contributor Author

The PR is up-to-date and ready for review.

@jeromedockes jeromedockes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @thomass-dev ! sorry you needed to update it after fixing a bunch of conflicts and thanks for addressing all comments

@jeromedockes

Copy link
Copy Markdown
Member

oops sorry I forgot one last detail: if you could add a brief entry with your name and github handle in the changelog that would be great:

New features

thanks!

@thomass-dev

Copy link
Copy Markdown
Contributor Author

Thanks @jeromedockes , it should be okay?

@jeromedockes

Copy link
Copy Markdown
Member

yes, perfect. thanks!

@jeromedockes jeromedockes merged commit 00df92f into skrub-data:main Feb 4, 2025
rcap107 pushed a commit to rcap107/skrub that referenced this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH introduce SKRUB_DATA_DIRECTORY envar to control the data directory

3 participants