-
Notifications
You must be signed in to change notification settings - Fork 177
Refactor of the fetching system #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor of the fetching system #155
Conversation
…fetching_refactor
|
So, for this (I hope last) error, we got an issue with the JSON decoding. Unless told to, I won't try to dig inside sklearn's code to figure out at which step it goes wrong, and perhaps monkeypatch a solution.
What do you suggest ? |
|
Is this error happening for all the datasets, or only some? |
|
By the way, I checked, sklearn 0.21.0 is a couple years old. It's fine to bump the requirement. |
GaelVaroquaux
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished reviewing this and made all my comments. After these small things, this will be ready for merge.
f8bf8f5 to
76979d6
Compare
|
These most recent changes improve the user interface. |
|
This PR looks good to me, I think it can be merged. |
|
I was about to merge, but don't we need a an entry summarizing things in the changelog? |
|
Yes you're right, I always forget that. |
|
Yey! Merged |

Follows #147
This is a refactor of the dataset fetching system.
Historically, datasets were fetched from various websites.
Our aim with this update is to only use OpenML.org's API, through Scikit-learn's
fetch_openml()function.This allows us to have a much more reliable and unified interface, and avoids losing access to datasets due to deletion, renaming, etc.
For instance, with the current system, 4 on the 7 datasets are unavailable (403 Access Denied, website down, etc.).
The user-interface stays the same with the functions
fetch_*()(e.gfetch_open_payments()), still returning a similar dictionary.The major difference is that this dictionary returns, among other information, a
path, where a CSV file is located, and must be loaded (using for instance pandas'read_csv()function).TL;DR of the previous thread:
the way
fetch_openml()is used here makes pandas a requirement.