Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OpenML timeout #23358

Merged
merged 4 commits into from
May 13, 2022
Merged

Fix OpenML timeout #23358

merged 4 commits into from
May 13, 2022

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented May 13, 2022

Fix #23357.

I am not sure why we were passing timeout=delay and I did not find anything in #21901. They are not really the same thing, delay is the time to wait for between two attempts, timeout is the time to wait to get the data.

By not passing timeout, we use the default timeout which is "no timeout" (i.e. "wait for ever") by default. You can always use socket.setdefaulttimeout if you want to change it.

Side-comment: maybe part of the reason we did not see this before is because OpenML started to do some redirections recently (openml.org -> old.openml.org) and for some datasets, that just happens to go over the timeout of 1s we were using. The timeout _get_data_info_by_name in the issue OP and in _get_data_features in my case.

@lesteve lesteve added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label May 13, 2022
@glemaitre glemaitre modified the milestones: 1.1, 1.1.1 May 13, 2022
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well. I see no reason why the retry delay should be of the same order as a network timeout. The only purpose of the retry delay is to avoid DOSing a service that is already potentially overloaded via a retry mechanism.

This was probably overlooked during the initial review of the PR that introduced the retry mechanism.

@ogrisel ogrisel merged commit d247579 into scikit-learn:main May 13, 2022
@lesteve lesteve deleted the fix-openml-timeout branch May 13, 2022 12:31
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request May 19, 2022
glemaitre pushed a commit that referenced this pull request May 19, 2022
mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:datasets To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fetch_openml fails on leukemia
3 participants