Convert single task study calls to a task call #303

PGijsbers · 2021-05-14T10:23:30Z

This avoids performing a call to retrieve the study and all their dataset metadata (2 calls total) for each job.

This avoids performing a call to retrieve the study and all their dataset metadata (2 calls total).

Innixma · 2021-05-18T17:41:06Z

Is this a fix that will reduce the frequency of openml server errors? Does this impact runs done with pre-existing benchmark files instead of studies (such as resources/benchmarks/medium.yaml)?

PGijsbers · 2021-05-19T08:31:35Z

It reduces requests only when the benchmark was specified with the openml/s/N format. There should be an update to the retry policy for openml-python later this week which hopefully alleviates the server issues more generally.

sebhrusen · 2021-05-26T14:11:59Z

amlb/runners/aws.py

+                    _task_names = []
+                else:
+                    _task_names = task_names


You don't need to change task_names variable, it will work either way.

Did you try it?
I want to be sure that the folder structure generated on s3 is the same as before and that this change is not making it more difficult to retrieve results from s3 a posteriori.
Currently, s3 is the long-term storage for results and those are organized by sessions, and inside the session, each folder contains the original benchmark name and the task name, which makes it relatively easy to download only a specific result.
I think it should be fine though as aws mode is running benchmarks using --session= (which removes the session folder on the ec2 instance to avoid an additional subfolder) and this should prevent the modifed params to appear anywhere.

Testing with python runbenchmark.py constantpredictor openml/s/264 -m aws -f 0 the structure on the bucket seems the same, but the local result directory is actually different. Both have the same aws.openml_s_264.test.all_tasks.0.constantpredictor subdirectory with the data from that run, but the main directory of this branch does not feature logs and logs.zip.

You seem to be correct that task names don't need to be modified, though I find the openml/t/61 -t iris notation a bit odd to explicitly support.

Convert single task study calls to a task call

02844be

This avoids performing a call to retrieve the study and all their dataset metadata (2 calls total).

PGijsbers requested a review from sebhrusen May 14, 2021 10:23

PGijsbers mentioned this pull request May 21, 2021

code 107: Database connection error. #229

Closed

sebhrusen reviewed May 26, 2021

View reviewed changes

PGijsbers added this to the 2.1 milestone Mar 3, 2023

PGijsbers modified the milestones: 2.1, 2.2 May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert single task study calls to a task call #303

Convert single task study calls to a task call #303

PGijsbers commented May 14, 2021

Innixma commented May 18, 2021

PGijsbers commented May 19, 2021

sebhrusen May 26, 2021

PGijsbers May 27, 2021

Convert single task study calls to a task call #303

Are you sure you want to change the base?

Convert single task study calls to a task call #303

Conversation

PGijsbers commented May 14, 2021

Innixma commented May 18, 2021

PGijsbers commented May 19, 2021

sebhrusen May 26, 2021

Choose a reason for hiding this comment

PGijsbers May 27, 2021

Choose a reason for hiding this comment