Lazy download of data splits by Neeratyoy · Pull Request #659 · openml/openml-python

Neeratyoy · 2019-04-04T10:40:34Z

What does this PR implement/fix? Explain your changes.

Addresses the pending part from #346 . Wherein, the splits and the data are not downloaded when get_task is called. Only when a run is executed, the data and the respective splits are downloaded.
A unit test has been added for the lazy fetch of a task.

How should this PR be tested?

Calling get_task or get_tasks with the additional parameter of download_data=False downloads only the task.xml file. Which can be verified by monitoring the cache directory. When a task is run, the corresponding datasplits.arff file is downloaded and available in the cache directory.

Any other comments?

Earlier: task.class_labels assignment was happening in a nested manner after task.download_split
Now: task.class_labelsis being assigned even if splits are not being downloaded, by fetching it from dataset's description

[WIP] New release

… develop

…y test server

mfeurer

Thanks a lot! This looks really good, I only have a few minor comments.

mfeurer · 2019-04-04T11:40:56Z

-def get_task(task_id):
-    """Download the OpenML task for a given task ID.
+def get_task(task_id, download_data=True):
+    """Download the OpenML task representation for a given task ID, optionally


Could you please try to keep the first line of the docstring to be a single line and add information about optional arguments afterwards?

codecov-io · 2019-04-04T11:48:18Z

Codecov Report

Merging #659 into develop will decrease coverage by 0.11%.
The diff coverage is 76.92%.

@@             Coverage Diff             @@
##           develop     #659      +/-   ##
===========================================
- Coverage       91%   90.89%   -0.12%     
===========================================
  Files           36       32       -4     
  Lines         3526     3395     -131     
===========================================
- Hits          3209     3086     -123     
+ Misses         317      309       -8

Impacted Files	Coverage Δ
openml/tasks/functions.py	`85.53% <76.92%> (-0.92%)`	⬇️
openml/testing.py	`93.82% <0%> (-1.5%)`	⬇️
openml/flows/functions.py	`87.4% <0%> (-0.79%)`	⬇️
openml/config.py	`89.28% <0%> (-0.19%)`	⬇️
openml/runs/trace.py	`90.84% <0%> (-0.18%)`	⬇️
openml/flows/flow.py	`94.08% <0%> (-0.1%)`	⬇️
openml/__init__.py	`100% <0%> (ø)`	⬆️
openml/datasets/functions.py	`95.47% <0%> (ø)`	⬆️
openml/flows/__init__.py	`100% <0%> (ø)`	⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e8e904...23209af. Read the comment docs.

mfeurer · 2019-04-05T09:43:40Z

    Parameters
    ----------
-    task_id : int
+    The task representation is downloaded while the download of data


Sorry, that's not how I meant it. Could you please check for example the doc strings of the scikit-learn random forest? There's a single line, then some additional information (where I think the sentence about the optional download of the split and data should go) and then the Parameters.

sorry about that, should have looked up an example, will make the changes

* Added comments in examples for dataset 68 belonging to only test server * Added comment in flow and run example for dataset 68 belonging to only test server * Making download of datasplits optional and adding a relevant unit test * Adding error handling for task ID type * Changes suggested by Matthias on PR #659 * Removing inappropriate dataset check from test case * Fixing docstring * Fixing whitespace issue for PEP8

mfeurer and others added 6 commits February 15, 2019 09:38

Merge pull request #617 from openml/develop

a0ef724

[WIP] New release

Merge branch 'master' of https://github.com/openml/openml-python into…

7bc078a

… develop

Added comments in examples for dataset 68 belonging to only test server

8d591dd

Added comment in flow and run example for dataset 68 belonging to onl…

6f33612

…y test server

Making download of datasplits optional and adding a relevant unit test

f1a77f5

Adding error handling for task ID type

d8d4773

mfeurer reviewed Apr 4, 2019

View reviewed changes

Neeratyoy added 2 commits April 4, 2019 15:05

Changes suggested by Matthias on PR #659

ea6fe99

Removing inappropriate dataset check from test case

90057fb

Neeratyoy requested a review from mfeurer April 4, 2019 17:14

mfeurer reviewed Apr 5, 2019

View reviewed changes

Fixing docstring

9d7ec52

Neeratyoy requested a review from mfeurer April 5, 2019 10:42

Fixing whitespace issue for PEP8

76630ff

mfeurer approved these changes Apr 5, 2019

View reviewed changes

Merge branch 'develop' into fix346

23209af

mfeurer merged commit 6b5dfe6 into develop Apr 9, 2019

mfeurer deleted the fix346 branch April 9, 2019 15:27

mfeurer mentioned this pull request Apr 9, 2019

be lazy in downloading datasets and splits #346

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lazy download of data splits #659

Lazy download of data splits #659
mfeurer merged 11 commits intodevelopfrom
fix346

Neeratyoy commented Apr 4, 2019

Uh oh!

mfeurer left a comment

Uh oh!

Uh oh!

Uh oh!

mfeurer Apr 4, 2019

Uh oh!

codecov-io commented Apr 4, 2019 •

edited

Loading

Uh oh!

mfeurer Apr 5, 2019

Uh oh!

Neeratyoy Apr 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Neeratyoy commented Apr 4, 2019

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

Any other comments?

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mfeurer Apr 4, 2019

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Apr 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mfeurer Apr 5, 2019

Choose a reason for hiding this comment

Uh oh!

Neeratyoy Apr 5, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented Apr 4, 2019 •

edited

Loading