Conversation
[WIP] New release
mfeurer
left a comment
There was a problem hiding this comment.
Thanks a lot! This looks really good, I only have a few minor comments.
| def get_task(task_id): | ||
| """Download the OpenML task for a given task ID. | ||
| def get_task(task_id, download_data=True): | ||
| """Download the OpenML task representation for a given task ID, optionally |
There was a problem hiding this comment.
Could you please try to keep the first line of the docstring to be a single line and add information about optional arguments afterwards?
Codecov Report
@@ Coverage Diff @@
## develop #659 +/- ##
===========================================
- Coverage 91% 90.89% -0.12%
===========================================
Files 36 32 -4
Lines 3526 3395 -131
===========================================
- Hits 3209 3086 -123
+ Misses 317 309 -8
Continue to review full report at Codecov.
|
| Parameters | ||
| ---------- | ||
| task_id : int | ||
| The task representation is downloaded while the download of data |
There was a problem hiding this comment.
Sorry, that's not how I meant it. Could you please check for example the doc strings of the scikit-learn random forest? There's a single line, then some additional information (where I think the sentence about the optional download of the split and data should go) and then the Parameters.
There was a problem hiding this comment.
sorry about that, should have looked up an example, will make the changes
* Added comments in examples for dataset 68 belonging to only test server * Added comment in flow and run example for dataset 68 belonging to only test server * Making download of datasplits optional and adding a relevant unit test * Adding error handling for task ID type * Changes suggested by Matthias on PR #659 * Removing inappropriate dataset check from test case * Fixing docstring * Fixing whitespace issue for PEP8
What does this PR implement/fix? Explain your changes.
Addresses the pending part from #346 . Wherein, the splits and the data are not downloaded when
get_taskis called. Only when a run is executed, the data and the respective splits are downloaded.A unit test has been added for the lazy fetch of a task.
How should this PR be tested?
Calling
get_taskorget_taskswith the additional parameter ofdownload_data=Falsedownloads only the task.xml file. Which can be verified by monitoring the cache directory. When a task is run, the corresponding datasplits.arff file is downloaded and available in the cache directory.Any other comments?
task.class_labelsassignment was happening in a nested manner aftertask.download_splittask.class_labelsis being assigned even if splits are not being downloaded, by fetching it from dataset's description