Conversation
…laces that might use the arff file internally.
…ing it out of __init__.
…ed when downloading the arff file.
Codecov Report
@@ Coverage Diff @@
## develop #644 +/- ##
===========================================
+ Coverage 90.13% 90.67% +0.54%
===========================================
Files 32 32
Lines 3366 3390 +24
===========================================
+ Hits 3034 3074 +40
+ Misses 332 316 -16
Continue to review full report at Codecov.
|
… differently from one with multiple (e.g. feat 5 of d/2).
|
It looks like |
|
I can't seem to restart the |
|
Awesome! ::hooray:: |
Datasets can now be downloaded without downloading the
arfffile.Function signature of
get_datasetchanged fromdef get_dataset(dataset_id: Union[int, str]) -> OpenMLDataset:to
def get_dataset(dataset_id: Union[int, str], download_data: bool = True) -> OpenMLDataset:I chose to default to
Trueso there are no breaking changes to existing code.If
download_data=False, only metadata will be downloaded (i.e. all data except the arff file).Whenever a user invokes
retrieve_class_labelsorget_data, both of which require the arff-file, the arff-file is retrieved and processed as if it were downloaded from the server on initialization (e.g. pickled). This happens without warning or error/argument.As long as only functionality is used which does not require the arff file, no additional data is downloaded.
I took the liberty to refactor
retrieve_class_labelss.t. it uses the already downloaded feature metadata instead of reading the arff file. This makes it so thatretrieve_class_labelscan 50used without downloading the underlying data, and overall should really speed up the method in cases where a huge arff file was loaded just to check the header 👍As part of this I also addressed open issue #507 that was being worked on in #508 (I only realized afterwards). Though it looks like that code was from before @janvanrijn changed the xml so that the nominal values are a list. My solution is able to use this update and with only minor changes address the issue.
Fixes:
#643
#612
#507
#446
#346 in part