-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs in sparse data with missing nominal values #52
Comments
Because of the this bug, liac-arff cannot decode many OpenML datasets, such as https://www.openml.org/d/35002 The default behavior of Weka is to fill in the missing nominal value with the first value in the nominal specification list |
Thank you a lot for pointing this out, but I would argue that this is not a bug in liac-arff, but in the arff specification. This behavior is not defined there, and I am very hesitant to add functionality here which is undocumented behavior of the WEKA arff reader. In fact, we deactivated the datasets in question on OpenML because they are not valid arff. Also, this is a known bug in WEKA with a workaround on the WEKA side:
|
@mfeurer This still available dataset fails with this error: https://www.openml.org/d/35002 |
I am aware of that; however, the dataset is flagged as 'in preparation' which basically means that it's not ready for production use. Someone needs to download all QSAR datasets, fix the bug in them, and re-upload them |
Closing this as it is an issue with OpenML and WEKA. |
Reopening as WEKA can actually read this file :( |
Due to the default value str(0), it may causes BadNominalValue() error when data contains some missing nominal values.
The text was updated successfully, but these errors were encountered: