Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of OML files that dont work #17

Closed
berndbischl opened this issue Jan 7, 2016 · 3 comments
Closed

List of OML files that dont work #17

berndbischl opened this issue Jan 7, 2016 · 3 comments

Comments

@berndbischl
Copy link
Member


  # here a list of other dids that do not work (some of them even don't work for RWeka)
  bad = c(70,71,73,74,75,76,78,115,116,118,119,121,122,123,124,125,126,127,128,129,130,
          131,132,133,135,136,138,140,141,142,144,146,147,148,273,292,293,350,358,383,
          384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,572)

  # some of the files are also "big" and take a long time
  size.bad = vapply(bad, function(X) {
    path = OpenML:::downloadOMLObject(X, object = "data")$files$dataset.arff$path
    file.size(path)
  }, numeric(1))

We need to check those, reduce the list. Maybe convert some of them into now issues

@jakobbossek
Copy link
Contributor

Downloaded and checked all of these beside 358 (no access).
The majority can be parsed now without problems now.
Parsing the remaining files fails because they are in sparse format which is currently unsupported.
We have a dedicated issue for that: #4
Closing here.

@jakobbossek
Copy link
Contributor

Ahh. Damn. Should have a look at the files which fail before closing.
They are not in sparse format. Instead each data row is wrapped in curly braces.
The remaining ones:

bad = c(273, 292, 293, 350, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401)

@jakobbossek
Copy link
Contributor

Ahhhhh! Ok. I am stupid. Should not only have a look at the arff files, but also at the arff format documentation. The arff files, that could not be parsed are indeed in sparse format.
Closing again 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants