New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update benchmark docs (WIP) #867
Conversation
docs/docs/benchmark.md
Outdated
- You can add other tags as well. For each you can set: | ||
- User IDs (who can extend study with that tag) | ||
- Beginning/end dates (only stuff tagged within those dates are part of the study) | ||
- You do not have to tag any runs with study_X, these will be linked automatically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure, but this was in the notes from the Paris meeting. Can you explain the process better? How do you add runs to a study?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the biggest problem with Paris minutes is that a lot of things got discussed, new ideas were introduced, without clear consensus and more importantly without who is going to implement it (but that last question is usually answered implicitly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed :(
For clarity, can you list what is currently not possible yet? Then we need to see who has the time (if anyone), and maybe open issue for these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we can do exactly what we could do before. Add tasks, datasets, flows, etc to a study. A benchmark suite is not a separate entity. This could be changed easily though (by adding a flag and only accept tasks).
I am still not entirely happy about the current and proposed study and benchmark infrastructure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a way to move forward here. Let's break it down so we can discuss this.
- You can add other tags as well. For each you can set:
- User IDs (who can extend study with that tag)
- Beginning/end dates (only stuff tagged within those dates are part of the study)
Looking at the API docs, this seems to not be supported in the API. The dataset does keep this info. Are we missing the API calls for this?
- You do not have to tag any runs with study_X, these will be linked automatically
This is what I remember @janvanrijn saying, but maybe I misremembered. Can you clarify?
The underlying question is: how can we create a new suite and make sure other people can't mess with it. Anything that does that is fine. If we create a special API call for this, that is fine by me, too. At this time it would be nice to have a concrete, workable proposal that will work in the long term and is easy to use and maintainable. If the above design is not OK, then let's find a better one.
docs/docs/benchmark.md
Outdated
The OpenML100 was a predecessor of the OpenML-CC18, consisting of <a href="https://www.openml.org/search?q=tags.tag%3AOpenML100&type=data&table=1&size=100" target="_blank">100 classification datasets</a>. We recommend that you use the OpenML-CC18 instead, because the OpenML100 suffers from some teething issues in the design of benchmark suites. For instance, it contains several datasets that are too easy to model with today's machine learning algorithms, as well as datasets that represent time series analysis problems. These do not invalidate benchmarks run on the OpenML100, but may obfuscate the interpretation of results. The 'OpenML-CC18' handle is also more descriptive and allows easier versioning. | ||
|
||
For reference, the OpenML100 included datasets satisfying the following requirements: | ||
<ul><li>the number of observations are between 500 and 100000 to focus on medium-sized datasets, that are not too small for proper training and not too big for practical experimentation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, but for clarity i would indent this as following:
<ul>
<li> ... </li>
etc ...
</ul>
docs/docs/benchmark.md
Outdated
<ul><li>cannot be randomized via a 10-fold cross-validation due to grouped samples, | ||
</li><li>have an unknown origin or no clearly defined task, | ||
</li><li>are variants of other datasets (e.g. binarized regression tasks), | ||
</li><li>include sparse data (e.g., text mining data sets). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
githun doesn't seem convinced with this line (partly red) although I can't figure out why
docs/docs/benchmark.md
Outdated
</li></ul> | ||
|
||
In addition, OpenML offers <b>benchmarking suites</b>: curated, comprehensive sets of machine learning datasets, | ||
OpenML offers <b>benchmarking suites</b>: curated, comprehensive sets of machine learning datasets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all benchmark suites are necessarily curated though. The OpenML100 and CC18 are.
docs/docs/benchmark.md
Outdated
- Create a new study with name and description | ||
- Set flag to "private" - we dont want people adding tasks here | ||
- Set "suite" flag | ||
- Find the study tag study_X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only part of the story.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you complete the steps, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I tried to say is this is hardly implemented. Nor discussed.
docs/docs/benchmark.md
Outdated
Detailed steps: | ||
- Create a new study with name and description | ||
- Set flag to "private" - we dont want people adding tasks here | ||
- Set "suite" flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this there already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see this in the API docs, so I assume not?
docs/docs/benchmark.md
Outdated
</li><li>sharing of benchmarking results in a reproducible way through the [APIs](APIs), enabling large scale comparisons | ||
</li></ul> | ||
OpenML offers <b>benchmarking suites</b>: curated, comprehensive sets of machine learning datasets, | ||
covering a wide spectrum of domains and statistical properties. This makes benchmarking results more comparable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the word more
needs to compare some A
to some B
.
|
||
### Retrieving runs on a benchmarking suites: | ||
Once a benchamrk suite has been created, the listing functions can be used to | ||
obtain all results on the benchmark suite. Note that there are several other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there also be a separate call to retrieve all studies for a suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This isn't covered yet.
Updates the documentation for the benchmark suites.
Not entirely finished yet. Will do a few more updates.