Update benchmark docs (WIP) #867

joaquinvanschoren · 2018-11-20T11:04:54Z

Updates the documentation for the benchmark suites.
Not entirely finished yet. Will do a few more updates.

janvanrijn · 2018-11-26T15:31:18Z

docs/docs/benchmark.md

+- You can add other tags as well. For each you can set:
+    - User IDs (who can extend study with that tag)
+    - Beginning/end dates (only stuff tagged within those dates are part of the study)
+- You do not have to tag any runs with study_X, these will be linked automatically


Is this really true?

I wasn't sure, but this was in the notes from the Paris meeting. Can you explain the process better? How do you add runs to a study?

Yes, but the biggest problem with Paris minutes is that a lot of things got discussed, new ideas were introduced, without clear consensus and more importantly without who is going to implement it (but that last question is usually answered implicitly)

Indeed :(
For clarity, can you list what is currently not possible yet? Then we need to see who has the time (if anyone), and maybe open issue for these.

Currently, we can do exactly what we could do before. Add tasks, datasets, flows, etc to a study. A benchmark suite is not a separate entity. This could be changed easily though (by adding a flag and only accept tasks).

I am still not entirely happy about the current and proposed study and benchmark infrastructure

We need a way to move forward here. Let's break it down so we can discuss this.

- You can add other tags as well. For each you can set: - User IDs (who can extend study with that tag) - Beginning/end dates (only stuff tagged within those dates are part of the study)

Looking at the API docs, this seems to not be supported in the API. The dataset does keep this info. Are we missing the API calls for this?

- You do not have to tag any runs with study_X, these will be linked automatically

This is what I remember @janvanrijn saying, but maybe I misremembered. Can you clarify?

The underlying question is: how can we create a new suite and make sure other people can't mess with it. Anything that does that is fine. If we create a special API call for this, that is fine by me, too. At this time it would be nice to have a concrete, workable proposal that will work in the long term and is easy to use and maintainable. If the above design is not OK, then let's find a better one.

janvanrijn · 2018-11-26T15:32:43Z

docs/docs/benchmark.md

+The OpenML100 was a predecessor of the OpenML-CC18, consisting of <a href="https://www.openml.org/search?q=tags.tag%3AOpenML100&type=data&table=1&size=100" target="_blank">100 classification datasets</a>. We recommend that you use the OpenML-CC18 instead, because the OpenML100 suffers from some teething issues in the design of benchmark suites. For instance, it contains several datasets that are too easy to model with today's machine learning algorithms, as well as datasets that represent time series analysis problems. These do not invalidate benchmarks run on the OpenML100, but may obfuscate the interpretation of results. The 'OpenML-CC18' handle is also more descriptive and allows easier versioning.
+
+For reference, the OpenML100 included datasets satisfying the following requirements:
+<ul><li>the number of observations are between 500 and 100000 to focus on medium-sized datasets, that are not too small for proper training and not too big for practical experimentation,


minor, but for clarity i would indent this as following:

<ul> <li> ... </li> etc ... </ul>

janvanrijn · 2018-11-26T15:33:40Z

docs/docs/benchmark.md

+<ul><li>cannot be randomized via a 10-fold cross-validation due to grouped samples,
+</li><li>have an unknown origin or no clearly defined task,
+</li><li>are variants of other datasets (e.g. binarized regression tasks),
+</li><li>include sparse data (e.g., text mining data sets).


githun doesn't seem convinced with this line (partly red) although I can't figure out why

janvanrijn · 2018-11-26T19:36:57Z

docs/docs/benchmark.md

-</li></ul>
-
-In addition, OpenML offers <b>benchmarking suites</b>: curated, comprehensive sets of machine learning datasets,
+OpenML offers <b>benchmarking suites</b>: curated, comprehensive sets of machine learning datasets,


Not all benchmark suites are necessarily curated though. The OpenML100 and CC18 are.

janvanrijn · 2018-11-26T19:58:08Z

docs/docs/benchmark.md

+- Create a new study with name and description
+- Set flag to "private" - we dont want people adding tasks here
+- Set "suite" flag
+- Find the study tag study_X


This is only part of the story.

Can you complete the steps, please?

What I tried to say is this is hardly implemented. Nor discussed.

janvanrijn · 2018-11-26T19:58:29Z

docs/docs/benchmark.md

+Detailed steps:
+- Create a new study with name and description
+- Set flag to "private" - we dont want people adding tasks here
+- Set "suite" flag


Is this there already?

I can't see this in the API docs, so I assume not?

Develop

mfeurer · 2019-02-26T12:27:51Z

docs/docs/benchmark.md

-</li><li>sharing of benchmarking results in a reproducible way through the [APIs](APIs), enabling large scale comparisons
-</li></ul>
+OpenML offers <b>benchmarking suites</b>: curated, comprehensive sets of machine learning datasets,
+covering a wide spectrum of domains and statistical properties. This makes benchmarking results more comparable,


I think the word more needs to compare some A to some B.

docs/docs/benchmark.md

mfeurer · 2019-02-26T12:37:18Z

docs/docs/benchmark.md

+
+### Retrieving runs on a benchmarking suites:
+Once a benchamrk suite has been created, the listing functions can be used to 
+obtain all results on the benchmark suite. Note that there are several other


Will there also be a separate call to retrieve all studies for a suite?

Good question. This isn't covered yet.

docs/docs/benchmark.md

Update benchmark docs (WIP)

5986485

janvanrijn reviewed Nov 26, 2018

View reviewed changes

layout fixes

3109d35

janvanrijn reviewed Nov 26, 2018

View reviewed changes

joaquinvanschoren and others added 15 commits November 26, 2018 22:25

completed description and added TODOs

b691021

fixed hyperlinks

e8a4707

typo

5b30758

layout

cb8be31

code layout

4303ec8

code layout

fc0c161

added links to additional scripts and added comment about java uploads

1962bfe

markup code blocks

42a30f5

markup fix

ab3e6a7

markup

acd7cb8

markup

b13cb34

markup

d20d2be

markdown

29fab03

Merge pull request #934 from openml/develop

455d7ef

Develop

updated benchmark docs

8aebc42

mfeurer reviewed Feb 26, 2019

View reviewed changes

janvanrijn and others added 4 commits February 26, 2019 14:01

tiny update

411c9cf

work on benchmark docs

dcdd721

General cleanup, clarifications, and fixes

373cc1f

Fixes based on Matthias' review

a5c62c4

joaquinvanschoren merged commit 7a9a5d0 into develop Mar 4, 2019

janvanrijn deleted the benchmarksuitedocs branch March 4, 2019 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update benchmark docs (WIP) #867

Update benchmark docs (WIP) #867

joaquinvanschoren commented Nov 20, 2018

janvanrijn Nov 26, 2018

joaquinvanschoren Nov 26, 2018

janvanrijn Nov 26, 2018

joaquinvanschoren Nov 26, 2018

janvanrijn Nov 26, 2018

joaquinvanschoren Dec 18, 2018

janvanrijn Nov 26, 2018 •

edited

janvanrijn Nov 26, 2018

janvanrijn Nov 26, 2018

janvanrijn Nov 26, 2018

joaquinvanschoren Dec 18, 2018

janvanrijn Dec 18, 2018

janvanrijn Nov 26, 2018

joaquinvanschoren Dec 18, 2018

mfeurer Feb 26, 2019

mfeurer Feb 26, 2019

joaquinvanschoren Mar 4, 2019

Update benchmark docs (WIP) #867

Update benchmark docs (WIP) #867

Conversation

joaquinvanschoren commented Nov 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janvanrijn Nov 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janvanrijn Nov 26, 2018 •

edited