Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update benchmark docs (WIP) #867

Merged
merged 21 commits into from Mar 4, 2019
Merged

Update benchmark docs (WIP) #867

merged 21 commits into from Mar 4, 2019

Conversation

joaquinvanschoren
Copy link
Sponsor Contributor

Updates the documentation for the benchmark suites.
Not entirely finished yet. Will do a few more updates.

- You can add other tags as well. For each you can set:
- User IDs (who can extend study with that tag)
- Beginning/end dates (only stuff tagged within those dates are part of the study)
- You do not have to tag any runs with study_X, these will be linked automatically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really true?

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure, but this was in the notes from the Paris meeting. Can you explain the process better? How do you add runs to a study?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the biggest problem with Paris minutes is that a lot of things got discussed, new ideas were introduced, without clear consensus and more importantly without who is going to implement it (but that last question is usually answered implicitly)

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed :(
For clarity, can you list what is currently not possible yet? Then we need to see who has the time (if anyone), and maybe open issue for these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we can do exactly what we could do before. Add tasks, datasets, flows, etc to a study. A benchmark suite is not a separate entity. This could be changed easily though (by adding a flag and only accept tasks).

I am still not entirely happy about the current and proposed study and benchmark infrastructure

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to move forward here. Let's break it down so we can discuss this.

- You can add other tags as well. For each you can set:
    - User IDs (who can extend study with that tag)
    - Beginning/end dates (only stuff tagged within those dates are part of the study)

Looking at the API docs, this seems to not be supported in the API. The dataset does keep this info. Are we missing the API calls for this?

- You do not have to tag any runs with study_X, these will be linked automatically

This is what I remember @janvanrijn saying, but maybe I misremembered. Can you clarify?

The underlying question is: how can we create a new suite and make sure other people can't mess with it. Anything that does that is fine. If we create a special API call for this, that is fine by me, too. At this time it would be nice to have a concrete, workable proposal that will work in the long term and is easy to use and maintainable. If the above design is not OK, then let's find a better one.

The OpenML100 was a predecessor of the OpenML-CC18, consisting of <a href="https://www.openml.org/search?q=tags.tag%3AOpenML100&type=data&table=1&size=100" target="_blank">100 classification datasets</a>. We recommend that you use the OpenML-CC18 instead, because the OpenML100 suffers from some teething issues in the design of benchmark suites. For instance, it contains several datasets that are too easy to model with today's machine learning algorithms, as well as datasets that represent time series analysis problems. These do not invalidate benchmarks run on the OpenML100, but may obfuscate the interpretation of results. The 'OpenML-CC18' handle is also more descriptive and allows easier versioning.

For reference, the OpenML100 included datasets satisfying the following requirements:
<ul><li>the number of observations are between 500 and 100000 to focus on medium-sized datasets, that are not too small for proper training and not too big for practical experimentation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, but for clarity i would indent this as following:

<ul>
  <li> ... </li>
  etc ... 
</ul>

<ul><li>cannot be randomized via a 10-fold cross-validation due to grouped samples,
</li><li>have an unknown origin or no clearly defined task,
</li><li>are variants of other datasets (e.g. binarized regression tasks),
</li><li>include sparse data (e.g., text mining data sets).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

githun doesn't seem convinced with this line (partly red) although I can't figure out why

</li></ul>

In addition, OpenML offers <b>benchmarking suites</b>: curated, comprehensive sets of machine learning datasets,
OpenML offers <b>benchmarking suites</b>: curated, comprehensive sets of machine learning datasets,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all benchmark suites are necessarily curated though. The OpenML100 and CC18 are.

- Create a new study with name and description
- Set flag to "private" - we dont want people adding tasks here
- Set "suite" flag
- Find the study tag study_X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only part of the story.

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you complete the steps, please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I tried to say is this is hardly implemented. Nor discussed.

Detailed steps:
- Create a new study with name and description
- Set flag to "private" - we dont want people adding tasks here
- Set "suite" flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this there already?

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see this in the API docs, so I assume not?

</li><li>sharing of benchmarking results in a reproducible way through the [APIs](APIs), enabling large scale comparisons
</li></ul>
OpenML offers <b>benchmarking suites</b>: curated, comprehensive sets of machine learning datasets,
covering a wide spectrum of domains and statistical properties. This makes benchmarking results more comparable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the word more needs to compare some A to some B.

docs/docs/benchmark.md Show resolved Hide resolved
docs/docs/benchmark.md Outdated Show resolved Hide resolved
docs/docs/benchmark.md Outdated Show resolved Hide resolved
docs/docs/benchmark.md Outdated Show resolved Hide resolved
docs/docs/benchmark.md Outdated Show resolved Hide resolved
docs/docs/benchmark.md Outdated Show resolved Hide resolved
docs/docs/benchmark.md Outdated Show resolved Hide resolved

### Retrieving runs on a benchmarking suites:
Once a benchamrk suite has been created, the listing functions can be used to
obtain all results on the benchmark suite. Note that there are several other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there also be a separate call to retrieve all studies for a suite?

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This isn't covered yet.

docs/docs/benchmark.md Show resolved Hide resolved
@joaquinvanschoren joaquinvanschoren merged commit 7a9a5d0 into develop Mar 4, 2019
@janvanrijn janvanrijn deleted the benchmarksuitedocs branch March 4, 2019 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants