Redid the `BenchmarkResults` structure to be tabular #44

cwognum · 2023-10-18T20:25:26Z

Changelogs

Updated to BenchmarksResults.results field to be of a tabular format. Closes Enhancement: Rethink the data-type for the results field in BenchmarkResults #39.
Added convenience methods to each of the artifacts to upload it to the hub. They are all light wrappers around the client's methods.
Add consistent, more informative error messages when the user is not logged in.

Checklist:

Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
Add tests to cover the fixed bug(s) or the newly introduced feature(s) (if appropriate).
Update the API documentation if a new function is added, or an existing one is deleted.
Write concise and explanatory changelogs above.
If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

Results now look like this 👇

Test set	Target label	Metric	Score
test_iid	EGFR_WT	AUC	0.9
test_ood	EGFR_WT	AUC	0.75
...	...	...	...
test_ood	EGFR_L858R	AUC	0.79

I thought about the results data-structure for these last few days, but couldn't come up with anything more convenient than a tabular Pandas DataFrame. Let's see how far that gets us! 🙂

hadim

Thanks!

LGTM for me. The tabular structure is 100% easier to work with than a nested structure (being in Python or on the hub).

Question: should we provide some averages in the test/target dimensions? or should we let the user or the hub doing that by himself?

cwognum · 2023-10-18T20:37:39Z

Question: should we provide some averages in the test/target dimensions? or should we let the user or the hub doing that by himself?

Interesting! Definitely wouldn't prioritize this now, but it could be nice to ultimately add this to the Results page on the Hub!

zhu0619

Thanks @cwognum
It looks good to me.
I have a general question – how would a multitask result to be presented in the leaderboard. An aggregated score?

cwognum · 2023-10-19T14:16:12Z

how would a multitask result to be presented in the leaderboard. An aggregated score?

The leaderboard is compiled based on the main_metric and it is thus up to the user specifying the benchmark to decide.

We support two scenarios for the results:

A metric is computed per task.
A metric is computed across tasks.
(see the is_multitask field in the MetricInfo class)

Using scenario 2, you can add a custom metric that aggregates the results across tasks. You can then specify this metric to be the main metric that is used to compile the leaderboard.

Redid the BenchmarkResults structure to be tabular

75a1b6f

cwognum added the feature Annotates any PR that adds new features; Used in the release process label Oct 18, 2023

cwognum requested a review from zhu0619 October 18, 2023 20:25

cwognum requested a review from hadim as a code owner October 18, 2023 20:25

hadim reviewed Oct 18, 2023

View reviewed changes

Don't save the index. We don't really use it anyways!

4f47e96

zhu0619 reviewed Oct 19, 2023

View reviewed changes

cwognum merged commit 283fb48 into main Oct 19, 2023

cwognum deleted the feat/new-results-datastructure branch October 19, 2023 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redid the `BenchmarkResults` structure to be tabular #44

Redid the `BenchmarkResults` structure to be tabular #44

Uh oh!

cwognum commented Oct 18, 2023 •

edited

Loading

Uh oh!

hadim left a comment

Uh oh!

cwognum commented Oct 18, 2023

Uh oh!

zhu0619 left a comment

Uh oh!

cwognum commented Oct 19, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Redid the BenchmarkResults structure to be tabular #44

Redid the BenchmarkResults structure to be tabular #44

Uh oh!

Conversation

cwognum commented Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelogs

Uh oh!

hadim left a comment

Choose a reason for hiding this comment

Uh oh!

cwognum commented Oct 18, 2023

Uh oh!

zhu0619 left a comment

Choose a reason for hiding this comment

Uh oh!

cwognum commented Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Redid the `BenchmarkResults` structure to be tabular #44

Redid the `BenchmarkResults` structure to be tabular #44

cwognum commented Oct 18, 2023 •

edited

Loading

cwognum commented Oct 19, 2023 •

edited

Loading