Skip to content

Conversation

@cwognum
Copy link
Collaborator

@cwognum cwognum commented Oct 18, 2023

Changelogs


Checklist:

  • Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
  • Add tests to cover the fixed bug(s) or the newly introduced feature(s) (if appropriate).
  • Update the API documentation if a new function is added, or an existing one is deleted.
  • Write concise and explanatory changelogs above.
  • If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

Results now look like this 👇

Test set Target label Metric Score
test_iid EGFR_WT AUC 0.9
test_ood EGFR_WT AUC 0.75
... ... ... ...
test_ood EGFR_L858R AUC 0.79

I thought about the results data-structure for these last few days, but couldn't come up with anything more convenient than a tabular Pandas DataFrame. Let's see how far that gets us! 🙂

@cwognum cwognum added the feature Annotates any PR that adds new features; Used in the release process label Oct 18, 2023
@cwognum cwognum requested a review from zhu0619 October 18, 2023 20:25
@cwognum cwognum requested a review from hadim as a code owner October 18, 2023 20:25
Copy link
Contributor

@hadim hadim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

LGTM for me. The tabular structure is 100% easier to work with than a nested structure (being in Python or on the hub).

Question: should we provide some averages in the test/target dimensions? or should we let the user or the hub doing that by himself?

@cwognum
Copy link
Collaborator Author

cwognum commented Oct 18, 2023

Question: should we provide some averages in the test/target dimensions? or should we let the user or the hub doing that by himself?

Interesting! Definitely wouldn't prioritize this now, but it could be nice to ultimately add this to the Results page on the Hub!

Copy link
Contributor

@zhu0619 zhu0619 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cwognum
It looks good to me.
I have a general question – how would a multitask result to be presented in the leaderboard. An aggregated score?

@cwognum
Copy link
Collaborator Author

cwognum commented Oct 19, 2023

how would a multitask result to be presented in the leaderboard. An aggregated score?

The leaderboard is compiled based on the main_metric and it is thus up to the user specifying the benchmark to decide.

We support two scenarios for the results:

  1. A metric is computed per task.
  2. A metric is computed across tasks.
    (see the is_multitask field in the MetricInfo class)

Using scenario 2, you can add a custom metric that aggregates the results across tasks. You can then specify this metric to be the main metric that is used to compile the leaderboard.

@cwognum cwognum merged commit 283fb48 into main Oct 19, 2023
@cwognum cwognum deleted the feat/new-results-datastructure branch October 19, 2023 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Annotates any PR that adds new features; Used in the release process

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: Rethink the data-type for the results field in BenchmarkResults

4 participants