Skip to content

Enhancement: Rethink the data-type for the results field in BenchmarkResults #39

@cwognum

Description

@cwognum

Context

The data type for the results field of the BenchmarkResults class is hard to parse.

Currently, the results field is a (possibly) nested dict up to 3 levels deep. Depending on whether the benchmark is multi-task or single-task and depending on whether the benchmark includes just a single or multiple test sets, the depth of the dict changes and the same level can contain different information for result objects coming from different benchmarks.

Because of this inconsistency, it's hard to parse the results downstream (e.g. to build the leaderboard or to serialize the field).

Description

Consider the downstream use-cases for the results field and devise a new data-structure that is easy to parse to facilitate these use cases.

Acceptance Criteria

  • An informed decision has been made on how to revise the results data-structure.
  • The data-structure has been implemented in the Polaris library.

Links

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions