Skip to content

Make scoring code consistent with scoring procedure #555

@priyakasimbeg

Description

@priyakasimbeg

Currently the scoring code is not consistent with the scoring procedure.
The scoring procedure for a submission:

  1. For each workload look at 5 trials per workload.
  2. If >=3 trials have reached the target, use the best trial out of the 5 trials in terms of submission time.
  3. If <3 trials have reached the target, the submission did not reach the target on that workload.

In scoring.py we have to make the following modifications:

  1. Print a warning if fewer than 5 trials are found for a workload submission.
  2. Check that at 3/5 trials have reached the target and only then use the best trial to score the submission (bug fix)
  3. Print a warning fewer than 8 workloads are found.
  4. Make sure the percentage workloads that reached targets is calculated over all 8 workloads as opposed to the number of workloads in the submission directory. Currently if the user scores a submission over an experiment with n workloads it will use n as the numerator.

Metadata

Metadata

Assignees

Labels

P1 Launch 2023High priority issues for October 2023 AlgoPerf Launch🚀 Launch BlockerIssues that are blocking launch of benchmark

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions