Skip to content

Add an artifact that requires time series comparisons (e.g., from a line graph) to confirm results were reproduced #19

@bastoica

Description

@bastoica

The initial release of ArtEvalBench (v0.9, PR #15) contains a single artifact, Wasabi, whose results can be ultimately summarized as a single integer -- the number of bugs were triggered by the current attempt.

The goal of this feature request is adding an artifact that produces a more diverse set of results/outputs, including time series used for plots and figures, which require a more elaborate "results reproduced"/"experiment runs" evaluator oracle.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions