Request gold compile.sh scripts

Hi there.

Really like this benchmark. I’m building a Modal-based pipeline to evaluate ProgramBench with higher parallelism, and I’d like to validate my implementation end-to-end without running a full agent each time.

For that purpose, I’m looking for a way to mock the agent generation phase with known-good submissions. The repo part is simple as the commit hash is known. As for the compile script part, it is a little bit tricky. Would it be possible to release the compile.sh scripts used to build the gold/reference executables, or any equivalent reference build scripts?

I understand if these cannot be shared due to benchmark integrity concerns. In that case, is there a recommended way to create a small set of known-good mock submissions for validating the evaluation pipeline?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request gold compile.sh scripts #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request gold compile.sh scripts #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions