I'd like to run ProgramBench evaluation using the same setup you used, so using mini-swe-agent. The usage guide mentions that you'll release a baseline system for ProgramBench in mini-swe-agent. When do you plan to do this? I'm trying to replicate your results so I want to make sure I'm using the exact same agent.
Awesome work and excited to use this!
I'd like to run ProgramBench evaluation using the same setup you used, so using
mini-swe-agent. The usage guide mentions that you'll release a baseline system for ProgramBench inmini-swe-agent. When do you plan to do this? I'm trying to replicate your results so I want to make sure I'm using the exact same agent.Awesome work and excited to use this!