Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't replicate SWE-Agent's results with the eval script #418

Closed
avisil opened this issue May 25, 2024 · 1 comment
Closed

Can't replicate SWE-Agent's results with the eval script #418

avisil opened this issue May 25, 2024 · 1 comment
Labels
❔question Further information is requested

Comments

@avisil
Copy link

avisil commented May 25, 2024

Describe the bug

I downloaded the preds.jsonl from the SWE-Bench Lite's leaderboard web-page for the official submission of SWE-Agent that gets 18% resolved. This means SWE-Agent must have resolved 54 issues out of the 300 in the test set. However when I run evaluation on this jsonl file I get resolved: 33 which is much less than 54. What am I missing here?

Steps/commands/code to Reproduce

./run_eval.sh ../../swe-agent-public/all_preds.jsonl

Error message/results

Reference Report:
- no_generation: 18
- generated: 284
- with_logs: 251
- install_fail: 16
- reset_failed: 0
- no_apply: 0
- applied: 233
- test_errored: 0
- test_timeout: 1
- resolved: 33
- Wrote summary of run to ../../swe-agent-public/results.json```

### System Information

`Linux internal_machine 5.4.0-182-generic #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux`

### Checklist

- [X] I'm running with the latest docker container/on the latest development version
- [X] I've searched the other issues for a duplicate
- [X] I have copied the full command/code that I ran (as text, not as screenshot!)
- [X] If applicable: I have copied the **full** log file/error message that was the result (as text, not as screenshot!)
- [X] I have enclosed code/log messages in triple backticks ([docs](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code)) and clicked "Preview" to make sure it's displayed correctly.
@klieret klieret added the ❔question Further information is requested label May 27, 2024
@ofirpress
Copy link
Member

This is more of a SWE-bench than a SWE-agent issue, and was probably solved by the new Dockerized SWE-bench version, so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❔question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants