You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I downloaded the preds.jsonl from the SWE-Bench Lite's leaderboard web-page for the official submission of SWE-Agent that gets 18% resolved. This means SWE-Agent must have resolved 54 issues out of the 300 in the test set. However when I run evaluation on this jsonl file I get resolved: 33 which is much less than 54. What am I missing here?
Reference Report:
- no_generation: 18
- generated: 284
- with_logs: 251
- install_fail: 16
- reset_failed: 0
- no_apply: 0
- applied: 233
- test_errored: 0
- test_timeout: 1
- resolved: 33
- Wrote summary of run to ../../swe-agent-public/results.json```
### System Information
`Linux internal_machine 5.4.0-182-generic #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux`
### Checklist
- [X] I'm running with the latest docker container/on the latest development version
- [X] I've searched the other issues for a duplicate
- [X] I have copied the full command/code that I ran (as text, not as screenshot!)
- [X] If applicable: I have copied the **full** log file/error message that was the result (as text, not as screenshot!)
- [X] I have enclosed code/log messages in triple backticks ([docs](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code)) and clicked "Preview" to make sure it's displayed correctly.
The text was updated successfully, but these errors were encountered:
Describe the bug
I downloaded the
preds.jsonl
from the SWE-Bench Lite's leaderboard web-page for the official submission of SWE-Agent that gets 18% resolved. This means SWE-Agent must have resolved 54 issues out of the 300 in the test set. However when I run evaluation on this jsonl file I getresolved: 33
which is much less than 54. What am I missing here?Steps/commands/code to Reproduce
./run_eval.sh ../../swe-agent-public/all_preds.jsonl
Error message/results
The text was updated successfully, but these errors were encountered: