Print trajectory path only at beginning/end #408

klieret · 2024-05-24T20:05:13Z

Closes #381

codecov · 2024-05-24T20:12:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.96%. Comparing base (2d0caa5) to head (9b71f2b).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #408      +/-   ##
==========================================
+ Coverage   75.94%   75.96%   +0.01%     
==========================================
  Files          18       18              
  Lines        2885     2887       +2     
==========================================
+ Hits         2191     2193       +2     
  Misses        694      694

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

[skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

…on-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

…ton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

* Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

* pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

…laude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

Add usage doc to run.py (princeton-nlp#243) Remove debug print statement with experimental communicate Update authors fix: TARGETARCH not set on some OS/docker setups (princeton-nlp#249) Add GPT4-turbo model (princeton-nlp#252) Update authors Add isolated flag to flake8 linting Add isolated flag to flake8 linting Fix typo - "doensn't" in templates (princeton-nlp#254) Fix typo - "succesfully" in templates (princeton-nlp#255) Catch one more docker error if docker isn't running (princeton-nlp#257) Refactor run.py main function into class with hook structure (princeton-nlp#253) * WIP * Refactor run.py into class with hook structure Closes princeton-nlp#170 * Add some more unit tests * Some more tests Added support for Bedrock-provided Claude models Refactored to AnthropicModel and BedrockModel to avoid code duplication; Added custom error messages Added Claude 3 Opus https://aws.amazon.com/blogs/aws/anthropics-claude-3-opus-model-on-amazon-bedrock/ Fixed model name logic and typing bugs; Added missing return statements Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

Fix opening PR from fork (princeton-nlp#229) Fix opening PR from fork Add changelog Tests to use fast experimental communication strategy (princeton-nlp#230) chore: update pre-commit hooks (princeton-nlp#231) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix pypi package installation command Update to evaluation logic Doc: missing 'no' in error message about --open_pr Better error handling for --open_pr (princeton-nlp#239) Closes princeton-nlp#237 Speed up testing with persistent containers & remove them end of session (princeton-nlp#238) Closes princeton-nlp#228 Closes princeton-nlp#201 Do not attempt to save patch with empty patch (princeton-nlp#242) * Fixed a potential error I've ran into this error several times, where it says model_patch can't be None and ending the entire program. * Do not attempt to save patch with empty patch --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Readme: GH token is optional Add usage doc to run.py (princeton-nlp#243) Remove debug print statement with experimental communicate Update authors fix: TARGETARCH not set on some OS/docker setups (princeton-nlp#249) Add GPT4-turbo model (princeton-nlp#252) Update authors Add isolated flag to flake8 linting Add isolated flag to flake8 linting Fix typo - "doensn't" in templates (princeton-nlp#254) Fix typo - "succesfully" in templates (princeton-nlp#255) Catch one more docker error if docker isn't running (princeton-nlp#257) Refactor run.py main function into class with hook structure (princeton-nlp#253) * WIP * Refactor run.py into class with hook structure Closes princeton-nlp#170 * Add some more unit tests * Some more tests Added support for Bedrock-provided Claude models Refactored to AnthropicModel and BedrockModel to avoid code duplication; Added custom error messages Added Claude 3 Opus https://aws.amazon.com/blogs/aws/anthropics-claude-3-opus-model-on-amazon-bedrock/ Fixed model name logic and typing bugs; Added missing return statements Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

Care was taken to add any missing fields to not break with old datafiles. Revert "Make instance a dataclass" This reverts commit 97bf5e3. Do not introduce dataclass Fix: Throw ValueError if local repo is dirty Test replay of batch mode Mention local run in readme Bump version Fix opening PR from fork (princeton-nlp#229) Fix opening PR from fork Add changelog Tests to use fast experimental communication strategy (princeton-nlp#230) chore: update pre-commit hooks (princeton-nlp#231) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix pypi package installation command Update to evaluation logic Doc: missing 'no' in error message about --open_pr Better error handling for --open_pr (princeton-nlp#239) Closes princeton-nlp#237 Speed up testing with persistent containers & remove them end of session (princeton-nlp#238) Closes princeton-nlp#228 Closes princeton-nlp#201 Do not attempt to save patch with empty patch (princeton-nlp#242) * Fixed a potential error I've ran into this error several times, where it says model_patch can't be None and ending the entire program. * Do not attempt to save patch with empty patch --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Readme: GH token is optional Add usage doc to run.py (princeton-nlp#243) Remove debug print statement with experimental communicate Update authors fix: TARGETARCH not set on some OS/docker setups (princeton-nlp#249) Add GPT4-turbo model (princeton-nlp#252) Update authors Add isolated flag to flake8 linting Add isolated flag to flake8 linting Fix typo - "doensn't" in templates (princeton-nlp#254) Fix typo - "succesfully" in templates (princeton-nlp#255) Catch one more docker error if docker isn't running (princeton-nlp#257) Refactor run.py main function into class with hook structure (princeton-nlp#253) * WIP * Refactor run.py into class with hook structure Closes princeton-nlp#170 * Add some more unit tests * Some more tests Added support for Bedrock-provided Claude models Refactored to AnthropicModel and BedrockModel to avoid code duplication; Added custom error messages Added Claude 3 Opus https://aws.amazon.com/blogs/aws/anthropics-claude-3-opus-model-on-amazon-bedrock/ Fixed model name logic and typing bugs; Added missing return statements Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

Remove left-over debug statements Test creation of persistent container (princeton-nlp#184) Typing fixes & improvements (princeton-nlp#187) Make github token fully optional (princeton-nlp#189) Closes princeton-nlp#152 Improve --help message option headers (princeton-nlp#192) The docstrings of the argument dataclasses are also used in the --help message. If they aren't set, the signature of the dataclass is shown instead. Update README nit: typos (princeton-nlp#212) Update README.md No need to specify platform in docker pull (princeton-nlp#210) Signed-off-by: 勇里 <yongli.zzp@antgroup.com> No need to specify platform in docker command Fix: undefined local var replay_task_instances_path Make patch note more noticeable (princeton-nlp#214) * WIP * More noticeable message about patch file being produced Closes princeton-nlp#206 test: add tests for parsing functions (princeton-nlp#218) * test: add tests for parsing functions * refactore: fix redundant arguments chore(models): simplify conditions and fix return types (princeton-nlp#216) * chore(models): simplify conditions and fix return types * undo formatting --------- Co-authored-by: pmprones <massimiliano.pronesti@amadeus.com> Rename is_from_github_url and minor typing fixes Add --problem_statement flag Allow to run on local repository Git apply patch if running locally Test running on local repo Use --data_path for local problem stmts and --repo_path for local repos Various fixes and improved tests for swe-env Make instance a dataclass Care was taken to add any missing fields to not break with old datafiles. Revert "Make instance a dataclass" This reverts commit 97bf5e3. Do not introduce dataclass Fix: Throw ValueError if local repo is dirty Test replay of batch mode Mention local run in readme Bump version Fix opening PR from fork (princeton-nlp#229) Fix opening PR from fork Add changelog Tests to use fast experimental communication strategy (princeton-nlp#230) chore: update pre-commit hooks (princeton-nlp#231) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix pypi package installation command Update to evaluation logic Doc: missing 'no' in error message about --open_pr Better error handling for --open_pr (princeton-nlp#239) Closes princeton-nlp#237 Speed up testing with persistent containers & remove them end of session (princeton-nlp#238) Closes princeton-nlp#228 Closes princeton-nlp#201 Do not attempt to save patch with empty patch (princeton-nlp#242) * Fixed a potential error I've ran into this error several times, where it says model_patch can't be None and ending the entire program. * Do not attempt to save patch with empty patch --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Readme: GH token is optional Add usage doc to run.py (princeton-nlp#243) Remove debug print statement with experimental communicate Update authors fix: TARGETARCH not set on some OS/docker setups (princeton-nlp#249) Add GPT4-turbo model (princeton-nlp#252) Update authors Add isolated flag to flake8 linting Add isolated flag to flake8 linting Fix typo - "doensn't" in templates (princeton-nlp#254) Fix typo - "succesfully" in templates (princeton-nlp#255) Catch one more docker error if docker isn't running (princeton-nlp#257) Refactor run.py main function into class with hook structure (princeton-nlp#253) * WIP * Refactor run.py into class with hook structure Closes princeton-nlp#170 * Add some more unit tests * Some more tests Added support for Bedrock-provided Claude models Refactored to AnthropicModel and BedrockModel to avoid code duplication; Added custom error messages Added Claude 3 Opus https://aws.amazon.com/blogs/aws/anthropics-claude-3-opus-model-on-amazon-bedrock/ Fixed model name logic and typing bugs; Added missing return statements Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

typo in `docker built -t sweagent/swe-agent-run:latest .` corrected to `build` [skip-CI] Doc style: Use GH markdown admonitions Doc: More installation hints Doc fix: Change wording (docker socket) Issues: Add 'question' label to questions; distinguish from bug Issue templates: 'question' label; disam from bugs Improve error handling of docker issues (princeton-nlp#165) Closes princeton-nlp#114 Closes princeton-nlp#123 Closes princeton-nlp#159 Fix: Correctly catch docker connection errors Allow to supply installation commands when running on gh issues (princeton-nlp#153) * Allow to supply installation commands when running on gh issue * Add doc for env specification Issue template: Two more checkboxes for dupes/version CI: Test OpenAI model (princeton-nlp#166) Minor improvements for models.py * refactor: Simple refactoring for clean code * change the fstring issue for flake8 * Fix up prefix matching issue * resolve conflicts * update the model list Fix warnings about simple_parsing import paths (princeton-nlp#176) Fix signature of ParseCommandDetailed (princeton-nlp#177) Simple typing improvements Use ruff and enable some more checks (princeton-nlp#174) * Check for unused imports and variables * Fix some issues * Remove some more unneeded imports * Switch to using ruff for checks * Remove two more imports Update evaluation to reflect swebench `get_model_report` Remove left-over debug statements Test creation of persistent container (princeton-nlp#184) Typing fixes & improvements (princeton-nlp#187) Make github token fully optional (princeton-nlp#189) Closes princeton-nlp#152 Improve --help message option headers (princeton-nlp#192) The docstrings of the argument dataclasses are also used in the --help message. If they aren't set, the signature of the dataclass is shown instead. Update README nit: typos (princeton-nlp#212) Update README.md No need to specify platform in docker pull (princeton-nlp#210) Signed-off-by: 勇里 <yongli.zzp@antgroup.com> No need to specify platform in docker command Fix: undefined local var replay_task_instances_path Make patch note more noticeable (princeton-nlp#214) * WIP * More noticeable message about patch file being produced Closes princeton-nlp#206 test: add tests for parsing functions (princeton-nlp#218) * test: add tests for parsing functions * refactore: fix redundant arguments chore(models): simplify conditions and fix return types (princeton-nlp#216) * chore(models): simplify conditions and fix return types * undo formatting --------- Co-authored-by: pmprones <massimiliano.pronesti@amadeus.com> Rename is_from_github_url and minor typing fixes Add --problem_statement flag Allow to run on local repository Git apply patch if running locally Test running on local repo Use --data_path for local problem stmts and --repo_path for local repos Various fixes and improved tests for swe-env Make instance a dataclass Care was taken to add any missing fields to not break with old datafiles. Revert "Make instance a dataclass" This reverts commit 97bf5e3. Do not introduce dataclass Fix: Throw ValueError if local repo is dirty Test replay of batch mode Mention local run in readme Bump version Fix opening PR from fork (princeton-nlp#229) Fix opening PR from fork Add changelog Tests to use fast experimental communication strategy (princeton-nlp#230) chore: update pre-commit hooks (princeton-nlp#231) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix pypi package installation command Update to evaluation logic Doc: missing 'no' in error message about --open_pr Better error handling for --open_pr (princeton-nlp#239) Closes princeton-nlp#237 Speed up testing with persistent containers & remove them end of session (princeton-nlp#238) Closes princeton-nlp#228 Closes princeton-nlp#201 Do not attempt to save patch with empty patch (princeton-nlp#242) * Fixed a potential error I've ran into this error several times, where it says model_patch can't be None and ending the entire program. * Do not attempt to save patch with empty patch --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Readme: GH token is optional Add usage doc to run.py (princeton-nlp#243) Remove debug print statement with experimental communicate Update authors fix: TARGETARCH not set on some OS/docker setups (princeton-nlp#249) Add GPT4-turbo model (princeton-nlp#252) Update authors Add isolated flag to flake8 linting Add isolated flag to flake8 linting Fix typo - "doensn't" in templates (princeton-nlp#254) Fix typo - "succesfully" in templates (princeton-nlp#255) Catch one more docker error if docker isn't running (princeton-nlp#257) Refactor run.py main function into class with hook structure (princeton-nlp#253) * WIP * Refactor run.py into class with hook structure Closes princeton-nlp#170 * Add some more unit tests * Some more tests Added support for Bedrock-provided Claude models Refactored to AnthropicModel and BedrockModel to avoid code duplication; Added custom error messages Added Claude 3 Opus https://aws.amazon.com/blogs/aws/anthropics-claude-3-opus-model-on-amazon-bedrock/ Fixed model name logic and typing bugs; Added missing return statements Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

Add markdown link checker (princeton-nlp#129) * Add markdown link checker * Fix & ignore broken markdown links Add markdown link checker badge Add run_replay integration test Add CI with github actions Add CI badge Fix: Choosing TogetherAI models (princeton-nlp#130) Closes 101 Revert "Remove azure override of model name (princeton-nlp#127)" This reverts commit 311467c. See discussion in princeton-nlp#127 Advertise experimental amd64 docker builds Fix typo in server.py seperately -> separately Update README.md - move badges to bottom Improve bug template Improve bug report template Improve bug report template Improve bug report template Better link for issue formatting Upload coverage data to codecov (princeton-nlp#140) Add codecov config and badge chore: update pre-commit hooks (princeton-nlp#141) updates: - [github.com/pre-commit/pre-commit-hooks: v4.5.0 → v4.6.0](pre-commit/pre-commit-hooks@v4.5.0...v4.6.0) - [github.com/pycqa/flake8: 4.0.1 → 7.0.0](PyCQA/flake8@4.0.1...7.0.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> multiplatform docker builds (princeton-nlp#131) * Select the right conda path from within the container * Build multiplatform images Improve test coverage (princeton-nlp#142) Doc: Remove architecture notice for docker Update README.md - change LLM to LM :) Update README.md Add Ollama support section Update README.md Update README.md Update README.md Add ollama link Increase coverage of swe-env tests (princeton-nlp#154) Fix typo in README.md (princeton-nlp#155) typo in `docker built -t sweagent/swe-agent-run:latest .` corrected to `build` [skip-CI] Doc style: Use GH markdown admonitions Doc: More installation hints Doc fix: Change wording (docker socket) Issues: Add 'question' label to questions; distinguish from bug Issue templates: 'question' label; disam from bugs Improve error handling of docker issues (princeton-nlp#165) Closes princeton-nlp#114 Closes princeton-nlp#123 Closes princeton-nlp#159 Fix: Correctly catch docker connection errors Allow to supply installation commands when running on gh issues (princeton-nlp#153) * Allow to supply installation commands when running on gh issue * Add doc for env specification Issue template: Two more checkboxes for dupes/version CI: Test OpenAI model (princeton-nlp#166) Minor improvements for models.py * refactor: Simple refactoring for clean code * change the fstring issue for flake8 * Fix up prefix matching issue * resolve conflicts * update the model list Fix warnings about simple_parsing import paths (princeton-nlp#176) Fix signature of ParseCommandDetailed (princeton-nlp#177) Simple typing improvements Use ruff and enable some more checks (princeton-nlp#174) * Check for unused imports and variables * Fix some issues * Remove some more unneeded imports * Switch to using ruff for checks * Remove two more imports Update evaluation to reflect swebench `get_model_report` Remove left-over debug statements Test creation of persistent container (princeton-nlp#184) Typing fixes & improvements (princeton-nlp#187) Make github token fully optional (princeton-nlp#189) Closes princeton-nlp#152 Improve --help message option headers (princeton-nlp#192) The docstrings of the argument dataclasses are also used in the --help message. If they aren't set, the signature of the dataclass is shown instead. Update README nit: typos (princeton-nlp#212) Update README.md No need to specify platform in docker pull (princeton-nlp#210) Signed-off-by: 勇里 <yongli.zzp@antgroup.com> No need to specify platform in docker command Fix: undefined local var replay_task_instances_path Make patch note more noticeable (princeton-nlp#214) * WIP * More noticeable message about patch file being produced Closes princeton-nlp#206 test: add tests for parsing functions (princeton-nlp#218) * test: add tests for parsing functions * refactore: fix redundant arguments chore(models): simplify conditions and fix return types (princeton-nlp#216) * chore(models): simplify conditions and fix return types * undo formatting --------- Co-authored-by: pmprones <massimiliano.pronesti@amadeus.com> Rename is_from_github_url and minor typing fixes Add --problem_statement flag Allow to run on local repository Git apply patch if running locally Test running on local repo Use --data_path for local problem stmts and --repo_path for local repos Various fixes and improved tests for swe-env Make instance a dataclass Care was taken to add any missing fields to not break with old datafiles. Revert "Make instance a dataclass" This reverts commit 97bf5e3. Do not introduce dataclass Fix: Throw ValueError if local repo is dirty Test replay of batch mode Mention local run in readme Bump version Fix opening PR from fork (princeton-nlp#229) Fix opening PR from fork Add changelog Tests to use fast experimental communication strategy (princeton-nlp#230) chore: update pre-commit hooks (princeton-nlp#231) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix pypi package installation command Update to evaluation logic Doc: missing 'no' in error message about --open_pr Better error handling for --open_pr (princeton-nlp#239) Closes princeton-nlp#237 Speed up testing with persistent containers & remove them end of session (princeton-nlp#238) Closes princeton-nlp#228 Closes princeton-nlp#201 Do not attempt to save patch with empty patch (princeton-nlp#242) * Fixed a potential error I've ran into this error several times, where it says model_patch can't be None and ending the entire program. * Do not attempt to save patch with empty patch --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Readme: GH token is optional Add usage doc to run.py (princeton-nlp#243) Remove debug print statement with experimental communicate Update authors fix: TARGETARCH not set on some OS/docker setups (princeton-nlp#249) Add GPT4-turbo model (princeton-nlp#252) Update authors Add isolated flag to flake8 linting Add isolated flag to flake8 linting Fix typo - "doensn't" in templates (princeton-nlp#254) Fix typo - "succesfully" in templates (princeton-nlp#255) Catch one more docker error if docker isn't running (princeton-nlp#257) Refactor run.py main function into class with hook structure (princeton-nlp#253) * WIP * Refactor run.py into class with hook structure Closes princeton-nlp#170 * Add some more unit tests * Some more tests Added support for Bedrock-provided Claude models Refactored to AnthropicModel and BedrockModel to avoid code duplication; Added custom error messages Added Claude 3 Opus https://aws.amazon.com/blogs/aws/anthropics-claude-3-opus-model-on-amazon-bedrock/ Fixed model name logic and typing bugs; Added missing return statements Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

Fix docker setup: updated image names (princeton-nlp#87) Add run via docker instructions to readme (princeton-nlp#90) * Add run via docker instructions to readme * Add note about windows * Add proper hint styling Small refactor: Add quicksart section (princeton-nlp#56) * Restructure readme: quickstart before eval * Remove mention of PR creation Small style fixes to readme Add note about windows with conda installation Update README.md Mention --open_pr flag Update README.md Update run.sh Update run_from_url.sh Update run.py default model arguments Update default model arguments - greedy decoding and 3.00 per instance cost Shell script highlighting in readme Update README.md Fix: Update default image name (princeton-nlp#102) Doc: Consolidate containerized run examples Add issue template fix: bad newline getting sent on windows (princeton-nlp#79) Signed-off-by: Chapman Pendery <cpendery@microsoft.com> Make sure that keys.cfg doesn't get copied to Docker Add templates for issues, pr Doc: Remove leftover "click to expand box" Fix release script: latest tag can already exist on dockerhub Add docs for how to write your own commands Mount keys.cfg within container Workaround for princeton-nlp#109 Doc: Missing backslash Improve bug report template (princeton-nlp#113) Add template workflow diagram Change doc_improvement to question Warning about containers being only for arm64 at the moment Code quality: Improve inference of return type Add flag to raise exceptions in run.py Forward unparsed arguments in run_replay.py to run.py Fix: Unbound local variable/name shadowing This probably only ran because of name shadowing Do not leave python when calling run.py This helps with debugging run_replay Separately save patch files + some typing cleanup (princeton-nlp#126) Closes princeton-nlp#41 Allow to configure openapi base url (princeton-nlp#118) --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Remove azure override of model name (princeton-nlp#127) Add pre-commit badge Add markdown link checker (princeton-nlp#129) * Add markdown link checker * Fix & ignore broken markdown links Add markdown link checker badge Add run_replay integration test Add CI with github actions Add CI badge Fix: Choosing TogetherAI models (princeton-nlp#130) Closes 101 Revert "Remove azure override of model name (princeton-nlp#127)" This reverts commit 311467c. See discussion in princeton-nlp#127 Advertise experimental amd64 docker builds Fix typo in server.py seperately -> separately Update README.md - move badges to bottom Improve bug template Improve bug report template Improve bug report template Improve bug report template Better link for issue formatting Upload coverage data to codecov (princeton-nlp#140) Add codecov config and badge chore: update pre-commit hooks (princeton-nlp#141) updates: - [github.com/pre-commit/pre-commit-hooks: v4.5.0 → v4.6.0](pre-commit/pre-commit-hooks@v4.5.0...v4.6.0) - [github.com/pycqa/flake8: 4.0.1 → 7.0.0](PyCQA/flake8@4.0.1...7.0.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> multiplatform docker builds (princeton-nlp#131) * Select the right conda path from within the container * Build multiplatform images Improve test coverage (princeton-nlp#142) Doc: Remove architecture notice for docker Update README.md - change LLM to LM :) Update README.md Add Ollama support section Update README.md Update README.md Update README.md Add ollama link Increase coverage of swe-env tests (princeton-nlp#154) Fix typo in README.md (princeton-nlp#155) typo in `docker built -t sweagent/swe-agent-run:latest .` corrected to `build` [skip-CI] Doc style: Use GH markdown admonitions Doc: More installation hints Doc fix: Change wording (docker socket) Issues: Add 'question' label to questions; distinguish from bug Issue templates: 'question' label; disam from bugs Improve error handling of docker issues (princeton-nlp#165) Closes princeton-nlp#114 Closes princeton-nlp#123 Closes princeton-nlp#159 Fix: Correctly catch docker connection errors Allow to supply installation commands when running on gh issues (princeton-nlp#153) * Allow to supply installation commands when running on gh issue * Add doc for env specification Issue template: Two more checkboxes for dupes/version CI: Test OpenAI model (princeton-nlp#166) Minor improvements for models.py * refactor: Simple refactoring for clean code * change the fstring issue for flake8 * Fix up prefix matching issue * resolve conflicts * update the model list Fix warnings about simple_parsing import paths (princeton-nlp#176) Fix signature of ParseCommandDetailed (princeton-nlp#177) Simple typing improvements Use ruff and enable some more checks (princeton-nlp#174) * Check for unused imports and variables * Fix some issues * Remove some more unneeded imports * Switch to using ruff for checks * Remove two more imports Update evaluation to reflect swebench `get_model_report` Remove left-over debug statements Test creation of persistent container (princeton-nlp#184) Typing fixes & improvements (princeton-nlp#187) Make github token fully optional (princeton-nlp#189) Closes princeton-nlp#152 Improve --help message option headers (princeton-nlp#192) The docstrings of the argument dataclasses are also used in the --help message. If they aren't set, the signature of the dataclass is shown instead. Update README nit: typos (princeton-nlp#212) Update README.md No need to specify platform in docker pull (princeton-nlp#210) Signed-off-by: 勇里 <yongli.zzp@antgroup.com> No need to specify platform in docker command Fix: undefined local var replay_task_instances_path Make patch note more noticeable (princeton-nlp#214) * WIP * More noticeable message about patch file being produced Closes princeton-nlp#206 test: add tests for parsing functions (princeton-nlp#218) * test: add tests for parsing functions * refactore: fix redundant arguments chore(models): simplify conditions and fix return types (princeton-nlp#216) * chore(models): simplify conditions and fix return types * undo formatting --------- Co-authored-by: pmprones <massimiliano.pronesti@amadeus.com> Rename is_from_github_url and minor typing fixes Add --problem_statement flag Allow to run on local repository Git apply patch if running locally Test running on local repo Use --data_path for local problem stmts and --repo_path for local repos Various fixes and improved tests for swe-env Make instance a dataclass Care was taken to add any missing fields to not break with old datafiles. Revert "Make instance a dataclass" This reverts commit 97bf5e3. Do not introduce dataclass Fix: Throw ValueError if local repo is dirty Test replay of batch mode Mention local run in readme Bump version Fix opening PR from fork (princeton-nlp#229) Fix opening PR from fork Add changelog Tests to use fast experimental communication strategy (princeton-nlp#230) chore: update pre-commit hooks (princeton-nlp#231) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix pypi package installation command Update to evaluation logic Doc: missing 'no' in error message about --open_pr Better error handling for --open_pr (princeton-nlp#239) Closes princeton-nlp#237 Speed up testing with persistent containers & remove them end of session (princeton-nlp#238) Closes princeton-nlp#228 Closes princeton-nlp#201 Do not attempt to save patch with empty patch (princeton-nlp#242) * Fixed a potential error I've ran into this error several times, where it says model_patch can't be None and ending the entire program. * Do not attempt to save patch with empty patch --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Readme: GH token is optional Add usage doc to run.py (princeton-nlp#243) Remove debug print statement with experimental communicate Update authors fix: TARGETARCH not set on some OS/docker setups (princeton-nlp#249) Add GPT4-turbo model (princeton-nlp#252) Update authors Add isolated flag to flake8 linting Add isolated flag to flake8 linting Fix typo - "doensn't" in templates (princeton-nlp#254) Fix typo - "succesfully" in templates (princeton-nlp#255) Catch one more docker error if docker isn't running (princeton-nlp#257) Refactor run.py main function into class with hook structure (princeton-nlp#253) * WIP * Refactor run.py into class with hook structure Closes princeton-nlp#170 * Add some more unit tests * Some more tests Added support for Bedrock-provided Claude models Refactored to AnthropicModel and BedrockModel to avoid code duplication; Added custom error messages Added Claude 3 Opus https://aws.amazon.com/blogs/aws/anthropics-claude-3-opus-model-on-amazon-bedrock/ Fixed model name logic and typing bugs; Added missing return statements Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

Update README.md Update make_demos README Update make_demos README Add demonstration trajectories Add support for ollama models Fix setup.py type Fix "idented" typo Update README.md Added link to GitHub token explanation Update README.md Fix broken links in readme (princeton-nlp#6) Typo fix readme (princeton-nlp#19) immensly -> immensely Add correspondence fix: allow token from keys.cfg to get passed to ghapi (princeton-nlp#31) Fix unbound variable in error handling (princeton-nlp#32) More helpful error message if docker is not running (princeton-nlp#33) See princeton-nlp#20 chore: remove gnureadline dependency (princeton-nlp#12) Doc: add TOGETHER_API_KEY to keys.cfg section of README (princeton-nlp#34) I noticed there is also a `TOGETHER_API_KEY` key that can be set in `keys.cfg`, but it wasn't mentioned in the README, so wanted to add it: https://github.com/princeton-nlp/SWE-agent/blob/6c9ebf0ea8a263806b276da7ba3b1eda1f4a9475/sweagent/agent/models.py#L509-L511 Fix typo omitted (princeton-nlp#45) ommitted -> omitted Increase portability of setup.sh; abort on failure In reference to princeton-nlp#42 config_file is a required arg in run_replay.sh (princeton-nlp#48) Fixes princeton-nlp#46 Handle with missing prompt_eval_count in Ollama (princeton-nlp#49) Closes princeton-nlp#44 feat(models): natively support claude haiku (princeton-nlp#9) fixed typo in config/README (princeton-nlp#55) Update README.md Add very basic pre-commit config (princeton-nlp#62) Open PR to repository More conditions to open PR; better commit msg; refactor Refactor: Move open PR code to env Remove debug messages; print PR URL; open PR as draft Skip PR creation if there are associated commits Refactor open-PR config and add override to skip if referenced Allow to specify separate URL to push to a fork Remove left-over prototyping code Add trajectory to PR Only allow overriding skip_if_commits_reference_issue on your own repo Update run.py Remove type hint to avoid flake8 false positive Fix: Unexpected keyword 'split' in load_dataset (princeton-nlp#76) Closes princeton-nlp#70 Fix: Allow run_replay with github URLs as data_path (princeton-nlp#58) Closes princeton-nlp#47 feat: add support for azure openai (princeton-nlp#16) * feat: add support for azure openai Signed-off-by: Chapman Pendery <cpendery@microsoft.com> * fix: feedback Signed-off-by: Chapman Pendery <cpendery@microsoft.com> * fix: add api_version Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com> * docs: add azure openai version to readme Signed-off-by: Chapman Pendery <cpendery@microsoft.com> * style: fix formatting Signed-off-by: Chapman Pendery <cpendery@microsoft.com> --------- Signed-off-by: Chapman Pendery <cpendery@microsoft.com> Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com> Add try/catch around PatchSet creation in evaluation Clean up run_replay Fix searching for flag-like strings, e.g., search_file "--flag" Update README.md Containerize application (princeton-nlp#81) Fix: Using docker images from dockerhub (princeton-nlp#85) Add release script for dockerhub (princeton-nlp#86) Fix docker setup: updated image names (princeton-nlp#87) Add run via docker instructions to readme (princeton-nlp#90) * Add run via docker instructions to readme * Add note about windows * Add proper hint styling Small refactor: Add quicksart section (princeton-nlp#56) * Restructure readme: quickstart before eval * Remove mention of PR creation Small style fixes to readme Add note about windows with conda installation Update README.md Mention --open_pr flag Update README.md Update run.sh Update run_from_url.sh Update run.py default model arguments Update default model arguments - greedy decoding and 3.00 per instance cost Shell script highlighting in readme Update README.md Fix: Update default image name (princeton-nlp#102) Doc: Consolidate containerized run examples Add issue template fix: bad newline getting sent on windows (princeton-nlp#79) Signed-off-by: Chapman Pendery <cpendery@microsoft.com> Make sure that keys.cfg doesn't get copied to Docker Add templates for issues, pr Doc: Remove leftover "click to expand box" Fix release script: latest tag can already exist on dockerhub Add docs for how to write your own commands Mount keys.cfg within container Workaround for princeton-nlp#109 Doc: Missing backslash Improve bug report template (princeton-nlp#113) Add template workflow diagram Change doc_improvement to question Warning about containers being only for arm64 at the moment Code quality: Improve inference of return type Add flag to raise exceptions in run.py Forward unparsed arguments in run_replay.py to run.py Fix: Unbound local variable/name shadowing This probably only ran because of name shadowing Do not leave python when calling run.py This helps with debugging run_replay Separately save patch files + some typing cleanup (princeton-nlp#126) Closes princeton-nlp#41 Allow to configure openapi base url (princeton-nlp#118) --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Remove azure override of model name (princeton-nlp#127) Add pre-commit badge Add markdown link checker (princeton-nlp#129) * Add markdown link checker * Fix & ignore broken markdown links Add markdown link checker badge Add run_replay integration test Add CI with github actions Add CI badge Fix: Choosing TogetherAI models (princeton-nlp#130) Closes 101 Revert "Remove azure override of model name (princeton-nlp#127)" This reverts commit 311467c. See discussion in princeton-nlp#127 Advertise experimental amd64 docker builds Fix typo in server.py seperately -> separately Update README.md - move badges to bottom Improve bug template Improve bug report template Improve bug report template Improve bug report template Better link for issue formatting Upload coverage data to codecov (princeton-nlp#140) Add codecov config and badge chore: update pre-commit hooks (princeton-nlp#141) updates: - [github.com/pre-commit/pre-commit-hooks: v4.5.0 → v4.6.0](pre-commit/pre-commit-hooks@v4.5.0...v4.6.0) - [github.com/pycqa/flake8: 4.0.1 → 7.0.0](PyCQA/flake8@4.0.1...7.0.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> multiplatform docker builds (princeton-nlp#131) * Select the right conda path from within the container * Build multiplatform images Improve test coverage (princeton-nlp#142) Doc: Remove architecture notice for docker Update README.md - change LLM to LM :) Update README.md Add Ollama support section Update README.md Update README.md Update README.md Add ollama link Increase coverage of swe-env tests (princeton-nlp#154) Fix typo in README.md (princeton-nlp#155) typo in `docker built -t sweagent/swe-agent-run:latest .` corrected to `build` [skip-CI] Doc style: Use GH markdown admonitions Doc: More installation hints Doc fix: Change wording (docker socket) Issues: Add 'question' label to questions; distinguish from bug Issue templates: 'question' label; disam from bugs Improve error handling of docker issues (princeton-nlp#165) Closes princeton-nlp#114 Closes princeton-nlp#123 Closes princeton-nlp#159 Fix: Correctly catch docker connection errors Allow to supply installation commands when running on gh issues (princeton-nlp#153) * Allow to supply installation commands when running on gh issue * Add doc for env specification Issue template: Two more checkboxes for dupes/version CI: Test OpenAI model (princeton-nlp#166) Minor improvements for models.py * refactor: Simple refactoring for clean code * change the fstring issue for flake8 * Fix up prefix matching issue * resolve conflicts * update the model list Fix warnings about simple_parsing import paths (princeton-nlp#176) Fix signature of ParseCommandDetailed (princeton-nlp#177) Simple typing improvements Use ruff and enable some more checks (princeton-nlp#174) * Check for unused imports and variables * Fix some issues * Remove some more unneeded imports * Switch to using ruff for checks * Remove two more imports Update evaluation to reflect swebench `get_model_report` Remove left-over debug statements Test creation of persistent container (princeton-nlp#184) Typing fixes & improvements (princeton-nlp#187) Make github token fully optional (princeton-nlp#189) Closes princeton-nlp#152 Improve --help message option headers (princeton-nlp#192) The docstrings of the argument dataclasses are also used in the --help message. If they aren't set, the signature of the dataclass is shown instead. Update README nit: typos (princeton-nlp#212) Update README.md No need to specify platform in docker pull (princeton-nlp#210) Signed-off-by: 勇里 <yongli.zzp@antgroup.com> No need to specify platform in docker command Fix: undefined local var replay_task_instances_path Make patch note more noticeable (princeton-nlp#214) * WIP * More noticeable message about patch file being produced Closes princeton-nlp#206 test: add tests for parsing functions (princeton-nlp#218) * test: add tests for parsing functions * refactore: fix redundant arguments chore(models): simplify conditions and fix return types (princeton-nlp#216) * chore(models): simplify conditions and fix return types * undo formatting --------- Co-authored-by: pmprones <massimiliano.pronesti@amadeus.com> Rename is_from_github_url and minor typing fixes Add --problem_statement flag Allow to run on local repository Git apply patch if running locally Test running on local repo Use --data_path for local problem stmts and --repo_path for local repos Various fixes and improved tests for swe-env Make instance a dataclass Care was taken to add any missing fields to not break with old datafiles. Revert "Make instance a dataclass" This reverts commit 97bf5e3. Do not introduce dataclass Fix: Throw ValueError if local repo is dirty Test replay of batch mode Mention local run in readme Bump version Fix opening PR from fork (princeton-nlp#229) Fix opening PR from fork Add changelog Tests to use fast experimental communication strategy (princeton-nlp#230) chore: update pre-commit hooks (princeton-nlp#231) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix pypi package installation command Update to evaluation logic Doc: missing 'no' in error message about --open_pr Better error handling for --open_pr (princeton-nlp#239) Closes princeton-nlp#237 Speed up testing with persistent containers & remove them end of session (princeton-nlp#238) Closes princeton-nlp#228 Closes princeton-nlp#201 Do not attempt to save patch with empty patch (princeton-nlp#242) * Fixed a potential error I've ran into this error several times, where it says model_patch can't be None and ending the entire program. * Do not attempt to save patch with empty patch --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Readme: GH token is optional Add usage doc to run.py (princeton-nlp#243) Remove debug print statement with experimental communicate Update authors fix: TARGETARCH not set on some OS/docker setups (princeton-nlp#249) Add GPT4-turbo model (princeton-nlp#252) Update authors Add isolated flag to flake8 linting Add isolated flag to flake8 linting Fix typo - "doensn't" in templates (princeton-nlp#254) Fix typo - "succesfully" in templates (princeton-nlp#255) Catch one more docker error if docker isn't running (princeton-nlp#257) Refactor run.py main function into class with hook structure (princeton-nlp#253) * WIP * Refactor run.py into class with hook structure Closes princeton-nlp#170 * Add some more unit tests * Some more tests Added support for Bedrock-provided Claude models Refactored to AnthropicModel and BedrockModel to avoid code duplication; Added custom error messages Added Claude 3 Opus https://aws.amazon.com/blogs/aws/anthropics-claude-3-opus-model-on-amazon-bedrock/ Fixed model name logic and typing bugs; Added missing return statements Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

@mikanfactory

Add results + preview image Fix website link Update README.md Update make_demos README Update make_demos README Add demonstration trajectories Add support for ollama models Fix setup.py type Fix "idented" typo Update README.md Added link to GitHub token explanation Update README.md Fix broken links in readme (princeton-nlp#6) Typo fix readme (princeton-nlp#19) immensly -> immensely Add correspondence fix: allow token from keys.cfg to get passed to ghapi (princeton-nlp#31) Fix unbound variable in error handling (princeton-nlp#32) More helpful error message if docker is not running (princeton-nlp#33) See princeton-nlp#20 chore: remove gnureadline dependency (princeton-nlp#12) Doc: add TOGETHER_API_KEY to keys.cfg section of README (princeton-nlp#34) I noticed there is also a `TOGETHER_API_KEY` key that can be set in `keys.cfg`, but it wasn't mentioned in the README, so wanted to add it: https://github.com/princeton-nlp/SWE-agent/blob/6c9ebf0ea8a263806b276da7ba3b1eda1f4a9475/sweagent/agent/models.py#L509-L511 Fix typo omitted (princeton-nlp#45) ommitted -> omitted Increase portability of setup.sh; abort on failure In reference to princeton-nlp#42 config_file is a required arg in run_replay.sh (princeton-nlp#48) Fixes princeton-nlp#46 Handle with missing prompt_eval_count in Ollama (princeton-nlp#49) Closes princeton-nlp#44 feat(models): natively support claude haiku (princeton-nlp#9) fixed typo in config/README (princeton-nlp#55) Update README.md Add very basic pre-commit config (princeton-nlp#62) Open PR to repository More conditions to open PR; better commit msg; refactor Refactor: Move open PR code to env Remove debug messages; print PR URL; open PR as draft Skip PR creation if there are associated commits Refactor open-PR config and add override to skip if referenced Allow to specify separate URL to push to a fork Remove left-over prototyping code Add trajectory to PR Only allow overriding skip_if_commits_reference_issue on your own repo Update run.py Remove type hint to avoid flake8 false positive Fix: Unexpected keyword 'split' in load_dataset (princeton-nlp#76) Closes princeton-nlp#70 Fix: Allow run_replay with github URLs as data_path (princeton-nlp#58) Closes princeton-nlp#47 feat: add support for azure openai (princeton-nlp#16) * feat: add support for azure openai Signed-off-by: Chapman Pendery <cpendery@microsoft.com> * fix: feedback Signed-off-by: Chapman Pendery <cpendery@microsoft.com> * fix: add api_version Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com> * docs: add azure openai version to readme Signed-off-by: Chapman Pendery <cpendery@microsoft.com> * style: fix formatting Signed-off-by: Chapman Pendery <cpendery@microsoft.com> --------- Signed-off-by: Chapman Pendery <cpendery@microsoft.com> Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com> Add try/catch around PatchSet creation in evaluation Clean up run_replay Fix searching for flag-like strings, e.g., search_file "--flag" Update README.md Containerize application (princeton-nlp#81) Fix: Using docker images from dockerhub (princeton-nlp#85) Add release script for dockerhub (princeton-nlp#86) Fix docker setup: updated image names (princeton-nlp#87) Add run via docker instructions to readme (princeton-nlp#90) * Add run via docker instructions to readme * Add note about windows * Add proper hint styling Small refactor: Add quicksart section (princeton-nlp#56) * Restructure readme: quickstart before eval * Remove mention of PR creation Small style fixes to readme Add note about windows with conda installation Update README.md Mention --open_pr flag Update README.md Update run.sh Update run_from_url.sh Update run.py default model arguments Update default model arguments - greedy decoding and 3.00 per instance cost Shell script highlighting in readme Update README.md Fix: Update default image name (princeton-nlp#102) Doc: Consolidate containerized run examples Add issue template fix: bad newline getting sent on windows (princeton-nlp#79) Signed-off-by: Chapman Pendery <cpendery@microsoft.com> Make sure that keys.cfg doesn't get copied to Docker Add templates for issues, pr Doc: Remove leftover "click to expand box" Fix release script: latest tag can already exist on dockerhub Add docs for how to write your own commands Mount keys.cfg within container Workaround for princeton-nlp#109 Doc: Missing backslash Improve bug report template (princeton-nlp#113) Add template workflow diagram Change doc_improvement to question Warning about containers being only for arm64 at the moment Code quality: Improve inference of return type Add flag to raise exceptions in run.py Forward unparsed arguments in run_replay.py to run.py Fix: Unbound local variable/name shadowing This probably only ran because of name shadowing Do not leave python when calling run.py This helps with debugging run_replay Separately save patch files + some typing cleanup (princeton-nlp#126) Closes princeton-nlp#41 Allow to configure openapi base url (princeton-nlp#118) --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Remove azure override of model name (princeton-nlp#127) Add pre-commit badge Add markdown link checker (princeton-nlp#129) * Add markdown link checker * Fix & ignore broken markdown links Add markdown link checker badge Add run_replay integration test Add CI with github actions Add CI badge Fix: Choosing TogetherAI models (princeton-nlp#130) Closes 101 Revert "Remove azure override of model name (princeton-nlp#127)" This reverts commit 311467c. See discussion in princeton-nlp#127 Advertise experimental amd64 docker builds Fix typo in server.py seperately -> separately Update README.md - move badges to bottom Improve bug template Improve bug report template Improve bug report template Improve bug report template Better link for issue formatting Upload coverage data to codecov (princeton-nlp#140) Add codecov config and badge chore: update pre-commit hooks (princeton-nlp#141) updates: - [github.com/pre-commit/pre-commit-hooks: v4.5.0 → v4.6.0](pre-commit/pre-commit-hooks@v4.5.0...v4.6.0) - [github.com/pycqa/flake8: 4.0.1 → 7.0.0](PyCQA/flake8@4.0.1...7.0.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> multiplatform docker builds (princeton-nlp#131) * Select the right conda path from within the container * Build multiplatform images Improve test coverage (princeton-nlp#142) Doc: Remove architecture notice for docker Update README.md - change LLM to LM :) Update README.md Add Ollama support section Update README.md Update README.md Update README.md Add ollama link Increase coverage of swe-env tests (princeton-nlp#154) Fix typo in README.md (princeton-nlp#155) typo in `docker built -t sweagent/swe-agent-run:latest .` corrected to `build` [skip-CI] Doc style: Use GH markdown admonitions Doc: More installation hints Doc fix: Change wording (docker socket) Issues: Add 'question' label to questions; distinguish from bug Issue templates: 'question' label; disam from bugs Improve error handling of docker issues (princeton-nlp#165) Closes princeton-nlp#114 Closes princeton-nlp#123 Closes princeton-nlp#159 Fix: Correctly catch docker connection errors Allow to supply installation commands when running on gh issues (princeton-nlp#153) * Allow to supply installation commands when running on gh issue * Add doc for env specification Issue template: Two more checkboxes for dupes/version CI: Test OpenAI model (princeton-nlp#166) Minor improvements for models.py * refactor: Simple refactoring for clean code * change the fstring issue for flake8 * Fix up prefix matching issue * resolve conflicts * update the model list Fix warnings about simple_parsing import paths (princeton-nlp#176) Fix signature of ParseCommandDetailed (princeton-nlp#177) Simple typing improvements Use ruff and enable some more checks (princeton-nlp#174) * Check for unused imports and variables * Fix some issues * Remove some more unneeded imports * Switch to using ruff for checks * Remove two more imports Update evaluation to reflect swebench `get_model_report` Remove left-over debug statements Test creation of persistent container (princeton-nlp#184) Typing fixes & improvements (princeton-nlp#187) Make github token fully optional (princeton-nlp#189) Closes princeton-nlp#152 Improve --help message option headers (princeton-nlp#192) The docstrings of the argument dataclasses are also used in the --help message. If they aren't set, the signature of the dataclass is shown instead. Update README nit: typos (princeton-nlp#212) Update README.md No need to specify platform in docker pull (princeton-nlp#210) Signed-off-by: 勇里 <yongli.zzp@antgroup.com> No need to specify platform in docker command Fix: undefined local var replay_task_instances_path Make patch note more noticeable (princeton-nlp#214) * WIP * More noticeable message about patch file being produced Closes princeton-nlp#206 test: add tests for parsing functions (princeton-nlp#218) * test: add tests for parsing functions * refactore: fix redundant arguments chore(models): simplify conditions and fix return types (princeton-nlp#216) * chore(models): simplify conditions and fix return types * undo formatting --------- Co-authored-by: pmprones <massimiliano.pronesti@amadeus.com> Rename is_from_github_url and minor typing fixes Add --problem_statement flag Allow to run on local repository Git apply patch if running locally Test running on local repo Use --data_path for local problem stmts and --repo_path for local repos Various fixes and improved tests for swe-env Make instance a dataclass Care was taken to add any missing fields to not break with old datafiles. Revert "Make instance a dataclass" This reverts commit 97bf5e3. Do not introduce dataclass Fix: Throw ValueError if local repo is dirty Test replay of batch mode Mention local run in readme Bump version Fix opening PR from fork (princeton-nlp#229) Fix opening PR from fork Add changelog Tests to use fast experimental communication strategy (princeton-nlp#230) chore: update pre-commit hooks (princeton-nlp#231) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix pypi package installation command Update to evaluation logic Doc: missing 'no' in error message about --open_pr Better error handling for --open_pr (princeton-nlp#239) Closes princeton-nlp#237 Speed up testing with persistent containers & remove them end of session (princeton-nlp#238) Closes princeton-nlp#228 Closes princeton-nlp#201 Do not attempt to save patch with empty patch (princeton-nlp#242) * Fixed a potential error I've ran into this error several times, where it says model_patch can't be None and ending the entire program. * Do not attempt to save patch with empty patch --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Readme: GH token is optional Add usage doc to run.py (princeton-nlp#243) Remove debug print statement with experimental communicate Update authors fix: TARGETARCH not set on some OS/docker setups (princeton-nlp#249) Add GPT4-turbo model (princeton-nlp#252) Update authors Add isolated flag to flake8 linting Add isolated flag to flake8 linting Fix typo - "doensn't" in templates (princeton-nlp#254) Fix typo - "succesfully" in templates (princeton-nlp#255) Catch one more docker error if docker isn't running (princeton-nlp#257) Refactor run.py main function into class with hook structure (princeton-nlp#253) * WIP * Refactor run.py into class with hook structure Closes princeton-nlp#170 * Add some more unit tests * Some more tests Added support for Bedrock-provided Claude models Refactored to AnthropicModel and BedrockModel to avoid code duplication; Added custom error messages Added Claude 3 Opus https://aws.amazon.com/blogs/aws/anthropics-claude-3-opus-model-on-amazon-bedrock/ Fixed model name logic and typing bugs; Added missing return statements Fixed None submission bug Fixed token-counting for older models with Bedrock anthropics/anthropic-sdk-python#353 Added max_tokens_to_sample for older models to avoid Bedrock val errors; Changed anthropic_history_to_messages output type Added missing rich_argparse pkg Change from claude 2 to claude 2.0 (see anthropics/anthropic-sdk-python#255) Changed alias name (claude --> claude-2) and target (claude-2.0 --> claude-2.1) pkg: merge all packaging stuff into pyproject.toml (princeton-nlp#256) * pkg: merge all packaging stuff into pyproject.toml * Add trivial test for packaging * Add Carlos' email to packaging --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Use legacy API for claude-2.1 Thanks to @mikanfactory for spotting this! Add hooks to agent (princeton-nlp#258) * Add hooks to agent * Test hook & fix non-running other tests Update defaults.sh - scroll_down was misnamed Use a shorter timeout duration for tests (princeton-nlp#264) Adding more hooks to env and agent (princeton-nlp#265) Update defaults and add last_5_history configs chore: update pre-commit hooks (princeton-nlp#268) updates: - [github.com/astral-sh/ruff-pre-commit: v0.3.7 → v0.4.1](astral-sh/ruff-pre-commit@v0.3.7...v0.4.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Pass Python version to get_environment_yml This ensures that the `environment.yml` is correctly constructed with the specific Python version required for the instance. Update swe_env.py Replicates installation behavior from SWE-bench at https://github.com/princeton-nlp/SWE-bench/blob/cfb20092bbbee9683176177b2f59b85f522e7f27/swebench/harness/context_manager.py#L354-L376 Minor condition changes Update edit_linting.sh - fix grammar issue Update cursors_edit_linting.sh - fix grammar issue Fix Together model validation error (princeton-nlp#236) * test: add unit test for Together model * fix: deal with the new Together API * chore: specify together version * refactor: clean code * change together model versioning from ">=~" to ">=" and write comment * raise exception when together SDK version is below 1.1.0 * refactor: update unit test format * speficy max_tokens chore: update pre-commit hooks (princeton-nlp#282) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> WIP: Create GH codespaces Codespaces: Fix permissions for talking to docker daemon Codespaces: Pull swe-agent image; conda init Codespace: Automatically activate swe-agent env Codespaces: Fix: don't overwrite bashrc (princeton-nlp#288) [Skip-ci] Update README.md Codespaces: Run additional setup as onCreateCommand Update devcontainer.json Revert "Update devcontainer.json" This reverts commit c8542e7. Add helpful message about conda env activation (princeton-nlp#289) Codespaces: Use pip install instead of creating new conda env (princeton-nlp#291) Doc: Avoid invalid github token (princeton-nlp#292) [skip-ci] Improve codespace setup & documentation (princeton-nlp#293) [skip-CI] * Codespaces: Remove shell setting; fix extensions setting [skip-ci] * Codespaces: Copy sample keys.cfg [skip-ci] * Codespaces: Add codespace badge [skip-CI] Doc: Add codespace video Codespace: Add startup message to terminal (princeton-nlp#294) [skip-ci] CI: Use pip for installation instead conda (princeton-nlp#299) * CI: Use pip for installation instead conda * Make sure that python is set up docker ignore everything from gitignore [skip-ci] Setup: do not duplicate requirements (princeton-nlp#300) * WIP * Fix: Need to copy app first before pip install . CI: Add GHA to test running setup.sh (princeton-nlp#302) Fix readme badge links (princeton-nlp#303) Enh: Allow to directly specify problem statement (princeton-nlp#308) fix:typo Fix: Include demonstrations in dockerignore (princeton-nlp#311) [skip-ci] Update README.md chore: update pre-commit hooks (princeton-nlp#318) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.2 → v0.4.3](astral-sh/ruff-pre-commit@v0.4.2...v0.4.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> lint: use `typos` as precommit's hook (princeton-nlp#259) * lint: use typos as precommit hook * fixing typos Doc: Recommend pip install instead of conda (princeton-nlp#304) * Doc: Recommend pip install instead of conda * Fix numbering [skip-ci] * Doc: Make installation with pip the default Doc fix: Misleading comment about env vars with docker Comment out all keys in sample keys.cfg by default Update swe_env.py fix typo Doc: Fix links to installation issues section [skip-ci] Doc: Fix link to installation issues section Web: Lay flask scaffolding Do not use unix signal calls Web: Can start runs from flask Web: Split feed into two Web: Use agent hooks Web: Separate messages in feeds; markdown support WIP Web: Add prompts to feed Web: Switch to using jquery Web: Add step index and scroll to it Web: Moved most of the interface to react Web: Bring back highlighting minor changes for server and client endpoints to better handling cors Web Fix: Every message to appear only once Web feat: Restore scrolling behavior Web feat: Kill running computation Web: Rename folder web -> api Web: Remove files from flask prototype Web refactor: Split up server.py Web feat: Display log messages (partially broken) Unfortunately all threads share the same stdout, so it's not trivial at all to redirect different threads to different stdouts Web enh: Control button activity depending on run state Web enh: Auto-scroll log messages Web enh: Only scroll and highlight after computation is finished Web enh: Make sure that killing thread succeeds Web: Factor out Feed.js; fix highlighting of step == null Web WIP: Started to integrate swe-agent/demo parts Web WIP: Styling and refactoring Web WIP: Split up message types Web enh: Bring in some highlighting Web feat: Include the rest of the demo code minor refactor of the server to fix 403 code and also missing secret_key adding requirements.txt there are many version conflicts in the codebase, it's hard to run the server without having the correct version. Adding the requirements to standardize the future setup Web: Fix port of server for websocket Web: Redirect all relevant stderr & handle errors in thread Web: Rename feeds Web: Add warning message if server is not connected Web: Simple script to start web server Codespace: Install npm Web: Make sure that pm2 is found in cleanup method Web: Factor out run control Web: Allow different ways to specify PS; repo path; bootstrap Web: Place controls in accordion Web: Format test run checkbox as switch Web fix: Reset highlighted step after running Web: Add flask dependencies disabled bubbles' scrolling and text color Rearranged input elements removed unnecessary elements create copy function for log panel change color for highlighted messages Web: Replace accordion with tabs Web: Various Styling improvements Web fix: Checkbox default state not reflected Web fix: Highlighting in terminal (restore linebreaks) Web enh: Remove highlight if mouse leaves message Web enh: Add timeout to highlight/scroll Web enh: Run button layout; logo; remove header Web: Add link to github readme Web feat: Model selection Web enh: Fix spacing of code blocks Better messages for InstantEmptySubmitTestModel Web: Remove "Thought" and fix info msg styling Web enh: Add start message; style no connection error msg Web style: Remove three dots; move logos into window bars Web style: Descriptions for other text fields Web ref: Move CSS to appropriate files Web: Move swe-agent logo to top bar Web: Font-size adjustments Web: Minimize menu when run started Web: Only show "Copy to clipboard" after run Web: Show critical errors in top banner Web: Show explicit support for local PS or repos Web: Improve handling of container closing Web: Assume compute has finished when 20s no update Web: Always use experimental speedups Web: Add note about successful pitch; real example by default Web: Catch bug with empty observation Web: Reformat code with prettier Print helpful error message when flask isn't available Close environment when raising exception Web: Always raise exceptions Web: Switch to silver logos Web: Change title of agent feed Web feat: Allow to specify python version & req pkgs Web feat: Allow to specify path to shell script Web: Temporarily disable timeout-based setIsComputing Web feat: Set custom install command Web style fix: Position of logo for narrow screens Fix: Handling of long problem statements Style: Black format api code [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove typo/comment Fix: Handling gh issue URLs as problem statements Doc: Add gif of web interface [skip ci] Doc: Add web UI instructions [skip ci] Fix typo [skip ci] Fix: Catch container not found and retry after wait Fixes princeton-nlp#322 Doc: Add information of how to open correct browser window (princeton-nlp#324) [skip ci] Doc: Suggest starting web UI in GH codespaces Update README.md - slight rewording of a header Web: Fix script_path input (princeton-nlp#334) Closes princeton-nlp#333 [skip ci] Update README.md - updating bibtex Update README.md Update README.md Readme: Fix links [skip ci] Improve handling of incorrect repo_path configs (princeton-nlp#340) Always get base_commit hash (can be specified as tag/branch) (princeton-nlp#341) Fix: Don't print patch msg for exit_cost patch (princeton-nlp#343) Closes princeton-nlp#342 Add gpt-4o model (princeton-nlp#344) Co-authored-by: Ray Myers <rmyers@indeed.com> Fix: Do not request job control in bash (princeton-nlp#345) Closes princeton-nlp#331 It's unlikely that job control was ever granted. Currently we're getting ERROR Unexpected container setup output: /bin/bash: cannot set terminal process group (-1): Inappropriate ioctl for device /bin/bash: no job control in this shell Because of this. Fix: --base_commit not used for gh urls (princeton-nlp#346) chore: update pre-commit hooks (princeton-nlp#347) updates: - [github.com/crate-ci/typos: v1.20.7 → v1.21.0](crate-ci/typos@v1.20.7...v1.21.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.3 → v0.4.4](astral-sh/ruff-pre-commit@v0.4.3...v0.4.4) - [github.com/pre-commit/mirrors-prettier: → v4.0.0-alpha.8](pre-commit/mirrors-prettier@...v4.0.0-alpha.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix: Separate data path/traj dir cause exception (princeton-nlp#348) Readme: Shorten ACI text [skip ci] Update README.md Update README.md Remove duplicated abstract method (princeton-nlp#355) Web: Refactor state into one runConfig with use-immer (princeton-nlp#350) Web: Allow to specify commit hash (princeton-nlp#358) Closes princeton-nlp#336 CI: Use uv pip install (princeton-nlp#360) * CI: Use uv pip install * CI: Try with explicit virtuale_env Web: Shorten long error messages in banner (princeton-nlp#361) Closes princeton-nlp#330 Wait longer if processes still running (princeton-nlp#364) Closes princeton-nlp#363 Update default_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update default_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-full_history-1_demos.yaml - adding warning to experimental config Update xml_sys-env_cursors_window100-detailed_cmd_format-last_5_history-1_demos.yaml - adding warning to experimental config Update README.md - clarify that traj arg has to be absolute path Fix handling of not_generated/no_generation in inspector (princeton-nlp#332) * Fix typo in inspector server.py This leads to "Results format not recognized" error whenever viewing the eval report for a trajectory. * Fix: Consistently handle no_generation vs not_generated --------- Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> Inspector: Better labels for roles (princeton-nlp#368) Closes princeton-nlp#365 Change icons for trajectory viewer (princeton-nlp#370) Closes princeton-nlp#365 Move documentation to mkdocs (princeton-nlp#371) Docs: Add installation overview page (princeton-nlp#377) Docs: Add github button; edit feature Docs: Change color preferences Docs: Add next prev/buttons CI: Skip CI for PRs that only touch docs Docs: Switch to documentation Add default environment_setup config (princeton-nlp#351) [skip ci] Docs: Fix max-width tag of doc link [skip ci] Doc: Significantly expand CL tutorial Doc: Restore docs on starting web UI on GH codespaces Doc: Add copy button; highlight specific lines Doc/CI: Speed up documentation build Doc: Move config docs to mkdocs CI: Set VIRTUAL_ENV for uv Doc: Fix inclusion of image in config.md Doc: Attempt to use relative image path Doc: Add changelog Closes princeton-nlp#335 Docs: Add more READMEs to mkdocs Remind people not to use screenshots when reporting bugs Remind people not to use screenshots for error messages Upper bound request version to avoid docker-py bug (princeton-nlp#390) Closes princeton-nlp#379 Doc: Replace symlinks with markdown files with links (princeton-nlp#392) Closes princeton-nlp#388 Docs: Add search (princeton-nlp#393) Closes princeton-nlp#387 Search is added by default but must be manually added if any other plugins are configured See https://github.com/squidfunk/mkdocs-material/blob/master/docs/setup/setting-up-site-search.md Docs: Add code of conduct (princeton-nlp#394) [skip ci] Add nodejs to swe-agent-run container (princeton-nlp#396) Docs: Note about old images from the hub (princeton-nlp#395) Docs: Advice to update pip if unsuccessful (princeton-nlp#399) Show error log if web server fails (princeton-nlp#400) [skip ci] CI: Fix passing python path to uv (princeton-nlp#401) Docs: Detailed way to start the web server (princeton-nlp#402) Docs: Use grids for prettier selections (princeton-nlp#403) Doc: Avoid duplicate information Docs: Add footer with links to report bugs (princeton-nlp#404) Docs/CI: Install mkdocs-include-markdown-plugin Improve question issue template Update question issue template Update question issue template Update question issue template Doc: Typo fix Split between configuration and development (princeton-nlp#407) Remove requests upper bound, add docker-py lower bound (princeton-nlp#406) Closes princeton-nlp#391 deprecate action from get_submission (princeton-nlp#274) Doc: Fix links to website pages (princeton-nlp#411) Print trajectory path only at beginning/end (princeton-nlp#408) Closes princeton-nlp#381 Fix: IndexError when replaying incomplete trajectories (princeton-nlp#410) Closes princeton-nlp#124 Add dev dependencies (princeton-nlp#414) Add dev notes (princeton-nlp#415) Docs: Move contribution guide to root to help gh discover it CI: Use github token during CI operations (princeton-nlp#412) Fixes princeton-nlp#405 Make use case for discord clearer Enh: Suppress openai logging; improve formatting of stats (princeton-nlp#416) Closes princeton-nlp#382 Tweaks to use swe-agent web UI from docker (princeton-nlp#423) Speed up evaluation by caching task environments as docker images (princeton-nlp#317) * cache task environment as docker images with separate tags * save env vars inside the task image before docker commit, debug timing * increase docker api timeout to afford long commits * fix * fix * remove timing collection code * some cleanup * remove timings storage * use close func to stop container * address review comment, type hint chore: update pre-commit hooks (princeton-nlp#424) updates: - [github.com/astral-sh/ruff-pre-commit: v0.4.4 → v0.4.5](astral-sh/ruff-pre-commit@v0.4.4...v0.4.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Add test for caching of task envs Make cached image name depend only on relevant features Document --cache_task_images Doc: Port more content from readme to docs/ (princeton-nlp#427) * Doc: Port more content from readme to docs/ * Fix links Remove signal dependency (princeton-nlp#428) Do not use select if running on Windows (princeton-nlp#429) * Do not use select if running on Windows * Test on windows Ensure that uv is avialable in containers (princeton-nlp#431) Use custom Config class to support env and keys.cfg (princeton-nlp#430) * Use custom Config class to support env and keys.cfg * Fix patching * Doc: Document use of environment variables * Doc: swap out env reference Doc: Document running web server from docker container (princeton-nlp#426) * Doc: Document running web server from docker container * Fix link Fix: Correct path to keys.cfg Fix: Config doesn't take pathlib.Path (princeton-nlp#434) Strip trailing whitespace & black formatting Allow ruff to write fixes [skip ci] Sort imports Code quality: Convert to make use of PEP 585 and PEP 604 CI: Add pyupgrade via ruff Add more fixable ruff checks Fix compatibility with main branch Fix unittest by excluding test data from formatting Doc: Add note about running tests (princeton-nlp#435) Add flake8-errmsg to tests Some more ruff checks Format: Use trailing commas CI: Add pytest rules CI: Add flake8 simplify Code qual: Some one-off fixes Docs: Note about updates (princeton-nlp#438) Remove direct imports in __init__.py; improve error handling of keys_config (princeton-nlp#436) keys_config Doc: Add notes about merge-conflicts after formatting changes (princeton-nlp#439) [skip CI] Dev: Exclude format commits from showing up in git blame [skip ci] Bump version [skip ci] Doc: Update changelog (princeton-nlp#441) CI: Release to dockerhub via github actions (princeton-nlp#440) * CI: Release to dockerhub via github actions * Checkout code * Fix name [skip ci] * Run daily by midnight * Doc: remove notice about later docker images Doc: Add badge for container build Doc: Document keywords of run.py (princeton-nlp#443) Closes princeton-nlp#442 Doc: Fix links to paper Doc: Fix broken formatting Update README.md Resolve relative paths to demonstrations and commands (princeton-nlp#444) * Resolve relative paths to demonstrations Closes princeton-nlp#225 * Resolve more paths relative to REPO_ROOT * Allow to override config root * Document Docs: Links to good first issues/help wanted Docs: Add more prominent note about formatting merge conflicts Update citation Doc: Add placeholder for updating forks Docs: Add verbose notes about avoiding formatting merge conflicts (princeton-nlp#448) * Docs: Add verbose notes about avoiding formatting merge conflicts * Include report footer Doc: Fix link to migration Docs: Update link to fix formatting issues Doc: Pull correct image for updating Docs: Improve installation steps Chore: Fix whitespace error Update demonstrations.md Update and rename faq.md to usage_faq.md Improve landing page and add background section (princeton-nlp#458) * Docs: Improve navigation from front page * Docs: Improve landing page * Fix link to changelog Docs: Start to add API documentation (princeton-nlp#460) Doc: Fix formatting and links CI/Docs: Add mkdocstrings to dependencies CI: Only run test build containers if changed (princeton-nlp#462) Docs/CI: Fix docs build & run for PRs (princeton-nlp#461) * CI: Always run mkdocs for testing * Actually build * Need to install complete dev * Specify python root * Fix link Docs: Fix inclusion of code structure Doc: Format fix Ensure container_name is reset for non-persistent containers (princeton-nlp#463) * Ensure container_name is reset for non-persistent containers Might help with princeton-nlp#451 * Always draw new container name Docs: Bring back some more ACI text Fix: Raise unclassified exception; use from e (princeton-nlp#464) * Fix: Raise unclassified exception; use from e * Improve exception logging Change run return_type default to "info_trajectory"; doc improvements (princeton-nlp#466) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs add swe env docstrings (princeton-nlp#468) * Change run return_type default to "info_trajectory"; doc improvements * Doc: Ensure that all public methods have docstring stub Otherwise not shown in docs * Doc: Add SWEEnv docstrings

Print trajectory path only at beginning/end

9b71f2b

Closes #381

klieret merged commit 63157e8 into main May 24, 2024
5 of 7 checks passed

klieret deleted the print-traj-path-only-beginning-end branch May 24, 2024 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Print trajectory path only at beginning/end #408

Print trajectory path only at beginning/end #408

klieret commented May 24, 2024

codecov bot commented May 24, 2024

Print trajectory path only at beginning/end #408

Print trajectory path only at beginning/end #408

Conversation

klieret commented May 24, 2024

codecov bot commented May 24, 2024

Codecov Report