Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collate file and more tests from autogpt into testbed #915

Merged
merged 13 commits into from
Dec 14, 2023

Conversation

LeoLjl
Copy link
Collaborator

@LeoLjl LeoLjl commented Dec 8, 2023

Why are these changes needed?

Add more tests from autogpt and reorganize file structure.

Related issue number

Checks

@LeoLjl LeoLjl changed the title Add collate file and more tests from autogpt Add collate file and more tests from autogpt into testbed Dec 8, 2023
@codecov-commenter
Copy link

codecov-commenter commented Dec 8, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1b4fb8f) 26.58% compared to head (2d762fe) 26.44%.
Report is 14 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #915      +/-   ##
==========================================
- Coverage   26.58%   26.44%   -0.14%     
==========================================
  Files          28       28              
  Lines        3732     3777      +45     
  Branches      847      858      +11     
==========================================
+ Hits          992      999       +7     
- Misses       2667     2706      +39     
+ Partials       73       72       -1     
Flag Coverage Δ
unittests 26.44% <ø> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@LeoLjl
Copy link
Collaborator Author

LeoLjl commented Dec 8, 2023

The pipeline is ready to go, just more tests to be added.

@afourney afourney self-requested a review December 9, 2023 05:37
@LeoLjl LeoLjl marked this pull request as ready for review December 13, 2023 04:32
@LeoLjl
Copy link
Collaborator Author

LeoLjl commented Dec 13, 2023

@afourney PTAL. I've added 13 tests, covering coding, scraping, file io. The pipeline is similar to that of HumanEval. I add some packages to requirements.txt because they are foundamental to task solving. I don't think it is neccesary to waste conversation turns for agents to install necessary packages.

@afourney
Copy link
Member

reviewing now

@afourney
Copy link
Member

@afourney PTAL. I've added 13 tests, covering coding, scraping, file io. The pipeline is similar to that of HumanEval. I add some packages to requirements.txt because they are foundamental to task solving. I don't think it is neccesary to waste conversation turns for agents to install necessary packages.

I think the question about installing packages is an interesting one. If they are common packages, then yes, we should expect them to already be installed. If, however, they are pretty esoteric and specific to the problem (e.g., yfinance) then identifying and installing the library is arguably part of the problem the agents need to work through to succeed. This is actually why I designed the Testbed the way that I did... so that each run would face identical obstacles.

Probably we want to identify some common packages and actually have them pre-installed on a Docker image. Identifying a core set of packages before looking at the problem set is likely ideal and free of bias. Perhaps something like https://learnpython.com/blog/most-popular-python-packages/

Copy link
Member

@afourney afourney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I was able to run it no problem, and the documentation is good.

As per #976 there may be more work to do, but I am happy to accept it as is, then submit my own PR to standardize things as per #976 and #973.

Also, I think the jury is still out on if we want to allow the agents to continue after a failed test. This capability is no-doubt extremely useful to see how the agents adapt to a feedback signal (the same is true for HumanEval), but it makes the benchmarks incomparable to other reported numbers. An option to stop after the first attempt is something we may want to invest in, (and perhaps turn on by default). We should also at least make clear that our implementation diverges in this way

@qingyun-wu qingyun-wu added this pull request to the merge queue Dec 14, 2023
Merged via the queue into main with commit 2ee944d Dec 14, 2023
16 checks passed
@afourney afourney deleted the testbed_autogpt branch December 14, 2023 23:50
rlam3 pushed a commit to rlam3/autogen that referenced this pull request Dec 19, 2023
)

* Add collate file.

* Add requirements.txt, Fix typo, Add tests

* More tests.

* Update check.py

* Update scenario.py

* Update prepare_autogpt.py

* Update prepare_autogpt.py

* More tasks for testset.

* Add more tests.

* Update docs.

* Optimize file organize.
whiskyboy pushed a commit to whiskyboy/autogen that referenced this pull request Apr 17, 2024
* workflow path->paths

* Apply suggestions from code review

Co-authored-by: Li Jiang <bnujli@gmail.com>

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
whiskyboy pushed a commit to whiskyboy/autogen that referenced this pull request Apr 17, 2024
)

* Add collate file.

* Add requirements.txt, Fix typo, Add tests

* More tests.

* Update check.py

* Update scenario.py

* Update prepare_autogpt.py

* Update prepare_autogpt.py

* More tasks for testset.

* Add more tests.

* Update docs.

* Optimize file organize.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants