Add collate file and more tests from autogpt into testbed #915

LeoLjl · 2023-12-08T06:35:09Z

Why are these changes needed?

Add more tests from autogpt and reorganize file structure.

Related issue number

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

codecov-commenter · 2023-12-08T06:37:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1b4fb8f) 26.58% compared to head (2d762fe) 26.44%.
Report is 14 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #915      +/-   ##
==========================================
- Coverage   26.58%   26.44%   -0.14%     
==========================================
  Files          28       28              
  Lines        3732     3777      +45     
  Branches      847      858      +11     
==========================================
+ Hits          992      999       +7     
- Misses       2667     2706      +39     
+ Partials       73       72       -1

Flag	Coverage Δ
unittests	`26.44% <ø> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

LeoLjl · 2023-12-08T09:07:42Z

The pipeline is ready to go, just more tests to be added.

LeoLjl · 2023-12-13T14:54:06Z

@afourney PTAL. I've added 13 tests, covering coding, scraping, file io. The pipeline is similar to that of HumanEval. I add some packages to requirements.txt because they are foundamental to task solving. I don't think it is neccesary to waste conversation turns for agents to install necessary packages.

afourney · 2023-12-14T00:06:18Z

reviewing now

afourney · 2023-12-14T00:12:01Z

@afourney PTAL. I've added 13 tests, covering coding, scraping, file io. The pipeline is similar to that of HumanEval. I add some packages to requirements.txt because they are foundamental to task solving. I don't think it is neccesary to waste conversation turns for agents to install necessary packages.

I think the question about installing packages is an interesting one. If they are common packages, then yes, we should expect them to already be installed. If, however, they are pretty esoteric and specific to the problem (e.g., yfinance) then identifying and installing the library is arguably part of the problem the agents need to work through to succeed. This is actually why I designed the Testbed the way that I did... so that each run would face identical obstacles.

Probably we want to identify some common packages and actually have them pre-installed on a Docker image. Identifying a core set of packages before looking at the problem set is likely ideal and free of bias. Perhaps something like https://learnpython.com/blog/most-popular-python-packages/

afourney

Looks good to me. I was able to run it no problem, and the documentation is good.

As per #976 there may be more work to do, but I am happy to accept it as is, then submit my own PR to standardize things as per #976 and #973.

Also, I think the jury is still out on if we want to allow the agents to continue after a failed test. This capability is no-doubt extremely useful to see how the agents adapt to a feedback signal (the same is true for HumanEval), but it makes the benchmarks incomparable to other reported numbers. An option to stop after the first attempt is something we may want to invest in, (and perhaps turn on by default). We should also at least make clear that our implementation diverges in this way

) * Add collate file. * Add requirements.txt, Fix typo, Add tests * More tests. * Update check.py * Update scenario.py * Update prepare_autogpt.py * Update prepare_autogpt.py * More tasks for testset. * Add more tests. * Update docs. * Optimize file organize.

* workflow path->paths * Apply suggestions from code review Co-authored-by: Li Jiang <bnujli@gmail.com> --------- Co-authored-by: Li Jiang <bnujli@gmail.com>

) * Add collate file. * Add requirements.txt, Fix typo, Add tests * More tests. * Update check.py * Update scenario.py * Update prepare_autogpt.py * Update prepare_autogpt.py * More tasks for testset. * Add more tests. * Update docs. * Optimize file organize.

Add collate file.

9d6e88d

LeoLjl changed the title ~~Add collate file and more tests from autogpt~~ Add collate file and more tests from autogpt into testbed Dec 8, 2023

LeoLjl added 3 commits December 8, 2023 08:09

Add requirements.txt, Fix typo, Add tests

c88d100

More tests.

1eab2c4

Merge branch 'main' into testbed_autogpt

bb5e4dc

afourney self-requested a review December 9, 2023 05:37

LeoLjl added 5 commits December 12, 2023 22:58

Update check.py

93e2254

Update scenario.py

5c8e020

Update prepare_autogpt.py

d424643

Update prepare_autogpt.py

d0240b3

More tasks for testset.

cb8fb7f

LeoLjl marked this pull request as ready for review December 13, 2023 04:32

LeoLjl added 3 commits December 13, 2023 04:34

Add more tests.

bf3e4d4

Update docs.

8924f13

Optimize file organize.

2d762fe

LeoLjl requested review from yiranwu0 and qingyun-wu December 13, 2023 14:54

afourney approved these changes Dec 14, 2023

View reviewed changes

Merge branch 'main' into testbed_autogpt

cc39a90

qingyun-wu added this pull request to the merge queue Dec 14, 2023

Merged via the queue into main with commit 2ee944d Dec 14, 2023
16 checks passed

afourney deleted the testbed_autogpt branch December 14, 2023 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add collate file and more tests from autogpt into testbed #915

Add collate file and more tests from autogpt into testbed #915

LeoLjl commented Dec 8, 2023 •

edited

codecov-commenter commented Dec 8, 2023 •

edited

LeoLjl commented Dec 8, 2023

LeoLjl commented Dec 13, 2023 •

edited

afourney commented Dec 14, 2023

afourney commented Dec 14, 2023

afourney left a comment •

edited

Add collate file and more tests from autogpt into testbed #915

Add collate file and more tests from autogpt into testbed #915

Conversation

LeoLjl commented Dec 8, 2023 • edited

Why are these changes needed?

Related issue number

Checks

codecov-commenter commented Dec 8, 2023 • edited

Codecov Report

LeoLjl commented Dec 8, 2023

LeoLjl commented Dec 13, 2023 • edited

afourney commented Dec 14, 2023

afourney commented Dec 14, 2023

afourney left a comment • edited

Choose a reason for hiding this comment

LeoLjl commented Dec 8, 2023 •

edited

codecov-commenter commented Dec 8, 2023 •

edited

LeoLjl commented Dec 13, 2023 •

edited

afourney left a comment •

edited