[Bug]: No module named 'testbed_utils' (GAIA MultiAgent v0.1) #1943

SimonB97 · 2024-03-10T11:19:55Z

Describe the bug

Hello, I'm trying to get the recently published MultiAgent workflow to run, but the readme is sparse, and I'm having issues importing some module named 'testbed_utils', which I presume to be some custom module.

When running scenario.py, iI get this error back:

ModuleNotFoundError: No module named 'testbed_utils'

I have installed the requirements included but this module is not listed.

Steps to reproduce

clone Multi Agent workflow (see link in desc)
create venv with python==3.10 (i'm using conda)
install requirements.txt
run python scenario.py

Model Used

No response

Expected Behavior

Agents workflow should be run (based on prompt.txt)

Screenshots and logs

No response

Additional Information

Windows 11
pyautogen==0.2.17
Python 3.10.13

The text was updated successfully, but these errors were encountered:

afourney · 2024-03-10T15:59:43Z

Thanks,

So that's our GAIA entry, and it expects to run in the AutoGenBench environment. From memory, I think the fastest way to get up and running is:

git clone git@github.com:microsoft/autogen.git
git checkout gaia_multiagent_v01_march_1st
cd autogen/samples/tools/autogenbench
install -e .
cd scenarios/GAIA
python Scripts/init_tasks.py
autogenbench run Tasks/gaia_validation_level_1__Orchestrator.jsonl

Line 1-2 clone the repo and select the right branch
Line 3-4 navigates to autogenbench and installs it
lines 5-6 navigate to the GAIA scenario and initialize the tasks (download them from Huggingface and convert to our format)
Line 7 runs the benchmark.

AutoGenBench requires Docker to be running.

Now, if you are just trying to use this agent configuration for some other problem, and have no interest in running GAIA, I advise you to still do steps 1-2 (because the template depends on a version of autogen that diverges from main), and install it. Then you can find the testbed_utils file here: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/template/testbed_utils.py

Or, just delete reference to it -- testbed utils isn't needed for anything except logging.

Finally, you will need to run this in an appropriate environment. The Dockerfile we use is:
https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/res/Dockerfile

If you don't use Docker, then install the following requirements:

pip install python-docx pdfminer.six requests pillow easyocr python-pptx SpeechRecognition pandas openpyxl pydub mammoth puremagic youtube_transcript_api==0.6.0 mammoth puremagic

And install the following command line tools

sudo apt-get install ffmpeg exiftool

I should probably add all this to the readme :)

Our plan is to do some ablation studies, and once we know which components are truly necessary for this entry, we can simplify all this and ship a more turnkey solution.

Oh, and I nearly forgot! Your OAI_CONFIG_LIST should look like this:

[
    {
        "model": "gpt-4-turbo-preview",
        "api_key": "REDACTED",
        "tags": ["llm"],
        "organization": "REDACTED",
        "max_retries": 65535
    },
    {
        "model": "gpt-4-vision-preview",
        "api_key": "REDACTED",
        "tags": ["mlm"],
        "max_tokens": 1000,
        "organization": "REDACTED",
        "max_retries": 65535
    }
]

Note the vision model has tag "mlm", and the language model has tag "llm".

You will also need a Bing API key for web surfer. It reads this key from the OS environment variable "BING_API_KEY"

afourney · 2024-03-10T16:07:20Z

Let me just echo this point again: Once we've had a chance to do some ablation studies, we will simplify the installation process, and likely merge some components into main for general use. At present, this is all still very experimental.

SimonB97 · 2024-03-10T16:24:59Z

Thank you for the thorough walkthrough!

It's not a problem that it's experimental, I'm just looking into this for research purposes, not production. Still looking forward to your refined version, once that's ready!

I'm going to try the steps you provided for general use and come back to you with any further issues.

SimonB97 · 2024-03-11T21:44:03Z

As i run install -e . in the specified directory, I'm prompted to specifiy the location of a zip file named nvm-windows:

(gaiaagent) C:\Users\sbene\Projects\AutoGen-tests\gaia_multiagent_v1\autogen\samples\tools\autogenbench>install -e .
Enter the absolute path where the nvm-windows zip file is extracted/copied to:

Does this refer to a version nvm-windows found here?

EDIT:
I guess this means pip install -e . instead, which worked.

SimonB97 · 2024-03-12T12:39:42Z

I'm facing an issue where the web_surfer agent seems unable to perform queries to bing web search. It claims "I'm going to search for X now ...", but when the orchestrator asks for a summary, it says it didn't perform a query yet, and this happens in a loop.

I'm sure my API key isn't the problem, I've verified that.
Any ideas on what could be problem here?

...
{
    "content": "Start by researching hallucination detection methods for Large Language Models. Look for academic papers, articles, and books about artificial intelligence and machine learning that discuss this topic. Summarize the findings and provide the sources for reference.",
    "name": "orchestrator",
    "role": "assistant"
},
{
    "content": "Sure, I will start by searching for academic papers and articles on \"hallucination detection methods for Large Language Models\".",
    "name": "web_surfer",
    "role": "assistant"
},
{
    "content": "Please provide a summary of your findings on hallucination detection methods for Large Language Models, along with the sources you used for your research.",
    "name": "orchestrator",
    "role": "assistant"
},
{
    "content": "I apologize for the confusion, but I need to perform the search first. Let's start by searching for \"hallucination detection methods for Large Language Models\" in academic databases and scholarly articles.",
    "name": "web_surfer",
    "role": "assistant"
},
{
    "content": "Please provide a summary of your findings on hallucination detection methods for Large Language Models, along with the sources you used for your research.",
    "name": "orchestrator",
    "role": "assistant"
},
{
    "content": "I apologize for the confusion earlier. Let's start by searching for \"hallucination detection methods for Large Language Models\" in academic databases and scholarly articles.",
    "name": "web_surfer",
    "role": "assistant"
},
...

afourney · 2024-03-14T20:31:28Z

That is weird. Can you share the console_log. It has more info.

franciscoabenza · 2024-04-19T21:35:13Z

I am stuck:
ModuleNotFoundError: No module named 'autogen.mdconvert'

afourney · 2024-04-19T23:58:14Z

I am stuck: ModuleNotFoundError: No module named 'autogen.mdconvert'

mdconvert is a file that only exists in the gaia branch. Can you please provide more details about how you've set things up?

franciscoabenza · 2024-04-20T14:23:45Z

I just duplicate the file into scope. Now it all works.
I can see that the code is very oriented to run the tasks in that jsonl format. I am going to get rid of the "FILE_NAME" thingy. Is there any considerations to auto-select witch template (BasicTwoAgents,BasicTwoAgentsFunctionCalling,Orchestrator,SocietyOfMind) is most appropriate for what task? When running the GAIA benchmark, did you pick the best result out of each template? how was this handled? How much did it cost to run the whole thing 😵‍💫 ?

so many questions haha perhaps I can catch you in the next Discord Autogen call :P

Let me just echo this point again: Once we've had a chance to do some ablation studies, we will simplify the installation process, and likely merge some components into main for general use. At present, this is all still very experimental.

Waiting eagerly

zhanwenchen · 2024-05-09T13:28:54Z

Thanks,

So that's our GAIA entry, and it expects to run in the AutoGenBench environment. From memory, I think the fastest way to get up and running is:
git clone git@github.com:microsoft/autogen.git
git checkout gaia_multiagent_v01_march_1st
cd autogen/samples/tools/autogenbench
install -e .
cd scenarios/GAIA
python Scripts/init_tasks.py
autogenbench run Tasks/gaia_validation_level_1__Orchestrator.jsonl
Line 1-2 clone the repo and select the right branch Line 3-4 navigates to autogenbench and installs it lines 5-6 navigate to the GAIA scenario and initialize the tasks (download them from Huggingface and convert to our format) Line 7 runs the benchmark.

AutoGenBench requires Docker to be running.

Now, if you are just trying to use this agent configuration for some other problem, and have no interest in running GAIA, I advise you to still do steps 1-2 (because the template depends on a version of autogen that diverges from main), and install it. Then you can find the testbed_utils file here: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/template/testbed_utils.py

Or, just delete reference to it -- testbed utils isn't needed for anything except logging.

Finally, you will need to run this in an appropriate environment. The Dockerfile we use is: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/res/Dockerfile

If you don't use Docker, then install the following requirements:
pip install python-docx pdfminer.six requests pillow easyocr python-pptx SpeechRecognition pandas openpyxl pydub mammoth puremagic youtube_transcript_api==0.6.0 mammoth puremagic
And install the following command line tools
sudo apt-get install ffmpeg exiftool
I should probably add all this to the readme :)

Our plan is to do some ablation studies, and once we know which components are truly necessary for this entry, we can simplify all this and ship a more turnkey solution.

Oh, and I nearly forgot! Your OAI_CONFIG_LIST should look like this:
[
    {
        "model": "gpt-4-turbo-preview",
        "api_key": "REDACTED",
        "tags": ["llm"],
        "organization": "REDACTED",
        "max_retries": 65535
    },
    {
        "model": "gpt-4-vision-preview",
        "api_key": "REDACTED",
        "tags": ["mlm"],
        "max_tokens": 1000,
        "organization": "REDACTED",
        "max_retries": 65535
    }
]
Note the vision model has tag "mlm", and the language model has tag "llm".

You will also need a Bing API key for web surfer. It reads this key from the OS environment variable "BING_API_KEY"

@afourney What should I do next? I was guessing running the collate_results.py under Scripts, but it couldn't capture the answers:

(base) zhanwen@mini:~/.../scenarios/GAIA$ python Scripts/collate_results.py Results
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "27d5d136-8563-469e-92bf-fd103c28b57c",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "ec09fa32-d03f-4bf8-84b0-1f16922c3ae4",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "2d83110e-a098-4ebb-9987-066c06fa42d0",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "42576abe-0deb-4869-8c63-225c2d75a95a",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "cca530fc-4052-43b2-b130-b30968d8aa44",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "72e110e7-464c-453c-a309-90a95aed6538",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "8e867cd7-cff9-4e6c-867a-ff5ddc2550be",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "46719c30-f4c3-4cad-be07-d5cb21eee6bb",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "4b6bb5f7-f634-410e-815d-e673ab7f8632",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "dc28cf18-6431-458b-83ef-64b3ce566c10",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "5d0080cb-90d7-4712-bc33-848150e917d3",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "b415aba4-4b68-4fc6-9b89-2c812e55a3e1",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "e1fc63a2-da7a-432f-be78-7c4a95598703",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "a1e91b78-d3d8-4675-bb8d-62741b4b68a6",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "5cfb274c-0207-4aa7-9575-6ac0bd95d9b2",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "b816bfce-3d80-4913-a07d-69b752ce6377",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}

liususan091219 · 2024-05-29T21:12:36Z

Thanks,

So that's our GAIA entry, and it expects to run in the AutoGenBench environment. From memory, I think the fastest way to get up and running is:
git clone git@github.com:microsoft/autogen.git
git checkout gaia_multiagent_v01_march_1st
cd autogen/samples/tools/autogenbench
install -e .
cd scenarios/GAIA
python Scripts/init_tasks.py
autogenbench run Tasks/gaia_validation_level_1__Orchestrator.jsonl
Line 1-2 clone the repo and select the right branch Line 3-4 navigates to autogenbench and installs it lines 5-6 navigate to the GAIA scenario and initialize the tasks (download them from Huggingface and convert to our format) Line 7 runs the benchmark.

AutoGenBench requires Docker to be running.

Now, if you are just trying to use this agent configuration for some other problem, and have no interest in running GAIA, I advise you to still do steps 1-2 (because the template depends on a version of autogen that diverges from main), and install it. Then you can find the testbed_utils file here: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/template/testbed_utils.py

Or, just delete reference to it -- testbed utils isn't needed for anything except logging.

Finally, you will need to run this in an appropriate environment. The Dockerfile we use is: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/res/Dockerfile

If you don't use Docker, then install the following requirements:
pip install python-docx pdfminer.six requests pillow easyocr python-pptx SpeechRecognition pandas openpyxl pydub mammoth puremagic youtube_transcript_api==0.6.0 mammoth puremagic
And install the following command line tools
sudo apt-get install ffmpeg exiftool
I should probably add all this to the readme :)

Our plan is to do some ablation studies, and once we know which components are truly necessary for this entry, we can simplify all this and ship a more turnkey solution.

Oh, and I nearly forgot! Your OAI_CONFIG_LIST should look like this:
[
    {
        "model": "gpt-4-turbo-preview",
        "api_key": "REDACTED",
        "tags": ["llm"],
        "organization": "REDACTED",
        "max_retries": 65535
    },
    {
        "model": "gpt-4-vision-preview",
        "api_key": "REDACTED",
        "tags": ["mlm"],
        "max_tokens": 1000,
        "organization": "REDACTED",
        "max_retries": 65535
    }
]
Note the vision model has tag "mlm", and the language model has tag "llm".

You will also need a Bing API key for web surfer. It reads this key from the OS environment variable "BING_API_KEY"

Hi @afourney ! I followed these steps and created a docker image from the docker file, but stuck at this line: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/run_cmd.py#L487 with the following issues:

===================================================================
sh: 0: cannot open run.sh: No such file
Running scenario Results/gaia_validation_level_1__Orchestrator/5a0c1adf-205e-4841-a666-7c3ef95def9d/0
Mounting:
[RW]	'Results/gaia_validation_level_1__Orchestrator/5a0c1adf-205e-4841-a666-7c3ef95def9d/0' => '/workspace'
[RW]	'/data/xliu127/projects/security/vulnerability/nday/autogen' => '/autogen'
===================================================================

Any chance you know how to solve this? Could you possibly upload your docker image? Thanks in advance 🙏

rysweet · 2024-10-18T20:56:19Z

@afourney - do we intend to fix?

SimonB97 added the bug label Mar 10, 2024

jackgerrits assigned afourney Mar 10, 2024

jackgerrits added the proj-autogenbench Issues related to AutoGenBench. label Mar 10, 2024

afourney mentioned this issue May 5, 2024

[Issue]: How to Repro GAIA Benchmark Results #2592

Open

liususan091219 mentioned this issue May 29, 2024

[Issue]: testbed_utils #2629

Open

rysweet added 0.2 Issues which were filed before re-arch to 0.4 needs-triage labels Oct 2, 2024

rysweet removed the needs-triage label Oct 18, 2024

fniedtner removed the bug label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: No module named 'testbed_utils' (GAIA MultiAgent v0.1) #1943

[Bug]: No module named 'testbed_utils' (GAIA MultiAgent v0.1) #1943

SimonB97 commented Mar 10, 2024

afourney commented Mar 10, 2024 •

edited

Loading

afourney commented Mar 10, 2024

SimonB97 commented Mar 10, 2024

SimonB97 commented Mar 11, 2024 •

edited

Loading

SimonB97 commented Mar 12, 2024

afourney commented Mar 14, 2024

franciscoabenza commented Apr 19, 2024

afourney commented Apr 19, 2024

franciscoabenza commented Apr 20, 2024 •

edited

Loading

zhanwenchen commented May 9, 2024

liususan091219 commented May 29, 2024 •

edited

Loading

rysweet commented Oct 18, 2024

[Bug]: No module named 'testbed_utils' (GAIA MultiAgent v0.1) #1943

[Bug]: No module named 'testbed_utils' (GAIA MultiAgent v0.1) #1943

Comments

SimonB97 commented Mar 10, 2024

Describe the bug

Steps to reproduce

Model Used

Expected Behavior

Screenshots and logs

Additional Information

afourney commented Mar 10, 2024 • edited Loading

afourney commented Mar 10, 2024

SimonB97 commented Mar 10, 2024

SimonB97 commented Mar 11, 2024 • edited Loading

SimonB97 commented Mar 12, 2024

afourney commented Mar 14, 2024

franciscoabenza commented Apr 19, 2024

afourney commented Apr 19, 2024

franciscoabenza commented Apr 20, 2024 • edited Loading

zhanwenchen commented May 9, 2024

liususan091219 commented May 29, 2024 • edited Loading

rysweet commented Oct 18, 2024

afourney commented Mar 10, 2024 •

edited

Loading

SimonB97 commented Mar 11, 2024 •

edited

Loading

franciscoabenza commented Apr 20, 2024 •

edited

Loading

liususan091219 commented May 29, 2024 •

edited

Loading