Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: No module named 'testbed_utils' (GAIA MultiAgent v0.1) #1943

Open
SimonB97 opened this issue Mar 10, 2024 · 12 comments
Open

[Bug]: No module named 'testbed_utils' (GAIA MultiAgent v0.1) #1943

SimonB97 opened this issue Mar 10, 2024 · 12 comments
Assignees
Labels
0.2 Issues which were filed before re-arch to 0.4 proj-autogenbench Issues related to AutoGenBench.

Comments

@SimonB97
Copy link

Describe the bug

Hello, I'm trying to get the recently published MultiAgent workflow to run, but the readme is sparse, and I'm having issues importing some module named 'testbed_utils', which I presume to be some custom module.

When running scenario.py, iI get this error back:

ModuleNotFoundError: No module named 'testbed_utils'

I have installed the requirements included but this module is not listed.

Steps to reproduce

  1. clone Multi Agent workflow (see link in desc)
  2. create venv with python==3.10 (i'm using conda)
  3. install requirements.txt
  4. run python scenario.py

Model Used

No response

Expected Behavior

Agents workflow should be run (based on prompt.txt)

Screenshots and logs

No response

Additional Information

Windows 11
pyautogen==0.2.17
Python 3.10.13

@SimonB97 SimonB97 added the bug label Mar 10, 2024
@jackgerrits jackgerrits added the proj-autogenbench Issues related to AutoGenBench. label Mar 10, 2024
@afourney
Copy link
Member

afourney commented Mar 10, 2024

Thanks,

So that's our GAIA entry, and it expects to run in the AutoGenBench environment. From memory, I think the fastest way to get up and running is:

git clone git@github.com:microsoft/autogen.git
git checkout gaia_multiagent_v01_march_1st
cd autogen/samples/tools/autogenbench
install -e .
cd scenarios/GAIA
python Scripts/init_tasks.py
autogenbench run Tasks/gaia_validation_level_1__Orchestrator.jsonl

Line 1-2 clone the repo and select the right branch
Line 3-4 navigates to autogenbench and installs it
lines 5-6 navigate to the GAIA scenario and initialize the tasks (download them from Huggingface and convert to our format)
Line 7 runs the benchmark.

AutoGenBench requires Docker to be running.

Now, if you are just trying to use this agent configuration for some other problem, and have no interest in running GAIA, I advise you to still do steps 1-2 (because the template depends on a version of autogen that diverges from main), and install it. Then you can find the testbed_utils file here: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/template/testbed_utils.py

Or, just delete reference to it -- testbed utils isn't needed for anything except logging.

Finally, you will need to run this in an appropriate environment. The Dockerfile we use is:
https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/res/Dockerfile

If you don't use Docker, then install the following requirements:

pip install python-docx pdfminer.six requests pillow easyocr python-pptx SpeechRecognition pandas openpyxl pydub mammoth puremagic youtube_transcript_api==0.6.0 mammoth puremagic

And install the following command line tools

sudo apt-get install ffmpeg exiftool

I should probably add all this to the readme :)

Our plan is to do some ablation studies, and once we know which components are truly necessary for this entry, we can simplify all this and ship a more turnkey solution.

Oh, and I nearly forgot! Your OAI_CONFIG_LIST should look like this:

[
    {
        "model": "gpt-4-turbo-preview",
        "api_key": "REDACTED",
        "tags": ["llm"],
        "organization": "REDACTED",
        "max_retries": 65535
    },
    {
        "model": "gpt-4-vision-preview",
        "api_key": "REDACTED",
        "tags": ["mlm"],
        "max_tokens": 1000,
        "organization": "REDACTED",
        "max_retries": 65535
    }
]

Note the vision model has tag "mlm", and the language model has tag "llm".

You will also need a Bing API key for web surfer. It reads this key from the OS environment variable "BING_API_KEY"

@afourney
Copy link
Member

Let me just echo this point again: Once we've had a chance to do some ablation studies, we will simplify the installation process, and likely merge some components into main for general use. At present, this is all still very experimental.

@SimonB97
Copy link
Author

Thank you for the thorough walkthrough!

It's not a problem that it's experimental, I'm just looking into this for research purposes, not production. Still looking forward to your refined version, once that's ready!

I'm going to try the steps you provided for general use and come back to you with any further issues.

@SimonB97
Copy link
Author

SimonB97 commented Mar 11, 2024

As i run install -e . in the specified directory, I'm prompted to specifiy the location of a zip file named nvm-windows:

(gaiaagent) C:\Users\sbene\Projects\AutoGen-tests\gaia_multiagent_v1\autogen\samples\tools\autogenbench>install -e .
Enter the absolute path where the nvm-windows zip file is extracted/copied to:

Does this refer to a version nvm-windows found here?

EDIT:
I guess this means pip install -e . instead, which worked.

@SimonB97
Copy link
Author

I'm facing an issue where the web_surfer agent seems unable to perform queries to bing web search. It claims "I'm going to search for X now ...", but when the orchestrator asks for a summary, it says it didn't perform a query yet, and this happens in a loop.

I'm sure my API key isn't the problem, I've verified that.
Any ideas on what could be problem here?

...
{
    "content": "Start by researching hallucination detection methods for Large Language Models. Look for academic papers, articles, and books about artificial intelligence and machine learning that discuss this topic. Summarize the findings and provide the sources for reference.",
    "name": "orchestrator",
    "role": "assistant"
},
{
    "content": "Sure, I will start by searching for academic papers and articles on \"hallucination detection methods for Large Language Models\".",
    "name": "web_surfer",
    "role": "assistant"
},
{
    "content": "Please provide a summary of your findings on hallucination detection methods for Large Language Models, along with the sources you used for your research.",
    "name": "orchestrator",
    "role": "assistant"
},
{
    "content": "I apologize for the confusion, but I need to perform the search first. Let's start by searching for \"hallucination detection methods for Large Language Models\" in academic databases and scholarly articles.",
    "name": "web_surfer",
    "role": "assistant"
},
{
    "content": "Please provide a summary of your findings on hallucination detection methods for Large Language Models, along with the sources you used for your research.",
    "name": "orchestrator",
    "role": "assistant"
},
{
    "content": "I apologize for the confusion earlier. Let's start by searching for \"hallucination detection methods for Large Language Models\" in academic databases and scholarly articles.",
    "name": "web_surfer",
    "role": "assistant"
},
...

@afourney
Copy link
Member

That is weird. Can you share the console_log. It has more info.

@franciscoabenza
Copy link

I am stuck:
ModuleNotFoundError: No module named 'autogen.mdconvert'

@afourney
Copy link
Member

I am stuck: ModuleNotFoundError: No module named 'autogen.mdconvert'

mdconvert is a file that only exists in the gaia branch. Can you please provide more details about how you've set things up?

@franciscoabenza
Copy link

franciscoabenza commented Apr 20, 2024

I just duplicate the file into scope. Now it all works.
I can see that the code is very oriented to run the tasks in that jsonl format. I am going to get rid of the "FILE_NAME" thingy. Is there any considerations to auto-select witch template (BasicTwoAgents,BasicTwoAgentsFunctionCalling,Orchestrator,SocietyOfMind) is most appropriate for what task? When running the GAIA benchmark, did you pick the best result out of each template? how was this handled? How much did it cost to run the whole thing 😵‍💫 ?

so many questions haha perhaps I can catch you in the next Discord Autogen call :P

Let me just echo this point again: Once we've had a chance to do some ablation studies, we will simplify the installation process, and likely merge some components into main for general use. At present, this is all still very experimental.

Waiting eagerly

@zhanwenchen
Copy link

Thanks,

So that's our GAIA entry, and it expects to run in the AutoGenBench environment. From memory, I think the fastest way to get up and running is:

git clone git@github.com:microsoft/autogen.git
git checkout gaia_multiagent_v01_march_1st
cd autogen/samples/tools/autogenbench
install -e .
cd scenarios/GAIA
python Scripts/init_tasks.py
autogenbench run Tasks/gaia_validation_level_1__Orchestrator.jsonl

Line 1-2 clone the repo and select the right branch Line 3-4 navigates to autogenbench and installs it lines 5-6 navigate to the GAIA scenario and initialize the tasks (download them from Huggingface and convert to our format) Line 7 runs the benchmark.

AutoGenBench requires Docker to be running.

Now, if you are just trying to use this agent configuration for some other problem, and have no interest in running GAIA, I advise you to still do steps 1-2 (because the template depends on a version of autogen that diverges from main), and install it. Then you can find the testbed_utils file here: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/template/testbed_utils.py

Or, just delete reference to it -- testbed utils isn't needed for anything except logging.

Finally, you will need to run this in an appropriate environment. The Dockerfile we use is: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/res/Dockerfile

If you don't use Docker, then install the following requirements:

pip install python-docx pdfminer.six requests pillow easyocr python-pptx SpeechRecognition pandas openpyxl pydub mammoth puremagic youtube_transcript_api==0.6.0 mammoth puremagic

And install the following command line tools

sudo apt-get install ffmpeg exiftool

I should probably add all this to the readme :)

Our plan is to do some ablation studies, and once we know which components are truly necessary for this entry, we can simplify all this and ship a more turnkey solution.

Oh, and I nearly forgot! Your OAI_CONFIG_LIST should look like this:

[
    {
        "model": "gpt-4-turbo-preview",
        "api_key": "REDACTED",
        "tags": ["llm"],
        "organization": "REDACTED",
        "max_retries": 65535
    },
    {
        "model": "gpt-4-vision-preview",
        "api_key": "REDACTED",
        "tags": ["mlm"],
        "max_tokens": 1000,
        "organization": "REDACTED",
        "max_retries": 65535
    }
]

Note the vision model has tag "mlm", and the language model has tag "llm".

You will also need a Bing API key for web surfer. It reads this key from the OS environment variable "BING_API_KEY"

@afourney What should I do next? I was guessing running the collate_results.py under Scripts, but it couldn't capture the answers:

(base) zhanwen@mini:~/.../scenarios/GAIA$ python Scripts/collate_results.py Results
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "27d5d136-8563-469e-92bf-fd103c28b57c",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "ec09fa32-d03f-4bf8-84b0-1f16922c3ae4",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "2d83110e-a098-4ebb-9987-066c06fa42d0",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "42576abe-0deb-4869-8c63-225c2d75a95a",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "cca530fc-4052-43b2-b130-b30968d8aa44",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "72e110e7-464c-453c-a309-90a95aed6538",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "8e867cd7-cff9-4e6c-867a-ff5ddc2550be",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "46719c30-f4c3-4cad-be07-d5cb21eee6bb",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "4b6bb5f7-f634-410e-815d-e673ab7f8632",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "dc28cf18-6431-458b-83ef-64b3ce566c10",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "5d0080cb-90d7-4712-bc33-848150e917d3",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "b415aba4-4b68-4fc6-9b89-2c812e55a3e1",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "e1fc63a2-da7a-432f-be78-7c4a95598703",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "a1e91b78-d3d8-4675-bb8d-62741b4b68a6",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "5cfb274c-0207-4aa7-9575-6ac0bd95d9b2",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}
{
    "task_id": "gaia_validation_level_1__Orchestrator",
    "trial": "b816bfce-3d80-4913-a07d-69b752ce6377",
    "question": null,
    "is_correct": false,
    "model_answer": "",
    "expected_answer": "NOT PROVIDED !#!#",
    "reasoning_trace": []
}

@liususan091219
Copy link
Contributor

liususan091219 commented May 29, 2024

Thanks,

So that's our GAIA entry, and it expects to run in the AutoGenBench environment. From memory, I think the fastest way to get up and running is:

git clone git@github.com:microsoft/autogen.git
git checkout gaia_multiagent_v01_march_1st
cd autogen/samples/tools/autogenbench
install -e .
cd scenarios/GAIA
python Scripts/init_tasks.py
autogenbench run Tasks/gaia_validation_level_1__Orchestrator.jsonl

Line 1-2 clone the repo and select the right branch Line 3-4 navigates to autogenbench and installs it lines 5-6 navigate to the GAIA scenario and initialize the tasks (download them from Huggingface and convert to our format) Line 7 runs the benchmark.

AutoGenBench requires Docker to be running.

Now, if you are just trying to use this agent configuration for some other problem, and have no interest in running GAIA, I advise you to still do steps 1-2 (because the template depends on a version of autogen that diverges from main), and install it. Then you can find the testbed_utils file here: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/template/testbed_utils.py

Or, just delete reference to it -- testbed utils isn't needed for anything except logging.

Finally, you will need to run this in an appropriate environment. The Dockerfile we use is: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/res/Dockerfile

If you don't use Docker, then install the following requirements:

pip install python-docx pdfminer.six requests pillow easyocr python-pptx SpeechRecognition pandas openpyxl pydub mammoth puremagic youtube_transcript_api==0.6.0 mammoth puremagic

And install the following command line tools

sudo apt-get install ffmpeg exiftool

I should probably add all this to the readme :)

Our plan is to do some ablation studies, and once we know which components are truly necessary for this entry, we can simplify all this and ship a more turnkey solution.

Oh, and I nearly forgot! Your OAI_CONFIG_LIST should look like this:

[
    {
        "model": "gpt-4-turbo-preview",
        "api_key": "REDACTED",
        "tags": ["llm"],
        "organization": "REDACTED",
        "max_retries": 65535
    },
    {
        "model": "gpt-4-vision-preview",
        "api_key": "REDACTED",
        "tags": ["mlm"],
        "max_tokens": 1000,
        "organization": "REDACTED",
        "max_retries": 65535
    }
]

Note the vision model has tag "mlm", and the language model has tag "llm".

You will also need a Bing API key for web surfer. It reads this key from the OS environment variable "BING_API_KEY"

Hi @afourney ! I followed these steps and created a docker image from the docker file, but stuck at this line: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/run_cmd.py#L487 with the following issues:

===================================================================
sh: 0: cannot open run.sh: No such file
Running scenario Results/gaia_validation_level_1__Orchestrator/5a0c1adf-205e-4841-a666-7c3ef95def9d/0
Mounting:
[RW]	'Results/gaia_validation_level_1__Orchestrator/5a0c1adf-205e-4841-a666-7c3ef95def9d/0' => '/workspace'
[RW]	'/data/xliu127/projects/security/vulnerability/nday/autogen' => '/autogen'
===================================================================

Any chance you know how to solve this? Could you possibly upload your docker image? Thanks in advance 🙏

@rysweet rysweet added 0.2 Issues which were filed before re-arch to 0.4 needs-triage labels Oct 2, 2024
@rysweet
Copy link
Collaborator

rysweet commented Oct 18, 2024

@afourney - do we intend to fix?

@fniedtner fniedtner removed the bug label Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 Issues which were filed before re-arch to 0.4 proj-autogenbench Issues related to AutoGenBench.
Projects
None yet
Development

No branches or pull requests

8 participants