# AutoBuild
AutoGen offers conversable agents powered by LLM, tool, or human, which can be used to perform tasks collectively via automated chat. This framework allows tool use and human participation through multi-agent conversation.
Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat).

In this notebook, we introduce a new class, `AgentBuilder`, to help user build an automatic task solving process powered by multi-agent system. Specifically, our main building pipeline include `build()` and `start()`. In `build()`, we prompt a LLM to create multiple participant agent and initialize a group chat, and specify whether this task need programming to solve. After that, user can call `start()` at a proper time to complete the task. AutoBuilder also support open-source LLMs by [vLLM](https://docs.vllm.ai/en/latest/index.html) and [Fastchat](https://github.com/lm-sys/FastChat). Check the supported model list [here](https://docs.vllm.ai/en/latest/models/supported_models.html).

## Requirement

AutoBuild need the latest version of AutoGen.
You can install AutoGen by the following command:

In [None]:
!pip install pyautogen

## Step 1: prepare configuration
Prepare a `config_path` for assistant agent to limit the choice of LLM you want to use in this task. This config can be a path of json file or a name of environment variable. A `default_llm_config` is also required for initialize the specific config of LLMs like seed, temperature, etc...

In [1]:
config_path = 'YOUR CONFIG PATH'  # modify path
default_llm_config = {
    'temperature': 0
}

## Step 2: create a AgentBuilder
Create a `AgentBuilder` with the specified `config_path`. AgentBuilder will use GPT-4 in default to complete the whole process, you can also change the `builder_model` to other OpenAI model if you want. You can also specify a OpenAI or open-source LLM as agent backbone, see blog for more details.

In [3]:
from autogen.agentchat.contrib.agent_builder import AgentBuilder

builder = AgentBuilder(config_path=config_path, builder_model='gpt-4-1106-preview', agent_model='gpt-4-1106-preview')

## Step 3: specify a building task

Specify a building task with a general description. Building task will help build manager (a LLM) decide what agents should be build.

In [4]:
building_task = "Find a paper on arxiv by programming, and analysis its application in some domain. For example, find a latest paper about gpt-4 on arxiv and find its potential applications in software."

## Step 4: build group chat agents
Use `build()` to let build manager (the specified `builder_model`) complete the group chat agents generation. If you think coding is necessary in your task, you can use `coding=True` to add a user proxy (an automatic code interpreter) into the agent list, like: 
```python
builder.build(building_task, default_llm_config, coding=True)
```
If `coding` is not specified, AutoBuilder will determine on its own whether the user proxy should be added or not according to the task.

In [4]:
builder.build(building_task, default_llm_config)

Generating agents...
Data_scientist,Research_analyst,Software_developer are generated.
Preparing configuration for Data_scientist...
Preparing configuration for Research_analyst...
Preparing configuration for Software_developer...
Creating agent Data_scientist with backbone gpt-4-1106-preview...
Creating agent Research_analyst with backbone gpt-4-1106-preview...
Creating agent Software_developer with backbone gpt-4-1106-preview...
Adding user console proxy...


## Step 5: execute task
Use agents generated in `build()` to complete the task collaboratively in a group chat.

In [6]:
builder.start(task="Find a latest paper about gpt-4 on arxiv and find its potential applications in software.")

[33mUser_console_and_Python_code_interpreter[0m (to chat_manager):

Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

--------------------------------------------------------------------------------
[33mSoftware_developer[0m (to chat_manager):

To accomplish this task, we will follow these steps:

1. **Search arXiv for the latest papers on GPT-4**: We will use the `arxiv` Python library to search for the most recent papers related to GPT-4. If the `arxiv` library is not installed, you will need to install it using `pip install arxiv`.

2. **Extract Information**: From the search results, we will extract the title, authors, summary, and publication date of the latest paper.

3. **Analyze for Potential Applications in Software Development**: We will read through the summary to identify mentions of potential applications in software development.

4. **Present Findings**: We will print out the relevant information.

Let's start with step 1. Here

execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change


[33mUser_console_and_Python_code_interpreter[0m (to chat_manager):

exitcode: 0 (execution succeeded)
Code output: 
Title: Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Authors: Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li
Abstract: In the field of document understanding, significant advances have been made
in the fine-tuning of Multimodal Large Language Models (MLLMs) with
instruction-following data. Nevertheless, the potential of text-grounding
capability within text-rich scenarios remains underexplored. In this paper, we
present a text-grounding document understanding model, termed TGDoc, which
addresses this deficiency by enhancing MLLMs with the ability to discern the
spatial positioning of text within images. Empirical evidence suggests that
text-grounding improves the model's interpretation of textual content, thereby
elevating its proficiency in comprehending text-rich images. Specifically, we
compile a dataset containing

## Step 6 (Optional): clear all agents and prepare for the next task
You can clear all agents generated in this task by the following code if your task is completed or the next task is largely different from the current task. If the agent's backbone is an open-source LLM, this process will also shutdown the endpoint server. If necessary, you can use `recycle_endpoint=False` to retain the previous open-source LLMs' endpoint server.

In [7]:
builder.clear_all_agents(recycle_endpoint=True)

## Save & load configs

You can save all necessary information of the built group chat agents. Here is a case for those agents generated in the above task:
```json
{
    "building_task": "Find a paper on arxiv by programming, and analysis its application in some domain. For example, find a latest paper about gpt-4 on arxiv and find its potential applications in software.",
    "agent_configs": [
        {
            "name": "Data_scientist",
            "model": "gpt-4-1106-preview",
            "system_message": "As a Data Scientist, you are tasked with automating the retrieval and analysis of academic papers from arXiv. Utilize your Python programming acumen to develop scripts for gathering necessary information such as searching for relevant papers, downloading them, and processing their contents. Apply your analytical and language skills to interpret the data and deduce the applications of the research within specific domains.\n\n1. To compile information, write and implement Python scripts that search and interact with online resources, download and read files, extract content from documents, and perform other information-gathering tasks. Use the printed output as the foundation for your subsequent analysis.\n\n2. Execute tasks programmatically with Python scripts when possible, ensuring results are directly displayed. Approach each task with efficiency and strategic thinking.\n\nProgress through tasks systematically. In instances where a strategy is not provided, outline your plan before executing. Clearly distinguish between tasks handled via code and those utilizing your analytical expertise.\n\nWhen providing code, include only Python scripts meant to be run without user alterations. Users should execute your script as is, without modifications:\n\n```python\n# filename: <filename>\n# Python script\nprint(\"Your output\")\n```\n\nUsers should not perform any actions other than running the scripts you provide. Avoid presenting partial or incomplete scripts that require user adjustments. Refrain from requesting users to copy-paste results; instead, use the 'print' function when suitable to display outputs. Monitor the execution results they share.\n\nIf an error surfaces, supply corrected scripts for a re-run. If the strategy fails to resolve the issue, reassess your assumptions, gather additional details as needed, and explore alternative approaches.\n\nUpon successful completion of a task and verification of the results, confirm the achievement of the stated objective. Ensuring accuracy and validity of the findings is paramount. Evidence supporting your conclusions should be provided when feasible.\n\nUpon satisfying the user's needs and ensuring all tasks are finalized, conclude your assistance with \"TERMINATE\"."
        },
        {
            "name": "Research_analyst",
            "model": "gpt-4-1106-preview",
            "system_message": "As a Research Analyst, you are expected to be a proficient AI assistant possessing a strong grasp of programming, specifically in Python, and robust analytical capabilities. Your primary responsibilities will include:\n\n1. Conducting comprehensive searches and retrieving information autonomously through Python scripts, such as querying databases, accessing web services (like arXiv), downloading and reading files, and retrieving system information.\n2. Analyzing the content of the retrieved documents, particularly academic papers, and extracting insights regarding their application in specific domains, such as the potential uses of GPT-4 in software development.\n3. Presenting your findings in a clear, detailed manner, explaining the implications of the research and its relevance to the assigned task.\n4. Employing your programming skills to automate tasks where possible, ensuring the output is delivered through Python code with clear, executable instructions. Your code will be designed for the user to execute without amendment or additional input.\n5. Verifying the results of information gathering and analysis to ensure accuracy and completeness, providing evidence to support your conclusions when available.\n6. Communicating the completion of each task and confirming that the user's needs have been satisfied through a clear and conclusive statement, followed by the word \"TERMINATE\" to signal the end of the interaction."
        },
        {
            "name": "Software_developer",
            "model": "gpt-4-1106-preview",
            "system_message": "As a dedicated AI assistant for a software developer, your role involves employing your Python programming prowess and proficiency in natural language processing to facilitate the discovery and analysis of scholarly articles on arXiv. Your tasks include crafting Python scripts to automatically search, retrieve, and present information regarding the latest research, with a focus on applicable advancements in technology such as GPT-4 and its potential impact on the domain of software development.\n\n1. Utilize Python to programmatically seek out and extract pertinent data, for example, navigating or probing the web, downloading/ingesting documents, or showcasing content from web pages or files. When enough information has been accumulated to proceed, you will then analyze and interpret the findings.\n\n2. When there's a need to perform an operation programmatically, your Python code should accomplish the task and manifest the outcome. Progress through the task incrementally and systematically.\n\nProvide a clear plan outlining each stage of the task, specifying which components will be executed through Python coding and which through your linguistic capabilities. When proposing Python code, remember to:\n\n- Label the script type within the code block\n- Avoid suggesting code that the user would need to alter\n- Refrain from including more than one code block in your response\n- Circumvent requesting the user to manually transcribe any results; utilize 'print' statements where applicable\n- Examine the user's reported execution outcomes\n\nIf an error arises, your responsibility is to rectify the issue and submit the corrected script. Should an error remain unresolvable, or if the task remains incomplete post successful code execution, re-evaluate the scenario, gather any further required information, and formulate an alternative approach.\n\nUpon confirming that the task has been satisfactorily accomplished and the user's requirements have been met, indicate closure of the procedure with a concluding statement."
        }
    ],
    "manager_system_message": "Group chat manager.",
    "coding": true,
    "default_llm_config": {
        "temperature": 0
    }
}
```
These information will be saved in JSON format. You can provide a specific filename, otherwise, AgentBuilder will save config to the current path with a generated filename 'save_config_TASK_MD5.json'.

In [8]:
saved_path = builder.save()

Building config saved to ./save_config_eb1be857faa608aeb4c5af11fe4ab245.json


After that, you can load the saved config and skip the building process. AutoBuilder will create agents with those information without prompting the builder manager.

In [9]:
new_builder = AgentBuilder(config_path=config_path).load(saved_path)
new_builder.start(task="Find a latest paper about gpt-4 on arxiv and find its potential applications in software.")
new_builder.clear_all_agents()

[33mUser_console_and_Python_code_interpreter[0m (to chat_manager):

Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

--------------------------------------------------------------------------------
[33mSoftware_developer[0m (to chat_manager):

To accomplish this task, we will follow these steps:

1. **Search arXiv for the latest papers on GPT-4**: We will use the `arxiv` Python library to search for the most recent papers related to GPT-4. If the `arxiv` library is not installed, you will need to install it using `pip install arxiv`.

2. **Extract Information**: From the search results, we will extract the title, authors, summary, and publication date of the latest paper.

3. **Analyze for Potential Applications in Software Development**: We will read through the summary to identify mentions of potential applications in software development.

4. **Present Findings**: We will print out the relevant information.

Let's start with step 1. Here

execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change


[33mUser_console_and_Python_code_interpreter[0m (to chat_manager):

exitcode: 0 (execution succeeded)
Code output: 
Title: Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Authors: Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li
Abstract: In the field of document understanding, significant advances have been made
in the fine-tuning of Multimodal Large Language Models (MLLMs) with
instruction-following data. Nevertheless, the potential of text-grounding
capability within text-rich scenarios remains underexplored. In this paper, we
present a text-grounding document understanding model, termed TGDoc, which
addresses this deficiency by enhancing MLLMs with the ability to discern the
spatial positioning of text within images. Empirical evidence suggests that
text-grounding improves the model's interpretation of textual content, thereby
elevating its proficiency in comprehending text-rich images. Specifically, we
compile a dataset containing

## Use GPTs

[GPTs](https://openai.com/blog/introducing-gpts) allow user to create an assistant with a simple instruction of the task. It has plugin support that can let ChatGPT complete some complex instructions, and can optionally update the assistant's instruction to let it adapted to new task or improve on the current task.
AutoBuild also support GPTs api by adding `use_gpts=True` to the `build()` function.

In [11]:
new_builder = AgentBuilder(config_path=config_path)
new_builder.build(building_task, default_llm_config, use_gpts=True)  # Transfer to GPTs API.
new_builder.start(task="Find a latest paper about gpt-4 on arxiv and find its potential applications in software.")
new_builder.clear_all_agents()

assistant_id was None, creating a new assistant


Generating agents...
Data_scientist,Research_analyst,Software_developer are generated.
Preparing configuration for Data_scientist...
Preparing configuration for Research_analyst...
Preparing configuration for Software_developer...


assistant_id was None, creating a new assistant
assistant_id was None, creating a new assistant


[33mUser_console_and_Python_code_interpreter[0m (to chat_manager):
Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

--------------------------------------------------------------------------------
[33mSoftware_developer[0m (to chat_manager):

To accomplish the task of finding the latest paper about GPT-4 on arXiv and identifying its potential applications in software development, we will break down the process into the following steps:

1. **Search for papers on arXiv**: We'll use the arXiv API to search for recent papers mentioning GPT-4. The arXiv API allows us to programmatically query their database.

2. **Retrieve and analyze the paper**: Once we have identified relevant papers, we will download the metadata of the latest paper. We will inspect the abstract and other available metadata for mentions of potential applications in software development.

3. **Present findings**: Finally, we will present a summary of our findings.

Here is a

execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change


[33mUser_console_and_Python_code_interpreter[0m (to chat_manager):

exitcode: 0 (execution succeeded)
Code output: 
Title: Towards Improving Document Understanding: An Exploration on
  Text-Grounding via MLLMs
Authors: Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li
Abstract: In the field of document understanding, significant advances have been made
in the fine-tuning of Multimodal Large Language Models (MLLMs) with
instruction-following data. Nevertheless, the potential of text-grounding
capability within text-rich scenarios remains underexplored. In this paper, we
present a text-grounding document understanding model, termed TGDoc, which
addresses this deficiency by enhancing MLLMs with the ability to discern the
spatial positioning of text within images. Empirical evidence suggests that
text-grounding improves the model's interpretation of textual content, thereby
elevating its proficiency in comprehending text-rich images. Specifically, we
compile a dataset containi