# Multiple Agent Chat

### Requirement

Auto Agent setup need `vllm>=0.2.0` and `fastchat>=0.2.30`, you can use the following command to install the latest version of [vLLM](https://github.com/vllm-project/vllm) and [FastChat](https://github.com/lm-sys/FastChat).
```
pip install vllm, "fastchat[model_worker]"
```

In [None]:
!pip install vllm "fastchat[model_worker]"

In [2]:
import autogen
import subprocess as sp
import time
from typing import *

In [6]:
class AgentCreator:
    """
    Descriptions
    """

    agent_procs: Dict[str, sp.Popen] = {}
    agent_procs_assign: Dict[str, Tuple[autogen.AssistantAgent, str]] = {}
    openai_server_name: str = 'openai'

    def __init__(
            self,
            endpoint_building_timeout: Optional[int] = 180
    ):
        """
        Args:
            endpoint_building_timeout: timeout for building up an endpoint server.
        """
        self.endpoint_building_timeout = endpoint_building_timeout

    def get_config(
            self,
            model_name_or_hf_repo: str,
            world_size: Optional[int] = 1,
            host: Optional[str] = "127.0.0.1",
            port: Optional[str] = "8000",
            seed: Optional[int] = 42,
            temperature: Optional[int] = 0,
            max_tokens: Optional[int] = 200
    ) -> Tuple[List, str]:
        """
        Descriptions

        Args:
            model_name_or_hf_repo: model name (e.g., "gpt-3.5-turbo", "gpt-4") or huggingface repository
                (e.g., "meta-llama/Llama-2-7b-chat-hf") use for starting an endpoint.
            world_size: the max size of parallel tensors (in most of the cases, this is identical to the amount of GPUs).
            host: the endpoint's API address.
            port: the port of the API address the endpoint listen to.
            seed: random seed for agent.
            temperature: temperature for an agent that use to control the "preciseness" of the answer.
            max_tokens: max number of tokens that the Agent can reply.

        Returns:
            (agent config list, endpoint server id)
        """
        if "gpt-" in model_name_or_hf_repo:
            config = autogen.config_list_from_json(
                "/home/elpis_ubuntu/LLM/autogen/OAI_CONFIG_LIST",
                filter_dict={
                    "model": [model_name_or_hf_repo]
                },
            )
            server_id = self.openai_server_name
        else:
            model_name = model_name_or_hf_repo.split('/')[-1]
            config = autogen.get_config_list(
                api_keys=["EMPTY"],
                api_bases=[f"http://{host}:{port}/v1"],
            )
            config[0].update({
                "model": model_name_or_hf_repo,
                "max_tokens": max_tokens
            })
            server_id = f'{model_name}_{host}_{port}_{seed}_{temperature}'
            if self.agent_procs.get(server_id, None) is None:
                # Use vLLM to set up a server with OpenAI API support.
                agent_proc = sp.Popen(['python', '-m', 'vllm.entrypoints.openai.api_server',
                                       f'--host', f'{host}',
                                       f'--port', f'{port}',
                                       f'--model', f'{model_name_or_hf_repo}',
                                       f'--tensor-parallel-size', f'{world_size}'], stdout=sp.PIPE, stderr=sp.STDOUT)
                timeout_start = time.time()
                while True:
                    server_stdout = agent_proc.stdout.readline()
                    timeout_end = time.time()
                    if b"running" in server_stdout:
                        print(f"Running {model_name_or_hf_repo} endpoint on http://{host}:{port} with tensor parallel size {world_size}.")
                        break
                    elif timeout_end - timeout_start > self.endpoint_building_timeout:
                        raise RuntimeError(f'Timeout exceed. Fail to set up the endpoint for '
                                           f'{model_name_or_hf_repo} on {host}:{port}.')
                self.agent_procs[server_id] = agent_proc

        return config, server_id

    def create_agent(
            self,
            agent_name: str,
            model_name_or_hf_repo: str,
            seed: int = 42,
            temperature: int = 0,
            **kwargs
    ) -> autogen.AssistantAgent:
        """
        Descriptions

        Args:
            agent_name: the name that identify the function of the agent (e.g., Coder, Product Manager,...)
            model_name_or_hf_repo: model name (e.g., "gpt-3.5-turbo", "gpt-4") or huggingface repository
                (e.g., "meta-llama/Llama-2-7b-chat-hf") use for starting an endpoint.
            seed: random seed for agent.
            temperature: temperature for an agent that use to control the "preciseness" of the answer.

        Returns:
            agent: a set-up agent.
        """
        config, server_id = self.get_config(model_name_or_hf_repo=model_name_or_hf_repo,
                                            seed=seed,
                                            temperature=temperature,
                                            **kwargs)
        agent = autogen.AssistantAgent(name=agent_name,
                                       llm_config={
                                           "seed": seed,  # seed for caching and reproducibility
                                           "config_list": config,  # a list of OpenAI API configurations
                                           "temperature": temperature,  # temperature for sampling
                                       })
        self.agent_procs_assign[agent_name] = (agent, server_id)
        return agent

    def clear_agent(
            self,
            agent_name: str = None,
            recycle_endpoint: bool = True
    ):
        """
        Descriptions

        Args:
            agent_name: the name of agent.
            recycle_endpoint: trigger for recycle the endpoint server. If true, the endpoint will be recycled
                when there is no agent relying on.
        """
        _, server_id = self.agent_procs_assign[agent_name]
        del self.agent_procs_assign[agent_name]
        if recycle_endpoint:
            if server_id == self.openai_server_name:
                return
            else:
                for _, iter_sid in self.agent_procs_assign.values():
                    if server_id == iter_sid:
                        return
                self.agent_procs[server_id].terminate()
    
    def clear_all(self):
        """
        Clear all cached agents.
        """
        for agent_name in [agent_name for agent_name in self.agent_procs_assign.keys()]:
            self.clear_agent(agent_name)

    def start(
            self,
            task: str,
            manager_config: dict,
            max_round: Optional[int] = 12,
            init_messages: Optional[List[dict]] = [],
            user_proxy: autogen.UserProxyAgent = None,
            initiate_agent_name: str = "user"
    ):
        """
        Descriptions

        Args:
            task: description of a task.
            manager_config: LLM configuration for a GroupChatManager.
            max_round: the maximum number of rounds.
            init_messages: input messages before the task start. This can be the chat history from other group chat
                or some preliminary of the task.
            user_proxy: a user proxy for human interaction.
            initiate_agent_name: the name of an agent use to initialize the group chat.
        """
        agent_list = [agent for agent, _ in self.agent_procs_assign.values()]
        if user_proxy is not None:
            agent_list.append(user_proxy)
        group_chat = autogen.GroupChat(agents=agent_list, messages=init_messages, max_round=max_round)
        manager = autogen.GroupChatManager(groupchat=group_chat, llm_config=manager_config)

        if initiate_agent_name == "user" and user_proxy is not None:
            user_proxy.initiate_chat(manager, message=task)
        else:
            for agent in agent_list:
                if initiate_agent_name == agent.name():
                    agent.initiate_chat(manager, message=task)
        

# Test

In [3]:
administrator = AgentCreator()

# create agent; get config (manager); initialize user proxy; task start; clear agent
administrator.create_agent('Coder_gpt-3_5', 'gpt-3.5-turbo', seed=41)
administrator.create_agent('Product_manager_opt-125m', 'facebook/opt-125m', max_tokens=100, seed=41)
mg_config, _ = administrator.get_config('gpt-3.5-turbo', seed=41)
user_proxy = autogen.UserProxyAgent(
    name="User_proxy",
    system_message="A human admin.",
    code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
    human_input_mode="TERMINATE"
)

task = "Find a latest paper about gpt-4 on arxiv and find its potential applications in software."
administrator.start(
    task,
    mg_config[0],
    user_proxy=user_proxy,
    initiate_agent_name="user"
)
administrator.clear_all()


Running facebook/opt-125m endpoint on http://127.0.0.1:8000 with tensor parallel size 1.
[33mUser_proxy[0m (to chat_manager):
Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

--------------------------------------------------------------------------------
[33mCoder_gpt-3_5[0m (to chat_manager):

To find the latest paper about GPT-4 on arXiv, we can use the arXiv API to search for papers related to GPT-4. Then, we can analyze the paper to identify its potential applications in software.

Here is the Python code to accomplish this task:

```python
import requests

# Search for papers related to GPT-4 on arXiv
search_query = "GPT-4"
api_url = f"http://export.arxiv.org/api/query?search_query={search_query}&sortBy=submittedDate&sortOrder=descending&max_results=1"
response = requests.get(api_url)

# Parse the XML response and extract the paper details
xml_data = response.text
paper_title = xml_data.split("<title>")[2].split("</title>")[0]
paper_s

KeyboardInterrupt: 

In [8]:
administrator = AgentCreator()

# create agent; get config (manager); initialize user proxy; task start; clear agent
administrator.create_agent('Coder_gpt-3_5', 'gpt-3.5-turbo')
administrator.create_agent('Product_manager_gpt-4', 'gpt-4')
mg_config, _ = administrator.get_config('gpt-3.5-turbo', seed=41)
user_proxy = autogen.UserProxyAgent(
    name="User_proxy",
    system_message="A human admin.",
    code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
    human_input_mode="TERMINATE"
)

task = "Find a latest paper about gpt-4 on arxiv and find its potential applications in software."
administrator.start(
    task,
    mg_config[0],
    user_proxy=user_proxy,
    initiate_agent_name="user"
)
administrator.clear_all()

[33mUser_proxy[0m (to chat_manager):

Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

--------------------------------------------------------------------------------
[33mCoder_gpt-3_5[0m (to chat_manager):

To find the latest paper about GPT-4 on arXiv, we can use the arXiv API to search for papers related to GPT-4. Then, we can analyze the abstract or content of the paper to identify its potential applications in software.

Here is the Python code to accomplish this task:

```python
import requests

# Search for papers related to GPT-4 on arXiv
search_query = "GPT-4"
api_url = f"http://export.arxiv.org/api/query?search_query={search_query}&sortBy=submittedDate&sortOrder=descending&max_results=1"
response = requests.get(api_url)

# Parse the XML response and extract the paper details
xml_data = response.text
paper_title = xml_data.split("<title>")[2].split("</title>")[0]
paper_abstract = xml_data.split("<summary>")[1].split("</summary>")[

execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change


[33mUser_proxy[0m (to chat_manager):

exitcode: 0 (execution succeeded)
Code output: 
Latest Paper on GPT-4:
Title: Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Abstract:   With LLMs shifting their role from statistical modeling of language to
serving as general-purpose AI agents, how should LLM evaluations change?
Arguably, a key ability of an AI agent is to flexibly combine, as needed, the
basic skills it has learned. The capability to combine skills plays an
important role in (human) pedagogy and also in a paper on emergence phenomena
(Arora & Goyal, 2023).
  This work introduces Skill-Mix, a new evaluation to measure ability to
combine skills. Using a list of $N$ skills the evaluator repeatedly picks
random subsets of $k$ skills and asks the LLM to produce text combining that
subset of skills. Since the number of subsets grows like $N^k$, for even modest
$k$ this evaluation will, with high probability, require the LLM to produce
text significantly diff