<a href="https://colab.research.google.com/github/microsoft/autogen/blob/main/notebook/agentchat_groupchat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Auto Generated Agent Chat: Group Chat with Retrieval Augmented Generation

AutoGen offers conversable agents powered by LLM, tool or human, which can be used to perform tasks collectively via automated chat. This framwork allows tool use and human participance through multi-agent conversation.
Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat).

## Requirements

AutoGen requires `Python>=3.8`. To run this notebook example, please install:
```bash
pip install pyautogen
```

In [None]:
%%capture --no-stderr
# %pip install pyautogen~=0.1.0

## Set your API Endpoint

The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.

In [None]:
import autogen

config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    file_location=".",
    filter_dict={
        "model": ["gpt-3.5-turbo", "gpt-35-turbo"],
    },
)

It first looks for environment variable "OAI_CONFIG_LIST" which needs to be a valid json string. If that variable is not found, it then looks for a json file named "OAI_CONFIG_LIST". It filters the configs by models (you can filter by other keys as well). Only the gpt-4 models are kept in the list based on the filter condition.

The config list looks like the following:
```python
config_list = [
    {
        'model': 'gpt-4',
        'api_key': '<your OpenAI API key here>',
    },
    {
        'model': 'gpt-4',
        'api_key': '<your Azure OpenAI API key here>',
        'api_base': '<your Azure OpenAI API base here>',
        'api_type': 'azure',
        'api_version': '2023-06-01-preview',
    },
    {
        'model': 'gpt-4-32k',
        'api_key': '<your Azure OpenAI API key here>',
        'api_base': '<your Azure OpenAI API base here>',
        'api_type': 'azure',
        'api_version': '2023-06-01-preview',
    },
]
```

If you open this notebook in colab, you can upload your files by clicking the file icon on the left panel and then choose "upload file" icon.

You can set the value of config_list in other ways you prefer, e.g., loading from a YAML file.

## Construct Agents

In [None]:
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
import chromadb

llm_config = {
    "request_timeout": 60,
    "seed": 42,
    "config_list": config_list,
}

autogen.ChatCompletion.start_logging()

raguserproxy = RetrieveUserProxyAgent(
    name="raguserproxy",
    human_input_mode="TERMINATE",
    system_message="A human admin.",
    max_consecutive_auto_reply=3,
    retrieve_config={
        "task": "code",
        "docs_path": "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Examples/Integrate%20-%20Spark.md",
        "chunk_token_size": 2000,
        "model": config_list[0]["model"],
        "client": chromadb.PersistentClient(path="/tmp/chromadb"),
        "collection_name": "groupchat",
        "get_or_create": True,
    },
    code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
)

ragcoder = RetrieveAssistantAgent(
    name="ragcoder",
    system_message="You are a senior python engineer.",
    llm_config=llm_config,
)

pm = autogen.AssistantAgent(
    name="Product_manager",
    system_message="Creative in software product ideas.",
    llm_config=llm_config,
)

user_proxy = autogen.UserProxyAgent(
   name="user_proxy",
   system_message="A human admin.",
   code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
   human_input_mode="TERMINATE"
)


def rag_chat():
    groupchat = autogen.GroupChat(
        agents=[raguserproxy, ragcoder, pm], messages=[], max_round=12
    )
    manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

    # Start chatting with raguserproxy as this is the user proxy agent.
    raguserproxy.initiate_chat(
        manager,
        problem="How to do a regression AutoML task with FLAML and train with spark?",
        n_results=3,
    )


def norag_chat():
    groupchat = autogen.GroupChat(
        agents=[user_proxy, ragcoder, pm], messages=[], max_round=12
    )
    manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

    # Start chatting with user_proxy as this is the user proxy agent.
    user_proxy.initiate_chat(
        manager,
        message="How to do a regression AutoML task with FLAML and train with spark?",
    )

## Start Chat

### UserProxyAgent doesn't get the correct code
[FLAML](https://github.com/microsoft/FLAML) is open sourced since 2020, so ChatGPT knows it. However, spark related APIs are added in 2022, which is not in ChatGPT's training data. As a result, we end up with wrong code.

In [None]:
norag_chat()

[33muser_proxy[0m (to chat_manager):

How to do a regression AutoML task with FLAML and train with spark?

--------------------------------------------------------------------------------
How to do a regression AutoML task with FLAML and train with spark?

--------------------------------------------------------------------------------
[33mragcoder[0m (to chat_manager):

To perform a regression AutoML task with FLAML and train with Spark, you can follow these steps:

1. Install FLAML and Spark on your machine.
2. Load your data into a Spark dataframe.
3. Create an `AutoML` object and set the `task` parameter to `'regression'`.
4. Set the `time_budget` parameter to the maximum amount of time (in seconds) that you want FLAML to spend on the AutoML search.
5. Call the `fit()` method on the `AutoML` object, passing in the Spark dataframe as the `data` parameter.
6. After the `fit()` method completes, you can access the best performing model using the `best_model()` method.
7. Use the b

### RetrieveUserProxyAgent get the correct code
With RetrieveUserProxyAgent, we enabled retrieval augmented generation based on the given documentation file, ChatGPT can generate the correct code for us!

In [None]:
rag_chat()
# type exit to terminate the chat

Trying to create collection.


  from .autonotebook import tqdm as notebook_tqdm


doc_ids:  [['doc_0', 'doc_10', 'doc_4']]
[32mAdding doc_id doc_0 to context.[0m
[32mAdding doc_id doc_10 to context.[0m
[33mraguserproxy[0m (to chat_manager):

You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
For code generation, you must obey the following rules:
Rule 1. You MUST NOT install any packages because all the packages needed are already installed.
Rule 2. You must follow the formats below to write your code:
```language
# your code
```

User's question is: How to do a regression AutoML task with FLAML and train with spark?

Context is: # Integrate - Spark

FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:
- Use Spark ML estimators for AutoML.
- Use Spark to run training in parallel spark jobs.

## Spark ML Estim