# Using RetrieveChat Powered by MongoDB Atlas for Retrieve Augmented Code Generation and Question Answering

AutoGen offers conversable agents powered by LLM, tool or human, which can be used to perform tasks collectively via automated chat. This framework allows tool use and human participation through multi-agent conversation.
Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat).

RetrieveChat is a conversational system for retrieval-augmented code generation and question answering. In this notebook, we demonstrate how to utilize RetrieveChat to generate code and answer questions based on customized documentations that are not present in the LLM's training dataset. RetrieveChat uses the `RetrieveAssistantAgent` and `RetrieveUserProxyAgent`, which is similar to the usage of `AssistantAgent` and `UserProxyAgent` in other notebooks (e.g., [Automated Task Solving with Code Generation, Execution & Debugging](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_auto_feedback_from_code_execution.ipynb)). Essentially, `RetrieveAssistantAgent` and  `RetrieveUserProxyAgent` implement a different auto-reply mechanism corresponding to the RetrieveChat prompts.

## Table of Contents
We'll demonstrate six examples of using RetrieveChat for code generation and question answering:

- [Example 1: Generate code based off docstrings w/o human feedback](#example-1)

````{=mdx}
:::info Requirements
Some extra dependencies are needed for this notebook, which can be installed via pip:

```bash
pip install pyautogen[retrievechat-mongodb] flaml[automl]
```

For more information, please refer to the [installation guide](/docs/installation/).
:::
````

Ensure you have a MongoDB Atlas instance.

If not, a test version can quickly be deployed using Docker.

`docker-compose.yml`

```yml
version: '3.9'

services:
  mongodb:
    image: mongodb/mongodb-atlas-local:latest
    restart: unless-stopped
    ports:
      - "27017:27017"
    environment:
      MONGODB_INITDB_ROOT_USERNAME: mongodb_user
      MONGODB_INITDB_ROOT_PASSWORD: mongodb_password
```



## Set your API Endpoint

The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.


In [1]:
import json
import os

import autogen
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent

# Accepted file formats for that can be stored in
# a vector database instance
from autogen.retrieve_utils import TEXT_FORMATS

config_list = [
    {
        "model": "gpt-35-turbo",
        "base_url": "",
        "api_type": "azure",
        "api_version": "2023-07-01-preview",
        "api_key": "",
    },
]
assert len(config_list) > 0
print("models to use: ", [config_list[i]["model"] for i in range(len(config_list))])

models to use:  ['gpt-35-turbo']


````{=mdx}
:::tip
Learn more about configuring LLMs for agents [here](/docs/topics/llm_configuration).
:::
````

## Construct agents for RetrieveChat

We start by initializing the `RetrieveAssistantAgent` and `RetrieveUserProxyAgent`. The system message needs to be set to "You are a helpful assistant." for RetrieveAssistantAgent. The detailed instructions are given in the user message. Later we will use the `RetrieveUserProxyAgent.message_generator` to combine the instructions and a retrieval augmented generation task for an initial prompt to be sent to the LLM assistant.

In [2]:
print("Accepted file formats for `docs_path`:")
print(TEXT_FORMATS)

Accepted file formats for `docs_path`:
['txt', 'json', 'csv', 'tsv', 'md', 'html', 'htm', 'rtf', 'rst', 'jsonl', 'log', 'xml', 'yaml', 'yml', 'pdf']


In [None]:
# 1. create an RetrieveAssistantAgent instance named "assistant"
assistant = RetrieveAssistantAgent(
    name="assistant",
    system_message="You are a helpful assistant.",
    llm_config={
        "timeout": 600,
        "cache_seed": 42,
        "config_list": config_list,
    },
)

# 2. create the RetrieveUserProxyAgent instance named "ragproxyagent"
# By default, the human_input_mode is "ALWAYS", which means the agent will ask for human input at every step. We set it to "NEVER" here.
# `docs_path` is the path to the docs directory. It can also be the path to a single file, or the url to a single file. By default,
# it is set to None, which works only if the collection is already created.
# `task` indicates the kind of task we're working on. In this example, it's a `code` task.
# `chunk_token_size` is the chunk token size for the retrieve chat. By default, it is set to `max_tokens * 0.6`, here we set it to 2000.
# `custom_text_types` is a list of file types to be processed. Default is `autogen.retrieve_utils.TEXT_FORMATS`.
# This only applies to files under the directories in `docs_path`. Explicitly included files and urls will be chunked regardless of their types.
# In this example, we set it to ["non-existent-type"] to only process markdown files. Since no "non-existent-type" files are included in the `websit/docs`,
# no files there will be processed. However, the explicitly included urls will still be processed.
ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
    retrieve_config={
        "task": "code",
        "docs_path": [
            "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Examples/Integrate%20-%20Spark.md",
            "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Research.md",
            os.path.join(os.path.abspath(""), "..", "website", "docs"),
        ],
        "custom_text_types": ["non-existent-type"],
        "chunk_token_size": 2000,
        "model": config_list[0]["model"],
        "vector_db": "mongodb",  # MongoDB Atlas database
        "collection_name": "demo_collection",
        "db_config": {
            "connection_string": "",  # MongoDB Atlas connection string
            "database_name": "",  # MongoDB Atlas database
            "index_name": "vector_index",
        },
        "get_or_create": True,  # set to False if you don't want to reuse an existing collection
        "overwrite": True,  # set to True if you want to overwrite an existing collection
    },
    code_execution_config=False,  # set to False if you don't want to execute the code
)

### Example 1

[Back to top](#table-of-contents)

Use RetrieveChat to help generate sample code and automatically run the code and fix errors if there is any.

Problem: Which API should I use if I want to use FLAML for a classification task and I want to train the model in 30 seconds. Use spark to parallel the training. Force cancel jobs if time limit is reached.

In [4]:
# reset the assistant. Always reset the assistant before starting a new conversation.
assistant.reset()

# given a problem, we use the ragproxyagent to generate a prompt to be sent to the assistant as the initial message.
# the assistant receives the message and generates a response. The response will be sent back to the ragproxyagent for processing.
# The conversation continues until the termination condition is met, in RetrieveChat, the termination condition when no human-in-loop is no code block detected.
# With human-in-loop, the conversation will continue until the user says "exit".
code_problem = "How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached."
chat_result = ragproxyagent.initiate_chat(
    assistant, message=ragproxyagent.message_generator, problem=code_problem, search_string="spark"
)  # search_string is used as an extra filter for the embeddings search, in this case, we only want to search documents that contain "spark".

Trying to create collection.


2024-07-01 08:50:43,934 - autogen.agentchat.contrib.vectordb.mongodb - INFO - Search index vector_index created successfully.[0m
2024-07-01 08:50:44,612 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 2 chunks.[0m
2024-07-01 08:50:45,064 - autogen.agentchat.contrib.vectordb.mongodb - INFO - Using index: [{'id': '6682a6042cf0e270602c0fe1', 'name': 'vector_index', 'type': 'vectorSearch', 'status': 'READY', 'queryable': True, 'latestDefinitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12, 109000)}, 'latestDefinition': {'fields': [{'type': 'vector', 'numDimensions': 384, 'path': 'embedding', 'similarity': 'cosine'}]}, 'statusDetail': [{'hostname': 'shared-shard-00-search-6xag8e', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12)}, 'definition': {'fields': [{'type': 'vector', 'path': 'embedding', 'numDimens

query_text How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached.
pipeline:  [{'$vectorSearch': {'index': 'vector_index', 'limit': 20, 'numCandidates': 200, 'queryVector': [-0.08256451040506363, -0.07900252193212509, -0.05290786176919937, 0.021982736885547638, 0.046406690031290054, 0.027769701555371284, -0.02768588438630104, -0.020102187991142273, -0.05407266318798065, -0.061684805899858475, -0.03940979018807411, -0.029285598546266556, -0.1118478998541832, -0.03136416897177696, -0.04099257290363312, -0.07897000014781952, -0.02522769570350647, 0.043702732771635056, -0.030820483341813087, -0.041595760732889175, 0.10552595555782318, 0.0023172772489488125, 0.08983399718999863, 0.10865391790866852, -0.06146957352757454, 0.04154617711901665, 0.015428234823048115, 0.016568025574088097, 0.013623313046991825, -0.06059451401233673, 0.08428270369768143, 0.009563339874148369, -0.002620439976453781

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[33massistant[0m (to ragproxyagent):

You can activate Spark as the parallel backend during parallel tuning in both AutoML and Hyperparameter Tuning, by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using joblib-spark. According to the documentation, to use FLAML with Spark, you need to prepare your data in pandas-on-spark format using the `flaml.automl.spark.utils.to_pandas_on_spark` function in the `flaml.automl.spark.utils` module. Then, you can pass pandas-on-spark data to FLAML as normal data using `dataframe` and `label`. For example, to use SparkML models for regression and train for 30 seconds with force cancel, you can use the following code snippet:

```python
import flaml
from flaml.automl.spark.utils import to_pandas_on_spark

# load your data into a pandas dataframe
train_data = ...

psdf = to_pandas_on_spark(train_data)

automl = flaml.AutoML()
settings = {
    "time_budget": 30,
    "metric": "r2",
    "task": "regress

2024-07-01 08:51:06,587 - autogen.agentchat.contrib.vectordb.mongodb - INFO - Using index: [{'id': '6682a6042cf0e270602c0fe1', 'name': 'vector_index', 'type': 'vectorSearch', 'status': 'READY', 'queryable': True, 'latestDefinitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12, 109000)}, 'latestDefinition': {'fields': [{'type': 'vector', 'numDimensions': 384, 'path': 'embedding', 'similarity': 'cosine'}]}, 'statusDetail': [{'hostname': 'shared-shard-00-search-6xag8e', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12)}, 'definition': {'fields': [{'type': 'vector', 'path': 'embedding', 'numDimensions': 384, 'similarity': 'cosine'}]}}}, {'hostname': 'shared-shard-00-search-onamml', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.da

query_text How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached.
pipeline:  [{'$vectorSearch': {'index': 'vector_index', 'limit': 60, 'numCandidates': 600, 'queryVector': [-0.08256451040506363, -0.07900252193212509, -0.05290786176919937, 0.021982736885547638, 0.046406690031290054, 0.027769701555371284, -0.02768588438630104, -0.020102187991142273, -0.05407266318798065, -0.061684805899858475, -0.03940979018807411, -0.029285598546266556, -0.1118478998541832, -0.03136416897177696, -0.04099257290363312, -0.07897000014781952, -0.02522769570350647, 0.043702732771635056, -0.030820483341813087, -0.041595760732889175, 0.10552595555782318, 0.0023172772489488125, 0.08983399718999863, 0.10865391790866852, -0.06146957352757454, 0.04154617711901665, 0.015428234823048115, 0.016568025574088097, 0.013623313046991825, -0.06059451401233673, 0.08428270369768143, 0.009563339874148369, -0.002620439976453781

2024-07-01 08:51:22,212 - autogen.agentchat.contrib.vectordb.mongodb - INFO - Using index: [{'id': '6682a6042cf0e270602c0fe1', 'name': 'vector_index', 'type': 'vectorSearch', 'status': 'READY', 'queryable': True, 'latestDefinitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12, 109000)}, 'latestDefinition': {'fields': [{'type': 'vector', 'numDimensions': 384, 'path': 'embedding', 'similarity': 'cosine'}]}, 'statusDetail': [{'hostname': 'shared-shard-00-search-6xag8e', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12)}, 'definition': {'fields': [{'type': 'vector', 'path': 'embedding', 'numDimensions': 384, 'similarity': 'cosine'}]}}}, {'hostname': 'shared-shard-00-search-onamml', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.da

query_text How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached.
pipeline:  [{'$vectorSearch': {'index': 'vector_index', 'limit': 100, 'numCandidates': 1000, 'queryVector': [-0.08256451040506363, -0.07900252193212509, -0.05290786176919937, 0.021982736885547638, 0.046406690031290054, 0.027769701555371284, -0.02768588438630104, -0.020102187991142273, -0.05407266318798065, -0.061684805899858475, -0.03940979018807411, -0.029285598546266556, -0.1118478998541832, -0.03136416897177696, -0.04099257290363312, -0.07897000014781952, -0.02522769570350647, 0.043702732771635056, -0.030820483341813087, -0.041595760732889175, 0.10552595555782318, 0.0023172772489488125, 0.08983399718999863, 0.10865391790866852, -0.06146957352757454, 0.04154617711901665, 0.015428234823048115, 0.016568025574088097, 0.013623313046991825, -0.06059451401233673, 0.08428270369768143, 0.009563339874148369, -0.0026204399764537

2024-07-01 08:51:37,885 - autogen.agentchat.contrib.vectordb.mongodb - INFO - Using index: [{'id': '6682a6042cf0e270602c0fe1', 'name': 'vector_index', 'type': 'vectorSearch', 'status': 'READY', 'queryable': True, 'latestDefinitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12, 109000)}, 'latestDefinition': {'fields': [{'type': 'vector', 'numDimensions': 384, 'path': 'embedding', 'similarity': 'cosine'}]}, 'statusDetail': [{'hostname': 'shared-shard-00-search-6xag8e', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12)}, 'definition': {'fields': [{'type': 'vector', 'path': 'embedding', 'numDimensions': 384, 'similarity': 'cosine'}]}}}, {'hostname': 'shared-shard-00-search-onamml', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.da

query_text How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached.
pipeline:  [{'$vectorSearch': {'index': 'vector_index', 'limit': 140, 'numCandidates': 1400, 'queryVector': [-0.08256451040506363, -0.07900252193212509, -0.05290786176919937, 0.021982736885547638, 0.046406690031290054, 0.027769701555371284, -0.02768588438630104, -0.020102187991142273, -0.05407266318798065, -0.061684805899858475, -0.03940979018807411, -0.029285598546266556, -0.1118478998541832, -0.03136416897177696, -0.04099257290363312, -0.07897000014781952, -0.02522769570350647, 0.043702732771635056, -0.030820483341813087, -0.041595760732889175, 0.10552595555782318, 0.0023172772489488125, 0.08983399718999863, 0.10865391790866852, -0.06146957352757454, 0.04154617711901665, 0.015428234823048115, 0.016568025574088097, 0.013623313046991825, -0.06059451401233673, 0.08428270369768143, 0.009563339874148369, -0.0026204399764537

2024-07-01 08:51:53,494 - autogen.agentchat.contrib.vectordb.mongodb - INFO - Using index: [{'id': '6682a6042cf0e270602c0fe1', 'name': 'vector_index', 'type': 'vectorSearch', 'status': 'READY', 'queryable': True, 'latestDefinitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12, 109000)}, 'latestDefinition': {'fields': [{'type': 'vector', 'numDimensions': 384, 'path': 'embedding', 'similarity': 'cosine'}]}, 'statusDetail': [{'hostname': 'shared-shard-00-search-6xag8e', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 7, 1, 12, 50, 12)}, 'definition': {'fields': [{'type': 'vector', 'path': 'embedding', 'numDimensions': 384, 'similarity': 'cosine'}]}}}, {'hostname': 'shared-shard-00-search-onamml', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.da

query_text How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached.
pipeline:  [{'$vectorSearch': {'index': 'vector_index', 'limit': 180, 'numCandidates': 1800, 'queryVector': [-0.08256451040506363, -0.07900252193212509, -0.05290786176919937, 0.021982736885547638, 0.046406690031290054, 0.027769701555371284, -0.02768588438630104, -0.020102187991142273, -0.05407266318798065, -0.061684805899858475, -0.03940979018807411, -0.029285598546266556, -0.1118478998541832, -0.03136416897177696, -0.04099257290363312, -0.07897000014781952, -0.02522769570350647, 0.043702732771635056, -0.030820483341813087, -0.041595760732889175, 0.10552595555782318, 0.0023172772489488125, 0.08983399718999863, 0.10865391790866852, -0.06146957352757454, 0.04154617711901665, 0.015428234823048115, 0.016568025574088097, 0.013623313046991825, -0.06059451401233673, 0.08428270369768143, 0.009563339874148369, -0.0026204399764537