# Auto Generated Agent Chat: Using RetrieveChat for Retrieve Augmented Code Generation and Question Answering

RetrieveChat is a convesational framework for retrieve augmented code generation and question answering. In this notebook, we demonstrate how to utilize RetrieveChat to generate code and answer questions based on customized documentations that are not present in the LLM's training dataset. RetrieveChat uses the `RetrieveAssistantAgent` and `RetrieveUserProxyAgent`, which is similar to the usage of `AssistantAgent` and `UserProxyAgent` in other notebooks (e.g., [Automated Task Solving with Code Generation, Execution & Debugging](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_agentchat_auto_feedback_from_code_execution.ipynb)). Essentially,`RetrieveAssistantAgent` and  `RetrieveUserProxyAgent` implements a different auto reply mechanism corresponding to the RetrieveChat prompts.

## Requirements

FLAML requires `Python>=3.8`. To run this notebook example, please install flaml with the [mathchat] option.
```bash
pip install flaml[retrievechat]
```

In [1]:
# %pip install flaml[retrievechat]~=2.0.0rc4

## Set your API Endpoint

The [`config_list_from_json`](https://microsoft.github.io/FLAML/docs/reference/autogen/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.


In [1]:
from flaml import autogen

config_list = autogen.config_list_from_json(
    env_or_file=".config.local",
    file_location=".",
    filter_dict={
        "model": {
            "gpt-4",
            "gpt4",
            "gpt-4-32k",
            "gpt-4-32k-0314",
            "gpt-35-turbo",
            "gpt-3.5-turbo",
        }
    },
)

assert len(config_list) > 0
print("models to use: ", [config_list[i]["model"] for i in range(len(config_list))])

  from .autonotebook import tqdm as notebook_tqdm


models to use:  ['gpt-4']


It first looks for environment variable "OAI_CONFIG_LIST" which needs to be a valid json string. If that variable is not found, it then looks for a json file named "OAI_CONFIG_LIST". It filters the configs by models (you can filter by other keys as well). Only the gpt-4 and gpt-3.5-turbo models are kept in the list based on the filter condition.

The config list looks like the following:
```python
config_list = [
    {
        'model': 'gpt-4',
        'api_key': '<your OpenAI API key here>',
    },
    {
        'model': 'gpt-4',
        'api_key': '<your Azure OpenAI API key here>',
        'api_base': '<your Azure OpenAI API base here>',
        'api_type': 'azure',
        'api_version': '2023-06-01-preview',
    },
    {
        'model': 'gpt-3.5-turbo',
        'api_key': '<your Azure OpenAI API key here>',
        'api_base': '<your Azure OpenAI API base here>',
        'api_type': 'azure',
        'api_version': '2023-06-01-preview',
    },
]
```

If you open this notebook in colab, you can upload your files by clicking the file icon on the left panel and then choose "upload file" icon.

You can set the value of config_list in other ways you prefer, e.g., loading from a YAML file.

## Construct agents for RetrieveChat

We start by initialzing the `RetrieveAssistantAgent` and `RetrieveUserProxyAgent`. The system message needs to be set to "You are a helpful assistant." for RetrieveAssistantAgent. The detailed instructions are given in the user message. Later we will use the `RetrieveUserProxyAgent.generate_init_prompt` to combine the instructions and a math problem for an initial prompt to be sent to the LLM assistant.

In [2]:
from flaml.autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from flaml.autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
import chromadb

autogen.ChatCompletion.start_logging()

# 1. create an RetrieveAssistantAgent instance named "assistant"
assistant = RetrieveAssistantAgent(
    name="assistant", 
    system_message="You are a helpful assistant.",
    llm_config={
        "request_timeout": 600,
        "seed": 42,
        "config_list": config_list,
    },
)

# 2. create the RetrieveUserProxyAgent instance named "ragproxyagent"
# By default, the human_input_mode is "ALWAYS", which means the agent will ask for human input at every step. We set it to "NEVER" here.
# `docs_path` is the path to the docs directory. By default, it is set to "./docs". Here we generated the documentations from FLAML's docstrings.
# Navigate to the website folder and run `pydoc-markdown` and it will generate folder `reference` under `website/docs`.
# `chunk_token_size` is the chunk token size for the retrieve chat. By default, it is set to `max_tokens * 0.6`, here we set it to 2000.
ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    retrieve_config={
        "docs_path": "../website/docs/reference",
        "chunk_token_size": 2000,
        "model": config_list[0]["model"],
        "client": chromadb.PersistentClient(path="/tmp/chromadb"),
        "embedding_model": "all-mpnet-base-v2",
    },
)

### Example 1

Use RetrieveChat to help generate sample code and automatically run the code and fix errors if there is any.

Problem: Which API should I use if I want to use FLAML for a classification task and I want to train the model in 30 seconds. Use spark to parallel the training. Force cancel jobs if time limit is reached.

In [5]:
# reset the assistant. Always reset the assistant before starting a new conversation.
assistant.reset()

# given a problem, we use the ragproxyagent to generate a prompt to be sent to the assistant as the initial message.
# the assistant receives the message and generates a response. The response will be sent back to the ragproxyagent for processing.
# The conversation continues until the termination condition is met, in RetrieveChat, the termination condition when no human-in-loop is no code block detected.
# With human-in-loop, the conversation will continue until the user says "exit".
code_problem = "How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached."
ragproxyagent.initiate_chat(assistant, problem=code_problem, search_string="spark")  # search_string is used as an extra filter for the embeddings search, in this case, we only want to search documents that contain "spark".

ERROR:flaml.autogen.retrieve_utils:Collection flaml-docs already exists.


doc_ids:  [['doc_29', 'doc_34', 'doc_26', 'doc_11', 'doc_45', 'doc_44', 'doc_36', 'doc_15', 'doc_46', 'doc_43', 'doc_30', 'doc_63', 'doc_14', 'doc_32', 'doc_55']]
Adding doc_id doc_29 to context.
Adding doc_id doc_34 to context.
Adding doc_id doc_26 to context.
Adding doc_id doc_11 to context.
ragproxyagent (to assistant):

You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user. You should follow the following steps to answer a question:
Step 1, you estimate the user's intent based on the question and context. The intent can be a code generation task or
a QA task.
Step 2, you generate code or answer the question based on the intent.
You should leverage the context provided by the user as much as possible. If you think the context is not enough, you
can reply exactly "UPDATE CONTEXT" to ask the user to provide more contexts.
For code generation, you must obey the following rules:
You MUST NOT install any packages

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


23/08/08 07:35:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[flaml.automl.logger: 08-08 07:35:27] {1679} INFO - task = classification
[flaml.automl.logger: 08-08 07:35:27] {1690} INFO - Evaluation method: cv
[flaml.automl.logger: 08-08 07:35:27] {1788} INFO - Minimizing error metric: log_loss
[flaml.automl.logger: 08-08 07:35:27] {1900} INFO - List of ML learners in AutoML Run: ['xgboost', 'rf']


[32m[I 2023-08-08 07:35:27,646][0m A new study created in memory with name: optuna[0m
[32m[I 2023-08-08 07:35:27,910][0m A new study created in memory with name: optuna[0m


23/08/08 07:35:30 WARN SparkSession: Using an existing Spark session; only runtime SQL configurations will take effect.
[flaml.tune.tune: 08-08 07:35:30] {729} INFO - Number of trials: 1/1000000, 1 RUNNING, 0 TERMINATED


2023-08-08 07:35:44.881985: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


[flaml.tune.tune: 08-08 07:35:49] {749} INFO - Brief result: {'pred_time': 2.0863215128580728e-05, 'wall_clock_time': 24.607859134674072, 'metric_for_logging': {'pred_time': 2.0863215128580728e-05}, 'val_loss': 0.7099812857309977, 'trained_estimator': <flaml.automl.model.XGBoostSklearnEstimator object at 0x7fa7727a3070>}
[flaml.tune.tune: 08-08 07:35:49] {729} INFO - Number of trials: 2/1000000, 1 RUNNING, 1 TERMINATED


                                                                                

[flaml.tune.tune: 08-08 07:35:49] {749} INFO - Brief result: {'pred_time': 1.7093022664388023e-05, 'wall_clock_time': 24.926518440246582, 'metric_for_logging': {'pred_time': 1.7093022664388023e-05}, 'val_loss': 0.34762326869958377, 'trained_estimator': <flaml.automl.model.RandomForestEstimator object at 0x7fa7727a2800>}
[flaml.tune.tune: 08-08 07:35:49] {729} INFO - Number of trials: 3/1000000, 1 RUNNING, 2 TERMINATED
[flaml.tune.tune: 08-08 07:35:50] {749} INFO - Brief result: {'pred_time': 1.735210418701172e-05, 'wall_clock_time': 25.229209661483765, 'metric_for_logging': {'pred_time': 1.735210418701172e-05}, 'val_loss': 0.2581059209712818, 'trained_estimator': <flaml.automl.model.RandomForestEstimator object at 0x7fa7727a16f0>}
[flaml.tune.tune: 08-08 07:35:50] {729} INFO - Number of trials: 4/1000000, 1 RUNNING, 3 TERMINATED
[flaml.tune.tune: 08-08 07:35:50] {749} INFO - Brief result: {'pred_time': 1.658916473388672e-05, 'wall_clock_time': 25.52937912940979, 'metric_for_logging': {

Time exceeded, canceled jobs



[flaml.automl.logger: 08-08 07:35:55] {2493} INFO - selected model: None
[flaml.automl.logger: 08-08 07:35:55] {2627} INFO - retrain rf for 0.0s
[flaml.automl.logger: 08-08 07:35:55] {2630} INFO - retrained model: RandomForestClassifier(max_features=0.38633389177321886, max_leaf_nodes=10,
                       n_estimators=16, n_jobs=-1, random_state=12032022)
[flaml.automl.logger: 08-08 07:35:55] {1930} INFO - fit succeeded
[flaml.automl.logger: 08-08 07:35:55] {1931} INFO - Time taken to find the best model: 29.060681343078613
Best estimator found: rf
Best config found: {'n_estimators': 16, 'max_features': 0.38633389177321886, 'max_leaves': 10, 'criterion': 'gini'}
Best validation loss found: 0.13137661858336086
ragproxyagent (to assistant):

exitcode: 0 (execution succeeded)
Code output: 
You MUST NOT install any packages because all the packages needed are already installed.
None

--------------------------------------------------------------------------------
assistant (to ragpr

### Example 2

Use RetrieveChat to answer a question that is not related to code generation.

Problem: Who is the author of FLAML?

In [6]:
# reset the assistant. Always reset the assistant before starting a new conversation.
assistant.reset()

qa_problem = "Who is the author of FLAML?"
ragproxyagent.initiate_chat(assistant, problem=qa_problem)

doc_ids:  [['doc_3', 'doc_45', 'doc_29', 'doc_18', 'doc_34', 'doc_14', 'doc_4', 'doc_52', 'doc_20', 'doc_41', 'doc_58', 'doc_46', 'doc_21', 'doc_59', 'doc_54', 'doc_62', 'doc_42', 'doc_15', 'doc_44', 'doc_60']]
Adding doc_id doc_3 to context.
Adding doc_id doc_45 to context.
Adding doc_id doc_29 to context.
Adding doc_id doc_18 to context.
Adding doc_id doc_34 to context.
ragproxyagent (to assistant):

You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user. You should follow the following steps to answer a question:
Step 1, you estimate the user's intent based on the question and context. The intent can be a code generation task or
a QA task.
Step 2, you generate code or answer the question based on the intent.
You should leverage the context provided by the user as much as possible. If you think the context is not enough, you
can reply exactly "UPDATE CONTEXT" to ask the user to provide more contexts.
For code 

### Example 3

Use RetrieveChat to help generate sample code and ask for human-in-loop feedbacks.

Problem: how to build a time series forecasting model for stock price using FLAML?

In [7]:
# reset the assistant. Always reset the assistant before starting a new conversation.
assistant.reset()

# set `human_input_mode` to be `ALWAYS`, so the agent will ask for human input at every step.
ragproxyagent.human_input_mode = "ALWAYS"
code_problem = "how to build a time series forecasting model for stock price using FLAML?"
ragproxyagent.initiate_chat(assistant, problem=code_problem)

doc_ids:  [['doc_31', 'doc_29', 'doc_51', 'doc_34', 'doc_50', 'doc_45', 'doc_3', 'doc_48', 'doc_33', 'doc_49', 'doc_36', 'doc_30', 'doc_14', 'doc_2', 'doc_4', 'doc_46', 'doc_27', 'doc_6', 'doc_28', 'doc_15']]
Adding doc_id doc_31 to context.
Adding doc_id doc_29 to context.
Adding doc_id doc_51 to context.
Adding doc_id doc_34 to context.
Adding doc_id doc_50 to context.
ragproxyagent (to assistant):

You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user. You should follow the following steps to answer a question:
Step 1, you estimate the user's intent based on the question and context. The intent can be a code generation task or
a QA task.
Step 2, you generate code or answer the question based on the intent.
You should leverage the context provided by the user as much as possible. If you think the context is not enough, you
can reply exactly "UPDATE CONTEXT" to ask the user to provide more contexts.
For code g

### Example 4

Use RetrieveChat to answer a question and ask for human-in-loop feedbacks.

Problem: Is there a function named `tune_automl` in FLAML?

In [8]:
# reset the assistant. Always reset the assistant before starting a new conversation.
assistant.reset()

# set `human_input_mode` to be `ALWAYS`, so the agent will ask for human input at every step.
ragproxyagent.human_input_mode = "ALWAYS"
qa_problem = "Is there a function named `tune_automl` in FLAML?"
ragproxyagent.initiate_chat(assistant, problem=qa_problem)

doc_ids:  [['doc_29', 'doc_34', 'doc_14', 'doc_45', 'doc_3', 'doc_15', 'doc_13', 'doc_22', 'doc_28', 'doc_18', 'doc_5', 'doc_4', 'doc_41', 'doc_49', 'doc_39', 'doc_9', 'doc_20', 'doc_38', 'doc_51', 'doc_21']]
Adding doc_id doc_29 to context.
Adding doc_id doc_34 to context.
Adding doc_id doc_14 to context.
ragproxyagent (to assistant):

You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user. You should follow the following steps to answer a question:
Step 1, you estimate the user's intent based on the question and context. The intent can be a code generation task or
a QA task.
Step 2, you generate code or answer the question based on the intent.
You should leverage the context provided by the user as much as possible. If you think the context is not enough, you
can reply exactly "UPDATE CONTEXT" to ask the user to provide more contexts.
For code generation, you must obey the following rules:
You MUST NOT install

INFO:flaml.autogen.oai.completion:retrying in 10 seconds...
Traceback (most recent call last):
  File "/datadrive/FLAML/flaml/autogen/oai/completion.py", line 204, in _get_response
    response = openai_completion.create(**config)
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_requestor.py", line 298, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_requestor.py", line 700, in _interpret_response
    self._interpret_response_line(
  File "/home/lijiang1/anaconda

[flaml.autogen.oai.completion: 08-08 07:37:23] {212} INFO - retrying in 10 seconds...
Traceback (most recent call last):
  File "/datadrive/FLAML/flaml/autogen/oai/completion.py", line 204, in _get_response
    response = openai_completion.create(**config)
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_requestor.py", line 298, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_requestor.py", line 700, in _interpret_response
    self._interpret_response_line(
  Fil

INFO:flaml.autogen.oai.completion:retrying in 10 seconds...
Traceback (most recent call last):
  File "/datadrive/FLAML/flaml/autogen/oai/completion.py", line 204, in _get_response
    response = openai_completion.create(**config)
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_requestor.py", line 298, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/home/lijiang1/anaconda3/envs/flaml-oss/lib/python3.10/site-packages/openai/api_requestor.py", line 700, in _interpret_response
    self._interpret_response_line(
  File "/home/lijiang1/anaconda

assistant (to ragproxyagent):

There doesn't seem to be a function named `tune_automl` in the FLAML library. However, there is a `tune.run` function available for hyperparameter optimization, and you can use the AutoML class from the FLAML library to perform automated machine learning tasks. If you have a specific question or need help regarding the usage of FLAML, feel free to ask.

--------------------------------------------------------------------------------
There doesn't seem to be a function named `tune_automl` in the FLAML library. However, there is a `tune.run` function available for hyperparameter optimization, and you can use the AutoML class from the FLAML library to perform automated machine learning tasks. If you have a specific question or need help regarding the usage of FLAML, feel free to ask.

--------------------------------------------------------------------------------


### Example 5

Use RetrieveChat to do QA for HotpotQA dataset.


In [3]:
corpus_file = "/datadrive/FLAML/evaluation/retrievechat/NaturalQuestion/naturalquestionsshortqa/corpus.txt"

In [9]:
# Create a new collection for HotpotQA dataset
ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    retrieve_config={
        "docs_path": corpus_file,
        "chunk_token_size": 4900,
        "model": config_list[0]["model"],
        "client": chromadb.PersistentClient(path="/tmp/chromadb"),
        "collection_name": "natural-questions",
        "chunk_mode": "one_line",
        "embedding_model": "all-MiniLM-L6-v2",
    },
)

In [10]:
queries_file = "/datadrive/FLAML/evaluation/retrievechat/NaturalQuestion/naturalquestionsshortqa/queries.jsonl"

with open(queries_file, "r") as f:
    queries = [json.loads(line) for line in f.readlines()]
    questions = [q["text"] for q in queries]
    answers = [q["metadata"]["answer"] for q in queries]

In [11]:
for i in range(5):
    print(questions[i], answers[i])

what is non controlling interest on balance sheet ["the portion of a subsidiary corporation 's stock that is not owned by the parent corporation"]
how many episodes are in chicago fire season 4 ['23']
who sings love will keep us alive by the eagles ['Timothy B. Schmit']
who is the leader of the ontario pc party ['Patrick Walter Brown']
where did the last name keith come from ['from Keith in East Lothian , Scotland', "from a nickname , derived from the Middle High German kīt , a word meaning `` sprout '' , `` offspring ''"]


In [12]:
# reset the assistant. Always reset the assistant before starting a new conversation.
assistant.reset()

# set `human_input_mode` to be `ALWAYS`, so the agent will ask for human input at every step.
ragproxyagent.human_input_mode = "NEVER"
qa_problem = questions[0]
ragproxyagent.initiate_chat(assistant, problem=qa_problem, n_results=5)

doc_ids:  [['doc_0', 'doc_155', 'doc_19', 'doc_196', 'doc_140']]
Adding doc_id doc_0 to context.
Adding doc_id doc_155 to context.
ragproxyagent (to assistant):

You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user. You should follow the following steps to answer a question:
Step 1, you estimate the user's intent based on the question and context. The intent can be a code generation task or
a question answering task.
Step 2, you reply based on the intent.
You should leverage the context provided by the user as much as possible. If you need more context, you should reply 
"UPDATE CONTEXT".
For code generation task, you must obey the following rules:
Rule 1. You MUST NOT install any packages because all the packages needed are already installed.
Rule 2. You must follow the formats below to write your code:
```language
# your code
```

For question answering task, you must give as short an answer as possible.

Us

In [None]:
MY_PROMPT = """You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user. You must give as short an answer as possible.
"""

# reset the assistant. Always reset the assistant before starting a new conversation.
assistant.reset()

# set `human_input_mode` to be `ALWAYS`, so the agent will ask for human input at every step.
ragproxyagent.human_input_mode = "NEVER"
ragproxyagent.customized_prompt = MY_PROMPT
qa_problem = questions[0]
ragproxyagent.initiate_chat(assistant, problem=qa_problem, n_results=5)