# RetrieveChat based FinRobot-RAG

In this demo, we showcase the RAG usecase of our finrobot, which inherits from autogen's RetrieveChat implementation.


Instead of using `RetrieveUserProxyAgent` directly, we register the context retrieval as a function for our bots.
For detailed implementation, refer to [rag function](../finrobot/functional/rag.py) and [rag workflow](../finrobot/agents/workflow.py) of `SingleAssistantRAG` 

In [5]:
import autogen
from finrobot.agents.workflow import SingleAssistantRAG

for openai configuration, rename OAI_CONFIG_LIST_sample to OAI_CONFIG_LIST and replace the api keys

In [6]:
# Read OpenAI API keys from a JSON file
llm_config = {
    "config_list": autogen.config_list_from_json(
        "../OAI_CONFIG_LIST",
        filter_dict={"model": ["gpt-3.5-turbo"]},
    ),
    "timeout": 120,
    "temperature": 0,
}

From `finrobot.agents.workflow` we import the `SingleAssistantRAG`, which takes a `retrieve_config` as input.
For `docs_path`, we first put our generated pdf report from [this notebook](./agent_annual_report.ipynb). 

For more configuration, refer to [autogen's documentation](https://microsoft.github.io/autogen/docs/reference/agentchat/contrib/retrieve_user_proxy_agent)

Then, lets do a simple Q&A.

In [7]:
assitant = SingleAssistantRAG(
    "Data_Analyst",
    llm_config,
    human_input_mode="NEVER",
    retrieve_config={
        "task": "qa",
        "vector_db": None,  # Autogen has bug for this version
        "docs_path": [
            "../report/Microsoft_Annual_Report_2023.pdf",
        ],
        "chunk_token_size": 1000,
        "get_or_create": True,
        "collection_name": "msft_analysis",
        "must_break_at_empty_line": False,
    },
)
assitant.chat("How's msft's 2023 income? Provide with some analysis.")

[33mUser_Proxy[0m (to Data_Analyst):

How's msft's 2023 income? Provide with some analysis.

--------------------------------------------------------------------------------
[33mData_Analyst[0m (to User_Proxy):

[32m***** Suggested tool call (call_dD0rXF90Tpxb98cC9PZGRNo7): retrieve_content *****[0m
Arguments: 
{"message":"Microsoft's 2023 income analysis","n_results":1}
[32m*********************************************************************************[0m

--------------------------------------------------------------------------------
[35m
>>>>>>>> EXECUTING FUNCTION retrieve_content...
Call ID: call_dD0rXF90Tpxb98cC9PZGRNo7
Input arguments: {'message': "Microsoft's 2023 income analysis", 'n_results': 1}[0m


CropBox missing from /Page, defaulting to MediaBox


Trying to create collection.
doc_ids:  [['doc_0']]
[32mAdding content of doc doc_0 to context.[0m
[33mUser_Proxy[0m (to Data_Analyst):

[32m***** Response from calling tool (call_dD0rXF90Tpxb98cC9PZGRNo7) *****[0m
Below is the context retrieved from the required file based on your query.
If you can't answer the question with or without the current context, you should try using a more refined search query according to your requirements, or ask for more contexts.

Your current query is: Microsoft's 2023 income analysis

Retrieved context is: Equity Research Report: Microsoft Corporation
FinRobot
https://ai4finance.org/
Income Summarization The company experienced a 7% Year-over-Year increase in revenue, driven by significant contributions from its Intelligent Cloud and Productivity and Business Processes segments, indicating a strong demand for cloud-based solutions and productivity software. Despite the revenue growth, the Cost of Goods Sold (COGS) increased by 5%, suggesting a ne

Here we come up with a more complex case, where we put the 10-k report of MSFT here.

Let' see how the agent work this out.

In [8]:
assitant = SingleAssistantRAG(
    "Data_Analyst",
    llm_config,
    human_input_mode="NEVER",
    retrieve_config={
        "task": "qa",
        "vector_db": None,  # Autogen has bug for this version
        "docs_path": [
            "../report/2023-07-27_10-K_msft-20230630.htm.pdf",
        ],
        "chunk_token_size": 2000,
        "collection_name": "msft_10k",
        "get_or_create": True,
        "must_break_at_empty_line": False,
    },
    rag_description="Retrieve content from MSFT's 2023 10-K report for detailed question answering.",
)
assitant.chat("How's msft's 2023 income? Provide with some analysis.")

[33mUser_Proxy[0m (to Data_Analyst):

How's msft's 2023 income? Provide with some analysis.

--------------------------------------------------------------------------------
[33mData_Analyst[0m (to User_Proxy):

[32m***** Suggested tool call (call_8kLRWCtUCVLFVpHZrJUijl7F): retrieve_content *****[0m
Arguments: 
{"message":"Microsoft's 2023 income analysis","n_results":1}
[32m*********************************************************************************[0m

--------------------------------------------------------------------------------
[35m
>>>>>>>> EXECUTING FUNCTION retrieve_content...
Call ID: call_8kLRWCtUCVLFVpHZrJUijl7F
Input arguments: {'message': "Microsoft's 2023 income analysis", 'n_results': 1}[0m


CropBox missing from /Page, defaulting to MediaBox


Trying to create collection.


CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, def

doc_ids:  [['doc_17']]
[32mAdding content of doc doc_17 to context.[0m
[33mUser_Proxy[0m (to Data_Analyst):

[32m***** Response from calling tool (call_8kLRWCtUCVLFVpHZrJUijl7F) *****[0m
Below is the context retrieved from the required file based on your query.
If you can't answer the question with or without the current context, you should try using a more refined search query according to your requirements, or ask for more contexts.

Your current query is: Microsoft's 2023 income analysis

Retrieved context is: Revenue from Windows Commercial products and cloud services, comprising volume licensing of the Windows operating system, Windows cloud services, and other Windows commercial oﬀerings
Devices revenue growth
Revenue from Devices, including Surface, HoloLens, and PC accessories
Xbox content and services revenue growth
Revenue from Xbox content and services, comprising ﬁrst- and third-party content (including games and in-game content), Xbox Game Pass and other subscription