# RetrieveChat based FinRobot-RAG

In this demo, we showcase the RAG usecase of our finrobot, which inherits from autogen's RetrieveChat implementation.


Instead of using `RetrieveUserProxyAgent` directly, we register the context retrieval as a function for our bots.
For detailed implementation, refer to [rag function](../finrobot/functional/rag.py) and [rag workflow](../finrobot/agents/workflow.py) of `SingleAssistantRAG` 

In [1]:
import autogen
from finrobot.agents.workflow import SingleAssistantRAG

In [2]:
import sys
print(sys.path)

['C:\\Program Files\\Python311\\python311.zip', 'C:\\Program Files\\Python311\\DLLs', 'C:\\Program Files\\Python311\\Lib', 'C:\\Program Files\\Python311', 'c:\\Users\\I012859\\Documents\\Projects\\FinRobot\\venv', '', 'c:\\Users\\I012859\\Documents\\Projects\\FinRobot\\venv\\Lib\\site-packages', 'c:\\users\\i012859\\documents\\projects\\finrobot', 'c:\\Users\\I012859\\Documents\\Projects\\FinRobot\\venv\\Lib\\site-packages\\win32', 'c:\\Users\\I012859\\Documents\\Projects\\FinRobot\\venv\\Lib\\site-packages\\win32\\lib', 'c:\\Users\\I012859\\Documents\\Projects\\FinRobot\\venv\\Lib\\site-packages\\Pythonwin']


for openai configuration, use OAI_CONFIG_LIST and replace the api keys

In [None]:
# Read OpenAI API keys from a JSON file
llm_config = {
    "config_list": autogen.config_list_from_json(
        "../OAI_CONFIG_LIST",
        filter_dict={"model": ["gemma2-9b-it"]},
    ),
    "timeout": 120,
    "temperature": 0,
}

From `finrobot.agents.workflow` we import the `SingleAssistantRAG`, which takes a `retrieve_config` as input.
For `docs_path`, we first put our generated pdf report from [this notebook](./agent_annual_report.ipynb). 

For more configuration, refer to [autogen's documentation](https://microsoft.github.io/autogen/docs/reference/agentchat/contrib/retrieve_user_proxy_agent)

Then, lets do a simple Q&A.

In [4]:
assitant = SingleAssistantRAG(
    "Data_Analyst",
    llm_config,
    human_input_mode="NEVER",
    retrieve_config={
        "task": "qa",
        "vector_db": None,  # Autogen has bug for this version
        "docs_path": [
            "../report/Microsoft_Annual_Report_2023.pdf",
        ],
        "chunk_token_size": 1000,
        "get_or_create": True,
        "collection_name": "msft_analysis",
        "must_break_at_empty_line": False,
    },
)
assitant.chat("How's msft's 2023 income? Provide with some analysis.")

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given


[33mUser_Proxy[0m (to Data_Analyst):

How's msft's 2023 income? Provide with some analysis.

--------------------------------------------------------------------------------
[33mData_Analyst[0m (to User_Proxy):

[32m***** Suggested tool call (jxq60hnpa): retrieve_content *****[0m
Arguments: 
{"message":"Microsoft's income in 2023","n_results":3}
[32m*************************************************************[0m

--------------------------------------------------------------------------------
[35m
>>>>>>>> EXECUTING FUNCTION retrieve_content...
Call ID: jxq60hnpa
Input arguments: {'message': "Microsoft's income in 2023", 'n_results': 3}[0m
Trying to create collection.


  from .autonotebook import tqdm as notebook_tqdm
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given




Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
Number of requested results 3 is greater than number of elements in index 1, updating n_results = 1


doc_ids:  [['doc_0']]
[32mAdding content of doc doc_0 to context.[0m
[33mUser_Proxy[0m (to Data_Analyst):

[32m***** Response from calling tool (jxq60hnpa) *****[0m
Below is the context retrieved from the required file based on your query.
If you can't answer the question with or without the current context, you should try using a more refined search query according to your requirements, or ask for more contexts.

Your current query is: Microsoft's income in 2023

Retrieved context is: Equity Research Report: Microsoft Corporation
Income Summarization The company experienced a 7% Year-over-Year increase in revenue, driven by significant contributions from its Intelligent Cloud and Productivity and Business Processes segments, indicating a strong demand for cloud-based solutions and productivity software. Despite the revenue growth, the Cost of Goods Sold (COGS) increased by 5%, suggesting a need for closer cost control measures to improve cost efficiency and maintain profitabilit

Here we come up with a more complex case, where we put the 10-k report of MSFT here.

Let' see how the agent work this out.

In [5]:
assitant = SingleAssistantRAG(
    "Data_Analyst",
    llm_config,
    human_input_mode="NEVER",
    retrieve_config={
        "task": "qa",
        "vector_db": None,  # Autogen has bug for this version
        "docs_path": [
            "../report/2023-07-27_10-K_msft-20230630.htm.pdf",
        ],
        "chunk_token_size": 2000,
        "collection_name": "msft_10k",
        "get_or_create": True,
        "must_break_at_empty_line": False,
    },
    rag_description="Retrieve content from MSFT's 2023 10-K report for detailed question answering.",
)
assitant.chat("How's msft's 2023 income? Provide with some analysis.")

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given


[33mUser_Proxy[0m (to Data_Analyst):

How's msft's 2023 income? Provide with some analysis.

--------------------------------------------------------------------------------
[33mData_Analyst[0m (to User_Proxy):

[32m***** Suggested tool call (q29f9wge3): retrieve_content *****[0m
Arguments: 
{"message":"YoY comparisons of profit margin","n_results":3}
[32m*************************************************************[0m

--------------------------------------------------------------------------------
[35m
>>>>>>>> EXECUTING FUNCTION retrieve_content...
Call ID: q29f9wge3
Input arguments: {'message': 'YoY comparisons of profit margin', 'n_results': 3}[0m


Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


Trying to create collection.


Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


doc_ids:  [['doc_18', 'doc_26', 'doc_19']]
[32mAdding content of doc doc_18 to context.[0m
[32mAdding content of doc doc_26 to context.[0m
[32mAdding content of doc doc_19 to context.[0m
[33mUser_Proxy[0m (to Data_Analyst):

[32m***** Response from calling tool (q29f9wge3) *****[0m
Below is the context retrieved from the required file based on your query.
If you can't answer the question with or without the current context, you should try using a more refined search query according to your requirements, or ask for more contexts.

Your current query is: YoY comparisons of profit margin

Retrieved context is: Operating income decreased $4.0 billion or 20%.
Gross margin decreased $4.2 billion or 13% driven by declines in Windows and Devices. Gross margin percentage decreased driven by a decline in Devices.
Operating expenses decreased $195 million or 2% driven by a decline in Devices, oﬀset in part by investments in Search and news advertising, including 2 points of growth from 

BadRequestError: Error code: 400 - {'error': {'message': 'Please reduce the length of the messages or completion.', 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}