# DocumentAgent

In this notebook, we demonstrated how to use Document Agent which 
1. Ingest documents from local or url.
2. Answer questions with RAG capability.

### Installation

To get started with the document agent integration in AG2, follow these steps:

Install AG2 with the `rag` extra:
   ```bash
   pip install ag2[rag]
   ```

**Warning:**
   1. Current document agent only support questions related to the ingested documents.
   2. Answers may not be accurate for documents that could not be parsed correctly to markdown format.

You're all set! Now you can start using document agent feature in AG2.

In [1]:
import os

import autogen

config_list = autogen.config_list_from_json(
    "../OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-4o"],
    },
)
os.environ["OPENAI_API_KEY"] = config_list[0]["api_key"]

llm_config = {
    "config_list": config_list,
}

  from pandas.core import (


### Ingesting local documents and answering questions

In [2]:
from autogen.agents.experimental.document_agent.document_agent import DocumentAgent

document_agent = DocumentAgent(llm_config=llm_config)
document_agent.run(
    "could you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf? What is the fiscal year 2024 financial summary?"
)

  from .autonotebook import tqdm as notebook_tqdm
INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


[33muser[0m (to Document_Agent):

could you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf? What is the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------
[33m_User[0m (to chat_manager):

could you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf? What is the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------
[32m
Next speaker: DocumentTriageAgent
[0m
[33mDocumentTriageAgent[0m (to chat_manager):

{"ingestions":[{"path_or_url":"../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf"}],"queries":[{"query_type":"RAG_QUERY","query":"What is the fiscal year 2024 financial summary?"}]}

--------------------------------------------------------------------------------
[32m
Next speaker: TaskManagerAgent
[0m
context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'Quer

INFO:autogen.agents.experimental.document_agent.document_utils:Error when checking if ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf is a valid URL: Invalid URL.
INFO:autogen.agents.experimental.document_agent.document_utils:Detected file. Returning file path...
INFO:docling.document_converter:Going to convert document batch...
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.pipeline.base_pipeline:Processing document Toast_financial_report.pdf
INFO:docling.document_converter:Finished converting document Toast_financial_report.pdf in 16.33 sec.
INFO:autogen.agents.experimental.document_agent.parser_utils:Document converted in 16.33 seconds.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Collection docling-parsed-docs was created in the database.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Loading input doc: /workspaces/ag2/notebook/par

docling ingest: {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True} 
 {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
[33m_Swarm_Tool_Executor[0m (to chat_manager):

[32m***** Response from calling tool (call_zloTpZZrxiNcvyflE0AHSjAw) *****[0m
Data Ingestion Task Completed for ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf
[32m**********************************************************************[0m

--------------------------------------------------------------------------------
[32m
Next speaker: TaskManagerAgent
[0m
context_variables {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2

ChatResult(chat_id=None, chat_history=[{'content': 'could you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf? What is the fiscal year 2024 financial summary?', 'role': 'assistant', 'name': 'user'}, {'content': "The fiscal year 2024 financial summary for Toast, Inc. is as follows:\n\n- Total assets increased to $2,227 million by September 30, 2024, from $1,958 million at the end of 2023.\n- The company's total liabilities stood at $807 million.\n- Stockholders' equity was reported at $1,420 million.\n- Toast, Inc. generated total revenue of $3,622 million for the nine months ending September 30, 2024.\n- The company recorded a gross profit of $857 million.\n- Operating expenses amounted to $873 million, resulting in an operating loss of $16 million.\n- There was a net loss of $13 million for the period, with a basic and diluted loss per share of $0.02.", 'role': 'user', 'name': 'Document_Agent'}], summary="The fiscal year 2024 financial summary for Toast, Inc. is 

### Fetching a webpage and answering questions

In [2]:
from autogen.agents.experimental.document_agent.document_agent import DocumentAgent

document_agent = DocumentAgent(llm_config=llm_config)
document_agent.run(
    "could you read 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html' and summarize the article?"
)

INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


[33muser[0m (to Document_Agent):

could you read 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html' and summarize the article?

--------------------------------------------------------------------------------
[33m_User[0m (to chat_manager):

could you read 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html' and summarize the article?

--------------------------------------------------------------------------------
[32m
Next speaker: DocumentTriageAgent
[0m
[33mDocumentTriageAgent[0m (to chat_manager):

{"ingestions":[{"path_or_url":"https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html"}],"queries":[]}

--------------------------------------------------------------------------------
[32m
Next speaker: TaskManagerAgent
[0m
context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': []}
context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest

INFO:autogen.agents.experimental.document_agent.document_utils:Detected URL. Downloading content...
INFO:WDM:Get LATEST chromedriver version for google-chrome


KeyboardInterrupt: 