# Fiscal Report Analysis

This notebook demonstrates how to load a PDF, chunk the text, generate embeddings, and use a custom retriever to answer questions about the document.

### Step 1: Setting Up the Environment

Here, we import all the required libraries and functions. We also load environment variables, which securely store sensitive information like API keys.

In [None]:
import asyncio
import os
import getpass
import yaml
import numpy as np
from dotenv import load_dotenv, find_dotenv, set_key
from IPython.display import display, Markdown, Code
from air import AIRefinery, DistillerClient, login
from air.utils import async_print
from library.data_processing import extract_text_with_pdfplumber, extract_tables_with_pdfplumber, chunk_text, generate_document_embeddings, cache_embeddings, load_cached_embeddings
from library.vector_store import InMemoryVectorStore

### Step 2: Securely Load Credentials

We'll check if your AI Refinery credentials are saved in a `.env` file. If not, you'll be prompted to enter them, and we'll save them for you.

In [None]:
load_dotenv()
env_path = find_dotenv()
if not env_path:
    with open(".env", "w") as f:
        pass
    env_path = find_dotenv()

account = os.getenv("ACCOUNT")
api_key = os.getenv("API_KEY")

if not account:
    account = getpass.getpass("Enter your AI Refinery Account ID: ")
    set_key(env_path, "ACCOUNT", account)

if not api_key:
    api_key = getpass.getpass("Enter your AI Refinery API Key: ")
    set_key(env_path, "API_KEY", api_key)

load_dotenv(override=True)
print("Credentials loaded successfully.")

### Step 3: Processing the Document

We take a PDF fiscal report, read its text and tables, and break it down into smaller, more manageable chunks. We then convert these chunks into a numerical format (called embeddings) that our AI can understand. This process is saved (cached) so we don't have to repeat it every time we run the notebook.

In [None]:
EMBEDDINGS_CACHE_PATH = "embeddings.pickle"
PDF_PATH = "data/acn-third-quarter-fiscal-2025-earnings-release.pdf"

auth = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)
base_url = os.getenv("AIREFINERY_ADDRESS", "")
air_client = AIRefinery(**auth.openai(base_url=base_url))
embedding_client = air_client.embeddings

if os.path.exists(EMBEDDINGS_CACHE_PATH):
    cached_data = load_cached_embeddings(EMBEDDINGS_CACHE_PATH)
    documents_from_pdf = cached_data["documents"]
    document_vectors = cached_data["vectors"]
else:
    # Extract both text and tables using pdfplumber for better results
    plain_text = extract_text_with_pdfplumber(PDF_PATH)
    table_html = extract_tables_with_pdfplumber(PDF_PATH)
    combined_content = plain_text + "\n\n--- Extracted Tables ---\n" + table_html
    
    documents_from_pdf = chunk_text(combined_content)
    document_vectors = generate_document_embeddings(documents_from_pdf, embedding_client)
    if documents_from_pdf and document_vectors:
        cache_embeddings(documents_from_pdf, document_vectors, EMBEDDINGS_CACHE_PATH)

vector_store = InMemoryVectorStore(documents_from_pdf, document_vectors)

### Step 4: Why Do We Chunk Documents?

Large Language Models (LLMs) have a limited context window, meaning they can only process a certain amount of text at once. To work with large documents, we break them into smaller, overlapping 'chunks.' This ensures that the model has enough context to find relevant information without being overwhelmed. Below, you can see the first few chunks created from our document.

In [None]:
for i in range(3):
    chunk_id = f"chunk_{i}"
    if chunk_id in documents_from_pdf:
        print(f"--- {chunk_id.upper()} ---")
        print(documents_from_pdf[chunk_id]['text'])
        print("\n")

### Step 5: Creating a Custom Search Tool

This step defines our specialized search function. When we ask a question, this tool converts the question into the same numerical format as our document chunks. It then searches through the chunks to find the most relevant pieces of information to answer the question.

In [None]:
def _format_document_result(doc_id, doc, source_weight=1, retriever_name=""):
    base_score = doc.get("score", 0.0)
    final_score = float(base_score * source_weight)
    content_text = doc.get("content", {}).get("text", "")
    formatted_result = f"Source: {retriever_name}\nID: {doc_id}\nContent: {content_text[:500]}..."
    return {"result": formatted_result, "score": final_score}

async def custom_in_memory_vector_search(query: str):
    print(f"Received query for vector search: '{query}'")
    response = embedding_client.create(
        input=[query],
        model="nvidia/nv-embedqa-mistral-7b-v2",
        encoding_format="float",
        extra_body={"input_type": "query", "truncate": "NONE"},
    )
    query_vector = np.array(response.data[0].embedding, dtype=np.float32).reshape(1, -1)
    
    print("Searching for relevant documents in the vector store...")
    documents = vector_store.search(query_vector)

    if not documents:
        return [{"result": "There is no relevant document from the PDF.", "score": 0}]

    results = [
        _format_document_result(
            doc["id"], doc, source_weight=1, retriever_name="PDF-NumPy-Retriever"
        )
        for doc in documents
    ]
    return results

### Step 6: Initialize the Research Agent

Now, we set up the first AI agent that will use our custom search tool. We create a project for simple questions (`DocumentSearch`). This setup only needs to be run once.

In [None]:
distiller_client = DistillerClient(base_url=base_url)
uuid = os.getenv("UUID", "test_user_refactored")
research_config_path = "custom_vector_search.yaml"

# Create project for the simple Research Agent
research_project = "DocumentSearch"
distiller_client.create_project(
    config_path=research_config_path, project=research_project
)

# Define the executor dictionary for the agent
research_executor_dict = {
    "Research Agent": {
        "Fiscal Reports Database": custom_in_memory_vector_search,
    }
}

print("Research Agent project created successfully.")

### Step 7: Visualize the Research Agent Workflow

Here we can see a simple diagram of our Research Agent. The `Distiller Orchestrator` takes our query and passes it to the `Research Agent`.

In [None]:
with open(research_config_path, 'r') as f:
    config_data = yaml.safe_load(f)

orchestrator_name = "Distiller Orchestrator"
agent_name = config_data.get('utility_agents', [{}])[0].get('agent_name', 'Research Agent')

mermaid_code = f"""graph TD
    A[User Query] --> O(({orchestrator_name}));
    O --> B[{agent_name}];
"""
display(Markdown("```mermaid\n" + mermaid_code + "\n```"))

### Step 8: Answering Questions with the Research Agent

We can now use our 'Research Agent' and the custom vector database to ask a specific question. The agent uses our custom search tool to find the answer in the document. **You can change the `query` text in the cell below and rerun it to ask a new question.**

In [None]:
async def ask_simple_question(query: str):
    """Runs a query using the Research Agent."""
    async with distiller_client(
        project=research_project,
        uuid=uuid,
        executor_dict=research_executor_dict,
    ) as dc:
        print(f"----\nQuery: {query}")
        responses = await dc.query(query=query)
        async for response in responses:
            await async_print(f"Response: {response['content']}")

# Define your query here
simple_query = "what did julie say in the report?"
await ask_simple_question(simple_query)

### Step 9: Initialize the Flow Super Agent

For more complex questions, we'll set up a more advanced 'Flow Super Agent'. This agent can perform multi-step reasoning to provide a comprehensive analysis.

In [None]:
flow_config_path = "flow_super_agent.yaml"

# Create project for the Flow Super Agent
flow_project = "FinancialAnalysisFlow"
distiller_client.create_project(
    config_path=flow_config_path, project=flow_project
)

# Define the executor dictionary for the agent
flow_executor_dict = {
    "Fiscal Report Researcher": {
        "Fiscal Reports Database": custom_in_memory_vector_search,
    }
}

print("Flow Super Agent project created successfully.")

### Step 10: Visualize the Flow Super Agent Workflow

This diagram shows the more complex workflow of our `FlowSuperAgent`. The `Distiller Orchestrator` initiates the `FlowSuperAgent`, which in turn orchestrates multiple specialized agents to build a final, comprehensive answer.

In [None]:
with open(flow_config_path, 'r') as f:
    config_data = yaml.safe_load(f)

orchestrator_name = "Distiller Orchestrator"
super_agent_config = config_data.get('super_agents', [{}])[0]
super_agent_name = super_agent_config.get('agent_name', 'Super Agent')
agent_flow = super_agent_config.get('config', {}).get('agent_list', [])
entry_agent = agent_flow[0]['agent_name'] if agent_flow else ''

mermaid_lines = ["graph TD"]
mermaid_lines.append(f"    O(({orchestrator_name})) --> S[{super_agent_name}]")
mermaid_lines.append(f"    subgraph {super_agent_name}")
mermaid_lines.append("        direction LR")

for agent_step in agent_flow:
    from_agent = agent_step['agent_name']
    if 'next_step' in agent_step:
        for to_agent in agent_step['next_step']:
            mermaid_lines.append(f"        {from_agent.replace(' ', '_')}[{from_agent}] --> {to_agent.replace(' ', '_')}[{to_agent}]")
    else:
        mermaid_lines.append(f"        {from_agent.replace(' ', '_')}[{from_agent}]")

mermaid_lines.append("    end")
mermaid_code = "\n".join(mermaid_lines)

display(Markdown("```mermaid\n" + mermaid_code + "\n```"))

### Step 11: Performing a Deeper Analysis

Now we can use the 'Flow Super Agent' for questions that require reasoning and combining information. **You can change the `query` text in the cell below and rerun it to request a new analysis.**

In [None]:
async def perform_comparative_analysis(query: str):
    """Runs a query using the Flow Super Agent."""
    async with distiller_client(
        project=flow_project,
        uuid=uuid,
        executor_dict=flow_executor_dict,
    ) as dc:
        print(f"----\nQuery: {query}")
        responses = await dc.query(query=query)
        async for response in responses:
            await async_print(f"Role: {response.get('role', 'System')}\nContent: {response.get('content', '')}\n")

# Define your query here
analysis_query = "Provide a comparative analysis of the attached fiscal report."
await perform_comparative_analysis(analysis_query)

### Step 12: Customize the Flow Super Agent

Now it's your turn to experiment! You can customize the behavior of the `FlowSuperAgent` by editing the `flow_super_agent.yaml` file.

To do this, use the file browser on the left panel of your Jupyter environment to find and open `flow_super_agent.yaml`. After you save your changes, you can rerun this notebook from **Step 9** to see your new workflow in action.

Here are a few ideas to get you started:

1.  **Change the Goal:** Modify the `goal` in the `Financial Analyst` super agent's config to focus on a different aspect, like "risk assessment" or "growth opportunities."
2.  **Add a New Agent:** Introduce a new `utility_agent`, like the [CriticalThinker Agent](https://sdk.airefinery.accenture.com/distiller/agent-library/utility_agents/criticalthinker/), to challenge the assumptions of the `Comparative Analysis Writer` before the final report is generated. You can find a full list of available agents in the [Agent Library](https://sdk.airefinery.accenture.com/distiller/agent-library/).
3.  **Modify the Flow:** Change the `next_step` for the existing agents. For example, have the `Market Analyst` also feed information directly to the `Competitor Analyst`.
4.  **Customize Leading Questions:** Edit the `leading_questions` for the `Comparative Analysis Writer` to change the structure and focus of the final report. For instance, add a question about the company's sustainability initiatives.
5.  **Simplify the Flow:** For a quicker analysis, remove one of the agents, like the `Competitor Analyst`, and adjust the `next_step` pointers accordingly to create a more direct path to the `Comparative Analysis Writer`.