# Vector + Graph Retriever Agent

You will modify the agent to include an additional tool that:

1. Searches the documents using the vector index
2. Traverses the graph around the document to find other facts

***

Load the environment variables, import the required modules, connect to the Neo4j database, and set up configuration.

In [None]:
import sys
sys.path.insert(0, '../solutions')

from typing import Annotated

from neo4j_graphrag.retrievers import VectorCypherRetriever
from neo4j_graphrag.schema import get_schema
from pydantic import Field

from agent_framework.azure import AzureAIClient
from azure.identity.aio import AzureCliCredential

from config import get_neo4j_driver, get_agent_config, get_embedder

In [None]:
# Get configuration and connect to Neo4j
config = get_agent_config()
driver = get_neo4j_driver().__enter__()

To use the vector index, you will need to create an embedding model to convert user queries into embeddings.

In [None]:
# Create the embedding model from Microsoft Foundry
embedder = get_embedder()

To retrieve data from the graph after documents have been found, you can define a `retrieval_query`.

This `retrieval_query` is appended to the Cypher query automatically generated by the neo4j-graphrag-python library's `VectorCypherRetriever`. The library first calls the vector index and yields `node` and `score` variables, which are then used by this query.

The conceptual flow is:

```cypher
CALL db.index.vector.queryNodes($index_name, $top_k, $embedding)
YIELD node, score
// ... then this retrieval_query is appended here ...
```

*   `node`: This represents the specific Chunk node found by the vector search (the text segment that mathematically matches your query).
*   `score`: The similarity score (0.0 to 1.0) of the vector match.

In [None]:
retrieval_query = """
MATCH (node)-[:FROM_DOCUMENT]-(doc:Document)-[:FILED]-(company:Company)
OPTIONAL MATCH (company)-[:FACES_RISK]->(risk:RiskFactor)
WITH node, score, company, collect(risk.name) as risks
RETURN 
    node.text as text,
    score,
    {
        company: company.name,
        risks: risks
    } AS metadata
ORDER BY score DESC
"""

> This query retrieves the `Company` the `Document` relates to and any associated `RiskFactor` nodes.

***

Create the vector retriever to search the `chunkEmbeddings` index and include the `retrieval_query`.

In [None]:
# Create vector retriever with graph context
vector_retriever = VectorCypherRetriever(
    driver=driver,
    index_name="chunkEmbeddings",
    embedder=embedder,
    retrieval_query=retrieval_query,
)

Create the tools for the agent.

In the Microsoft Agent Framework, tools are simple Python functions with:
- A descriptive docstring (used by the agent to decide when to use the tool)
- `Annotated` type hints for parameters (describes the parameter to the agent)

In [None]:
# Define the schema tool
def get_graph_schema() -> str:
    """Get the schema of the graph database including node labels, relationships, and properties."""
    return get_schema(driver)

# Define a tool to retrieve financial documents
def retrieve_financial_documents(
    query: Annotated[str, Field(description="The search query to find relevant documents")]
) -> str:
    """Find details about companies in their financial documents using semantic search."""
    results = vector_retriever.search(query_text=query, top_k=3)
    if not results.items:
        return "No documents found matching the query."
    return "\n\n".join(item.content for item in results.items)

> The agent will use the tool's name and docstring to determine if it is needed.

***

Create the agent `tools` list and set up the `AzureAIClient`.

In [None]:
# Add the tools to a list
tools = [get_graph_schema, retrieve_financial_documents]

# Create credential and client
credential = AzureCliCredential()

client = AzureAIClient(
    project_endpoint=config.project_endpoint,
    model_deployment_name=config.model_name,
    async_credential=credential,
)

> The agent has access to the `get_graph_schema` and `retrieve_financial_documents` tools. The agent will pick between them when processing the user's query.

***

Create a query, run the agent, and stream the results.

In [None]:
query = "Summarise what risk factors are mentioned in Apple's financial documents?"

async def run_agent():
    async with client.create_agent(
        name="workshop-vector-graph-agent",
        instructions=(
            "You are a helpful assistant that can answer questions about "
            "a graph database containing financial documents. You can retrieve "
            "the schema and search for relevant documents."
        ),
        tools=tools,
    ) as agent:
        print(f"User: {query}\n")
        print("Assistant: ", end="", flush=True)
        
        async for update in agent.run_stream(query):
            if update.text:
                print(update.text, end="", flush=True)
        
        print("\n")

await run_agent()

Experiment with the agent, ask different questions about the documents and the graph schema, for example:

* Summarize the schema of the graph database.
* What are the main risk factors mentioned in the documents?
* Tell me about cybersecurity threats in financial services
* What products does Microsoft mention in its financial documents?
* How are companies connected through their mentioned products?
* What type of questions can I ask about Apple using the graph database?

> The agent will pick different tools depending on the task.

***

Try modifying the `retrieval_query` to pull back additional data about the `Company` such as:

* Asset managers - `(company:Company)<-[:OWNS]-(manager:AssetManager)`
* Financial metrics - `(company:Company)-[:HAS_METRIC]->(metric:FinancialMetric)`
* Products - `(company:Company)-[:MENTIONS]->(product:Product)`

Including additional context will help the agent to create more specific responses.

***

[View the complete code](../solutions/02_02_vector_graph_agent.py)

In [None]:
# Try a different query
query = "What products does Microsoft mention in its financial documents?"

async def run_experiment():
    # Create a fresh client for this experiment
    credential = AzureCliCredential()
    client = AzureAIClient(
        project_endpoint=config.project_endpoint,
        model_deployment_name=config.model_name,
        async_credential=credential,
    )
    
    async with client.create_agent(
        name="workshop-vector-graph-agent",
        instructions=(
            "You are a helpful assistant that can answer questions about "
            "a graph database containing financial documents. You can retrieve "
            "the schema and search for relevant documents."
        ),
        tools=tools,
    ) as agent:
        print(f"User: {query}\n")
        print("Assistant: ", end="", flush=True)
        
        async for update in agent.run_stream(query):
            if update.text:
                print(update.text, end="", flush=True)
        
        print("\n")
    
    await credential.close()

await run_experiment()

In [None]:
# Cleanup
driver.close()
await credential.close()