## From Knowledge Graph to AI Agent

### A hands-on walkthrough for demonstrating Agentic KG


## Introduction

Raw data is like a pile of LEGO bricks — useful, but static.
A knowledge graph transforms that pile into a well-designed LEGO city — everything connected, everything meaningful.
Now, imagine adding an AI agent to that city — not just someone who can describe it, but someone who can explore its streets, navigate its neighborhoods, and answer your questions in real time… even uncovering answers you didn’t think to ask for.

Today, we’re going from **Knowledge Graph** → **AI Agent** — a step-by-step journey where data becomes intelligence, and intelligence becomes interactive.

In this notebook, we'll:
1. Create semantic embeddings + a vector store for retrieval
2. Build a retriever from the vector store
3. Wrap the retriever into a tool
4. Create a tool for Text-to-Cypher query generation
5. Combine both tools into an AI agent
6. Interact with the agent to query and explore the knowledge graph

**Tools:**
- **Neo4j**: A leading graph database
- **OpenAI LLM & Embeddings**: for natural language understanding and vector representation
- **LangChain Eco System**: for orchestrating the end-to-end Generative AI pipeline


### Define LLM and Embeddings

In [12]:
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

DEPLOYMENT_NAME = "gpt-4.1"
EMBEDDING_DEPLOYMENT_NAME = "text-embedding-3-small"

embedding_model = OpenAIEmbeddings(
    model=EMBEDDING_DEPLOYMENT_NAME
)

llm = ChatOpenAI(model=DEPLOYMENT_NAME)

llm.invoke("What is 2+2?")

APIConnectionError: Connection error.

### Neo4J Setup

In [None]:
from neo4j import GraphDatabase

NEO4J_URI = "bolt://localhost:7687" # os.getenv("NEO4J_URI")
NEO4J_USER = "neo4j" # os.getenv("NEO4J_USERNAME")
NEO4J_PASSWORD = "Neo4j@123" # os.getenv("NEO4J_PASSWORD")

AUTH = (NEO4J_USER, NEO4J_PASSWORD)

with GraphDatabase.driver(NEO4J_URI, auth=AUTH) as driver:
    driver.verify_connectivity()

### Graph to vector index

In [3]:
from langchain_community.vectorstores.neo4j_vector import Neo4jVector

vectorstore = Neo4jVector.from_existing_graph(
    embedding=embedding_model,
    url=NEO4J_URI,
    username=NEO4J_USER,
    password=NEO4J_PASSWORD,
    index_name="adonis_embedding_index",
    node_label="Searchable",
    text_node_properties=["name", "description"],
    embedding_node_property="embedding",
    distance_strategy="COSINE"
)


In [4]:
# Run a search query
query = "What should I do with zero value intangible assets?"

results = vectorstore.similarity_search(query, k=5)

In [5]:
retriever = vectorstore.as_retriever(
    search_type="similarity",            # or "mmr"
    search_kwargs={
        "k": 5,                           # top-k results
        "filter": {}                     # optional: metadata filter (e.g., node type)
    }
)

In [6]:
from langchain.tools.retriever import create_retriever_tool

adonis_graph_retriever_tool = create_retriever_tool(
    retriever=retriever,
    name="finance_semantic_graph_search",
    description="Use this tool to answer the questions regarding the process involved in the finance core with flow charts. **Use this tool mainly for descriptive based questions**"
)

In [1]:
from datetime import datetime
formatted_datetime = formatted_datetime = datetime.now().strftime("%Y-%m-%d")
polite_instruction = """I'm working to understand your query better. Could you please try rephrasing your question with more details?'."""

DG_REACT_PROMPT = """
{SYSTEM_PROMPT}

### GENERAL INSTRUCTIONS:
{GENERAL_INSTRUCTIONS}

### SPECIAL INSTRUCTIONS:
{SPECIAL_INSTRUCTIONS}

### AGENTS RESPONSE WORKFLOW:
Answer using all of these tools: {tools}. You must use ALL THE TOOLS SIMULTANEOUSLY.
Follow this format:

Question: {input}

Thought: {agent_scratchpad}
# SINGLE ACTION
Action: [tool name] - should be one of [{tool_names}]
Action Input: [input]
Observation: [result]

# MULTIPLE PARALLEL ACTIONS
Action 1: [tool name] - should be one of [{tool_names}]
Action Input 1: [input]
Action 2: [tool name] - should be one of [{tool_names}]
Action Input 2: [input]
# MANDATORY: Each Action N MUST be followed by Observation N. DO NOT skip any.
Observation 1: [result from Action 1]
Observation 2: [result from Action 2]

... (repeat as needed)

# ---  DECIDE BEFORE CONCLUDING  --------------------------------
# Immediately after every Observation, ask yourself:
#     "Do I already have all the information to answer all parts of the user query and have I used all the tools provided - {tools}?"
# • If No → write another `Thought:` line and continue the loop.
# • If Yes → jump to the Final Thought / Final Answer block below.
# ----------------------------------------------------------------

Final Thought: [summary reasoning after all actions]
Final Answer: [your conclusion]

**CRITICAL RULES**
1. Always follow the format above. Every `Thought` must be followed by one of the following sequences:
   - a single Action + Observation, OR
   - multiple Actions + corresponding Observations
   → Repeat as needed, until all tools are used and query is fully addressed.
2. If a user query involves multiple entities (e.g., multiple companies, years, policies, standards, sub questions, etc.), you MUST decompose the query and take actions PER ENTITY in parallel, one for each, using the relevant tool. Each entity must be treated as a separate Action/Observation pair.
3. Once you have all needed information, only after that, you may conclude with:
    - Final Thought + Final Answer (to end).
4. NEVER leave a `Thought:` line without an Action or a Final Answer.
5. If you use parallel Actions (Action 1, Action 2...), you MUST return the matching Observations (Observation 1, Observation 2...).
6. Maintain correct order when one Action’s result is needed by another.
7. ALWAYS use exact tool names from: `{tool_names}`
8. It is **MANDATORY to use ALL tools in `{tools}`** before reaching Final Thought.

### EXAMPLE 1 — PARALLEL TOOLS
Question: What is the current time in Tokyo, and what is 3^5?

Thought: I need both world_clock and python_repl tools to answer this in parallel.
Action 1: world_clock
Action Input 1: Tokyo
Action 2: python_repl
Action Input 2: 3**5
Observation 1: 2023-10-05 14:30 JST
Observation 2: 243
Final Thought: I now have both the time and the result of 3^5.
Final Answer: The current time in Tokyo is 14:30 JST. 3^5 = 243

# STRICTLY NOTE
# • Do NOT skip the self-check and go straight to Final Thought.
# • You must perform at least one Thought → Action → Observation cycle
#   unless there are zero applicable tools for this question.

# SELF-CORRECTION
# If you realise you broke any rule above, output exactly the word
#     RETRY
# on its own line and wait for the next message.

Begin!"""

SYSTEM_PROMPT = f"""
"""

General_Instructions = f"""
#### Answering and Formatting Instructions

1. **Markdown Formatting (MANDATORY):**
   - All responses must be formatted in Markdown.
   - Use bold text for all the headers and subheaders.
   - Use bullets, tables wherever applicable.
   - Do not use plain text or paragraphs without Markdown structure.
   - Ensure that you use hyphens (-) for list bullets. For sub-bullets, indent using 2 spaces (not tabs). Ensure proper nesting and consistent formatting.
   - Enhance the readability and clarity of the response by using relevant emojis where appropriate. Choose emojis dynamically based on the context — such as ✅ for confirmations, ❌ for errors, ⚠️ for warnings, 📌 for key points, 💡 for tips, and 📊 or tables for structured information. Use your judgment to decide where emojis can improve understanding or visual appeal.


2. **Citations Must (MANDATORY):**
    - Citations must be immediately placed after the relevant content.
    - Do not place citations at the end or in a separate references section. They should appear directly after the statement being referenced. **Place inline citations immediately after the relevant content**
    - Do not include tool names or retriever names in citations.


3. **Clarity and Precision:**
   - Break down your answer step-by-step, using bullet points for each logical step.
   - Do not include information not present in the provided context.
   - If the user mentions only a month without specifying a year, assume the most recent past or ongoing occurrence of that month based on the current date.
   - The current date and time is: **{formatted_datetime}**. Use this for time-sensitive queries.

4. **Conciseness:**
   - Keep explanations brief and to the point, but always use the required Markdown structure.

5. **Structured Responses**:
   - Answer questions **directly and concisely**. Start your response with a clear answer to the question asked.
   - The answer should directly refer to the question asked and **no extra information**.
"""

DG_Adonis_Special_Instructions = """
**Finance core**:
Your job is to provide accurate answers based solely on the information provided in the context. **Do not speculate or generate new information** beyond the given details. Failure to adhere to these instructions will result in severe penalties and you will be penalized with **$10 million and sentenced to life time imprisonment** if do not follow the instructions and answer outside the context provided.

### Answering Instructions

1. **Structured Responses**:
   - Start your response with a concise and clear answer to the question asked
   - Provide a detailed answer and additional relevant context in subsequent paragraphs as needed.
   - **Make the answers readable**
     - Tabulate the results but only wherever applicable
     - Use bullets, markdowns and hyperlinks

2. **Clarity and Precision**:
   - Think step-by-step to ensure clarity and accuracy, strictly aligning with the provided context.
   - Do not expand abbreviations unless explicitly stated in the data

3. **Process Diagram Questions**:
   - Include the following in your answers:
     - **Process ID**: <Process ID> : <Process Diagram Name>
     - **No Speculation**: Do not create or speculate on process IDs or names. You will be penalized with **$10 million and sentenced to life time imprisonment** if you make up any process diagram id or name

4. **Implemented Controls (Controls)**:
   - Highlight any controls involved in specific steps of the process diagram:
     - **No Speculation**: Do not generate or speculate on NFCM controls. You will be penalized with **$10 million and sentenced to life time imprisonment** if you make up any controls
     - If asked about control ownership, refer to the **"Responsible for Execution"** section of the corresponding step where the control is specified. Always refer them as control owners in the answer.

5. **RACI Identification**:
   - For questions about RACI, detail the answer in separate sections for:
     - **Responsible**
     - **Accountable**
     - **Consulted**
     - **Informed**

6. **Responsibility Queries**:
   - Refer to the **"Responsible for Execution (Role)"** section for identifying responsible parties. Mention all individuals if multiple are responsible.

7. **Focused Responses**:
   - If the question pertains to specific steps in a process, concentrate your answer on those steps without including irrelevant information.
   - If you think the addiotional information will add a value to the answer, keep it as a seperate section in the answer with a meaningful heading.

8. **Timelines and Abbreviations**:
    - Be clear on abbreviations (e.g., "WD" = "Working Day").

9. **Terminology**:
    - Use "Start of the Process" and "End of the Process" instead of "start event" and "end event." if needed
    - Avoid using generic phrases like "Provide context" or "as per context or "based on the context provided"
"""


In [8]:
from langchain_core.prompts import (
    MessagesPlaceholder,
    AIMessagePromptTemplate
)
from langchain.agents import (
    AgentExecutor,
    create_react_agent,
)
def get_simple_react_agent(llm, tools, system_prompt, verbose=False):
    """Helper function for creating agent executor"""
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            MessagesPlaceholder(variable_name="conversation_history", optional=True),
            HumanMessagePromptTemplate(
                prompt=PromptTemplate(input_variables=["input"], template="{input}")
            ),
            AIMessagePromptTemplate(
                prompt=PromptTemplate(
                    input_variables=["agent_scratchpad"], template="{agent_scratchpad}"
                )
            ),
        ]
    )
    agent = create_react_agent(llm, tools, prompt)
    return AgentExecutor(
        agent=agent,
        tools=tools,
        verbose=verbose,
        stream_runnable=True,
        handle_parsing_errors=True,
        max_iterations=10,
        return_intermediate_steps=True,
    )

graph_react_agent = get_simple_react_agent(llm, [adonis_graph_retriever_tool], DG_REACT_PROMPT, verbose=True)

### Now ask questions

In [9]:
from IPython.display import display, Markdown
question = "What is the R2R Revenue specialist responsible for in the inventory return process?"

answer = graph_react_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    })
display(Markdown(answer["output"]))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to identify the responsibilities of the R2R Revenue specialist specifically within the inventory return process. I will search for the relevant process diagram and steps that mention the R2R Revenue specialist and their responsibilities.

Action: adonis_semantic_graph_search
Action Input: R2R Revenue specialist responsibility in inventory return process[0m[36;1m[1;3m
name: R2R Revenue Specialist
description: 


name: Release sales return provision that is utilized by CN issued and Re-assess sales Return Provision balanceRelease sales return provision that is utilized by CN issued and Re-assess sales Return Provision balance
description: R2R Revenue Specialist shall re-assess the Sales Return Provision for any adjustment required once credit note issued.


name: Assess Sales Return Consider if provision is sufficient and recognize additional provision for return case if needed
description: Upon stock return 

**  
- The R2R Revenue specialist is responsible for:
  - Reassessing the Sales Return Provision for any adjustment required once a credit note is issued.
  - Assessing if the Sales Return Provision is sufficient and recognizing additional provision if needed upon stock return approval.
  - Working closely with the SCM/customer service team for any sales return occurrence.
  - Ensuring COPA vs FI reconciliation where applicable.
  - For more details, refer to [1.48.28, Establish Provisions for Revenue Deductions (R2R)](1.48.28,%20Establish%20Provisions%20for%20Revenue%20Deductions%20(R2R)).

**Key Steps:**
- Before reassessment, the specialist works with SCM/CS for sales return occurrence.
- After reassessment, the next step is to ensure COPA vs FI reconciliation.

In [10]:
question = "What are the controls around significant contracts? Who are the control owners?"

answer = graph_react_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    }
)
display(Markdown(answer["output"]))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to identify the controls around significant contracts and the control owners by searching for process steps and NFCM controls related to significant contracts.
Action: adonis_semantic_graph_search
Action Input: controls around significant contracts and control owners[0m[36;1m[1;3m
name: Control Owner
description: 


name: Inform (Share APP / APM) other stakeholders as relevant e.g. FC&C, Statutory or Tax etc.  
description: FC&C team is considered a key stakeholder and needs to be informed on all significant / unusual contracts and transactions


name: NFCM.R2R.SIG.05
description: Review for Completeness of Significant Contracts and Significant Transactions. All significant contracts and transactions as per definition outlined in Finance Core in combination with the threshold defined in the Local Supplemental Information (LSI) are maintained in an organized listing.. CFO, Local Head of Legal and Head FRA re

**Controls Around Significant Contracts** 📌  
- **NFCM.R2R.SIG.05:**  
  - Review for completeness of significant contracts and transactions.
  - All significant contracts/transactions are maintained in an organized listing.
  - CFO, Local Head of Legal, and Head FRA review and sign-off the local list as attestation of completeness.
  - Expected evidence: Reviewed list and sign-off by the above roles.  
  - Reference: [1.48.42, Significant Contracts or Transactions - Identification (R2R)](https://novartis.sharepoint.com/sites/FinanceCore/ProcessDiagrams/Forms/AllItems.aspx?id=%2Fsites%2FFinanceCore%2FProcessDiagrams%2F1%2E48%2E42%5FSignificant%5FContracts%5FOr%5FTransactions%5FIdentification%2Epdf&parent=%2Fsites%2FFinanceCore%2FProcessDiagrams)

**Control Owners** 🧑‍💼  
- CFO
- Local Head of Legal
- Head FRA  
  - These are the control owners as per the "Responsible for Execution" for NFCM.R2R.SIG.05.
- R2R Manager/Team Lead also signs the final list.

**Supporting Process Step**  
- FRA Head holds a meeting with R2R, SPOCs, Legal Head, and CFO to review and confirm completeness. If an attendee can't join, written confirmation is required.  
- The final list must be signed by CFO, Legal Head, Head FRA, and R2R Manager/Team Lead.

**Relevant Process Diagram:**  
- [1.48.42, Significant Contracts or Transactions - Identification (R2R)](https://novartis.sharepoint.com/sites/FinanceCore/ProcessDiagrams/Forms/AllItems.aspx?id=%2Fsites%2FFinanceCore%2FProcessDiagrams%2F1%2E48%2E42%5FSignificant%5FContracts%5FOr%5FTransactions%5FIdentification%2Epdf&parent=%2Fsites%2FFinanceCore%2FProcessDiagrams)

In [11]:
question = "Who is responsible for revenue recognition?"

answer = graph_react_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    }
)
display(Markdown(answer["output"]))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI'm working to understand your query better. Could you please try rephrasing your question with more details?[0mInvalid Format: Missing 'Action:' after 'Thought:'[32;1m[1;3mAction: adonis_semantic_graph_search
Action Input: Who is responsible for revenue recognition?[0m[36;1m[1;3m
name: Continuously or at least at month end, generate report to ensure proper reconciliation and revenue recognition (incl. review of open GI balance)
description: Continuous and at least during the month end close,perform  sales review to ensure correct revenue recognition.Review should include review of open GI(goods issued) and II (Invoice issued) balances.With review we should confirm that all deliveries were correctly invoiced and if POD was confirmed /met in order to recognize revenues.Revenue should be recognized at the point at which control is transferred .In some cases,invoice may be raised prior to control being transferred (e.g;due

**  
- **R2R Specialist** is responsible for revenue recognition, including contract review, revenue posting, and collaboration with Commercial and Market Access teams. For significant contracts, an APP/APM must be created as per **NAM Framework 9**. 📌  
- Refer to: [1.10.1.3 Revenue & Revenue Deductions (R2R)](1.10.1.3 Revenue & Revenue Deductions (R2R))

In [12]:
question = "What are the key review considerations for closing meeting FRA/CFO including BPA on WD2? Is this a NFCM control?"

answer = graph_react_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    }
)
display(Markdown(answer["output"]))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to identify the key review considerations for the closing meeting involving FRA/CFO and BPA on WD2, and determine if this is an NFCM control. I will search for the relevant process steps and control information.
Action: adonis_semantic_graph_search
Action Input: key review considerations closing meeting FRA CFO BPA WD2 NFCM control[0m[36;1m[1;3m
name: NFCM.R2R.REP.25
description:  . Review of Balance Sheet and Income Statement. The Reporting Package based on ERP or FCRS extract for Balance Sheet and Income Statement is reviewed and authorized by the CFO as per FCRS Reporting Package Review Guidelines. Validation checks, reconciliation and additional review of documentation are performed where required (e.g., unusual trends and major/volatile movements)..  . Expected Evidence.  .  . 1. Closing Meeting checklist, minutes or briefing deck evidencing review by CFO, including indication of follow-up activities (

**

- **Key Review Considerations for Closing Meeting FRA/CFO including BPA on WD2:**  
  - Review and authorization of preliminary reporting data (Balance Sheet and Income Statement) by CFO.
  - Consideration of the full reporting entity (all BU/OU).
  - Review activities include:
    - Checking major/volatile movements.
    - Analyzing trends.
    - Comparing actuals vs. prior month, prior quarter, or budget.
  - Evidence: Meeting minutes or briefing deck evidencing review by CFO, including indication of follow-up activities (if applicable) 📌  
    [Closing meeting FRA/CFO including BPA on WD2 (R2R)]

- **Is this a NFCM control?**  
  - ✅ Yes, this is an NFCM control (NFCM.R2R.REP.25).  
    [NFCM.R2R.REP.25 (R2R)]

- **Supporting Document:**  
  - Please refer to the FCRS Reporting Package Review Guidelines for further details.

**Relevant Process Diagram:**  
- Closing meeting FRA/CFO including BPA on WD2 (R2R)

In [13]:
question = "List down the process diagram with more than 2 controls involved"

answer = graph_react_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    }
)
display(Markdown(answer["output"]))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to identify all process diagrams that have more than 2 controls involved. I will search for process diagrams and filter those with more than 2 controls.
Action: adonis_semantic_graph_search
Action Input: List all process diagrams with more than 2 controls involved[0m[36;1m[1;3m
name: 1.10.1 R2R Process Diagrams
description: 


name: Process Landscape-ae31468f
description: 


name: Global Process Owner
description: 


name: 1.10.9 R2R Process Enabler
description: 


name: Receive updates for Global ICSI Master data
description: The Global Process Owner reviews and validates all global master data change requests...[0m[32;1m[1;3mFinal Thought: The search did not return any process diagrams explicitly listing more than 2 controls involved.
Final Answer: ❌ No process diagrams with more than 2 controls involved were found in the available context.[0m

[1m> Finished chain.[0m


❌ No process diagrams with more than 2 controls involved were found in the available context.

## Natural Language To Cypher using GraphCypherQAChain

In [14]:
from langchain.chains import RetrievalQA
from langchain.prompts import (
    PromptTemplate,
    HumanMessagePromptTemplate,
    ChatPromptTemplate
)

In [15]:
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain

In [16]:
graph = Neo4jGraph(
    url=NEO4J_URI,
    username=NEO4J_USER,
    password=NEO4J_PASSWORD,
)

  graph = Neo4jGraph(


In [17]:
# graph.refresh_schema()

In [18]:
cypher_generation_template = """
Task:
Generate Cypher query for a Neo4j graph database.

Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.

Schema:
{schema}

Note:
Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything other than
for you to construct a Cypher statement. Do not include any text except
the generated Cypher statement. Make sure the direction of the relationship is
correct in your queries. Make sure you alias both entities and relationships
properly. Do not run any queries that would add to or delete from
the database. Make sure to alias all statements that follow as with
statement (e.g. WITH c as customer, o.orderID as order_id).
If you need to divide numbers, make sure to
filter the denominator to be non-zero.

Examples:
# Retrieve the total number of orders placed by each customer.
MATCH (c:Customer)-[o:ORDERED_BY]->(order:Order)
RETURN c.customerID AS customer_id, COUNT(o) AS total_orders
# List the top 5 products with the highest unit price.
MATCH (p:Product)
RETURN p.productName AS product_name, p.unitPrice AS unit_price
ORDER BY unit_price DESC
LIMIT 5
# Find all employees who have processed orders.
MATCH (e:Employee)-[r:PROCESSED_BY]->(o:Order)
RETURN e.employeeID AS employee_id, COUNT(o) AS orders_processed
String category values:
Use existing strings and values from the schema provided. 

The question is:
{question}
"""

In [19]:
cypher_generation_prompt = PromptTemplate(
    input_variables=["schema", "question"],
    template=cypher_generation_template
)

In [20]:
qa_generation_template_str = """
You are an assistant that takes the results from a Neo4j Cypher query and forms a human-readable response. The query results section contains the results of a Cypher query that was generated based on a user's natural language question. The provided information is authoritative; you must never question it or use your internal knowledge to alter it. Make the answer sound like a response to the question.

Query Results:
{context}
Question:
{question}
If the provided information is empty, respond by stating that you don't know the answer. Empty information is indicated by: []
If the information is not empty, you must provide an answer using the results. If the question involves a time duration, assume the query results are in units of days unless specified otherwise.
When names are provided in the query results, such as hospital names, be cautious of any names containing commas or other punctuation. For example, 'Jones, Brown and Murray' is a single hospital name, not multiple hospitals. Ensure that any list of names is presented clearly to avoid ambiguity and make the full names easily identifiable.
Never state that you lack sufficient information if data is present in the query results. Always utilize the data provided.
Helpful Answer:
"""

In [21]:
qa_generation_prompt = PromptTemplate(
    input_variables=["context", "question"], template=qa_generation_template_str
)

In [22]:
cypher_chain = GraphCypherQAChain.from_llm(
    top_k=10,
    graph=graph,
    verbose=True,
    validate_cypher=True,
    qa_prompt=qa_generation_prompt,
    cypher_prompt=cypher_generation_prompt,
    qa_llm=llm,
    cypher_llm=llm,
    allow_dangerous_requests=True
)

In [23]:
from langchain.tools import tool
@tool("graph_tool", return_direct=False)
def graph_tool(query: str) -> str:
    """Use this tool when the user asks a question that requires querying structured information 
    from a Neo4j graph database. Ideal for answering questions related to relationships, 
    hierarchies, dependencies, connections between entities, or insights stored in the graph.
	
    The graph contains entities such as:
    - Steps (which belong to process diagrams)
    - Process Diagrams (which are grouped under Subprocess Areas → Process Areas → Functions)
    - Roles (responsible or accountable for executing specific steps)
    - NFCM Controls (linked to steps via 'INVOLVED_IN' relationships)
    - Inter-diagram references (e.g., 'REFERENCED_EVENT', 'CROSS_REFERENCE', 'REFERENCED_SUBPROCESS')

    Use this tool when:
    - The user asks about who is responsible or accountable for a particular step or process
    - The question involves business process hierarchies, functional areas, or diagram structure
    - The user wants to find controls implemented in a process or step
    - The query relates to semantic or relational search across roles, functions, controls, or process flows
    - You need to traverse relationships or analyze dependencies (e.g., which diagrams a step connects to)

    Avoid using this tool for:
    - General or factual knowledge unrelated to business processes or the graph
    - Questions that don't require traversing or querying the graph database

    Always prefer this tool if the user query involves understanding structured workflows, responsibilities or control frameworks within a business process.
	"""
    response = cypher_chain.invoke(query)
    
    return response.get("result")

In [24]:
REACT_PROMPT = """
### AGENTS RESPONSE WORKFLOW:
Answer using all of these tools: {tools}. You must use ALL THE TOOLS SIMULTANEOUSLY.
Follow this format:

Question: {input}

Thought: {agent_scratchpad}
# SINGLE ACTION
Action: [tool name] - should be one of [{tool_names}]
Action Input: [input]
Observation: [result]

# MULTIPLE PARALLEL ACTIONS
Action 1: [tool name] - should be one of [{tool_names}]
Action Input 1: [input]
Action 2: [tool name] - should be one of [{tool_names}]
Action Input 2: [input]
# MANDATORY: Each Action N MUST be followed by Observation N. DO NOT skip any.
Observation 1: [result from Action 1]
Observation 2: [result from Action 2]

... (repeat as needed)

# ---  DECIDE BEFORE CONCLUDING  --------------------------------
# Immediately after every Observation, ask yourself:
#     "Do I already have all the information to answer all parts of the user query and have I used all the tools provided - {tools}?"
# • If No → write another `Thought:` line and continue the loop.
# • If Yes → jump to the Final Thought / Final Answer block below.
# ----------------------------------------------------------------

Final Thought: [summary reasoning after all actions]
Final Answer: [your complete final response to the question]

**CRITICAL RULES**
1. Always follow the format above. Every `Thought` must be followed by one of the following sequences:
   - a single Action + Observation, OR
   - multiple Actions + corresponding Observations
   → Repeat as needed, until all tools are used and query is fully addressed.
2. If a user query involves multiple entities (e.g., multiple companies, years, policies, standards, sub questions, etc.), you MUST decompose the query and take actions PER ENTITY in parallel, one for each, using the relevant tool. Each entity must be treated as a separate Action/Observation pair.
3. Once you have all needed information, only after that, you may conclude with:
    - Final Thought + Final Answer (to end).
4. NEVER leave a `Thought:` line without an Action or a Final Answer.
5. If you use parallel Actions (Action 1, Action 2...), you MUST return the matching Observations (Observation 1, Observation 2...).
6. Maintain correct order when one Action’s result is needed by another.
7. ALWAYS use exact tool names from: `{tool_names}`
8. It is **MANDATORY to use ALL tools in `{tools}`** before reaching Intermediate Thought.

### EXAMPLE 1 — PARALLEL TOOLS
Question: What is the current time in Tokyo, and what is 3^5?

Thought: I need both world_clock and python_repl tools to answer this in parallel.
Action 1: world_clock
Action Input 1: Tokyo
Action 2: python_repl
Action Input 2: 3**5
Observation 1: 2023-10-05 14:30 JST
Observation 2: 243
Final Thought: I now have both the time and the result of 3^5.
Final Answer: The current time in Tokyo is 14:30 JST. 3^5 = 243

# STRICTLY NOTE
# • Do NOT skip the self-check and go straight to Final Thought.
# • You must perform at least one Thought → Action → Observation cycle
#   unless there are zero applicable tools for this question.

# SELF-CORRECTION
# If you realise you broke any rule above, output exactly the word
#     RETRY
# on its own line and wait for the next message.

Begin!"""


In [25]:
cypher_react_agent = get_simple_react_agent(llm, [graph_tool], DG_REACT_PROMPT, verbose=True)

In [26]:
question = "List down the process diagram with more than 2 controls involved"
answer = cypher_react_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    }
)
display(Markdown(answer["output"]))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to query the graph database to identify all process diagrams that have more than 2 NFCM controls involved.
Action: graph_tool
Action Input: List all process diagrams that have more than 2 NFCM controls involved.[0m

[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (d:ProcessDiagram)<-[:STEP_IN]-(s:Step)<-[:INVOLVED_IN]-(n:NFCMControl)
WITH d AS process_diagram, COUNT(DISTINCT n) AS nfcm_control_count
WHERE nfcm_control_count > 2
RETURN process_diagram.name AS process_diagram_name, nfcm_control_count
ORDER BY nfcm_control_count DESC[0m
Full Context:
[32;1m[1;3m[{'process_diagram_name': '1.48.76, FCRS review', 'nfcm_control_count': 9}, {'process_diagram_name': '1.48.6, Posting Accruals / Provisions', 'nfcm_control_count': 5}, {'process_diagram_name': '1.48.79, Maintain and Review Blackline set up', 'nfcm_control_count': 5}, {'process_diagram_name': '1.48.17,Period End C

**Process diagrams with more than 2 controls involved are:** 📊

- **1.48.76, FCRS review** (9 controls)
- **1.48.6, Posting Accruals / Provisions** (5 controls)
- **1.48.79, Maintain and Review Blackline set up** (5 controls)
- **1.48.17, Period End Closing - Pre close** (5 controls)
- **1.48.5, Journal Entries** (4 controls)
- **1.48.68, OBA 3rd party Revenue sales** (4 controls)
- **1.48.9, Entity Legal Provision** (4 controls)
- **1.47.34, Addition of all new leases including embedded lease & sales & leaseback (except global fleet)** (3 controls)
- **1.47.14, Manage Period End Closing** (3 controls)
- **1.47.13, Intangible Asset impairment** (3 controls)

Please refer to the respective process diagrams for further details.

In [27]:
question = "Which role does most of the work in Asset Management?"
answer = cypher_react_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    }
)
display(Markdown(answer["output"]))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to identify which role is most frequently responsible for steps in the Asset Management process area. I will query the graph for roles and their frequency of responsibility within Asset Management.
Action: graph_tool
Action Input: Which role is responsible for the most steps in the Asset Management process area?[0m

[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (pa:ProcessArea {name: "Asset Management"})<-[:PART_OF]-(spa:SubProcessArea)<-[:PART_OF]-(pd:ProcessDiagram)<-[:STEP_IN]-(s:Step)<-[:RESPONSIBLE_FOR]-(r:Role)
WITH r.name AS role_name, COUNT(s) AS steps_count
ORDER BY steps_count DESC
LIMIT 1
RETURN role_name, steps_count[0m
Full Context:
[32;1m[1;3m[{'role_name': 'Asset Specialist', 'steps_count': 124}][0m

[1m> Finished chain.[0m
[36;1m[1;3mThe role responsible for the most steps in the Asset Management process area is the Asset Specialist, with 124 s

- **The Asset Specialist is responsible for most of the work in Asset Management, handling 124 steps.** ✅

In [28]:
question = "What are the key controls around balance sheet reconciliation?"
answer = cypher_react_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    }
)
display(Markdown(answer["output"]))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to identify the key controls implemented around balance sheet reconciliation by querying the process diagrams and NFCM controls related to this process.
Action: graph_tool
Action Input: What are the NFCM controls implemented for balance sheet reconciliation steps and which process diagrams do they belong to?[0m

[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (n:NFCMControl)-[inv:INVOLVED_IN]->(s:Step)
WHERE s.name = "balance sheet reconciliation"
MATCH (s)-[si:STEP_IN]->(pd:ProcessDiagram)
RETURN n.name AS nfcm_control_name, n.description AS nfcm_control_description, pd.name AS process_diagram_name, pd.summary AS process_diagram_summary
WITH n AS nfcm_control, pd AS process_diagram[0m


CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Variable `n` not defined (line 5, column 6 (offset: 295))
"WITH n AS nfcm_control, pd AS process_diagram"
      ^}

## Hybrid Agent

In [None]:
hybrid_agent = get_simple_react_agent(llm, [graph_tool, adonis_graph_retriever_tool], DG_REACT_PROMPT, verbose=True)

In [None]:
question = "List down the process diagram with more than 2 controls involved"
answer = hybrid_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    }
)
display(Markdown(answer["output"]))

In [None]:
question = "What are the controls around significant contracts? Who are the control owners?"
answer = hybrid_agent.invoke(
    {
        "input": question,
        "SYSTEM_PROMPT": SYSTEM_PROMPT,
        "GENERAL_INSTRUCTIONS": General_Instructions,
        "SPECIAL_INSTRUCTIONS": DG_Adonis_Special_Instructions,
    }
)
display(Markdown(answer["output"]))