# Professional Services Retreat | GenAI Workshop

# Text2Cypher Graph Agent Demo

This notebook demonstrates how build and use a Text2Cypher graph agent.

## Imports

In [1]:
import os
import sys
sys.path.append("../../")

from dotenv import load_dotenv
from IPython.display import display, Markdown
from langchain_openai import ChatOpenAI
from langchain.graphs import Neo4jGraph

load_dotenv()

from src.ps_genai_agents.agents.graph.text2cypher import create_text2cypher_graph_agent

(
    print(".env variables loaded!")
    if load_dotenv()
    else print("Unable to load .env variables.")
)

.env variables loaded!


## LLM Connection

The LLM we define here will be responsible for generating Cypher and summarizing the returned query results. In custom implementations you may decide to let different LLMs handle different tasks. Perhaps you can summarize with a lower-performance LLM in order to lower your application costs.

In [2]:
llm = ChatOpenAI(model="gpt-4o")

## Graph Connection

The `LangChain` [`Neo4jGraph`](https://python.langchain.com/docs/integrations/providers/neo4j/#graphcypherqachain) class will be used to connect to our Aura instance. It will be used to automatically gather the graph schema and read from the database.

In [3]:
graph = Neo4jGraph(
    url=os.environ.get("IQS_NEO4J_URI"),
    username=os.environ.get("IQS_NEO4J_USERNAME"),
    password=os.environ.get("IQS_NEO4J_PASSWORD"),
    refresh_schema=True,
)

## Example Queries

We store our queries in a yaml file. This allows us to easily store and modify our queries. 

You may view the examples [here](../../data/iqs/queries/queries.yml).

In [4]:
QUERY_LOCATION = "../../data/iqs/queries/queries.yml"

## Construct Agent

Now we can use the `ps-genai-agent` function to construct a Text2Cypher agent. 

The code to construct the graph agent is in [this](../../src/ps_genai_agents/agents/graph/text2cypher/text2cypher.py) file.

In [5]:
agent = create_text2cypher_graph_agent(chat_llm=llm, neo4j_graph=graph, example_queries_location=QUERY_LOCATION)

### Architecture

Let's take a look at the architecture of our agent.

We see that there are four nodes in our architecture (Ignore `__start__` and `__end__`)
* agent 
  * This is the input node. At this step the user's question is broken into subquestions and it's decided whether the question is in scope.
* text2cypher 
  * This node will generate a cypher query for the given question or subquestion and run it against the database. 
  * Results are gathered in the `state`.
* error 
  * This node will either explain why the agent can't answer a question or summarize any results that the agent has gathered so far.
* final_answer
  * This node will return a `Response` object containing the information gathered during the request.



There are two types of edges in our architecture
* conditional edges
  * These edges exist where a node may have many options for the next action. The LLM decides which path should be taken.
  * Dotted lines
* regular edges
  * These edges exist where only a single path is defined.
  * Solid lines


![text2cypher-graph-architecture](./images/text2cypher-graph-agent.png)

Notice that the Text2Cypher node contains an edge with itself. 

This is because when multiple subquestions are found, we must process each one individually with the Text2Cypher node. 

This will create a sequence of Text2Cypher nodes in the workflow.

### Nodes

Each node outputs a dictionary with keys that map to attributes in the `state`. Only attributes that should be modified need to be provided in the `return` statement.

The Text2Cypher node is shown below. 

First we grab the first item in the `agent_outcome` queue and process the contained question with Text2Cypher. Then we then return the outputs so they can be mapped onto our `state`.

In line 8 we call `execute_text2cypher` which handles the actual tool call and result formatting.

The `tool_params` variable is a dictionary containing argument-value pairs to pass to the Text2Cypher tool.

```Python
def text2cypher_node(data: Dict[str, Any]) -> Dict[str, Any]:

    agent_action = data.get("agent_outcome")
    intermediate_steps = list()

    tool_params = agent_action[0].tool_input

    output = execute_text2cypher(tool_params)
    intermediate_steps.append(output["intermediate_steps"][0])
    agent_outcome = (
        agent_action[1:]
        if len(agent_action) > 1
        else [
            AgentAction(
                tool="final_answer",
                tool_input="",
                log="No more actions to perform. Moving to summarization step.",
            )
        ]
    )

    return {
        "agent_outcome": agent_outcome,
        "intermediate_steps": intermediate_steps,
        "cypher": output.get("cypher"),
        "cypher_result": output.get("cypher_result"),
    }
```

### State

The state variable is passed to each node and informs which action should be taken. Different agents will likely have different states. For example the vector search agent will have an attribute for tracking node source IDs for citations. 

We annotate attributes with `operator.add` to inform `LangGraph` that new values should be appended to the list, not replace the previous values. This allows us to track results from multi-hop workflows.

You may view the code for the `state` [here](../../src/ps_genai_agents/agents/graph/text2cypher/state.py).

```python
class AgentState(TypedDict):
    # The input string
    input: str
    # The sub questions identified in the user input
    sub_questions: Annotated[Union[List[str], None], operator.add]
    # The list of previous messages in the conversation
    chat_history: List[BaseMessage]
    # The outcome of a given call to the agent
    # Needs `None` as a valid type, since this is what this will start as
    agent_outcome: Union[AgentAction, AgentFinish, None]
    # List of actions and corresponding observations
    # Here we annotate this with `operator.add` to indicate that operations to
    # this state should be ADDED to the existing values (not overwrite it)
    intermediate_steps: Annotated[List[tuple[AgentAction, str]], operator.add]
    # The cypher query ran
    cypher: Annotated[Union[List[str], None], operator.add]
    # The result of the cypher query
    cypher_result: Annotated[Union[List[Any], None], operator.add]
```

## Questions

In [6]:
# function to print markdown
def print_markdown(text):
    display(Markdown(text))

In [10]:
%%capture # suppress printing this cell. Comment out this line to see each step of the agent.
res = agent.invoke(
    {
        "input": "What are the total responses under seat23 for honda civic, what is the male to female proportion for these responses and what is the problem for seat23?",
        "chat_history": [],
    }
)

In [12]:
res["agent_outcome"].display()


Question:
What are the total responses under seat23 for honda civic, what is the male to female proportion for these responses and what is the problem for seat23?

Sub Questions:
What are the total responses under seat23 for honda civic?
What is the male to female proportion for responses under seat23 for honda civic?
What is the problem for seat23?

            
Cypher:

MATCH (p:Problem {id: "SEAT23"})<-[:HAS_PROBLEM]-(v:Verbatim {make: "Honda", model: "Civic"})
WITH p, COUNT(v) AS totalResponses
RETURN totalResponses


cypher
MATCH (p:Problem {id: "SEAT23"})<-[:HAS_PROBLEM]-(v:Verbatim {make: "Honda", model: "Civic"})
WITH COUNT(v) AS totalResponses, 
     SUM(CASE WHEN v.gender = "Male" THEN 1 ELSE 0 END) AS males,
     SUM(CASE WHEN v.gender = "Female" THEN 1 ELSE 0 END) AS females
RETURN totalResponses, males, females, toFloat(males) / (CASE WHEN females = 0 THEN 1 ELSE females END) AS maleToFemaleRatio



MATCH (p:Problem {id: "SEAT23"})
RETURN p.problem AS problem




Cypher R

In [13]:
print_markdown(res.get("agent_outcome").answer)

- Total responses under SEAT23: 10
- Male to female proportion: 2.33 (7 males, 3 females)
- Problem: Seat materials scuff/soil easily

In [14]:
%%capture
res = agent.invoke({"input": "What are the top 5 problems about seats for each age buckets for men over the age of 53?", "chat_history": list()})

In [15]:
res.get("agent_outcome").display()


Question:
What are the top 5 problems about seats for each age buckets for men over the age of 53?

Sub Questions:
What are the top 5 problems about seats for each age bucket for men over the age of 53?

            
Cypher:
cypher
MATCH (v:Verbatim)-[:HAS_PROBLEM]->(p:Problem)
WHERE toLower(v.verbatimText) CONTAINS 'seat' AND v.gender = 'Male' AND v.minAge > 53 AND v.ageBucket IS NOT NULL
WITH v.ageBucket AS ageBucket, p.problem AS problem, collect(v.verbatim) AS responses
WITH ageBucket, problem, size(responses) AS total, responses
WITH * ORDER BY ageBucket, total DESC
WITH ageBucket, COLLECT(problem) AS problems, COLLECT(total) AS totals, COLLECT(responses) AS responsesList
RETURN ageBucket, problems[..5] AS problem, totals[..5] AS total, responsesList[..5] AS responses
LIMIT 5




Cypher Result:
            
Final Response:
- **Age 55-59:**
  - Insufficient range of seat adjustment
  - Not enough power plugs/USB ports
  - Seats excessively uncomfortable
  - Seats squeak/rattle/loo

In [16]:
print_markdown(res.get("agent_outcome").answer)

- **Age 55-59:**
  - Insufficient range of seat adjustment
  - Not enough power plugs/USB ports
  - Seats excessively uncomfortable
  - Seats squeak/rattle/loose/abnormal noises
  - Other seat problems

- **Age 60-64:**
  - Insufficient range of seat adjustment
  - Heated seats do not heat fast enough/maintain temperature
  - Cooled/ventilated seats do not cool fast enough/maintain temperature/excessively noisy
  - Seat material imperfection
  - Seat materials scuff/soil easily

- **Age 65-69:**
  - Insufficient range of seat adjustment
  - Seat belt uncomfortable
  - Not enough power plugs/USB ports
  - Seat materials scuff/soil easily
  - Seat material imperfection

- **Age >=70:**
  - Insufficient range of seat adjustment
  - Seats excessively uncomfortable
  - Other seat problems
  - Seat belt uncomfortable
  - Seat adjustment controls difficult to use

In [17]:
%%capture
res = agent.invoke({"input": "Please summarize the verbatims for 2023 RDX for question 010 Trunk/TG Touch-Free Sensor DTU and create categories for the problems. As an output, I want the summary, corresponding categories and their verbatims", "chat_history": list()})

In [18]:
res.get("agent_outcome").display()


Question:
Please summarize the verbatims for 2023 RDX for question 010 Trunk/TG Touch-Free Sensor DTU and create categories for the problems. As an output, I want the summary, corresponding categories and their verbatims

Sub Questions:
Summarize the verbatims for 2023 RDX for question 010 Trunk/TG Touch-Free Sensor DTU and create categories for the problems. Provide the summary, corresponding categories, and their verbatims.

            
Cypher:
cypher
MATCH (q:Question{id: 10})<-[:HAS_QUESTION]-(v:Verbatim)
WHERE v.model = 'RDX' AND v.verbatimText IS NOT NULL
WITH v, COLLECT(v.verbatim) AS verbatims
MATCH (v)-[:HAS_CATEGORY]->(c:Category)
RETURN c.id AS category, verbatims




Cypher Result:
[[{'category': 'Exterior', 'verbatims': ['when I move my foot under sensor sometimes the tailgate opens and sometimes it does not. It seems to be hit or miss.']}, {'category': 'Exterior', 'verbatims': ['foot release does not work consistently']}, {'category': 'Exterior', 'verbatims': ['At times

In [19]:
print_markdown(res.get("agent_outcome").answer)

**Summary:**

Many users report that the touch-free tailgate sensor on the 2023 RDX does not work consistently. They often have difficulty locating the sensor, and even when they do, it frequently requires multiple attempts to activate. Some suggest that a larger detection area or a visual marker could improve usability.

**Categories and Verbatims:**

- **Inconsistent Operation:**
  - "when I move my foot under sensor sometimes the tailgate opens and sometimes it does not. It seems to be hit or miss."
  - "foot release does not work consistently"
  - "this feature doesnt work consistently"
  - "Haven’t been able to get this feature to work consistently."
  - "sensor to open tailgate with foot does not work a good amount of the time"
  - "Using my leg to open the back tailgate had not been reliable."
  - "Sometimes it just does not work."
  - "My foot underneath doesn’t always open the trunk. I sometimes have to hit the button in order for it to open."
  - "Putting your foot under the back of the trunk only opens the lift gate 50% of the time"
  - "Wave foot under sensor multiple times and unable to open."
  - "When I kick under the tail gate there ocassions when it doesn’t open"
  - "Works some times"
  - "Touch free sensor doesn’t seem to be responsive. I’ve had to wave my foot around it three or four times and after a four second delay, it triggers the mechanism to open the hatch"
  - "When placing my foot under the sensor, it does not open/close. If it read a larger area it would be easier to use! Otherwise I use the button"
  - "doesn't work"

- **Difficulty Locating Sensor:**
  - "My wife and I have not found a consistent place to activate the sensor. We move our feet all along the rear of the car and sometimes it opens others it doesn’t."
  - "Hatch touch-free sensor doesn't work consistently. Perhaps a marking in sight would help indicate where the sensor is located."
  - "Hard to locate it and it takes several times to get it to work"
  - "The location of the latch. You have to be precise in touching the latch or it won't open."
  - "Getting used to using this feature. I may not have my foot in the correct location to activate the hatch."
  - "I can’t figure out this feature."
  - "Trunck/hatch/tailgate touch free sensor doesn't work consistently. I have to continually place my foot under the tailgate to activate. Sometimes it doesn't activate so I use the tailgate button to open."

In [20]:
%%capture
res = agent.invoke({"input": "What are the top 5 problems about seats for each age buckets for men over the age of 53?", "chat_history": []})

In [21]:
res.get("agent_outcome").display()


Question:
What are the top 5 problems about seats for each age buckets for men over the age of 53?

Sub Questions:
What are the top 5 problems about seats for each age buckets for men over the age of 53?

            
Cypher:

MATCH (v:Verbatim)-[:HAS_PROBLEM]->(p:Problem)
WHERE toLower(v.verbatimText) CONTAINS 'seat' AND v.gender = "Male" AND v.minAge > 53
WITH v.ageBucket AS ageBucket, p.problem AS problem, COLLECT(v.verbatim) AS responses
WITH ageBucket, problem, SIZE(responses) AS total, responses
WITH * ORDER BY ageBucket, total DESC
WITH ageBucket, COLLECT(problem) AS problems, COLLECT(total) AS totals, COLLECT(responses) AS responsesList
RETURN ageBucket, problems[..5] AS problem, totals[..5] AS total, responsesList[..5] AS reponses
LIMIT 5




Cypher Result:
            
Final Response:
- **55-59 Age Bucket:**
  - Seats have insufficient range of adjustment
  - Not enough power plugs/USB ports
  - Seat excessively uncomfortable
  - Seat squeaks/rattles/loose/abnormal noises
  

In [22]:
print_markdown(res.get("agent_outcome").answer)

- **55-59 Age Bucket:**
  - Seats have insufficient range of adjustment
  - Not enough power plugs/USB ports
  - Seat excessively uncomfortable
  - Seat squeaks/rattles/loose/abnormal noises
  - Other seat problems

- **60-64 Age Bucket:**
  - Seats have insufficient range of adjustment
  - Heated seats do not heat fast enough/maintain temperature
  - Cooled/ventilated seats do not cool fast enough/maintain temperature/excessively noisy
  - Seat material imperfection
  - Seat materials scuff/soil easily

- **65-69 Age Bucket:**
  - Seats have insufficient range of adjustment
  - Seat belt uncomfortable
  - Not enough power plugs/USB ports
  - Seat materials scuff/soil easily
  - Seat material imperfection

- **70 and Above Age Bucket:**
  - Seats have insufficient range of adjustment
  - Seat excessively uncomfortable
  - Other seat problems
  - Seat belt uncomfortable
  - Seat adjustment controls difficult to use