# Professional Services Retreat | GenAI Workshop

# LLM Chat Notebook

This notebook will walkthrough how to build a simple LLM chat function.

## Imports

In [1]:
import os
import sys

sys.path.append("../../")

from dotenv import load_dotenv
from IPython.display import display, Markdown
from langchain_openai import ChatOpenAI
from langchain.chains.graph_qa.cypher import GraphCypherQAChain
from langchain.graphs import Neo4jGraph

from src.ps_genai_agents.prompts import create_graphqa_chain_cypher_prompt, create_final_summary_prompt
from src.ps_genai_agents.agents.graph.text2cypher.types.response import Response as Text2CypherResponse


(
    print(".env variables loaded!")
    if load_dotenv()
    else print("Unable to load .env variables.")
)

.env variables loaded!


## Graph Connection

The `LangChain` [`Neo4jGraph`](https://python.langchain.com/docs/integrations/providers/neo4j/#graphcypherqachain) class will be used to connect to our Aura instance. It will be used to automatically gather the graph schema and read from the database.

In [2]:
graph = Neo4jGraph(
    url=os.environ.get("IQS_NEO4J_URI"),
    username=os.environ.get("IQS_NEO4J_USERNAME"),
    password=os.environ.get("IQS_NEO4J_PASSWORD"),
    refresh_schema=True,
)

## Prompt Creation

The `ps-genai-agents` project contains functions to create Text2Cypher prompts easily. Since we'll be using LangChain's implementation of Text2Cypher we will only need to provide a file path to our query examples yaml. the `ps-genai-agents` and `LangChain` libraries will handle formatting our examples and injecting them into the prompt.

We can view the code for these prompt creation functions [here](../../src/ps_genai_agents/prompts/cypher_prompts.py).

Note that we have two Cypher generation prompts. One for `neo4j_graphrag` and one for `LangChain`. 

In [3]:
cypher_prompt = create_graphqa_chain_cypher_prompt(examples_yaml_path="../../data/iqs/queries/queries.yml")

## LLM Connection

We will use OpenAI LLMs for this workshop. You can try Text2Cypher with any LLM, but more recent LLMs will likely perform much better. Feel free to test older models such as `gpt-3.5` and compare results.

You may find a list of OpenAI models [here](https://platform.openai.com/docs/models).

In [4]:
llm = ChatOpenAI(model="gpt-4o")

## Text2Cypher

We will use LangChain's [`GraphCypherQAChain`](https://python.langchain.com/v0.1/docs/integrations/graphs/neo4j_cypher/) to handle our Text2Cypher workflow. This Chain class will automatically retrieve the current graph schema and validate the generated Cypher behind the scenes.

A [chain](https://python.langchain.com/v0.1/docs/modules/chains/) refers to a sequence of calls. In this case these calls include the graph schema retrieval, Cypher generation and querying Neo4j.

In [5]:
chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,
    cypher_prompt=cypher_prompt,
    verbose=True,
    return_direct=True,
    return_intermediate_steps=True,
)

We can see how the schema is formatted with the `GraphCypherQAChain` property `graph_schema`.

In [6]:
print(chain.graph_schema)

Node properties are the following:
Customer {id: STRING, ageBucket: STRING, gender: STRING},Category {id: STRING},Problem {id: STRING, problem: STRING},Question {id: INTEGER, question: STRING},Vehicle {id: STRING, totalProblems: INTEGER},Verbatim {id: STRING, verbatim: STRING, verbatimText: STRING, severity: FLOAT, gender: STRING, make: STRING, model: STRING, adaEmbedding: LIST, titanEmbedding: LIST, ageBucket: STRING, minAge: INTEGER, maxAge: INTEGER}
Relationship properties are the following:

The relationships are the following:
(:Customer)-[:SUBMITTED]->(:Verbatim),(:Problem)-[:HAS_CATEGORY]->(:Category),(:Question)-[:HAS_PROBLEM]->(:Problem),(:Vehicle)-[:HAS_CATEGORY]->(:Category),(:Vehicle)-[:HAS_VERBATIM]->(:Verbatim),(:Verbatim)-[:HAS_CATEGORY]->(:Category),(:Verbatim)-[:HAS_PROBLEM]->(:Problem),(:Verbatim)-[:HAS_QUESTION]->(:Question)


## Chat Function

Here we define a simple chat function to make our lives easier.

This function will:
* Generate Cypher from the user question
* Query the Neo4j database
* Summarize the query results
* Return a Response object containing call information and results

In [7]:
def chat(question: str) -> Text2CypherResponse:
    # Retrieve the Results from Neo4j
    r = chain(question)
    print(r)

    # Summarize the Results
    summary_prompt = create_final_summary_prompt(
        tool_execution_result=r.get("result"), question=r.get("query")
    )
    summary = llm.invoke(summary_prompt)
    return Text2CypherResponse(question=question, answer=summary.content, cypher=[r.get("intermediate_steps")[0].get("query")], cypher_result=[r.get("result")])

In [8]:
# function to print markdown
def print_markdown(text):
    display(Markdown(text))

## Questions

### Please summarize the verbatims for 2023 RDX for question 010 Trunk/TG Touch-Free Sensor DTU and create categories for the problems. As an output, I want the summary, corresponding categories and their verbatims

In [9]:
response = chat("Please summarize the verbatims for 2023 RDX for question 010 Trunk/TG Touch-Free Sensor DTU and create categories for the problems. As an output, I want the summary, corresponding categories and their verbatims")



[1m> Entering new GraphCypherQAChain chain...[0m


  r = chain(question)


Generated Cypher:
[32;1m[1;3m
MATCH (q:Question {id: 10})<-[:HAS_QUESTION]-(v:Verbatim {model: "RDX"})
WHERE v.verbatimText IS NOT NULL
WITH v, collect(v.verbatim) AS verbatims
MATCH (v)-[:HAS_CATEGORY]->(c:Category)
WITH c.id AS category, verbatims
RETURN category, verbatims, SIZE(verbatims) AS totalVerbatims
ORDER BY totalVerbatims DESC
LIMIT 5
[0m

[1m> Finished chain.[0m
{'query': 'Please summarize the verbatims for 2023 RDX for question 010 Trunk/TG Touch-Free Sensor DTU and create categories for the problems. As an output, I want the summary, corresponding categories and their verbatims', 'result': [{'category': 'Exterior', 'verbatims': ['foot release does not work consistently'], 'totalVerbatims': 1}, {'category': 'Exterior', 'verbatims': ['At times, the leg action does not close the tailgate.  Other times, when i am loading/unloading the trunk and have my feet under the vehicle, the tailgate closes.'], 'totalVerbatims': 1}, {'category': 'Exterior', 'verbatims': ['My wife a

In [10]:
response.display()


Question:
Please summarize the verbatims for 2023 RDX for question 010 Trunk/TG Touch-Free Sensor DTU and create categories for the problems. As an output, I want the summary, corresponding categories and their verbatims

Cypher:

MATCH (q:Question {id: 10})<-[:HAS_QUESTION]-(v:Verbatim {model: "RDX"})
WHERE v.verbatimText IS NOT NULL
WITH v, collect(v.verbatim) AS verbatims
MATCH (v)-[:HAS_CATEGORY]->(c:Category)
WITH c.id AS category, verbatims
RETURN category, verbatims, SIZE(verbatims) AS totalVerbatims
ORDER BY totalVerbatims DESC
LIMIT 5




Cypher Result:
[[{'category': 'Exterior', 'verbatims': ['foot release does not work consistently'], 'totalVerbatims': 1}, {'category': 'Exterior', 'verbatims': ['At times, the leg action does not close the tailgate.  Other times, when i am loading/unloading the trunk and have my feet under the vehicle, the tailgate closes.'], 'totalVerbatims': 1}, {'category': 'Exterior', 'verbatims': ['My wife and I have not found a consistent place to acti

In [11]:
print_markdown(response.answer)

**Summary:**

The trunk/tailgate touch-free sensor for the 2023 RDX has been reported to have inconsistent performance. Users have experienced difficulty in activating the sensor reliably, leading to frustration. 

**Categories and Verbatims:**

- **Inconsistent Sensor Activation:**
  - "foot release does not work consistently"
  - "this feature doesnt work consistently"
  - "when I move my foot under sensor sometimes the tailgate opens and sometimes it does not. It seems to be hit or miss."

- **Unintended Tailgate Closure:**
  - "At times, the leg action does not close the tailgate.  Other times, when I am loading/unloading the trunk and have my feet under the vehicle, the tailgate closes."

- **Difficulty in Finding Sensor Activation Point:**
  - "My wife and I have not found a consistent place to activate the sensor. We move our feet all along the rear of the car and sometimes it opens others it doesn’t."

### What are the top 5 problems about seats for each age buckets for men over the age of 53?

In [12]:
response = chat("What are the top 5 problems about seats for each age buckets for men over the age of 53?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3m
MATCH (v:Verbatim)-[:HAS_PROBLEM]->(p:Problem)
WHERE v.verbatimText CONTAINS 'seat' 
  AND v.gender = "Male" 
  AND v.minAge > 53 
  AND v.ageBucket IS NOT NULL
WITH v.ageBucket AS ageBucket, p.problem AS problem, COLLECT(v.verbatim) AS responses
WITH ageBucket, problem, SIZE(responses) AS total, responses
WITH * ORDER BY ageBucket, total DESC
WITH ageBucket, COLLECT(problem) AS problems, COLLECT(total) AS totals, COLLECT(responses) AS responsesList
RETURN ageBucket, problems[..5] AS problem, totals[..5] AS total, responsesList[..5] AS responses
LIMIT 5
[0m

[1m> Finished chain.[0m


In [13]:
response.display()


Question:
What are the top 5 problems about seats for each age buckets for men over the age of 53?

Cypher:

MATCH (v:Verbatim)-[:HAS_PROBLEM]->(p:Problem)
WHERE v.verbatimText CONTAINS 'seat' 
  AND v.gender = "Male" 
  AND v.minAge > 53 
  AND v.ageBucket IS NOT NULL
WITH v.ageBucket AS ageBucket, p.problem AS problem, COLLECT(v.verbatim) AS responses
WITH ageBucket, problem, SIZE(responses) AS total, responses
WITH * ORDER BY ageBucket, total DESC
WITH ageBucket, COLLECT(problem) AS problems, COLLECT(total) AS totals, COLLECT(responses) AS responsesList
RETURN ageBucket, problems[..5] AS problem, totals[..5] AS total, responsesList[..5] AS responses
LIMIT 5




Cypher Result:
            
Final Response:
### Age Bucket: 55-59
1. **Seats have insufficient range of adjustment**: 4
2. **Seat - Squeaks/rattles/loose/abnormal noises**: 4
3. **Seat - Excessively uncomfortable**: 4
4. **Other Seat Problem(s)**: 3
5. **Not enough power plugs/USB ports**: 4

### Age Bucket: 60-64
1. **Seats

In [14]:
print_markdown(response.answer)

### Age Bucket: 55-59
1. **Seats have insufficient range of adjustment**: 4
2. **Seat - Squeaks/rattles/loose/abnormal noises**: 4
3. **Seat - Excessively uncomfortable**: 4
4. **Other Seat Problem(s)**: 3
5. **Not enough power plugs/USB ports**: 4

### Age Bucket: 60-64
1. **Seats have insufficient range of adjustment**: 8
2. **Memory seats - Controls DTU/poorly located**: 2
3. **Other Seat Problem(s)**: 2
4. **Heated seats - Do not heat fast enough/maintain temperature**: 2
5. **Cooled/ventilated seats - Do not cool fast enough/maintain temperature/excessively noisy**: 2

### Age Bucket: 65-69
1. **Seats have insufficient range of adjustment**: 8
2. **Seat material imperfection**: 3
3. **Seat materials scuff/soil easily**: 3
4. **Seat belt uncomfortable**: 3
5. **Not enough power plugs/USB ports**: 3

### Age Bucket: >=70
1. **Seats have insufficient range of adjustment**: 29
2. **Seat - Excessively uncomfortable**: 9
3. **Other Seat Problem(s)**: 8
4. **Seat belt uncomfortable**: 6
5. **Seat adjustment - Controls DTU**: 6

### What are the total responses under seat23 for honda civic, what is the male to female proportion for these responses and what is the problem for seat23?

In [15]:
response = chat("What are the total responses under seat23 for honda civic, what is the male to female proportion for these responses and what is the problem for seat23?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3m
MATCH (p:Problem {id: "SEAT23"})<-[:HAS_PROBLEM]-(v:Verbatim {make: "Honda", model: "Civic"})
WITH p.problem AS problem, COUNT(v) AS totalResponses, 
SUM(CASE WHEN v.gender = "Male" THEN 1 ELSE 0 END) AS males,
SUM(CASE WHEN v.gender = "Female" THEN 1 ELSE 0 END) AS females
RETURN totalResponses, males, females, toFloat(males) /  (CASE WHEN females = 0 THEN 1 ELSE females END) AS maleToFemaleRatio, problem
[0m

[1m> Finished chain.[0m
{'query': 'What are the total responses under seat23 for honda civic, what is the male to female proportion for these responses and what is the problem for seat23?', 'result': [{'totalResponses': 10, 'males': 7, 'females': 3, 'maleToFemaleRatio': 2.3333333333333335, 'problem': 'SEAT23: Seat materials scuff/soil easily'}], 'intermediate_steps': [{'query': '\nMATCH (p:Problem {id: "SEAT23"})<-[:HAS_PROBLEM]-(v:Verbatim {make: "Honda", model: "Civic"})\nWITH p.problem AS 

In [16]:
response.display()


Question:
What are the total responses under seat23 for honda civic, what is the male to female proportion for these responses and what is the problem for seat23?

Cypher:

MATCH (p:Problem {id: "SEAT23"})<-[:HAS_PROBLEM]-(v:Verbatim {make: "Honda", model: "Civic"})
WITH p.problem AS problem, COUNT(v) AS totalResponses, 
SUM(CASE WHEN v.gender = "Male" THEN 1 ELSE 0 END) AS males,
SUM(CASE WHEN v.gender = "Female" THEN 1 ELSE 0 END) AS females
RETURN totalResponses, males, females, toFloat(males) /  (CASE WHEN females = 0 THEN 1 ELSE females END) AS maleToFemaleRatio, problem




Cypher Result:
[[{'totalResponses': 10, 'males': 7, 'females': 3, 'maleToFemaleRatio': 2.3333333333333335, 'problem': 'SEAT23: Seat materials scuff/soil easily'}]]
            
Final Response:
Total responses under SEAT23 for Honda Civic: 10  
Male to female proportion for these responses: 2.33  
Problem for SEAT23: Seat materials scuff/soil easily
        


In [17]:
print_markdown(response.answer)

Total responses under SEAT23 for Honda Civic: 10  
Male to female proportion for these responses: 2.33  
Problem for SEAT23: Seat materials scuff/soil easily

## Your Questions

In [None]:
response = chat(...)

In [None]:
response.display()

In [None]:
print_markdown(response.answer)