# GraphRAG and Agents
## What This Notebook Does

This notebook is a **“Hello World” playground** for our Neo4j + Bedrock + LangChain setup.  
It shows:

1. How to connect Claude and Titan to Neo4j.
2. How to use three retrieval patterns:
   - Pure vector search
   - Vector + graph context
   - Text-to-Cypher (GraphCypherQAChain)
3. How to build three simple agents that call these tools.

---

## 1. Setup and Connections

**What happens:**

- Environment variables are loaded from `.env` (Neo4j URI/user/password, AWS region).
- A low-level Neo4j `driver` is created and `verify_connectivity()` is called to confirm DB access.
- A `Neo4jGraph` wrapper is created for LangChain.  
- Two Bedrock models are initialized:
  - `ChatBedrock` → **Claude Sonnet 3.5** as the LLM (`llm`).
  - `BedrockEmbeddings` → **Titan v2** as the embedder (`embedding_model`).

**Why:**

- `Neo4jGraph` is used by LangChain tools and chains.
- Claude answers questions and acts as the agent brain.
- Titan embeds OpinionChunk text so we can do semantic search in Neo4j.

---

## 2. Vector Store from Existing Index

**What happens:**

- `Neo4jVector.from_existing_index(...)` connects to the Neo4j vector index:

  - Label: `OpinionChunk`
  - Text property: `text`
  - Embedding property: `embedding`
  - Index name: `chunkEmbeddings`

Two vector stores are created:

1. `opinion_vector`  
   - Only does **vanilla semantic search** over `OpinionChunk.text`.
   - No extra Cypher context.

2. `opinion_vector_context`  
   - Uses the same index, but with a custom `retrieval_query`:

     ```cypher
     RETURN
       node.text AS text,
       score,
       {
         node_element_id: elementId(node),
         case_id: node.case_id,
         opinion_type: node.opinion_type,
         opinion_author: node.opinion_author
       } AS metadata
     ORDER BY score DESC
     ```

   - This returns:
     - The chunk text
     - The similarity score
     - Metadata about the opinion (case id, type, author, element id)

**Why:**

- `opinion_vector` = simple **Vector Retriever** (chunk-level semantic search).
- `opinion_vector_context` = **Vector + Graph Retriever** (semantic search + richer metadata from the graph).

---

## 3. Three Retriever Examples

### 3.1 Vector Retriever (semantic search)

**Code idea:**

- Run `opinion_vector.similarity_search(query, k=5)`.
- Print snippets and metadata.

**What it shows:**

- How Titan + Neo4j vector index retrieve the most relevant opinion text chunks for a legal question.

---

### 3.2 Vector + Cypher Retriever (semantic search + graph metadata)

**Code idea:**

- Use `opinion_vector_context.similarity_search(query, k=3)`.
- The custom `retrieval_query` adds opinion metadata to each result.

**What it shows:**

- Same semantic search, but now each result has:
  - `case_id`
  - `opinion_type`
  - `opinion_author`
  - `node_element_id`
- This is closer to the workshop’s “Vector + Cypher Retriever” pattern.

---

### 3.3 Text-to-Cypher Retriever (GraphCypherQAChain)

**Code idea:**

- Define a `cypher_template` that forces the LLM to output **only Cypher**.
- Wrap it in a `PromptTemplate`.
- Build `GraphCypherQAChain.from_llm(...)` with:
  - `graph=graph`
  - `llm` for answering
  - `cypher_llm` for generating Cypher
  - `cypher_prompt` to guide Cypher generation
  - `return_direct=True`, `verbose=True`, `allow_dangerous_requests=True`

- Call `cypher_qa.invoke({"query": "How many OpinionChunk nodes are in the graph?"})`.

**What it shows:**

- Claude generates a Cypher query based on the natural language question.
- Chain executes the Cypher against Neo4j and returns the result directly.
- This is your **Text2Cypher Retriever** analogue.

---

## 4. Three Agent “Hello World” Examples

All agents use `create_react_agent` from **LangGraph** and tools defined with `@tool`.  
The agent chooses when to call each tool based on the tool name and docstring.

### 4.1 Agent 1 – Schema-only Agent

**Tool:**

- `Get-graph-database-schema` → returns `graph.schema`.

**Agent:**

- `schema_agent = create_react_agent(llm, [get_schema])`.

**Usage:**

- Query: `"Summarize the schema of the graph database."`
- The agent decides to call the schema tool, reads the schema, and explains it in natural language.

---

### 4.2 Agent 2 – Schema + Vector/Graph Agent

**Tools:**

1. `Get-graph-database-schema` (same as above).
2. `Retrieve-opinion-chunks`:
   - Calls `opinion_vector_context.similarity_search(query, k=3)`.
   - Returns a list of dicts with `"text"` and `"metadata"`.

**Agent:**

- `vector_graph_agent = create_react_agent(llm, [get_schema, retrieve_opinion_chunks])`.

**Usage:**

- Query: `"Summarize the main ADA-related issues you can infer from the opinions."`
- The agent can:
  - Use vector+graph retrieval to pull relevant chunks and metadata.
  - Then synthesize a legal summary using Claude.

---

### 4.3 Agent 3 – Multi-tool Agent (Schema + Vector/Graph + Text2Cypher)

**Tools:**

1. `Get-graph-database-schema`
2. `Retrieve-opinion-chunks`
3. `Query-database`:
   - Wraps `GraphCypherQAChain` (`cypher_qa`).
   - Takes a natural language question, generates Cypher, executes it, and returns the result.

**Agent:**

- `multi_agent = create_react_agent(llm, [get_schema, retrieve_opinion_chunks, query_database])`.

**Usage:**

- For factual questions (counts, specific relationships), the agent can call `Query-database`.
- For open-ended or contextual questions, it can call `Retrieve-opinion-chunks`.
- For structural questions, it can call `Get-graph-database-schema`.
- The agent combines all three like the workshop’s **“all tools”** GraphRAG agent.

---

## Summary

This notebook validates that:

- **Embeddings + vector index** work (Titan + `chunkEmbeddings` on `OpinionChunk`).
- You can do:
  - Plain vector retrieval,
  - Vector + graph-aware retrieval,
  - Text-to-Cypher querying.
- You can wrap these into **agents** that:
  - Inspect the graph schema,
  - Retrieve opinion chunks with context,
  - Run Cypher queries generated from natural language.

This is your **first end-to-end GraphRAG “smoke test”** before building a more polished UI and production pipeline.


In [45]:
%pip install --quiet "neo4j>=5.23.0" "python-dotenv" \
    "langchain-aws>=0.1.0" "langchain-neo4j>=0.1.0" "langgraph>=0.2.0"


Note: you may need to restart the kernel to use updated packages.


In [89]:
import os
from dotenv import load_dotenv

from neo4j import GraphDatabase

from langchain_aws import ChatBedrock, BedrockEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector, GraphCypherQAChain
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder

load_dotenv("../.env", override=True)

# Neo4j
NEO4J_URI      = os.getenv("NEO4J_URI")
NEO4J_USERNAME = os.getenv("NEO4J_USERNAME")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")
NEO4J_DATABASE = os.getenv("NEO4J_DATABASE", "neo4j")

# AWS / Bedrock
AWS_REGION       = os.getenv("AWS_REGION", "us-west-2")
CLAUDE_MODEL_ID  = "anthropic.claude-3-5-sonnet-20240620-v1:0"
TITAN_MODEL_ID   = "amazon.titan-embed-text-v2:0"

# Neo4j low-level driver (for quick checks if needed)
driver = GraphDatabase.driver(
    NEO4J_URI,
    auth=(NEO4J_USERNAME, NEO4J_PASSWORD)
)
driver.verify_connectivity()
print("✅ Connected to Neo4j")

✅ Connected to Neo4j


## LangChain graph + models

In [90]:
# LangChain graph wrapper
graph = Neo4jGraph(
    url=NEO4J_URI,
    username=NEO4J_USERNAME,
    password=NEO4J_PASSWORD,
    database=NEO4J_DATABASE,
)

# Claude as LLM (chat)
llm = ChatBedrock(
    model_id=CLAUDE_MODEL_ID,
    region_name=AWS_REGION,
)

# Titan as embedder for queries
embedding_model = BedrockEmbeddings(
    model_id=TITAN_MODEL_ID,
    region_name=AWS_REGION,
)

# Shared prompt
AGENT_SYSTEM_PROMPT = """You are a legal research assistant.

You can use tools to look up information, but you must NOT mention tools,
tool names, function names, or that you are calling a tool in your replies.

When you answer:
- Give a clear, concise answer to the user’s question.
- Do not talk about internal steps, tools, or API calls.
"""

react_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", AGENT_SYSTEM_PROMPT),
        # LangGraph passes the running conversation as `messages`
        MessagesPlaceholder(variable_name="messages"),
        # ❌ no agent_scratchpad here anymore
    ]
)


print("✅ LangChain graph and models initialized")


✅ LangChain graph and models initialized


## Vector store from existing Neo4j index
Here we assume:
- Label: OpinionChunk
- Text property: text
- Embedding property: embedding
- Vector index: chunkEmbeddings

In [91]:
# Pure vector retriever (no extra Cypher yet)
opinion_vector = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="chunkEmbeddings",
    text_node_property="text",
    embedding_node_property="embedding",
)

print("✅ Neo4jVector (opinion_vector) ready")


✅ Neo4jVector (opinion_vector) ready


## Three “retriever” examples
### 1. Vector retriever (semantic search over OpinionChunk)

In [92]:
# Simple vector search
query = "What does the court say about major life activities under the ADA?"

docs = opinion_vector.similarity_search(query, k=5)

for i, d in enumerate(docs, start=1):
    print(f"Result {i} – score not shown (handled inside Neo4j)")
    print("Text snippet:", d.page_content[:200].replace("\n", " "), "...")
    print("Metadata:", d.metadata)
    print("-" * 80)


Result 1 – score not shown (handled inside Neo4j)
Text snippet: Regulations codified by the Equal Employment Opportunity Commission (“EEOC”) indicate a person is substantially limited if he or she is (i) Unable to perform a major life activity that the average per ...
Metadata: {'opinion_author': 'Daughtrey, Gilman, Collier', 'chunk_index': 6, 'opinion_type': 'combined-opinion', 'embedding_dim': 1024, 'case_id': 773421, 'embedding_model': 'amazon.titan-embed-text-v2:0', 'embedding_updated_at': neo4j.time.DateTime(2025, 12, 2, 10, 39, 27, 2000000, tzinfo=<UTC>)}
--------------------------------------------------------------------------------
Result 2 – score not shown (handled inside Neo4j)
Text snippet: discharge of employees, employee compensation, job training, and other terms, conditions, and privileges of employment."3 The term "disability" as used in the ADA means: (A) a physical or mental impai ...
Metadata: {'opinion_author': 'Politz, Jolly, Benavides', 'chunk_index': 1, 'opinio

#### 2. Vector + Cypher retriever (semantic search + graph traversal)

In [93]:
# Vector + Cypher Neo4jVector
retrieval_query = """
RETURN
  node.text AS text,
  score,
  {
    node_element_id: elementId(node),
    case_id: node.case_id,
    opinion_type: node.opinion_type,
    opinion_author: node.opinion_author
  } AS metadata
ORDER BY score DESC
"""


opinion_vector_context = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="chunkEmbeddings",
    text_node_property="text",
    embedding_node_property="embedding",
    retrieval_query=retrieval_query,
)

print("✅ Neo4jVector (opinion_vector_context) with graph traversal ready")


✅ Neo4jVector (opinion_vector_context) with graph traversal ready


In [94]:
# Run vector + Cypher retrieval
query = "How do courts describe the standard for summary judgment?"

docs = opinion_vector_context.similarity_search(query, k=3)

for i, d in enumerate(docs, start=1):
    print(f"Result {i}")
    print("Text snippet:", d.page_content[:200].replace("\n", " "), "...")
    print("Metadata:", d.metadata)
    print("-" * 80)


Result 1
Text snippet: Kiphart received the same rate and schedule of pay as any team member on sickness and accident leave, that is, 100 percent of his base pay for 30 days, 80 percent of his base pay for the next 30 days, ...
Metadata: {'opinion_author': 'Higgins', 'node_element_id': '4:ea5939dd-fe97-45c8-af09-045847700147:212', 'opinion_type': 'combined-opinion', 'case_id': 2424551}
--------------------------------------------------------------------------------
Result 2
Text snippet: at 23. Mimi Goings was aware of that Pat Hastings was making up her missed time by showing up early in the morning. Id., Ex. 5, Goings Dep. at 54. Principal Glover stated that it had been a policy of  ...
Metadata: {'opinion_author': 'Joseph F. Bataillon', 'node_element_id': '4:ea5939dd-fe97-45c8-af09-045847700147:115', 'opinion_type': 'combined-opinion', 'case_id': 2113004}
--------------------------------------------------------------------------------
Result 3
Text snippet: Plaintiff received a Dism

#### 3. Text-to-Cypher retriever (GraphCypherQAChain)

In [95]:
# Crate Text2Cypher chain
# Simple prompt to force Cypher-only generation for the cypher_llm
cypher_template = """Task: Generate a Cypher statement to query a Neo4j database.

Instructions:
- Use only labels, relationship types, and properties present in the schema.
- Do not invent labels or properties.
- Return ONLY a Cypher query, with no explanations or extra text.

The user's question is:
{question}
"""

cypher_prompt = PromptTemplate(
    input_variables=["question"],
    template=cypher_template,
)

# Use Claude for both Cypher generation and answer generation
cypher_qa = GraphCypherQAChain.from_llm(
    graph=graph,
    llm=llm,
    cypher_llm=llm,
    cypher_prompt=cypher_prompt,
    return_direct=True,
    verbose=True,
    allow_dangerous_requests=True,  # be careful in real production
)

print("✅ Text2Cypher-style GraphCypherQAChain ready")

✅ Text2Cypher-style GraphCypherQAChain ready


In [96]:
# Example Text2Cypher query
result = cypher_qa.invoke({
    "query": "How many OpinionChunk nodes are in the graph?"
})
print("Answer:", result)




[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (o:OpinionChunk)
RETURN COUNT(o) AS OpinionChunkCount[0m

[1m> Finished chain.[0m
Answer: {'query': 'How many OpinionChunk nodes are in the graph?', 'result': [{'OpinionChunkCount': 714}]}


## Three agent “Hello World” examples
#### Agent 1 – Simple schema agent

In [97]:
# Define schema tool and agent
@tool("Get-graph-database-schema")
def get_schema():
    """Get the schema of the graph database."""
    return graph.schema

schema_tools = [get_schema]

schema_agent = create_react_agent(
    llm,
    schema_tools,
    prompt=react_prompt,
)

print("✅ Schema-only agent ready")


✅ Schema-only agent ready


/tmp/ipykernel_6837/1349785925.py:9: LangGraphDeprecatedSinceV10: create_react_agent has been moved to `langchain.agents`. Please update your import to `from langchain.agents import create_agent`. Deprecated in LangGraph V1.0 to be removed in V2.0.
  schema_agent = create_react_agent(


In [98]:
# Run schema agent
query = "Summarize the schema of the graph database."

for step in schema_agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()



Summarize the schema of the graph database.

Certainly! I'll retrieve the schema of the graph database for you and provide a summary. Let me fetch that information.
Tool Calls:
  Get-graph-database-schema (toolu_bdrk_01Up45ieQCpaE1rYvQ6yjzDn)
 Call ID: toolu_bdrk_01Up45ieQCpaE1rYvQ6yjzDn
  Args:
Name: Get-graph-database-schema

Node properties:
Case {id: INTEGER, name: STRING, case_full_name: STRING, decision_date: DATE_TIME, docket_number: STRING, court_id: INTEGER, jurisdiction_id: INTEGER, court_name_abbreviation: STRING, court_name: STRING, jurisdiction_name: STRING, citation_pipe: STRING, file_name: STRING, court_listener_url: STRING, adah_case: BOOLEAN, opinion_summary: STRING, updated_at_utc: DATE_TIME, case_label: STRING, label_rationale: STRING, court_level_case_label_decision: STRING}
Court {id: INTEGER, name: STRING, name_abbreviation: STRING, court_level: INTEGER}
Jurisdiction {id: INTEGER, jurisdiction_name: STRING}
OpinionChunk {id: STRING, case_id: INTEGER, chunk_index:

#### Agent 2 – Vector + Graph retrieval agent

In [100]:
# Define vector retrieval tool and agent
@tool("Retrieve-opinion-chunks")
def retrieve_opinion_chunks(query: str):
    """Find relevant opinion text chunks and related graph context."""
    docs = opinion_vector_context.similarity_search(query, k=3)
    # Return plain dicts so the agent can read them easily
    return [
        {
            "text": d.page_content,
            "metadata": d.metadata,
        }
        for d in docs
    ]

tools_vector_graph = [get_schema, retrieve_opinion_chunks]

vector_graph_agent = create_react_agent(
    llm,
    tools_vector_graph,
    prompt=react_prompt,
)

print("✅ Agent with schema + vector+graph tools ready")


✅ Agent with schema + vector+graph tools ready


/tmp/ipykernel_6837/2495610884.py:17: LangGraphDeprecatedSinceV10: create_react_agent has been moved to `langchain.agents`. Please update your import to `from langchain.agents import create_agent`. Deprecated in LangGraph V1.0 to be removed in V2.0.
  vector_graph_agent = create_react_agent(


In [101]:
# Run vector + graph agent
query = "Summarize the main ADA-related issues you can infer from the opinions."

for step in vector_graph_agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()



Summarize the main ADA-related issues you can infer from the opinions.

Certainly! I'll analyze the opinions related to the Americans with Disabilities Act (ADA) and provide you with a summary of the main issues. Let me retrieve the relevant information for you.
Tool Calls:
  Retrieve-opinion-chunks (toolu_bdrk_01F5H8T9YojSzkk69N8gWaMd)
 Call ID: toolu_bdrk_01F5H8T9YojSzkk69N8gWaMd
  Args:
    query: ADA-related issues in opinions
Name: Retrieve-opinion-chunks

[{"text": "And I agree that the error this decision corrects resulted from our prior failure to respect the words of the ADA and the rules of statutory construction that must govern our analysis. For the same reasons—failure to respect the words of the statute as a whole and failure to honor the tenets of statutory construction—I respectfully dissent from the majority’s determination that we now impose a “but for” standard upon those seeking protection under the ADA from discrimination “because of” disability. Applying the rule

#### Agent 3 – Multi-tool agent (schema + vector+graph + text2cypher)

In [102]:
# Define Text2Cypher tool
@tool("Query-database")
def query_database(question: str):
    """Answer specific factual questions by generating and running Cypher."""
    result = cypher_qa.invoke({"query": question})
    return {"result": result}


In [103]:
# Build multi-tool agent
multi_tools = [get_schema, retrieve_opinion_chunks, query_database]

multi_agent = create_react_agent(
    llm,
    multi_tools,
    prompt=react_prompt,
)

print("✅ Multi-tool agent ready (schema + vector+graph + text2cypher)")


✅ Multi-tool agent ready (schema + vector+graph + text2cypher)


/tmp/ipykernel_6837/2194119314.py:4: LangGraphDeprecatedSinceV10: create_react_agent has been moved to `langchain.agents`. Please update your import to `from langchain.agents import create_agent`. Deprecated in LangGraph V1.0 to be removed in V2.0.
  multi_agent = create_react_agent(


In [104]:
# Run multi-tool agent
query = "How many opinions mention 'Monette v. Electronic Systems Corporation' and what do they say?"

for step in multi_agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()



How many opinions mention 'Monette v. Electronic Systems Corporation' and what do they say?

To answer your question about opinions mentioning 'Monette v. Electronic Systems Corporation', I'll need to search for relevant information. Let me do that for you.
Tool Calls:
  Retrieve-opinion-chunks (toolu_bdrk_012Qxor557trQxKgRGLJTuSk)
 Call ID: toolu_bdrk_012Qxor557trQxKgRGLJTuSk
  Args:
    query: Monette v. Electronic Systems Corporation
Name: Retrieve-opinion-chunks

[{"text": "90 F.3d 1173 65 USLW 2159, 5 A.D. Cases 1326, 18 A.D.D. 425, 8 NDLR P 224 Roger MONETTE and Doris Monette, Plaintiffs-Appellants, v. ELECTRONIC DATA SYSTEMS CORPORATION, Defendant-Appellee. No. 95-1114. United States Court of Appeals, Sixth Circuit. Argued March 19, 1996. Decided July 30, 1996. Charles W. Palmer (argued and briefed), Robb, Messing & Palmer, Taylor, MI, for Plaintiffs-Appellants. Brian B. Smith, Electronic Data Systems Corp., Troy, MI, Martin T. Wymer (argued and briefed), Duvin, Cahn & Hutton, 