# Vector + Cypher retriever

The chunks in the knowledge graph include vector embeddings that allow for similarity search based on vector distance.

You can create a vector retriever that uses these embeddings to find the most relevant chunks for a given query.

The retriever can then use the structured and unstructured data in the knowledge graph to provide additional context.

## Create the Vector Index

You will need to create a vector index on the `Chunk` nodes `embedding` properties

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from utils import execute_query

neo4j_uri = os.getenv("NEO4J_URI")
neo4j_user = os.getenv("NEO4J_USERNAME")
neo4j_pass = os.getenv("NEO4J_PASSWORD")
neo4j_db = os.getenv("NEO4J_DATABASE")

neo4j_driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_pass))

query = """
CREATE VECTOR INDEX chunkEmbedding IF NOT EXISTS
FOR (n:Chunk)
ON n.embedding
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}};
"""

execute_query(neo4j_driver, query)

[]

You can search the vector index by creating an embedding for a search term:

```cypher
WITH genai.vector.encode(
    "Retrieval Augmented Generation",
    "OpenAI",
    { token: "sk-..." }) AS userEmbedding
CALL db.index.vector.queryNodes('chunkEmbedding', 5, userEmbedding)
YIELD node, score
RETURN node.text, score
```

## Create a Vector + Cypher GraphRAG pipeline

The `neo4j_graphrag` package includes a `VectorCypherRetriever` class that combines vector similarity search with Cypher retrieval.

You can use this retriever to create a GraphRAG pipeline to:

1. Perform a vector similarity search to find the most relevant chunks for a given query.

2. Use a Cypher query to add additional information to the context.

3. Pass the context to an LLM to generate a response to the original query.


```python
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.retrievers import VectorCypherRetriever
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

# Create embedder
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")

# Define retrieval query
retrieval_query = """
RETURN node.text as text, score
"""

# Create retriever
retriever = VectorCypherRetriever(
    driver,
    neo4j_database=os.getenv("NEO4J_DATABASE"),
    index_name="chunkEmbedding",
    embedder=embedder,
    retrieval_query=retrieval_query,
)

#  Create the LLM
llm = OpenAILLM(model_name="gpt-4o")

# Create GraphRAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)

# Search
query_text = "Where can I learn more about knowledge graphs?"

response = rag.search(
    query_text=query_text, 
    retriever_config={"top_k": 5},
    return_context=True
)

print(response.answer)

# Close the database connection
driver.close()
```

The retriever is configured to use the chunkEmbedding vector index you just created.

```cypher
retriever = VectorCypherRetriever(
    driver,
    neo4j_database=os.getenv("NEO4J_DATABASE"),
    index_name="chunkEmbedding",
    embedder=embedder,
    retrieval_query=retrieval_query,
)
```


When you run the code:

1. The VectorCypherRetriever uses the vector index to find chunks similar to the query:

    "Where can I learn more about knowledge graphs?"

2. The GraphRAG pipeline passes the text from those chunks as context to the LLM.

3. The response from the LLM is printed:

    You can learn more about knowledge graphs in the Neo4j blog post linked here: What Is a Knowledge Graph?

You can print the context passed to the LLM by adding the following to the end of the code:

```python
print("CONTEXT:", response.retriever_result.items)
```

## Retrieval Cypher Query

The `VectorCypherRetriever` also allows you to define a Cypher query to retrieve additional context from the knowledge graph.

Adding additional context can help the LLM generate more accurate responses.

Update the `retrieval_query` to add additional information about the lessons, technologies, and concepts related to the chunks:

```python
retrieval_query = """
RETURN DISTINCT
    node.text as text, score,
    collect { MATCH (node)-[:FROM_DOCUMENT]->(d)-[:PDF_OF]->(lesson) RETURN lesson.url} as lesson_url,
    collect { MATCH (node)<-[:FROM_CHUNK]-(e:Technology) RETURN e.name } as technologies,
    collect { MATCH (node)<-[:FROM_CHUNK]-(e:Concept) RETURN e.name } as concepts
"""
```

The retriever will execute the Cypher query adding more context.

Running the code again for the same query, "Where can I learn more about knowledge graphs?", will produce a more detailed response:

You can learn more about knowledge graphs in the Neo4j blog post linked here: What Is a Knowledge Graph?. Additionally, you can explore further lessons on knowledge graphs on the GraphAcademy website, specifically in the course "GenAI Fundamentals," including the sections "What is a Knowledge Graph" and "Creating Knowledge Graphs."

The retrieval query includes additional context relating to technologies and concepts mentioned in the chunks.

Experiment asking different questions relating to the knowledge graph such as "What technologies and concepts support knowledge graphs?".

# Text to Cypher retriever

The `Text2CypherRetriever` retriever allows you to create GraphRAG pipelines that can answer natural language questions by generating and executing Cypher queries against the knowledge graph.

Using text to cypher retrieval can help you get precise information from the knowledge graph based on user questions. For example, how many lessons are in a course, what concepts are covered in a module, or how technologies relate to each other.

## Create a Text2CypherRetriever GraphRAG pipeline

```python
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.retrievers import Text2CypherRetriever

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

llm = OpenAILLM(
    model_name="gpt-4o", 
    model_params={"temperature": 0}
)

# Cypher examples as input/query pairs
examples = [
    "USER INPUT: 'Find a node with the name $name?' QUERY: MATCH (node) WHERE toLower(node.name) CONTAINS toLower($name) RETURN node.name AS name, labels(node) AS labels",
]

# Build the retriever
retriever = Text2CypherRetriever(
    driver=driver,
    neo4j_database=os.getenv("NEO4J_DATABASE"),
    llm=llm,
    examples=examples,
)

rag = GraphRAG(
    retriever=retriever, 
    llm=llm
)

query_text = "How many technologies are mentioned in the knowledge graph?"

response = rag.search(
    query_text=query_text,
    return_context=True
    )

print(response.answer)
print("CYPHER :", response.retriever_result.metadata["cypher"])
print("CONTEXT:", response.retriever_result.items)

driver.close()
```