# Graph Builder
In this notebook, let's explore how to leverage generative AI to build and consume a knowledge graph in Neo4j.

## Setup
First, let's install the libraries we're going to need for this lab and the following notebook dependent labs.  We'll also want to reboot the kernel once done.  To do that, go to the "Kernel" menu and click "Restart Kernel and Clear All Outputs."  That will get rid of everything the install statements printed, leaving us with a cleaner notebook to work with.

In [10]:
#%pip install --user langchain
#%pip install --user neo4j

Now restart the kernel. That will allow the Python evironment to import the new packages.

## Start Neo4j
We're going to run Neo4j in Docker.  You can do so by running this command in the terminal.  It will start Neo4j with APOC enabled.  We need APOC for Langchain to work.

To do -- figure a way to make this run in the notebook.  Background isn't supported.  I'm not sure how screen could do this either.

## Parse Data
Now we're going to use an LLM to parse a document and create a graph in Neo4j.

In [11]:
CYPHER_GENERATION_TEMPLATE = """You are an expert Neo4j Cypher translator who understands the English and converts it to Cypher strictly based on the Neo4j Schema provided and following the instructions below:
1. Only generate Cypher queries that are compatible with Neo4j Version 5.
2. Do not use EXISTS, SIZE keywords in the cypher. Use alias when using the WITH keyword.
3. Do not use same variable names for different nodes and relationships in the query.
4. Always enclose the Cypher output inside 3 backticks
5. Always do a case-insensitive and fuzzy search for any properties related search. Eg: to search for a Company name use `toLower(c.name) contains 'neo4j'`
6. Candidate node is synonymous to Manager
7. Always use aliases to refer the node in the query
8. 'Answer' is not a Cypher keyword. Answer should never be used in a query.
9. Generate only one Cypher query per question. 
10. Cypher is not SQL. So, do not mix and match the syntaxes.
11. Every Cypher query always starts with a MERGE keyword.

Schema:
{schema}
Samples:
Question: Jeff took his dog, Lassie, for a walk.
Answer: MERGE (Jeff:Person)

Question: {question}
Answer: 
"""

In [12]:
from langchain.prompts.prompt import PromptTemplate

CYPHER_GENERATION_PROMPT = PromptTemplate(
    input_variables=['schema','question'], validate_template=True, template=CYPHER_GENERATION_TEMPLATE
)

In [13]:
NEO4J_USERNAME = 'neo4j'
NEO4J_PASSWORD = 'mypassword'
NEO4J_URI = 'neo4j://localhost'

In [14]:
from langchain.graphs import Neo4jGraph

graph = Neo4jGraph(
    url=NEO4J_URI, 
    username=NEO4J_USERNAME, 
    password=NEO4J_PASSWORD
)

In [15]:
from langchain.chains import GraphCypherQAChain
from langchain.llms import VertexAI

chain = GraphCypherQAChain.from_llm(
    graph=graph,
    cypher_llm=VertexAI(model_name='code-bison@001', max_output_tokens=2048, temperature=0.0),
    qa_llm=VertexAI(model_name='text-bison', max_output_tokens=2048, temperature=0.0),
    cypher_prompt=CYPHER_GENERATION_PROMPT,
    verbose=True,
    return_intermediate_steps=True
)

In [16]:
r1 = chain("""Jeff took his dog, Lassie, for a walk.""")
print(f"Final answer: {r1['result']}")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMERGE (Jeff:Person)
MERGE (Lassie:Dog)
MERGE (Jeff)-[:WALKED_WITH]->(Lassie)[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m
Final answer:  I don't know. The provided information does not mention Jeff, Lassie, or a walk.
