# Neo4J

Notes:

- References:
    - https://neo4j.com/developer/python/
    - https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/
    - https://neo4j.com/developer/graph-data-science/applied-graph-embeddings/
- Still in beta
- Metrics: Euclidean, Cosine
- HNSW


In [21]:
pip install neo4j py2neo 

Collecting py2neo
  Downloading py2neo-2021.2.3-py2.py3-none-any.whl (177 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.0/177.0 kB[0m [31m246.1 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting interchange~=2021.0.4 (from py2neo)
  Downloading interchange-2021.0.4-py2.py3-none-any.whl (28 kB)
Collecting pansi>=2020.7.3 (from py2neo)
  Downloading pansi-2020.7.3-py2.py3-none-any.whl (10 kB)
Installing collected packages: pansi, interchange, py2neo
Successfully installed interchange-2021.0.4 pansi-2020.7.3 py2neo-2021.2.3
Note: you may need to restart the kernel to use updated packages.


In [34]:
from dotenv import load_dotenv
import os
load_dotenv(override=True)
neo4j_user = os.environ["NEO_USER"]
neo4j_pass = os.environ["NEO_PASSWORD"]
neo4j_host = os.environ["NEO_CONN"]

In [27]:
import pandas as pd
import numpy as np

In [23]:
rs = graph.query("""CALL db.index.vector.createNodeIndex('abstract-embeddings', 'Abstract', 'embedding', 1536, 'cosine')""")
rs

ClientError: [Statement.AccessMode] Writing in read access mode not allowed. Attempted write to neo4j

In [25]:
rs = graph.query("""SHOW INDEXES YIELD name, type, labelsOrTypes, properties, options
WHERE type = 'VECTOR'""")
print(rs)

 name                | type   | labelsOrTypes | properties    | options                                                                                                         
---------------------|--------|---------------|---------------|-----------------------------------------------------------------------------------------------------------------
 abstract-embeddings | VECTOR | ['Abstract']  | ['embedding'] | {indexProvider: 'vector-1.0', indexConfig: {`vector.dimensions`: 1536, `vector.similarity_function`: 'cosine'}} 



In [26]:
rs = graph.query("""
MATCH (title:Title)<--(:Paper)-->(abstract:Abstract)
WHERE toLower(title.text) = 'efficient and robust approximate nearest neighbor search using
  hierarchical navigable small world graphs'

CALL db.index.vector.queryNodes('abstract-embeddings', 10, abstract.embedding)
YIELD node AS similarAbstract, score

MATCH (similarAbstract)<--(:Paper)-->(similarTitle:Title)
RETURN similarTitle.text AS title, score
""")
print(rs)

(No data)


## Langchain

In [28]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import GraphCypherQAChain
from langchain.graphs import Neo4jGraph

In [29]:
graph = Neo4jGraph(
    url=neo4j_host, username=neo4j_user, password=neo4j_pass
)

In [30]:
graph.query(
    """
MERGE (m:Movie {name:"Top Gun"})
WITH m
UNWIND ["Tom Cruise", "Val Kilmer", "Anthony Edwards", "Meg Ryan"] AS actor
MERGE (a:Actor {name:actor})
MERGE (a)-[:ACTED_IN]->(m)
"""
)

[]

In [31]:
# if needed
graph.refresh_schema()

In [32]:
print(graph.get_schema)


        Node properties are the following:
        [{'labels': 'Movie', 'properties': [{'property': 'name', 'type': 'STRING'}]}, {'labels': 'Actor', 'properties': [{'property': 'name', 'type': 'STRING'}]}]
        Relationship properties are the following:
        []
        The relationships are the following:
        ['(:Actor)-[:ACTED_IN]->(:Movie)']
        


In [35]:
chain = GraphCypherQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True
)

In [36]:
chain.run("Who played in Top Gun?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m

[1m> Finished chain.[0m


'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'

In [37]:
chain = GraphCypherQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True, top_k=2
)

In [38]:
chain.run("Who played in Top Gun?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}][0m

[1m> Finished chain.[0m


'Tom Cruise and Val Kilmer played in Top Gun.'

In [39]:
chain = GraphCypherQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True, return_intermediate_steps=True
)

In [40]:
result = chain("Who played in Top Gun?")
print(f"Intermediate steps: {result['intermediate_steps']}")
print(f"Final answer: {result['result']}")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m

[1m> Finished chain.[0m
Intermediate steps: [{'query': "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\nRETURN a.name"}, {'context': [{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}]}]
Final answer: Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.


In [41]:
chain = GraphCypherQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True, return_direct=True
)

In [42]:
chain.run("Who played in Top Gun?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})
RETURN a.name[0m

[1m> Finished chain.[0m


[{'a.name': 'Tom Cruise'},
 {'a.name': 'Val Kilmer'},
 {'a.name': 'Anthony Edwards'},
 {'a.name': 'Meg Ryan'}]

In [43]:
from langchain.prompts.prompt import PromptTemplate


CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}
Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.
Examples: Here are a few examples of generated Cypher statements for particular questions:
# How many people played in Top Gun?
MATCH (m:Movie {{title:"Top Gun"}})<-[:ACTED_IN]-()
RETURN count(*) AS numberOfActors

The question is:
{question}"""

CYPHER_GENERATION_PROMPT = PromptTemplate(
    input_variables=["schema", "question"], template=CYPHER_GENERATION_TEMPLATE
)

chain = GraphCypherQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True, cypher_prompt=CYPHER_GENERATION_PROMPT
)

In [44]:
chain.run("How many people played in Top Gun?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Movie {name:"Top Gun"})<-[:ACTED_IN]-(:Actor)
RETURN count(*) AS numberOfActors[0m
Full Context:
[32;1m[1;3m[{'numberOfActors': 4}][0m

[1m> Finished chain.[0m


'Four people played in Top Gun.'

In [45]:
chain = GraphCypherQAChain.from_llm(
     graph=graph,
     cypher_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
     qa_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
     verbose=True,
)

In [46]:
chain.run("Who played in Top Gun?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m

[1m> Finished chain.[0m


'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'