# Reset the Knowledge Graph

## Imports

### Import python packages

To start, load some useful python packages, including some great stuff from langchain.
The "shared" notebook also sets up global constants like the connection to Neo4j,
embedding model to use, and the LLM to use for chat.


In [2]:
%run 'shared.ipynb'

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv
Connecting to Neo4j at bolt://neo4j-1:7687 as neo4j
Using data from /home/jovyan/data/single
Embedding with ollama using mxbai-embed-large
Chatting with ollama using llama3


## Set up Neo4j

### Prepare a GraphDatabase driver

You can use the Neo4j `GraphDatabase` interface to send queries to the graph database.

In [3]:
# expect `gdb` to be defined in shared notebook
# gdb = GraphDatabase.driver(uri=NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

result = gdb.execute_query("RETURN 'Hello, World!' AS message")

result.records[0].get('message')

'Hello, World!'

# Clean up the graph to remove any existing data and indexes

Use these queries to reset the current graph to a blank state,
with no indexes, contraints or data.

In [4]:
for constraint in gdb.execute_query('SHOW CONSTRAINTS').records:
    print(f"Removing constraint {constraint['name']}:")
    gdb.execute_query(f"DROP CONSTRAINT {constraint['name']}")

Removing constraint unique_chunk:


In [5]:
for index in gdb.execute_query('SHOW INDEXES').records:
    print(f"Removing index {index['name']}:")
    gdb.execute_query(f"""
        DROP INDEX `{index['name']}`
    """)

Removing index index_2bc8b8e7:


In [6]:
# Remove all data by matching any node, then "detach deleting" it,
# which means removing the node and all its relationships.
gdb.execute_query("""
        MATCH (all)
        DETACH DELETE all
""")

EagerResult(records=[], summary=<neo4j._work.summary.ResultSummary object at 0xffff42774890>, keys=[])

## Basic Cypher Queries

In [None]:
retrieval_query_window = """
MATCH window=
    (:Chunk)-[:NEXT*0..1]->(node)-[:NEXT*0..1]->(:Chunk)
WITH node, score, window as longestWindow 
  ORDER BY length(window) DESC LIMIT 1
WITH nodes(longestWindow) as chunkList, node, score
  UNWIND chunkList as chunkRows
WITH collect(chunkRows.text) as textList, node, score
RETURN apoc.text.join(textList, " \n ") as text,
    score,
    node {.source} AS metadata
"""

vector_store_window = Neo4jVector.from_existing_index(
    embedding=embeddings_api,
    url=NEO4J_URI,
    username=NEO4J_USERNAME,
    password=NEO4J_PASSWORD,
    database="neo4j",
    index_name=VECTOR_INDEX_NAME,
    text_node_property=VECTOR_SOURCE_PROPERTY,
    retrieval_query=retrieval_query_window
)

# Create a retriever from the vector store
retriever_window = vector_store_window.as_retriever()

# Create a chatbot Question & Answer chain from the retriever
chain_window = prettifyChain(RetrievalQAWithSourcesChain.from_chain_type(
    chat_api, 
    chain_type="stuff", 
    retriever=retriever_window
))

### Script - helpful way to show schema

In [31]:
def show_schema(gdb):
  nodes = gdb.execute_query("""CALL db.schema.nodeTypeProperties()""")
  print(nodes)
  relationships = gdb.execute_query("""CALL db.schema.relTypeProperties()""")
  print(relationships)

show_schema(gdb)

EagerResult(records=[], summary=<neo4j._work.summary.ResultSummary object at 0xffff594bd710>, keys=['nodeType', 'nodeLabels', 'propertyName', 'propertyTypes', 'mandatory'])
EagerResult(records=[], summary=<neo4j._work.summary.ResultSummary object at 0xffff594be390>, keys=['relType', 'propertyName', 'propertyTypes', 'mandatory'])


In [70]:
# Check the vector indexes in the graph
gdb.execute_query('SHOW VECTOR INDEXES').records

[<Record id=12 name='chunks_vector' state='ONLINE' populationPercent=100.0 type='VECTOR' entityType='NODE' labelsOrTypes=['Chunk'] properties=['embedding'] indexProvider='vector-2.0' owningConstraint=None lastRead=neo4j.time.DateTime(2024, 6, 2, 12, 39, 49, 442000000, tzinfo=<UTC>) readCount=17>]