# Chatting with the Knowledge Graph



### Script - intro to chatting

In this notebook, you will be exploring the knowledge graph a bit more.

First with cypher queries to directly explore the graph.

Then using langchain to create a question and answer chat.

Finally, you will use the LLM to combine those techniques.

## Imports

### Script

You will start in the usual way, by importing some libraries.

In [9]:
from dotenv import load_dotenv
import os

import textwrap

# Langchain
from langchain_community.graphs import Neo4jGraph
from langchain.prompts.prompt import PromptTemplate
from langchain.chains import GraphCypherQAChain

from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.chat_models import ChatOllama

## Set up Neo4j and Langchain

### Script - define global variables

Then define the same global variables you've seen before.

In [10]:
# Load from environment
load_dotenv(override=True)
NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')
NEO4J_DATABASE = os.getenv('NEO4J_DATABASE') or 'neo4j'

# Global constants
VECTOR_INDEX_NAME = 'form_10k_chunks'
VECTOR_NODE_LABEL = 'Chunk'
VECTOR_SOURCE_PROPERTY = 'text'
VECTOR_EMBEDDING_PROPERTY = 'textEmbedding'

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
EMBEDDING_API = os.getenv('EMBEDDING_API') or 'openai'
EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL') or 'text-embedding-ada-002'
CHAT_API = os.getenv('CHAT_API') or 'openai'
CHAT_MODEL = os.getenv('CHAT_MODEL') or 'gpt-3.5-turbo'

print(f"Connecting to Neo4j at {NEO4J_URI} as {NEO4J_USERNAME}")
print(f"Embedding with {EMBEDDING_API} using {EMBEDDING_MODEL}")
print(f"Chat with {CHAT_API} using {CHAT_MODEL}")


Connecting to Neo4j at neo4j://localhost:7687 as neo4j
Embedding with ollama using nomic-embed-text
Chat with openai using gpt-3.5-turbo-instruct


In [11]:
# Create a knowledge graph using Langchain's Neo4j integration.
# This will be used for direct querying of the knowledge graph. 
kg = Neo4jGraph(
    url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD, database=NEO4J_DATABASE
)


# Example Cypher queries

In [6]:
kg.refresh_schema()

# for line in kg.schema.splitlines():
#     print(textwrap.fill(line, 100), "\n")

print(kg.schema)

Node properties are the following:
Chunk {textEmbedding: LIST, f10kItem: STRING, chunkSeqId: INTEGER, text: STRING, cik: STRING, cusip6: STRING, names: LIST, formId: STRING, source: STRING, chunkId: STRING},Form {cusip6: STRING, names: LIST, formId: STRING, source: STRING},Company {cusip: STRING, names: LIST, companyName: STRING, cusip6: STRING},Manager {managerName: STRING, managerCik: STRING, managerAddress: STRING}
Relationship properties are the following:
SECTION {f10kItem: STRING},OWNS_STOCK_IN {shares: INTEGER, reportCalendarOrQuarter: STRING, value: FLOAT}
The relationships are the following:
(:Chunk)-[:NEXT]->(:Chunk),(:Chunk)-[:PART_OF]->(:Form),(:Form)-[:SECTION]->(:Chunk),(:Company)-[:FILED]->(:Form),(:Manager)-[:OWNS_STOCK_IN]->(:Company)


# Cypher - queries about addresses


In [7]:
# Tell me about a manager named royal bank
kg.query("""
  CALL db.index.fulltext.queryNodes(
         "fullTextManagerNames", 
         "royal bank") YIELD node, score
  RETURN node.managerName, score LIMIT 1
""")

[{'node.managerName': 'Royal Bank of Canada', 'score': 4.431276321411133}]

In [8]:
# What is the location of royal bank?
kg.query("""
CALL db.index.fulltext.queryNodes(
         "fullTextManagerNames", 
         "royal bank"
  ) YIELD node, score
WITH node as mgr LIMIT 1
MATCH (mgr:Manager)-[:LOCATED_AT]->(addr:Address)
RETURN mgr.managerName, addr
""")

[]

In [None]:
# Which state has the most investment firms?
kg.query("""
  MATCH p=(:Manager)-[:LOCATED_AT]->(address:Address)
  RETURN address.state as state, count(address.state) as numManagers
    ORDER BY numManagers DESC
    LIMIT 10
""")

In [None]:
# Which state has the most public companies listed?
kg.query("""
  MATCH p=(:Company)-[:LOCATED_AT]->(address:Address)
  RETURN address.state as state, count(address.state) as numCompanies
    ORDER BY numCompanies DESC
""")

In [None]:
# What are the cities in California with the most investment firms?
kg.query("""
  MATCH p=(:Manager)-[:LOCATED_AT]->(address:Address)
         WHERE address.state = 'California'
  RETURN address.city as city, count(address.city) as numManagers
    ORDER BY numManagers DESC
    LIMIT 10
""")

In [None]:
# Which city in California has the most companies listed?
kg.query("""
  MATCH p=(:Company)-[:LOCATED_AT]->(address:Address)
         WHERE address.state = 'California'
  RETURN address.city as city, count(address.city) as numCompanies
    ORDER BY numCompanies DESC
""")

In [None]:
# What are top investment firms in San Francisco?
kg.query("""
  MATCH p=(mgr:Manager)-[:LOCATED_AT]->(address:Address),
         (mgr)-[owns:OWNS_STOCK_IN]->(:Company)
         WHERE address.city = "San Francisco"
  RETURN mgr.managerName as city, sum(owns.value) as totalInvestmentValue
    ORDER BY totalInvestmentValue DESC
    LIMIT 10
""")

In [None]:
# What companies are in Santa Clara?
kg.query("""
  MATCH (com:Company)-[:LOCATED_AT]->(address:Address)
         WHERE address.city = "Santa Clara"
  RETURN com.companyName
""")

### Script - cypher queries with cartesian distance

So far you've been exploring the graph using explicit relationships.

You can also find things based on their location coordinates.

This is like doing vector search, but within a 2-dimensional space

and using cartesian distance rather than cosine similarity.

But the idea is the same.

In [12]:
# What companies are near Santa Clara?
kg.query("""
  MATCH (address:Address)
    WHERE address.city = "Santa Clara"
  MATCH (com:Company)-[:LOCATED_AT]->(companyAddress:Address)
    WHERE point.distance(address.location, companyAddress.location) < 10000
  RETURN com.companyName, com.companyAddress
""")

[{'com.companyName': 'PALO ALTO NETWORKS INC',
  'com.companyAddress': '3000 Tannery Way, Santa Clara, CA 95054, USA'},
 {'com.companyName': 'GSI TECHNOLOGY INC',
  'com.companyAddress': '1213 Elko Dr, Sunnyvale, CA 94089, USA'},
 {'com.companyName': 'NETAPP INC',
  'com.companyAddress': 'Headquarters Dr, San Jose, CA 95134, USA'},
 {'com.companyName': 'WESTERN DIGITAL CORP.',
  'com.companyAddress': '615 National Ave # 100, Mountain View, CA 94043, USA'},
 {'com.companyName': 'SEAGATE TECHNOLOGY',
  'com.companyAddress': '2445 Augustine Dr, Santa Clara, CA 95054, USA'},
 {'com.companyName': 'APPLE INC', 'com.companyAddress': 'Cupertino, CA, USA'}]

In [None]:
# What investment firms are near Santa Clara?
kg.query("""
  MATCH (address:Address)
    WHERE address.city = "Santa Clara"
  MATCH (mgr:Manager)-[:LOCATED_AT]->(managerAddress:Address)
    WHERE point.distance(address.location, managerAddress.location) < 10000
  RETURN mgr.managerName, mgr.managerAddress
""")

In [None]:
# Which investment firms are near Palo Aalto Networks?
kg.query("""
  CALL db.index.fulltext.queryNodes(
         "fullTextCompanyNames", 
         "Palo Aalto Networks"
         ) YIELD node, score
  WITH node as com
  MATCH (com)-[:LOCATED_AT]->(comAddress:Address),
    (mgr:Manager)-[:LOCATED_AT]->(mgrAddress:Address)
    WHERE point.distance(comAddress.location, mgrAddress.location) < 20000
  RETURN mgr, 
    toInteger(point.distance(comAddress.location, mgrAddress.location) / 2000) as distanceKm
    ORDER BY distanceKm ASC
    LIMIT 10
""")

### Script - pause and try out variations

This is a good time to pause the video and try out some variations
of those Cypher queries.

Try different distances, company names and cities to see what you get.

## Cypher - few shot learning

Now to try something new, you can use a few shot learning technique.

Previously, you had been using vector search by itself, or in combination
with a cypher query to provide enough context to answer a user question.

Another approach is to teach the LLM about the knowledge graph,
then ask it to generate Cypher queries for you.

This is called few shot learning. You give the LLM a few examples
of natural language questions and their corresponding Cypher queries.

Then you can ask it to generate a Cypher query for a new question.

### Script - cypher generation template

Let's look at the prompt you will use to teach the LLM about the knowledge graph.

In [13]:
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}
Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.
Examples: Here are a few examples of generated Cypher statements for particular questions:

# What are the top investment firms are in San Francisco?
MATCH (mgr:Manager)-[:LOCATED_AT]->(mgrAddress:Address)
    WHERE mgrAddress.city = 'San Francisco'
RETURN mgr.managerName

# What companies are in Santa Clara?
MATCH (com:Company)-[:LOCATED_AT]->(comAddress:Address)
    WHERE comAddress.city = 'Santa Clara'
RETURN com.companyName

# What investment firms are near Santa Clara?
  MATCH (address:Address)
    WHERE address.city = "Santa Clara"
  MATCH (mgr:Manager)-[:LOCATED_AT]->(managerAddress:Address)
    WHERE point.distance(address.location, managerAddress.location) < 20 * 1000
  RETURN mgr.managerName, mgr.managerAddress

# Which investment firms are near Palo Aalto Networks?
  CALL db.index.fulltext.queryNodes(
         "fullTextCompanyNames", 
         "Palo Aalto Networks"
         ) YIELD node, score
  WITH node as com
  MATCH (com)-[:LOCATED_AT]->(comAddress:Address),
    (mgr:Manager)-[:LOCATED_AT]->(mgrAddress:Address)
    WHERE point.distance(comAddress.location, mgrAddress.location) < 20 * 1000
  RETURN mgr, 
    toInteger(point.distance(comAddress.location, mgrAddress.location) / 1000) as distanceKm
    ORDER BY distanceKm ASC
    LIMIT 10
  
The question is:
{question}"""

In [14]:
CYPHER_GENERATION_PROMPT = PromptTemplate(
    input_variables=["schema", "question"], template=CYPHER_GENERATION_TEMPLATE
)

cypherChain = GraphCypherQAChain.from_llm(
    ChatOpenAI(temperature=0),
    graph=kg,
    verbose=True,
    cypher_prompt=CYPHER_GENERATION_PROMPT,
)

def prettyCypherChain(question: str) -> str:
    response = cypherChain.run(question)
    print(textwrap.fill(response, 60))


In [15]:
prettyCypherChain("What investment firms are in San Francisco?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (mgr:Manager)-[:LOCATED_AT]->(mgrAddress:Address)
    WHERE mgrAddress.city = 'San Francisco'
RETURN mgr.managerName[0m
Full Context:
[32;1m[1;3m[{'mgr.managerName': 'PARNASSUS INVESTMENTS, LLC'}, {'mgr.managerName': 'SKBA CAPITAL MANAGEMENT LLC'}, {'mgr.managerName': 'ROSENBLUM SILVERMAN SUTTON S F INC /CA'}, {'mgr.managerName': 'CHARLES SCHWAB INVESTMENT MANAGEMENT INC'}, {'mgr.managerName': 'WELLS FARGO & COMPANY/MN'}, {'mgr.managerName': 'Dodge & Cox'}, {'mgr.managerName': 'Strait & Sound Wealth Management LLC'}, {'mgr.managerName': 'Sonoma Private Wealth LLC'}, {'mgr.managerName': 'Fund Management at Engine No. 1 LLC'}, {'mgr.managerName': 'SELDON CAPITAL LP'}][0m

[1m> Finished chain.[0m
PARNASSUS INVESTMENTS, LLC and Dodge & Cox are investment
firms located in San Francisco.


In [16]:
prettyCypherChain("What companies are in Santa Clara?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (com:Company)-[:LOCATED_AT]->(comAddress:Address)
WHERE comAddress.city = 'Santa Clara'
RETURN com.companyName[0m
Full Context:
[32;1m[1;3m[{'com.companyName': 'PALO ALTO NETWORKS INC'}, {'com.companyName': 'SEAGATE TECHNOLOGY'}][0m

[1m> Finished chain.[0m
The companies in Santa Clara are PALO ALTO NETWORKS INC and
SEAGATE TECHNOLOGY.


In [17]:
prettyCypherChain("What investment firms are near Santa Clara?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (address:Address)
    WHERE address.city = "Santa Clara"
MATCH (mgr:Manager)-[:LOCATED_AT]->(managerAddress:Address)
    WHERE point.distance(address.location, managerAddress.location) < 20 * 1000
RETURN mgr.managerName, mgr.managerAddress[0m
Full Context:
[32;1m[1;3m[{'mgr.managerName': 'Roberts Wealth Advisors, LLC', 'mgr.managerAddress': '855 EL CAMINO REAL, #311 BUILDING 5, PALO ALTO, CA, 94301'}, {'mgr.managerName': 'Adero Partners, LLC', 'mgr.managerAddress': '306 CAMBRIDGE AVENUE, PALO ALTO, CA, 94306'}, {'mgr.managerName': 'Wealthfront Advisers LLC', 'mgr.managerAddress': '261 Hamilton Ave, Palo Alto, CA, 94301'}, {'mgr.managerName': 'Redwood Grove Capital, LLC', 'mgr.managerAddress': '530 LYTTON AVENUE, 2ND FLOOR, Palo Alto, CA, 94301'}, {'mgr.managerName': 'LIGHT STREET CAPITAL MANAGEMENT, LLC', 'mgr.managerAddress': '505 Hamilton Avenue, Suite 110, Palo Alto, CA, 94301'}, {'mgr.manag

In [18]:
prettyCypherChain("Which investment firms are near Palo Aalto Networks?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mCALL db.index.fulltext.queryNodes(
     "fullTextCompanyNames", 
     "Palo Aalto Networks"
     ) YIELD node, score
WITH node as com
MATCH (com)-[:LOCATED_AT]->(comAddress:Address),
  (mgr:Manager)-[:LOCATED_AT]->(mgrAddress:Address)
WHERE point.distance(comAddress.location, mgrAddress.location) < 20 * 1000
RETURN mgr, 
  toInteger(point.distance(comAddress.location, mgrAddress.location) / 1000) as distanceKm
  ORDER BY distanceKm ASC
  LIMIT 10[0m
Full Context:
[32;1m[1;3m[{'mgr': {'managerCik': '1611518', 'managerAddress': '800 WEST EL CAMINO REAL SUITE 201, MOUNTAIN VIEW, CA, 94040', 'location': POINT(-122.0842031 37.3862077), 'managerName': 'Wealth Architects, LLC'}, 'distanceKm': 7}, {'mgr': {'managerCik': '1630365', 'managerAddress': '4984 EL CAMINO REAL, SUITE 101, LOS ALTOS, CA, 94022', 'location': POINT(-122.105859 37.397132), 'managerName': 'AIMZ Investment Advisors, LLC'}, 'distanceKm': 1