# Chatbot
In this notebook, we'll build a chatbot.  The chatbot interface will use gradio.  Underlying the chatbot is the Neo4j knowledge graph we built in previous labs.  The chatbot uses generative AI and langchain. Here is the flow:
1. User asks a question in natural language.
2. Generative AI converts it to Neo4j Cypher.
3. Cypher query queries the Neo4j knowledge graph for deeper contextual facts.
4. Generative AI converts the database response to natural language.
5. Response is presented to the user.

Let's begin by installing the packages needed for this lab.

In [None]:
# Install necessary packages 

# Core LangChain and related dependencies
%pip install -U 'langchain>=0.3.0,<0.4.0'  # Core LangChain (ensures latest compatible version)
%pip install -U 'langchain-neo4j'      # Neo4j integration for LangChain
%pip install -U 'langchain-aws'        # AWS Bedrock integration for LangChain
%pip install -U 'langchain-community>=0.3.20' # Langchain community packages.
%pip install -U 'langsmith>=0.1.125'     # Langsmith package.
%pip install -U 'openai'                   # langchain has a dependency on openai
%pip install -U 'tenacity>=9.0'          # Retry mechanism for robustness
%pip install -U 'anyio>=4.4.0'           # Asynchronous I/O support

# Data processing and UI
%pip install pandas<2.0.0            # Specific Pandas version for compatibility
%pip install gradio                   # Gradio for building web interfaces

# Utility packages
%pip install python-dotenv            # For managing environment variables

Now restart the kernel. That will allow the Python evironment to import the new packages.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

## Base Example Without Grounding
Before grounding with the Neo4j, let's setup up a baseline that just uses an LLM to answer questions.

In [None]:
SERVICE_NAME = 'bedrock-runtime'
REGION_NAME = 'us-east-1'

import boto3
bedrock = boto3.client(
 service_name=SERVICE_NAME,
 region_name=REGION_NAME,
 endpoint_url=f'https://{SERVICE_NAME}.{REGION_NAME}.amazonaws.com'
)

In [None]:
from langchain_aws import BedrockLLM
base_chain = BedrockLLM(
                model_id="anthropic.claude-v2",
                client=bedrock,
                model_kwargs = {
                    "temperature":0,
                    "top_k":1, "top_p":0.1,
                    "anthropic_version":"bedrock-2023-05-31",
                    "max_tokens_to_sample": 2048
                }
            )

We can now ask a simple finance question.

In [None]:
base_response = base_chain.invoke("""What are the top 10 investments for Blackrock?""")
print(f"Final answer: {base_response}")

While this answer looks reasonable, we have no real way to know how the LLM came it with it, or where it was sourced from.

Here is a more complicated example where we expect the LLM to understand some more specific terminology.

In [None]:
base_response = base_chain.invoke("""Which managers own FAANG stocks?""")
print(f"Final answer: {base_response}")

In this case, it looks like the LLM understands the ubiquitous acronym FAANG but, unsurprisingly, the results indicate it doesn't understand manager within the context of our data model.  In your use case, you may have lots of specific terminology/ontology like this that you would need a chatbot to understand.

## Grounding LLMs with Neo4j
Now let's create a chatbot that is grounded with Neo4j. Below is the pattern we will follow with LangChain:

![](images/langchain.png)

## Cypher Generation
We have to use a prompt template that: 
1. Clearly states what schema to use 
2. Provides principles the chatbot should follow in generating responses
3. Demonstrates few-shot examples to help the chatbot be more accurate in its query generation.

In [None]:
CYPHER_GENERATION_TEMPLATE = """You are an expert Neo4j Cypher translator who understands the question in english and convert to Cypher strictly based on the Neo4j Schema provided and following the instructions below:
<instructions>
* Use aliases to refer the node or relationship in the generated Cypher query
* Generate Cypher query compatible ONLY for Neo4j Version 5
* Do not use EXISTS, SIZE keywords in the cypher. Use alias when using the WITH keyword
* Use only Nodes and relationships mentioned in the schema
* Always enclose the Cypher output inside 3 backticks (```)
* Always do a case-insensitive and fuzzy search for any properties related search. Eg: to search for a Company name use `toLower(c.name) contains 'neo4j'`
* Use the relationship variable `o` to access the `shares` and `value` properties of the `OWNS` relationship when calculating the sums.
* Cypher is NOT SQL. So, do not mix and match the syntaxes
* Use the elementId() function instead of id() to compare node identifiers
</instructions>

Strictly use this Schema for Cypher generation:
<schema>
{schema}
</schema>

The samples below follow the instructions and the schema mentioned above. So, please follow the same when you generate the cypher:
<samples>
Human: Which fund manager owns most shares? What is the total portfolio value?
Assistant: ```MATCH (m:Manager) -[o:OWNS]-> (c:Company) RETURN m.managerName as manager, sum(distinct o.shares) as ownedShares, sum(o.value) as portfolioValue ORDER BY ownedShares DESC LIMIT 10```

Human: Which fund manager owns most companies? How many shares?
Assistant: ```MATCH (m:Manager) -[o:OWNS]-> (c:Company) RETURN m.managerName as manager, count(distinct c) as ownedCompanies, sum(distinct o.shares) as ownedShares ORDER BY ownedCompanies DESC LIMIT 10```

Human: What are the top 10 investments for Vanguard?
Assistant: ```MATCH (m:Manager) -[o:OWNS]-> (c:Company) WHERE toLower(m.managerName) contains "vanguard" RETURN c.companyName as Investment, sum(DISTINCT o.shares) as totalShares, sum(DISTINCT o.value) as investmentValue order by investmentValue desc limit 10```

Human: What other fund managers are investing in same companies as Vanguard?
Assistant: ```MATCH (m1:Manager) -[o1:OWNS]-> (c1:Company) <-[o2:OWNS]- (m2:Manager) WHERE toLower(m1.managerName) contains "vanguard" AND elementId(m1) <> elementId(m2) RETURN m2.managerName as manager, sum(DISTINCT o2.shares) as investedShares, sum(DISTINCT o2.value) as investmentValue ORDER BY investmentValue LIMIT 10```

Human: What are the top 10 investments for rempart?
Assistant: ```MATCH (m:Manager) -[o:OWNS]-> (c:Company) WHERE toLower(m.managerName) contains "rempart" RETURN c.companyName as Investment, sum(DISTINCT o.shares) as totalShares, sum(DISTINCT o.value) as investmentValue order by investmentValue desc limit 10```

Human: What are the top investors for Apple?
Assistant: ```MATCH (m1:Manager) -[o:OWNS]-> (c1:Company) WHERE toLower(c1.companyName) contains "apple" RETURN distinct m1.managerName as manager, sum(o.value) as totalInvested ORDER BY totalInvested DESC LIMIT 10```

Human: What are the other top investments for fund managers investing in Apple?
Assistant: ```MATCH (c1:Company) <-[o1:OWNS]- (m1:Manager) -[o2:OWNS]-> (c2:Company) WHERE toLower(c1.companyName) contains "apple" AND elementId(c1) <> elementId(c2) RETURN DISTINCT c2.companyName as company, sum(o2.value) as totalInvested, sum(o2.shares) as totalShares ORDER BY totalInvested DESC LIMIT 10```

Human: What are the top investors in the last 3 months?
Assistant: ```MATCH (m:Manager) -[o:OWNS]-> (c:Company) WHERE date() > o.reportCalendarOrQuarter > o.reportCalendarOrQuarter - duration({{months:3}}) RETURN distinct m.managerName as manager, sum(o.value) as totalInvested, sum(o.shares) as totalShares ORDER BY totalInvested DESC LIMIT 10```

Human: What are top investments in last 6 months for Vanguard?
Assistant: ```MATCH (m:Manager) -[o:OWNS]-> (c:Company) WHERE toLower(m.managerName) contains "vanguard" AND date() > o.reportCalendarOrQuarter > date() - duration({{months:6}}) RETURN distinct c.companyName as company, sum(o.value) as totalInvested, sum(o.shares) as totalShares ORDER BY totalInvested DESC LIMIT 10```

Human: Who are Apple's top investors in last 3 months?
Assistant: ```MATCH (m:Manager) -[o:OWNS]-> (c:Company) WHERE toLower(c.companyName) contains "apple" AND date() > o.reportCalendarOrQuarter > date() - duration({{months:3}}) RETURN distinct m.managerName as investor, sum(o.value) as totalInvested, sum(o.shares) as totalShares ORDER BY totalInvested DESC LIMIT 10```

Human: Which fund manager under 200 million has similar investment strategy as Vanguard?
Assistant: ```MATCH (m1:Manager) -[o1:OWNS]-> (:Company) <-[o2:OWNS]- (m2:Manager) WHERE toLower(m1.managerName) CONTAINS "vanguard" AND elementId(m1) <> elementId(m2) WITH distinct m2 AS m2, sum(distinct o2.value) AS totalVal WHERE totalVal < 200000000 RETURN m2.managerName AS manager, totalVal*0.000001 AS totalVal ORDER BY totalVal DESC LIMIT 10```

Human: Who are common investors in Apple and Amazon?
Assistant: ```MATCH (c1:Company) <-[o1:OWNS]- (m:Manager) -[o2:OWNS]-> (c2:Company) WHERE toLower(c1.companyName) contains "apple" AND toLower(c2.companyName) CONTAINS "amazon" RETURN DISTINCT m.managerName LIMIT 50```

Human: What are Vanguard's top investments by shares for 2023?
Assistant: ```MATCH (m:Manager) -[o:OWNS]-> (c:Company) WHERE toLower(m.managerName) CONTAINS "vanguard" AND date({{year:2023}}) = date.truncate('year',o.reportCalendarOrQuarter) RETURN c.companyName AS investment, sum(o.value) AS totalValue ORDER BY totalValue DESC LIMIT 10```

Human: What are Vanguard's top investments by value for 2023?
Assistant: ```MATCH (m:Manager) -[o:OWNS]-> (c:Company) WHERE toLower(m.managerName) CONTAINS "vanguard" AND date({{year:2023}}) = date.truncate('year',o.reportCalendarOrQuarter) RETURN c.companyName AS investment, sum(o.shares) AS totalShares ORDER BY totalShares DESC LIMIT 10```
</samples>

Human: {question}
Assistant: 
"""

Now let’s create a LangChain prompt template.  

This template defines the parameter inputs for the prompt sent to the Cypher generation bot.  In our example, the inputs will be schema and question.  The question comes from the end user.  The LangChain GraphCypherQAChain automatically inserts the schema via a built-in method to Neo4jGraph.

In [None]:
from langchain.prompts.prompt import PromptTemplate

CYPHER_GENERATION_PROMPT = PromptTemplate(
    input_variables=['schema','question'], validate_template=True, template=CYPHER_GENERATION_TEMPLATE
)

Now we'll load up the Aura credentials from the credential file we created in Lab 6

In [None]:
from dotenv import load_dotenv
import os
dotenv_file = "../aura_connection.txt"
load_dotenv(dotenv_file)
NEO4J_URI = os.getenv("NEO4J_URI")
NEO4J_USERNAME = os.getenv("NEO4J_USERNAME")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")
print('NEO4J_URI:', NEO4J_URI)
print('NEO4J_USERNAME:', NEO4J_USERNAME)
print('NEO4J_PASSWORD:', NEO4J_PASSWORD)

We need to connect to the graph via LangChain.

In [None]:
from langchain_neo4j import Neo4jGraph

graph = Neo4jGraph(
    url=NEO4J_URI, 
    username=NEO4J_USERNAME, 
    password=NEO4J_PASSWORD
)

We define our `chain` object (specifically a`GraphCypherQAChain`) using Anthropic Claude V2 LLM.

`GraphCypherQAChain` also takes a ‘Neo4jGraph’ so it can handle the chatbot process end-to-end, from taking the user question and translating to Cypher to executing the query, getting results, translating back to natural language, and returning to the user. 


In [None]:
from langchain_neo4j import Neo4jGraph, GraphCypherQAChain
from langchain_neo4j import Neo4jGraph
from langchain_aws import BedrockLLM
import json

llm = BedrockLLM(
    model_id="anthropic.claude-v2",
    client=bedrock,
    model_kwargs={
        "temperature": 0,
        "top_k": 1,
        "top_p": 0.1,
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens_to_sample": 2048,
    },
)

chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,  # Use Neo4jGraph directly
    cypher_prompt=CYPHER_GENERATION_PROMPT,
    verbose=True,
    return_direct=True,
    allow_dangerous_requests=True,
)

def chat(que):
    r = chain.invoke(que)
    print(r)
    summary_prompt_tpl = f"""Human: 
    Fact: {json.dumps(r['result'])}

    * Summarise the above fact as if you are answering this question "{r['query']}"
    * When the fact is not empty, assume the question is valid and the answer is true
    * Do not return helpful or extra text or apologies
    * Just return summary to the user. DO NOT start with Here is a summary
    * List the results in rich text format if there are more than one results
    Assistant:
    """
    return llm.invoke(summary_prompt_tpl)

Below we have a few examples of how we can get answers from the chatbot.

## Why Ground Your LLM?
Recall our base example where we asked for the top 10 Rempart investments?  We got an answer that looked like it may be reasonable, but we couldn't validate it or track sources.  We also asked what managers own FAANG stocks, and for that, we unsurprisingly received the wrong answers for our use case.

Let's try again grounding with Neo4j. 

In [None]:
r2 = chat("""What are the top 10 investments for Vanguard?""")
print(f"Final answer: {r2}")

Notice that this answer is different from our base example, and this time we have the Cypher logic used to obtain the answer from our database. This means that we can trace back how we came up with this answer and make any adjustments to our database or prompt if we need to.

Now lets try the FAANG question.

In [None]:
r3 = chat("""Which managers own FAANG stocks?""")
print(f"Final answer: {r3}")

Here again, we notice the traceability with Cypher, and because we engineered our prompt to include our schema, it understood what “manager” means in the context of our use case.

## Why Ground your LLM with Neo4j?
There are 3 primary reasons to ground your LLM with Neo4j specifically:
1. __Grounding for more complex question handling__: Multi-hop knowledge retrieval across connected data. Connections between data points are calculated before query time. 
2. __Enterprise reliability and security__: Fine-grained security so the chatbot only accesses information the user has permission to. Autonomous clustering for horizontal scaling.  Fully managed service in the cloud through Aura. 
3. __Performance__: fast queries with high concurrency for many users.

We can explore point 1 with more complex questions below.

A question requiring ~4 hops (would be joins in the relational world).  Having a knowledge graph with relationships calculated before query time allows us to answer the question quickly.

In [None]:
r4 = chat("""What are other top investments for fund managers investing in Lowes?""")
print(f"Final answer: {r4}")

and more...

In [None]:
r5 = chat("""Please get me common investors between Tesla and Costco""")
print(f"Final answer: {r5}")

## Grounded Chatbot
Now we will use Gradio to deploy a chat interface with our chain behind it.

The below code deploys a Gradio application.  You can access the app via a local URL. A publicly sharable URL is also provided (sharable for 3 days).

In [None]:
import gradio as gr
from langchain.memory import ConversationBufferWindowMemory, ChatMessageHistory
from langchain.schema import SystemMessage, HumanMessage, AIMessage

chat_history = ChatMessageHistory()
memory = ConversationBufferWindowMemory(
    chat_memory=chat_history,  # Correct integration
    k=5,  # Number of interactions to keep in memory.
    memory_key="chat_history",
    return_messages=True,
    output_key='output'
)

def chat_response(input_text, history):
    try:
        return chat(input_text)
    except:
        return "I'm sorry, there was an error retrieving the information you requested."

interface = gr.ChatInterface(
    fn=chat_response,
    title="Investment Chatbot",
    description="powered by Neo4j",
    theme="soft",
    chatbot=gr.Chatbot(height=500, type="messages"),
    examples=[
        "What are the top 10 investments for Vanguard?",
        "Which manager owns FAANG stocks?",
        "What are other top investments for fund managers investing in Exxon?",
        "What are Rempart's top investments by value for 2023?",
        "Who are the common investors between Tesla and Costco?",
    ],
    additional_inputs=None,
    type="messages"
)

interface.launch(share=True)

## Conclusion
In this notebook, we went through the steps of connecting a LangChain agent to a Neo4j database and using it to generate Cypher queries in response to user requests via LLMs on Bedrock, thus grounding the LLM with a knowledge graph.

While we used the Anthropic `claude v2` model here, this approach can be generalized to other Bedrock LLMs.  This process can also be augmented with additional steps around the generation chain to customize the experience for specific use cases.  

The critical takeaway is the importance of Neo4j Knowledge Graph as a grounding database to: 
* Anchor your chatbot to reality as it generates responses and 
* Enable your LLM to provide answers enriched with relevant enterprise data.