# Neo4j + LLM Conversational Agent Notebook

This notebook demonstrates a conversational agent that:
- Uses LangChain, OpenAI, and Neo4j to answer questions about company risk factors.
- Tracks each conversation session and every message in Neo4j.
- Logs which company and risk factor nodes were involved in each answer.
- Maintains message order: Each message in a session is linked to the next message via a `:NEXT` relationship, allowing you to traverse the conversation in order.

**Data Model:**
- `Session` nodes represent a user's chat session.
- `Message` nodes represent each question/answer pair.
- Relationships:
    - `(:Session)-[:HAS_MESSAGE]->(:Message)` links sessions to messages.
    - `(:Message)-[:NEXT]->(:Message)` links each message to the next in the session.
    - `(:Message)-[:INVOLVES_COMPANY]->(:Company)` and `(:Message)-[:INVOLVES_RISK]->(:RiskFactor)` track which nodes were referenced in the answer.

---


In [None]:
# Environment and Dependency Setup
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain.tools import Tool
from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_neo4j import Neo4jChatMessageHistory, Neo4jGraph
from uuid import uuid4
from pprint import pprint


## Configuration
Make sure your `.env` file contains your Neo4j and OpenAI credentials:

```
NEO4J_URI=neo4j+s://<your-instance>.databases.neo4j.io
NEO4J_USERNAME=your_username
NEO4J_PASSWORD=your_password
OPENAI_API_KEY=your_openai_key
```


In [None]:
SESSION_ID = str(uuid4())

llm = ChatOpenAI(openai_api_key=openai_api_key)
graph = Neo4jGraph(
    url=os.getenv('NEO4J_URI'),
    username=os.getenv('NEO4J_USERNAME'),
    password=os.getenv('NEO4J_PASSWORD'),
)
print(f'Session ID: {SESSION_ID}')


## Agent Prompt and Tool Definition
The agent uses a prompt that restricts answers to only what is in the graph. The tool queries risk factors for a given company.


In [None]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert on SEC filings and company data. ALWAYS use the provided tools to answer questions. NEVER answer directly. If you don't know, say you don't know."),
    ("human", "{input}"),
])
last_query_nodes = {}
def company_query_tool(query: str) -> str:
    """
    Query Neo4j for risk factors associated with a company.
    Returns company name and a list of risk factor names and IDs.
    Also saves the involved node IDs for logging.
    """
    cypher = """
    MATCH (c:Company)-[:FACES_RISK]->(r:RiskFactor)
    WHERE toLower(c.name) CONTAINS toLower($query)
    RETURN elementId(c) AS company_id, c.name AS company_name, collect({id: elementId(r), text: r.name}) AS risks
    LIMIT 1
    """
    results = graph.query(cypher, params={"query": query})
    if results:
        company_id = results[0]['company_id']
        company_name = results[0]['company_name']
        risks = results[0]['risks']
        risk_lines = "\n".join([f"- {r['text']} (id: {r['id']})" for r in risks])
        global last_query_nodes
        last_query_nodes = {
            "company_id": company_id,
            "risk_ids": [r['id'] for r in risks]
        }
        return f"Company: {company_name} (id: {company_id})\nRisk Factors:\n{risk_lines}"
    last_query_nodes.clear()
    return "No matching company or risk factors found."


## Logging: Store Session, Message, and Relationships
Each message is stored as a node and linked to the session. Messages are chained in order by the `:NEXT` relationship.


In [None]:
def log_query_nodes_to_neo4j(session_id, question, answer, company_id, risk_ids):
    """
    Log the session, message, and involved nodes to Neo4j.
    Also chains messages in order via :NEXT relationship.
    """
    cypher = """
    MERGE (s:Session {id: $session_id})
    WITH s
    OPTIONAL MATCH (s)-[:HAS_MESSAGE]->(prev:Message)
    WITH s, prev
    ORDER BY prev.timestamp DESC
    LIMIT 1
    CREATE (m:Message {question: $question, answer: $answer, timestamp: datetime()})
    MERGE (s)-[:HAS_MESSAGE]->(m)
    FOREACH (_ IN CASE WHEN prev IS NOT NULL THEN [1] ELSE [] END |
        MERGE (prev)-[:NEXT]->(m)
    )
    WITH m
    MATCH (c) WHERE elementId(c) = $company_id
    MERGE (m)-[:INVOLVES_COMPANY]->(c)
    WITH m
    UNWIND $risk_ids AS rid
    MATCH (r) WHERE elementId(r) = rid
    MERGE (m)-[:INVOLVES_RISK]->(r)
    """
    graph.query(
        cypher,
        params={
            "session_id": session_id,
            "question": question,
            "answer": answer,
            "company_id": company_id,
            "risk_ids": risk_ids,
        }
    )


## Agent, Tool, and Memory Setup: Detailed Explanation

This section configures the conversational agent using LangChain, Neo4j, and OpenAI.  
Below is a breakdown of each component, its purpose, and links to official documentation or source code.

- **Tool.from_function**  
  Registers a custom Python function as a tool the agent can use.  
  [Docs: Custom Tools](https://python.langchain.com/docs/modules/agents/tools/custom_tools/)

- **tools**  
  A list of all tools available to the agent.  
  [Docs: Tools for Agents](https://python.langchain.com/docs/modules/agents/tools/)

- **hub.pull("hwchase17/react-chat")**  
  Loads a high-quality, community prompt template for the ReAct agent from the [LangChain Hub](https://python.langchain.com/docs/hub/).  
  [Prompt Example](https://smith.langchain.com/hub/hwchase17/react-chat)

- **create_react_agent**  
  Creates a ReAct-style agent that can use tools, reason, and answer questions.  
  [Docs: ReAct Agent](https://python.langchain.com/docs/modules/agents/agent_types/react/)

- **AgentExecutor**  
  Wraps the agent and tools into an executable interface.  
  [Docs: AgentExecutor](https://python.langchain.com/docs/modules/agents/agent_executor/)

- **Neo4jChatMessageHistory**  
  Provides persistent, session-based chat memory in Neo4j.  
  [Docs: Message History](https://python.langchain.com/docs/modules/memory/message_history/)

- **RunnableWithMessageHistory**  
  Wraps the agent to provide context-aware responses using message history.  
  [Docs: RunnableWithMessageHistory](https://python.langchain.com/docs/modules/memory/message_history/)

**Summary Table**

| Component                      | Purpose/Role                                         | Docs/Source                                                                                      |
|---------------------------------|-----------------------------------------------------|--------------------------------------------------------------------------------------------------|
| Tool.from_function              | Register function as a tool for the agent           | [Docs](https://python.langchain.com/docs/modules/agents/tools/custom_tools/)                     |
| tools                           | List of agent tools                                 | [Docs](https://python.langchain.com/docs/modules/agents/tools/)                                  |
| hub.pull                        | Load community prompt template                      | [Docs](https://python.langchain.com/docs/hub/) [Prompt](https://smith.langchain.com/hub/hwchase17/react-chat) |
| create_react_agent              | Create a ReAct agent                                | [Docs](https://python.langchain.com/docs/modules/agents/agent_types/react/)                      |
| AgentExecutor                   | Run agent and tools                                 | [Docs](https://python.langchain.com/docs/modules/agents/agent_executor/)                         |
| Neo4jChatMessageHistory         | Store/retrieve session chat history in Neo4j        | [Docs](https://python.langchain.com/docs/modules/memory/message_history/)                        |
| RunnableWithMessageHistory      | Add memory to agent for context-aware responses     | [Docs](https://python.langchain.com/docs/modules/memory/message_history/)                        |


In [None]:
company_tool = Tool.from_function(
    name="Company Info",
    description="Query the graph for company risk factors or information. Input is a company name or question.",
    func=company_query_tool,
)
tools = [company_tool]
agent_prompt = hub.pull("hwchase17/react-chat")
agent = create_react_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, handle_parsing_errors=True)

def get_memory(session_id):
    return Neo4jChatMessageHistory(session_id=session_id, graph=graph)

chat_agent = RunnableWithMessageHistory(
    agent_executor,
    get_memory,
    input_messages_key="input",
    history_messages_key="chat_history",
)


## Conversation Handling and Logging

This section defines how user questions are sent to the agent, how responses are handled, and how the conversation is logged both in Python and in Neo4j.

### `conversation = []`
- This is a Python list that stores the full history of the current session in-memory.
- Each entry is a dictionary with `"question"` and `"response"` keys, preserving the order of interaction.
- Useful for quickly reviewing the conversation in the notebook, independent of Neo4j.

### `def ask_agent(question):`
This function is the main interface for interacting with the agent. Here’s what happens step-by-step:

1. **Send the Question to the Agent**
    - Calls `chat_agent.invoke(...)` with the user’s question and the current session ID.
    - The agent uses the full message history for context-aware responses.

2. **Capture the Answer**
    - Extracts the agent’s answer from the response.

3. **Append to Local Conversation**
    - Adds a dictionary with the question and response to the `conversation` list.
    - This allows you to review the session in the notebook.

4. **Log to Neo4j**
    - If `last_query_nodes` exists (i.e., the agent used the company tool and found relevant nodes), the function:
        - Calls `log_query_nodes_to_neo4j` to:
            - Store the session, message, and relationships (company, risk factors) in Neo4j.
            - Chain messages in order via the `:NEXT` relationship.

5. **Print and Return the Answer**
    - Prints the agent’s answer for immediate feedback in the notebook.
    - Returns the answer so it can be used in further processing if needed.

---

**Why is this design useful?**
- **Session persistence:** All questions and answers are logged in Neo4j, so the full conversation (and the order of messages) can be reconstructed at any time.
- **Node traceability:** Each message is linked to the specific company and risk factor nodes it referenced, enabling downstream graph analysis.
- **Notebook review:** The `conversation` list allows you to see the session history without querying Neo4j.
- **Reproducibility:** Every run is tracked and can be audited or replayed.

**Example usage:**
```python
ask_agent("What are the risk factors for Apple?")
ask_agent("What about Microsoft?")
print(conversation)


In [None]:
conversation = []
def ask_agent(question):
    """
    Ask the agent a question. Logs the message and involved nodes in Neo4j.
    Maintains message order within the session.
    """
    response = chat_agent.invoke(
        {"input": question},
        {"configurable": {"session_id": SESSION_ID}},
    )
    answer = response["output"]
    conversation.append({"question": question, "response": answer})
    if 'last_query_nodes' in globals() and last_query_nodes:
        log_query_nodes_to_neo4j(
            SESSION_ID,
            question,
            answer,
            last_query_nodes.get("company_id"),
            last_query_nodes.get("risk_ids", [])
        )
    pprint(answer)   
    return 


---
**Usage:**
- Use `ask_agent("your question")` to interact with the agent.
- Each message is logged and chained in order for the session.
- You can query Neo4j to reconstruct the full, ordered conversation for any session.


In [None]:
ask_agent("What are the risk factors associated with Apple?")

In [None]:
ask_agent("What are the risk factors associated with microsoft?")

In [None]:
ask_agent("What are the risk factors are shared by Microsoft and Apple?")

## Analyzing Conversation Messages in Neo4j

With every message logged in Neo4j (and chained by session), you can perform rich analyses, such as:

- **Conversation statistics:** How many messages per session? What is the average/median message length?
- **Most discussed companies:** Which companies are most frequently referenced?
- **Risk factor trends:** Which risk factors are most often discussed?
- **Session timelines:** How does conversation flow over time?
- **Message chains:** How are messages ordered and connected?

Below are some example analyses and queries you can use.

The following code cells use the Neo4j Python driver (via the LangChain `Neo4jGraph` object `graph`) to run Cypher queries and analyze the conversation/message data.

In [None]:
# Count the number of messages in each session
cypher = """
MATCH (s:Session)-[:HAS_MESSAGE]->(m:Message)
RETURN s.id AS session_id, count(m) AS message_count
ORDER BY message_count DESC
"""
results = graph.query(cypher)
for row in results:
    print(f"Session {row['session_id']} has {row['message_count']} messages")

In [None]:
# Find the most mentioned companies across all messages
cypher = """
MATCH (m:Message)-[:INVOLVES_COMPANY]->(c:Company)
RETURN c.name AS company, count(m) AS mentions
ORDER BY mentions DESC
LIMIT 10
"""
results = graph.query(cypher)
print("Top 10 most mentioned companies:")
for row in results:
    print(f"{row['company']}: {row['mentions']} mentions")

In [None]:
# Find the most mentioned risk factors
cypher = """
MATCH (m:Message)-[:INVOLVES_RISK]->(r:RiskFactor)
RETURN r.name AS risk_factor, count(m) AS mentions
ORDER BY mentions DESC
LIMIT 10
"""
results = graph.query(cypher)
print("Top 10 most mentioned risk factors:")
for row in results:
    print(f"{row['risk_factor']}: {row['mentions']} mentions")