# ðŸ““ The GenAI Revolution Cookbook

**Title:** How to Build a Knowledge Graph Chatbot with Neo4j, Chainlit, GPT-4o

**Description:** Ship a Python knowledge graph chatbot using Neo4j, Chainlit, and GPT-4oâ€”auto-generate Cypher, visualize results, and answer complex data questions accurately.

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



Knowledge graphs excel at modeling complex relationships. But querying them usually requires Cypher. That creates a barrier for non\-technical users. This tutorial shows you how to build a production\-ready chatbot that translates natural language questions into Cypher queries. It executes them against Neo4j and returns both tabular results and interactive visualizations. Youâ€™ll wire together GPT\-4\.1 for query generation, Neo4j for graph storage, CrewAI for agent orchestration, and Chainlit for a conversational UI. By the end, youâ€™ll have a working system that handles multi\-hop reasoning, enforces read\-only safety, and gracefully manages errors. This solves the common pain points of LLM hallucinations, brittle SQL generation, and UI plumbing. This build is fully runnable in a Colab notebook, with incremental validation at each step.

We will use the same graph that you built in a previous article. If you need a refresher on constructing the initial knowledge graph and embedding pipeline, you can revisit our [step\-by\-step guide to building a Knowledge Graph RAG system with Neo4j and embeddings](/article/how-to-build-a-knowledge-graph-rag-pipeline-with-neo4j-embeddings-2).

---

## Why This Approach Works

**Graphs over tables.** Relational databases struggle with multi\-hop queries like friends\-of\-friends or recommendation paths. Neo4jâ€™s native graph traversal makes those queries fast and expressive.

**GPT\-4\.1 for Cypher generation.** GPT\-4\.1 has strong language understanding and produces reliable structured output when given schema context and examples. This removes the need for hand\-coded query templates. It adapts well if your schema evolves.

**CrewAI for orchestration.** CrewAI offers a lightweight agent framework. It manages tool invocation and conversation flow without forcing you to build a custom orchestration layer. For a deeper dive into orchestrating multi\-agent systems, see our [tutorial on building multi\-agent AI systems with CrewAI and YAML](/article/how-to-build-multi-agent-ai-systems-with-crewai-and-yaml-2).

**Chainlit for UI.** Chainlit gives you a chat interface with minimal boilerplate. It supports streaming, tables, and charts out of the box. You focus on backend logic, not frontend plumbing.

**Read\-only enforcement.** By restricting Cypher to MATCH, RETURN, and CALL, and using a read\-only Neo4j role, you prevent accidental or malicious writes. That makes the system safe for production use.

---

## Setup \& Installation

Run this cell to install all dependencies with pinned versions:

In [1]:
%pip install -q neomodel crewai[tools] openai chainlit python-dotenv plotly pandas requests==2.32.4

Next, make sure you've got all required environment variables set before moving forward.

In [2]:
import os
from dotenv import load_dotenv
from google.colab import userdata # Import userdata

required_keys = ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD", "OPENAI_API_KEY", "OPENAI_MODEL"]

env_file_path = '.env'

# Check for missing keys in userdata first
missing_in_userdata = [k for k in required_keys if not userdata.get(k)]
if missing_in_userdata:
    raise EnvironmentError(
        f"Missing required environment variables in Colab User Secrets (userdata): {', '.join(missing_in_userdata)}\n"
        "Please add them to Colab's 'Secrets' tab and re-run this cell."
    )

# Create or update .env file with values from userdata
print(f"Creating/updating {env_file_path} with values from Colab User Secrets.")
with open(env_file_path, 'w') as f:
    for key in required_keys:
        value = userdata.get(key)
        f.write(f"{key}={value}\n")

# Load environment variables from .env file
load_dotenv(dotenv_path=env_file_path)

# Final check to ensure all required variables are loaded into the environment
missing_after_load = [k for k in required_keys if not os.getenv(k)]
if missing_after_load:
    raise EnvironmentError(f"Failed to load required environment variables from {env_file_path}: {', '.join(missing_after_load)}")

print(f"All required environment variables found and loaded from {env_file_path}.")

Creating/updating .env with values from Colab User Secrets.
All required environment variables found and loaded from .env.


---

## Step\-by\-Step Implementation

### Step 1: Connect to Neo4j and Validate

Create the directory structure and write the Neo4j connection module. This module normalizes the URI, initializes the connection, and provides a test function:

In [3]:
!mkdir -p tools utils config

In [4]:
%%writefile tools/query_knowledge_graph.py
import os
from typing import Any, Dict, List, Tuple, Optional
from neomodel import db, config as neo_config
from dotenv import load_dotenv
from urllib.parse import unquote, urlparse # Import unquote and urlparse for URL decoding and parsing
from neo4j import GraphDatabase # Import GraphDatabase driver

# Load environment variables from .env
load_dotenv()

required_keys = ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"]
missing = [k for k in required_keys if not os.getenv(k)]
if missing:
    raise EnvironmentError(
        f"Missing required environment variables: {', '.join(missing)}\n"
        "Please set them before running the application." # Now refers to .env
    )

print("All required Neo4j environment variables found.")

def _normalize_neo4j_uri(uri: str) -> str:
    print(f"DEBUG: _normalize_neo4j_uri - Original URI: {uri}")
    # First, URL-decode the URI to handle any encoded characters like '%3A'
    decoded_uri = unquote(uri)
    print(f"DEBUG: _normalize_neo4j_uri - Decoded URI: {decoded_uri}")

    # Parse the URI components
    parsed_uri = urlparse(decoded_uri)
    hostname = parsed_uri.hostname
    # Ensure port is always present, default to 7687 if not specified
    port = parsed_uri.port if parsed_uri.port else 7687

    # Construct the final bolt URI without credentials in the URL, as per new neo4j driver requirements
    final_bolt_uri = f"bolt://{hostname}:{port}"
    print(f"DEBUG: _normalize_neo4j_uri - Final canonical Bolt URI without credentials: {final_bolt_uri}")
    return final_bolt_uri


def init_neo4j_connection() -> None:
    uri = os.getenv("NEO4J_URI")
    user = os.getenv("NEO4J_USERNAME")
    pwd = os.getenv("NEO4J_PASSWORD")

    print(f"DEBUG: init_neo4j_connection - NEO4J_URI (from env): {uri}")
    print(f"DEBUG: init_neo4j_connection - NEO4J_USERNAME (from env): {user}")
    # print(f"DEBUG: init_neo4j_connection - NEO4J_PASSWORD (from env): {pwd[0]}... (hidden)") # Don't print full password

    if not uri or not user or not pwd:
        raise RuntimeError("Missing Neo4j credentials.")

    # Determine if SSL is needed from the original URI
    is_encrypted = uri.startswith("neo4j+s://") # Use original URI to check for +s
    print(f"DEBUG: init_neo4j_connection - Is encrypted connection: {is_encrypted}")

    # Normalize URI to bolt://host:port format (without credentials)
    clean_bolt_uri = _normalize_neo4j_uri(uri)
    print(f"DEBUG: init_neo4j_connection - Clean Bolt URI for neomodel and driver: {clean_bolt_uri}")

    # Set DATABASE_URL with clean URI for neomodel's internal parsing
    neo_config.DATABASE_URL = clean_bolt_uri
    neo_config.MAX_POOL_SIZE = 20
    # These are not strictly necessary if driver is passed, but keeping for clarity
    neo_config.NEO4J_USERNAME = user
    neo_config.NEO4J_PASSWORD = pwd
    neo_config.ENCRYPTED = is_encrypted

    # Explicitly create the neo4j driver, passing auth separately
    driver = GraphDatabase.driver(clean_bolt_uri, auth=(user, pwd), encrypted=is_encrypted)

    # Pass the instantiated driver to neomodel's set_connection
    db.set_connection(driver=driver)
    print("Neo4j connection initialized.")


def test_connection() -> Tuple[List[Tuple[Any]], List[Dict[str, Any]]]:
    return db.cypher_query("RETURN 'Connection successful' AS message")


def safe_init() -> None:
    try:
        init_neo4j_connection()
        res, _ = test_connection()
        print(f"Neo4j: {res[0][0]}")
    except Exception as e:
        raise RuntimeError(f"Neo4j connection failed: {e}")


def run_cypher(cypher: str, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
    cypher_lower = cypher.strip().lower()
    unsafe_keywords = ["create", "merge", "delete", "set", "remove", "detach"]
    if any(kw in cypher_lower for kw in unsafe_keywords):
        return {"error": "Unsafe Cypher detected. Only read-only queries allowed.", "columns": [], "rows": [], "data": []}

    try:
        results, columns = db.cypher_query(cypher, params or {}) # Corrected: meta is now columns
        rows = [list(r) for r in results]
        data = [dict(zip(columns, row)) for row in rows]
        return {"columns": columns, "rows": rows, "data": data}
    except Exception as e:
        print(f"Cypher execution error: {e}")
        return {"error": str(e), "columns": [], "rows": [], "data": []}


def query_knowledge_graph(nl_query: str, cypher: str, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
    result = run_cypher(cypher, params)
    result["nl_query"] = nl_query
    result["cypher"] = cypher
    return result

Overwriting tools/query_knowledge_graph.py


Test the connection:

In [5]:
from tools.query_knowledge_graph import safe_init, run_cypher

safe_init()
print(run_cypher("RETURN 1 AS one"))
print(run_cypher("MATCH (n) RETURN count(n) AS nodes LIMIT 1"))

All required Neo4j environment variables found.
DEBUG: init_neo4j_connection - NEO4J_URI (from env): neo4j+s://02fd3f4e.databases.neo4j.io
DEBUG: init_neo4j_connection - NEO4J_USERNAME (from env): neo4j
DEBUG: init_neo4j_connection - Is encrypted connection: True
DEBUG: _normalize_neo4j_uri - Original URI: neo4j+s://02fd3f4e.databases.neo4j.io
DEBUG: _normalize_neo4j_uri - Decoded URI: neo4j+s://02fd3f4e.databases.neo4j.io
DEBUG: _normalize_neo4j_uri - Final canonical Bolt URI without credentials: bolt://02fd3f4e.databases.neo4j.io:7687
DEBUG: init_neo4j_connection - Clean Bolt URI for neomodel and driver: bolt://02fd3f4e.databases.neo4j.io:7687
Neo4j connection initialized.
Neo4j: Connection successful
{'columns': ['one'], 'rows': [[1]], 'data': [{'one': 1}]}
{'columns': ['nodes'], 'rows': [[86]], 'data': [{'nodes': 86}]}


Expected output: `{'columns': ['one'], 'rows': [[1]], 'data': [{'one': 1}]}` and a node count.

---

### Step 2: Define Agent and Task Configuration

Write the agent and task YAML files. The agent uses GPT\-4\.1 and the knowledge graph tool to generate and execute Cypher queries:

In [6]:
%%writefile config/agents.yaml
cypher_agent:
  role: "Knowledge Graph Query Specialist"
  goal: "Generate accurate Cypher queries from natural language and return results."
  backstory: |
    You are an expert in Neo4j and Cypher. Given a schema and a question, you produce a valid Cypher query.
    You always return the query in a

Writing config/agents.yaml


cypher code block, followed by any chart configuration in a

```json
block if visualization is requested.
```

In [8]:
%%writefile config/tasks.yaml
generate_and_execute_query:
  description: |
    Given the question: {question}
    And the schema: {schema}
    Generate a Cypher query to answer it. Return the query in a block with chart config (type, x, y, title).
  expected_output: |
    A block with chart config.
  agent: cypher_agent

Overwriting config/tasks.yaml


---

### Step 3: Build the CrewAI Agent and Tool

Create a utility to load YAML configuration:

In [9]:
%%writefile utils/yaml_loader.py
import yaml
from pathlib import Path

def load_yaml(file_path: str):
    with open(Path(file_path), "r") as f:
        return yaml.safe_load(f)

Writing utils/yaml_loader.py


Define the CrewAI tool wrapper for the knowledge graph query function:

In [10]:
%%writefile tools/__init__.py
from crewai_tools import tool
from .query_knowledge_graph import query_knowledge_graph

@tool("Query Knowledge Graph")
def query_knowledge_graph_tool(nl_query: str, cypher: str) -> dict:
    """
    Execute a Cypher query against the Neo4j knowledge graph.
    Args:
        nl_query (str): The natural language question.
        cypher (str): The Cypher query to execute.
    Returns:
        dict: Query results with columns, rows, and data.
    """
    return query_knowledge_graph(nl_query, cypher)

Writing tools/__init__.py


Assemble the agent and task. This cell creates the agent, attaches the tool, and defines the task:

In [11]:
%%writefile agent_setup.py
import os
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
from tools import query_knowledge_graph_tool
from utils.yaml_loader import load_yaml
from dotenv import load_dotenv

# Load environment variables from .env
load_dotenv()

agents_config = load_yaml("config/agents.yaml")
tasks_config = load_yaml("config/tasks.yaml")

llm = ChatOpenAI(model=os.getenv("OPENAI_MODEL"), temperature=0, openai_api_key=os.getenv("OPENAI_API_KEY"))

cypher_agent = Agent(
    config=agents_config["cypher_agent"],
    tools=[query_knowledge_graph_tool],
    llm=llm,
    verbose=True
)

def create_query_task(question: str, schema: str) -> Task:
    return Task(
        config=tasks_config["generate_and_execute_query"],
        agent=cypher_agent,
        context={"question": question, "schema": schema}
    )

Writing agent_setup.py


---

### Step 4: Parse Agent Output and Execute Tool

Write a parser to extract Cypher and chart JSON from the agentâ€™s response:

In [12]:
%%writefile utils/parser.py
import re
import json
from typing import Optional, Tuple

def extract_cypher(text: str) -> Optional[str]:
    match = re.search(r"", text, re.DOTALL | re.IGNORECASE)
    return match.group(1).strip() if match else None

def extract_chart_config(text: str) -> Optional[dict]:
    match = re.search(r"

Writing utils/parser.py


cypher\s+(.*?)\s+

In [None]:
", text, re.DOTALL | re.IGNORECASE)
    return match.group(1).strip() if match else None

def extract_chart_config(text: str) -> Optional[dict]:
    match = re.search(r"

json\s+(.*?)\s+

In [None]:
", text, re.DOTALL | re.IGNORECASE)
    if match:
        try:
            return json.loads(match.group(1).strip())
        except json.JSONDecodeError:
            return None
    return None

Test the agent and parser with a sample question:

In [None]:
from agent_setup import cypher_agent, create_query_task
from crewai import Crew
from utils.parser import extract_cypher, extract_chart_config

schema = """
Nodes: Person(name, age), Movie(title, year)
Relationships: (Person)-[:ACTED_IN]->(Movie), (Person)-[:DIRECTED]->(Movie)
"""

question = "Who acted in movies released after 2010?"
task = create_query_task(question, schema)
crew = Crew(agents=[cypher_agent], tasks=[task], verbose=True)

result = crew.kickoff()
print("Agent output:", result)

cypher = extract_cypher(str(result))
chart_config = extract_chart_config(str(result))
print("Extracted Cypher:", cypher)
print("Chart config:", chart_config)

Expected output: A Cypher query like `MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.year > 2010 RETURN p.name, m.title` and optionally a chart config JSON.

---

### Step 5: Wire the Chainlit UI

Create the Chainlit app. This app maintains conversation history, sends questions to the agent, extracts and executes Cypher, and renders results as tables and charts:

In [None]:
%%writefile chat.py
import os
import chainlit as cl
from crewai import Crew
from agent_setup import cypher_agent, create_query_task
from tools.query_knowledge_graph import query_knowledge_graph, safe_init
from utils.parser import extract_cypher, extract_chart_config
import plotly.graph_objects as go
import pandas as pd

safe_init()

SCHEMA = """
Nodes: Person(name, age), Movie(title, year)
Relationships: (Person)-[:ACTED_IN]->(Movie), (Person)-[:DIRECTED]->(Movie)
"""

@cl.on_chat_start
async def start():
    cl.user_session.set("history", [])
    await cl.Message(content="Ask me anything about the knowledge graph!").send()

@cl.on_message
async def main(message: cl.Message):
    question = message.content
    history = cl.user_session.get("history")
    history.append({"role": "user", "content": question})

    task = create_query_task(question, SCHEMA)
    crew = Crew(agents=[cypher_agent], tasks=[task], verbose=False)

    try:
        result = crew.kickoff()
        agent_output = str(result)
        history.append({"role": "assistant", "content": agent_output})
        cl.user_session.set("history", history[-10:])

        cypher = extract_cypher(agent_output)
        chart_config = extract_chart_config(agent_output)

        if not cypher:
            await cl.Message(content="Could not extract a valid Cypher query.").send()
            return

        query_result = query_knowledge_graph(question, cypher)

        if "error" in query_result:
            await cl.Message(content=f"Query error: {query_result['error']}").send()
            return

        df = pd.DataFrame(query_result["data"])
        table_md = df.to_markdown(index=False)
        await cl.Message(content=f"**Results:**\n\n{table_md}").send()

        if chart_config and not df.empty:
            chart_type = chart_config.get("type", "bar")
            x_col = chart_config.get("x")
            y_col = chart_config.get("y")
            title = chart_config.get("title", "Chart")

            if x_col in df.columns and y_col in df.columns:
                if chart_type == "bar":
                    fig = go.Figure(data=[go.Bar(x=df[x_col], y=df[y_col])])
                elif chart_type == "line":
                    fig = go.Figure(data=[go.Scatter(x=df[x_col], y=df[y_col], mode='lines+markers')])
                else:
                    fig = go.Figure(data=[go.Bar(x=df[x_col], y=df[y_col])])

                fig.update_layout(title=title, xaxis_title=x_col, yaxis_title=y_col)
                await cl.Message(content="", elements=[cl.Plotly(name="chart", figure=fig)]).send()

    except Exception as e:
        await cl.Message(content=f"Error: {str(e)}").send()

Run the Chainlit app. In Colab, you can use a tunnel service like ngrok to expose the app publicly:

In [None]:
!chainlit run chat.py --host 0.0.0.0 --port 8000

For local development, simply run `chainlit run chat.py` and open the provided URL.

---

### Step 6: Validate End\-to\-End

Test the system with a sample question. This cell simulates the agent workflow without the UI:

In [None]:
from agent_setup import cypher_agent, create_query_task
from crewai import Crew
from tools.query_knowledge_graph import query_knowledge_graph
from utils.parser import extract_cypher

schema = """
Nodes: Person(name, age), Movie(title, year)
Relationships: (Person)-[:ACTED_IN]->(Movie)
"""

question = "List all actors and their movies"
task = create_query_task(question, schema)
crew = Crew(agents=[cypher_agent], tasks=[task], verbose=False)

result = crew.kickoff()
cypher = extract_cypher(str(result))
print("Generated Cypher:", cypher)

if cypher:
    query_result = query_knowledge_graph(question, cypher)
    print("Query result:", query_result)

Expected output: A Cypher query and a result dictionary with `columns`, `rows`, and `data`.

---

## Production Considerations

**Read\-only enforcement.** Use a Neo4j role with read\-only permissions. The `run_cypher` function already rejects unsafe keywords. Check your Neo4j user setup to enforce this.

**Error handling and retries.** Wrap OpenAI and Neo4j calls in retry logic with exponential backoff. Use libraries like `tenacity` to handle transient 429 or network errors.

**Token and cost control.** Limit conversation history to the last 10 messages. Prune or summarize older context. Monitor token usage per request. Set budget alerts in your OpenAI dashboard.

**Logging and observability.** Add correlation IDs to each request. Log Cypher queries, parameters, execution time, and errors. Use structured logging (JSON) for easy parsing and alerting.

**Schema versioning.** Store schema definitions in version control. Update prompts when the schema changes. Test queries against a staging graph before you deploy to production.

**Caching.** Cache normalized Cypher queries with a hash key. Return saved results for repeated questions. Use an LRU cache with TTL. For deeper optimization, consider adding semantic caching with Redis Vector to reuse similar queries using embeddings.

**Testing.** Write unit tests for `run_cypher`, `extract_cypher`, and `extract_chart_config`. Create integration tests that validate end\-to\-end flows with seed data. Assert expected column names, row counts, and chart configs.

For more guidance on prompt reliability and deterministic outputsâ€”especially when using LLMs for structured tasksâ€”see our [guide to prompt engineering with LLM APIs](/article/prompt-engineering-with-llm-apis-how-to-get-reliable-outputs-3). If your use case involves extracting structured data or building pipelines that require zero hallucinations, our [structured data extraction with LLMs pipeline tutorial](/article/structured-data-extraction-with-llms-how-to-build-a-pipeline-3) provides practical patterns and code.

---

## Conclusion and Next Steps

Youâ€™ve built a knowledge graph chatbot that translates natural language into Cypher, executes queries safely, and visualizes results. The system uses GPT\-4\.1 for query generation, CrewAI for orchestration, and Chainlit for a conversational UI. Youâ€™ve implemented read\-only enforcement, error handling, and incremental validation at each step.

**Next steps:**

* **Swap LLMs.** You can replace GPT\-4\.1 with another model such as Claude or Llama by changing the `llm` parameter in `agent_setup.py`. Test prompt compatibility and adjust schema examples if needed. For a hands\-on walkthrough of building LLM agents from scratch, including reasoning and action patterns, check out our [tutorial on building an LLM agent with GPT\-4 ReAct](/article/how-to-build-an-llm-agent-from-scratch-with-gpt-4-react-5).
* **Add RAG for schema context.** If your schema is large or dynamic, embed schema documentation and retrieve relevant subsets before query generation. See the [guide to structured data extraction with LLMs](/article/structured-data-extraction-with-llms-how-to-build-a-pipeline-3) for best practices on deterministic outputs.
* **Integrate with external APIs.** Extend the agent with additional tools (for example REST API calls or database lookups) to enrich query results or trigger actions based on graph insights.
* **Deploy to production.** Containerize the app with Docker. Set up environment\-specific configs. Deploy to a cloud platform like AWS, GCP, or Azure. Use a reverse proxy such as nginx and enable HTTPS.
* **Build stateful workflows.** For more complex agent behaviors, explore [building a stateful AI agent with LangGraph](/article/how-to-build-a-stateful-ai-agent-with-langgraph-step-by-step-5) to manage multi\-turn conversations and conditional logic.

Begin!