# Create MCP Toolbox Configuration for Podcast Episodes
This notebook creates the necessary tools and resources such that this workflow can be replicated in other clients via MCP.

Specifically, this will create the YAML configuration for an [MCP Toolbox server](https://neo4j.com/blog/developer/ai-agents-gen-ai-toolbox/). We will include tools for searching and analyzing podcast episode data from the Neo4j graph database.

Think of these as expert tools - pre-defined query templates with descriptions that can take in optional query parameters. We can later combine this with more generic MCP servers such as [mcp-neo4j-cypher](https://github.com/neo4j-contrib/mcp-neo4j/tree/main/servers/mcp-neo4j-cypher) that allow agents to write and execute their own Cypher queries based on the graph schema and user input.


In [18]:
import getpass
import os
from dotenv import load_dotenv

#get env setup
load_dotenv('podcast-gds.env', override=True)

if not os.environ.get('NEO4J_URI'):
    os.environ['NEO4J_URI'] = getpass.getpass('NEO4J_URI:\n')
if not os.environ.get('NEO4J_USERNAME'):
    os.environ['NEO4J_USERNAME'] = getpass.getpass('NEO4J_USERNAME:\n')
if not os.environ.get('NEO4J_PASSWORD'):
    os.environ['NEO4J_PASSWORD'] = getpass.getpass('NEO4J_PASSWORD:\n')

NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')


## Create Meta Context Node
We are actually going to save our former system prompt in the form of a "MetaContext" node. This allows us to pull and manage source of truth inside our database


In [19]:
meta_context = f"""
This knowledge graph, and corresponding tools, provide all the information you need to act as a podcast episode assistant who helps with finding and analyzing podcast episodes from the Data Engineering Podcast.

Corresponding tools retrieve data from internal knowledge on podcast episodes based on their topics, people, concepts, technologies, and reference links.

Try to prioritize expert tools (those other than `read_neo4j_cypher`) as appropriate since they have expert approved logic for accessing data. Though you may need to directly access data afterwards to pull more details.

When you need more flexible logic for aggregations, follow-up or anything else, you can access the knowledge (in a graph database) directly. ALWAYS get the schema first with `get_schema` and keep it in memory. Only use node labels, relationship types, and property names, and patterns in that schema to generate valid Cypher queries using the `read_neo4j_cypher` tool with proper parameter syntax ($parameter). If you get errors or empty results check the schema and try again at least up to 3 times.

Also never return embedding properties in Cypher queries. This will result in delays and errors.

When responding to the user:
- if your response includes episodes, include their names, numbers, and links. Never just their IDs.
- You must explain your retrieval logic and where the data came from. You must say exactly how relevance, similarity, etc. was inferred during search

Use information from previous queries when possible instead of asking the user again.

The graph contains the following node types:
- Episode: Individual podcast episodes with name, number, link, and description
- Topic: Topics discussed in episodes
- Person: People who appear in episodes (hosts, guests, listeners)
- Concept: Concepts covered in episodes
- Technology: Technologies mentioned in episodes
- ReferenceLink: Reference links from episodes
- Chunk: Transcript chunks from episodes

The graph contains the following relationship types:
- HAS_TOPIC: Episode -> Topic
- COVERED_BY_EPISODE: Topic -> Episode
- COVERS_CONCEPT: Topic -> Concept
- COVERS_TECHNOLOGY: Topic -> Technology
- HAS_REFERENCE_LINK: Episode -> ReferenceLink
- HAS_CHUNK: Episode -> Chunk
- BELONGS_TO_EPISODE: Chunk -> Episode
- Various person relationships: IS_A_HOST, IS_A_GUEST, LISTENS_TO_EPISODE, etc.
"""


In [20]:
from neo4j import GraphDatabase

#instantiate driver
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

#test neo4j connection
driver.execute_query("""
MERGE(n:__MetaContext__ {version:1, useCase:'podcastEpisodeAssistant'})
SET n.context = $context
RETURN n
""", context=meta_context)


EagerResult(records=[<Record n=<Node element_id='4:3cd7c5fe-3740-49ae-8afd-e31e0760f712:28' labels=frozenset({'__MetaContext__'}) properties={'useCase': 'podcastEpisodeAssistant', 'context': '\nThis knowledge graph, and corresponding tools, provide all the information you need to act as a podcast episode assistant who helps with finding and analyzing podcast episodes from the Data Engineering Podcast.\n\nCorresponding tools retrieve data from internal knowledge on podcast episodes based on their topics, people, concepts, technologies, and reference links.\n\nTry to prioritize expert tools (those other than `read_neo4j_cypher`) as appropriate since they have expert approved logic for accessing data. Though you may need to directly access data afterwards to pull more details.\n\nWhen you need more flexible logic for aggregations, follow-up or anything else, you can access the knowledge (in a graph database) directly. ALWAYS get the schema first with `get_schema` and keep it in memory. On

## Create MCP ToolBox Definitions
This will create the YAML configuration for an [MCP Toolbox server](https://neo4j.com/blog/developer/ai-agents-gen-ai-toolbox/) so we can expose the podcast episode tools to any MCP client.


In [21]:
# Create YAML configuration for podcast episode tools
tool_yaml = f"""
sources:
    podcast-episode-graph:
        kind: "neo4j"
        uri: "{NEO4J_URI}"
        user: "{NEO4J_USERNAME}"
        password: "{NEO4J_PASSWORD}"
    cortex-os-mentalmodel:
        kind: http
        baseUrl: "http://localhost:8000"
tools:

  get_context:
    kind: neo4j-cypher
    source: podcast-episode-graph
    statement: |
        MATCH(n:__MetaContext__ {{version:1, useCase:'podcastEpisodeAssistant'}})
        RETURN n.context AS context
    description: |
        Gets the context for how to use & access podcast episode data. Always run this first and store in your memory.
    parameters: []

  get_tool_statistics:
    kind: neo4j-cypher
    source: podcast-episode-graph
    statement: |
        MATCH (e:Episode)
        OPTIONAL MATCH (e)-[:HAS_TOPIC]->(t:Topic)
        OPTIONAL MATCH (e)-[:HAS_REFERENCE_LINK]->(r:ReferenceLink)
        OPTIONAL MATCH (e)-[:HAS_CHUNK]->(c:Chunk)
        RETURN count(DISTINCT e) AS total_episodes,
               count(DISTINCT t) AS total_topics,
               count(DISTINCT r) AS total_reference_links,
               count(DISTINCT c) AS total_chunks
    description: |
        Get statistics about episodes in the database.
        Returns counts of episodes, topics, reference links, and transcript chunks.
    parameters: []

  find_episodes_by_people:
    kind: neo4j-cypher
    source: podcast-episode-graph
    statement: |
        MATCH (p:Person)-[r]-(e:Episode)
        WHERE toLower(p.name) CONTAINS toLower($question)
        RETURN DISTINCT p.name AS person_name,
               type(r) AS relationship_type,
               e.name AS episode_name,
               e.number AS episode_number,
               e.link AS episode_link,
               $question AS matched_term
        ORDER BY e.number DESC
        LIMIT 10
    description: |
        Search for episodes that feature specific people (hosts, guests, or listeners).
        Searches for people whose names contain the given question string
        and returns all episodes where they appear, along with their relationship type
        to the episode (e.g., IS_A_HOST, IS_A_GUEST, LISTENS_TO_EPISODE, etc.).
    parameters:
      - name: question
        type: string
        description: The name or partial name of the person to search for. Case-insensitive search that matches any part of the person's name.

  find_episodes_by_concept:
    kind: neo4j-cypher
    source: podcast-episode-graph
    statement: |
        MATCH (e:Episode)-[:HAS_TOPIC]->(t:Topic)-[:COVERS_CONCEPT]->(c:Concept)
        WHERE toLower(c.name) CONTAINS toLower($question) OR 
              toLower(c.description) CONTAINS toLower($question)
        RETURN DISTINCT e.name AS episode_name,
               e.number AS episode_number,
               e.link AS episode_link,
               t.name AS topic_name,
               c.name AS concept_name,
               c.description AS concept_description,
               $question AS matched_term
        ORDER BY e.number DESC
        LIMIT 10
    description: |
        Search for episodes that discuss specific concepts or ideas.
        Performs a case-insensitive search on both concept names and descriptions
        to find relevant episodes.
    parameters:
      - name: question
        type: string
        description: The concept name or description to search for. Can be a single word or phrase.

  find_episodes_by_topic:
    kind: neo4j-cypher
    source: podcast-episode-graph
    statement: |
        MATCH (e:Episode)-[:HAS_TOPIC]->(t:Topic)
        WHERE toLower(t.name) CONTAINS toLower($question) OR 
              toLower(e.name) CONTAINS toLower($question) OR
              toLower(e.description) CONTAINS toLower($question)
        RETURN DISTINCT e.name AS episode_name,
               e.number AS episode_number,
               e.link AS episode_link,
               e.description AS description,
               collect(t.name) AS topics,
               $question AS matched_term
        ORDER BY e.number DESC
        LIMIT 10
    description: |
        Search for episodes that contain specific topics or keywords.
        Performs a case-insensitive search across episode names, descriptions,
        and topic names to find episodes that match the given question.
    parameters:
      - name: question
        type: string
        description: The search term to look for in topics, episode names, or descriptions. Can be a single word or phrase.

  find_episodes_by_technology:
    kind: neo4j-cypher
    source: podcast-episode-graph
    statement: |
        MATCH (e:Episode)-[:HAS_TOPIC]->(t:Topic)-[:COVERS_TECHNOLOGY]->(tech:Technology)
        WHERE toLower(tech.name) CONTAINS toLower($question)
        RETURN DISTINCT e.name AS episode_name,
               e.number AS episode_number,
               e.link AS episode_link,
               t.name AS topic_name,
               tech.name AS technology_name,
               $question AS matched_term
        ORDER BY e.number DESC
        LIMIT 10
    description: |
        Search for episodes that discuss specific technologies or tools.
        Performs a case-insensitive search on technology names to find relevant episodes
        that discuss or mention the technology.
    parameters:
      - name: question
        type: string
        description: The technology name to search for. Can be a single word or phrase.

  find_episodes_by_reference:
    kind: neo4j-cypher
    source: podcast-episode-graph
    statement: |
        MATCH (e:Episode)-[:HAS_REFERENCE_LINK]->(r:ReferenceLink)
        WHERE toLower(r.url) CONTAINS toLower($reference_string) OR 
              toLower(r.text) CONTAINS toLower($reference_string)
        RETURN e.name AS episode_name,
               e.number AS episode_number,
               e.link AS episode_link,
               e.description AS description,
               r.url AS reference_url,
               r.text AS reference_text,
               $reference_string AS matched_term
        ORDER BY e.number DESC
    description: |
        Find episodes that have reference links containing the input string.
        Searches for episodes that are connected to reference links
        through the HAS_REFERENCE_LINK relationship, where the reference URL or text
        contains the provided string. It performs a case-insensitive search.
    parameters:
      - name: reference_string
        type: string
        description: String to search for in reference URLs or text.

  find_episodes_by_mentions:
    kind: neo4j-cypher
    source: podcast-episode-graph
    statement: |
        MATCH (e:Episode)-[:HAS_REFERENCE_LINK]->(r:ReferenceLink)
        WHERE toLower(r.url) CONTAINS toLower($search_terms) OR 
              toLower(r.text) CONTAINS toLower($search_terms)
        RETURN e.name AS episode_name,
               e.number AS episode_number,
               e.link AS episode_link,
               e.description AS description,
               r.url AS reference_url,
               r.text AS reference_text,
               $search_terms AS matched_term
        ORDER BY e.number DESC
    description: |
        Find episodes that mention the input search term in their reference links.
        Performs a case-insensitive search on reference URLs and text to find relevant episodes.
        Returns episodes with the matched reference link and which search term was matched.
    parameters:
      - name: search_terms
        type: string
        description: Search term to look for in reference links. Can be a URL, partial URL, or keyword that might appear in reference text.

  search_episodes_gds_by_question_tool:
    kind: http
    source: cortex-os-mentalmodel
    method: POST
    path: "/search_episodes_gds_by_question_tool"
    description: |
        Extended search that combines vector search seed episodes with precomputed KNN similarities.
        Accepts a natural-language question and internally generates embeddings.
        Parameters: question (string), k (integer, default: 5), limit (integer, default: 10).
        Note: Requires cortex-os-mentalmodel HTTP server to be running on localhost:8000.
    requestBody: |
        {{
            "question": "{{{{.question}}}}",
            "k": {{{{.k}}}},
            "limit": {{{{.limit}}}}
        }}
    bodyParams:
      - name: question
        description: The user's natural-language question.
        type: string
      - name: k
        description: Number of nearest neighbor chunks to retrieve.
        type: integer
      - name: limit
        description: Total number of rows to return.
        type: integer
"""

with open('podcast-episode-tools.yaml', 'w') as file:
    file.write(tool_yaml)
print("podcast-episode-tools.yaml file created successfully.")


podcast-episode-tools.yaml file created successfully.


## Deploy MCP Toolbox
You can do this locally or, if you have a GCP account, deploy remotely via Cloud Run.

For local deployment see `deploy-toolbox-local.sh`. For remote GCP see `deploy-toolbox-gcp.sh`.

Optional: Once the Toolbox MCP server is running you can test with [MCP Inspector](https://modelcontextprotocol.io/legacy/tools/inspector).
