# Graph Construction & Retrieval
This notebook demonstrates how constructing a simple starter knowledge graph from documents can help with agent retrieval tools and improve responses.

We can think through What type of data model we want given our use case questions:
![](img/data-model-concept.png)

We can flesh that out a bit and target the below schema. This has a natural-language like data model that will properly connect across things and concepts and help surface important relationships for our agent.

![](img/graph-data-model.png)

document extraction is actually accomplished in [extract-resumes-to-people.py](extract-resumes-to-people.py) which stages structured people with skills and accomplishments in the [extracted-people-data.json](extracted-people-data.json) file. This is done for convenience for the workshop to avoid throttling OpenAI requests.



In [1]:
#get env setup
import getpass
import os
from dotenv import load_dotenv

#get env setup
load_dotenv('nb.env', override=True)

if not os.environ.get('NEO4J_URI'):
    os.environ['NEO4J_URI'] = getpass.getpass('NEO4J_URI:\n')
if not os.environ.get('NEO4J_USERNAME'):
    os.environ['NEO4J_USERNAME'] = getpass.getpass('NEO4J_USERNAME:\n')
if not os.environ.get('NEO4J_PASSWORD'):
    os.environ['NEO4J_PASSWORD'] = getpass.getpass('NEO4J_PASSWORD:\n')

NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')

## Graph Construction

In [2]:
import json
from person import Person, SkillName

#read json models back
with open('extracted-people-data.json', 'r') as file:
    people_json = json.load(file)
people = [Person(**person) for person in people_json]
people[0]

Person(id='MhzMrjwz', name='Robert Johnson', email='robert.johnson@email.com', current_title='Security Engineer', department=<Department.ENGINEERING: 'Engineering'>, level=<Level.SENIOR: 'Senior'>, hire_date=None, skills=[HasSkill(skill=Skill(name=<SkillName.PYTHON: 'Python'>), proficiency=4, years_experience=4, context='Programming for security automation and scripting', is_primary=False), HasSkill(skill=Skill(name=<SkillName.AWS: 'AWS'>), proficiency=3, years_experience=3, context='Cloud security architecture and compliance', is_primary=False)], accomplishments=[Accomplishment(type=<AccomplishmentType.BUILT: 'BUILT'>, thing=Thing(name='security_monitoring_system_MhzMrjwz', type=<WorkType.SYSTEM: 'SYSTEM'>, domain=<Domain.SECURITY: 'SECURITY'>), impact_description='Detected and prevented 95% of attempted cyber attacks', year=2022, role='Senior Security Engineer', duration=None, team_size=None, context='CyberDefense Corp'), Accomplishment(type=<AccomplishmentType.BUILT: 'BUILT'>, thing

In [3]:
from neo4j import GraphDatabase

# load into People nodes in Neo4j

#instantiate driver
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

#test neo4j connection
driver.execute_query("MATCH(n) RETURN count(n)")

EagerResult(records=[<Record count(n)=30>], summary=<neo4j._work.summary.ResultSummary object at 0x1054cebd0>, keys=['count(n)'])

In [4]:
from neo4j import RoutingControl

#create uniqueness constraint if not exists
driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:Person) REQUIRE (n.id) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)

driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:Skill) REQUIRE (n.name) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)

driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:Thing) REQUIRE (n.name) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)

driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:Domain) REQUIRE (n.name) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)

driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:WorkType) REQUIRE (n.name) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)


EagerResult(records=[], summary=<neo4j._work.summary.ResultSummary object at 0x1239bc350>, keys=[])

In [5]:
# merge people
def chunks(xs, n=10):
    n = max(1, n)
    return [xs[i:i + n] for i in range(0, len(xs), n)]

for chunk in chunks(people_json):
    records = driver.execute_query(
        """
        UNWIND $records AS rec
        MERGE(person:Person {id:rec.id})
        SET person.name = rec.name,
            person.email = rec.email,
            person.current_title = rec.current_title,
            person.department = rec.department,
            person.level = rec.level,
            person.years_experience = rec.years_experience,
            person.location = rec.location
        RETURN count(rec) AS records_upserted
        """,
        #database_=DATABASE,
        routing_=RoutingControl.WRITE,
        result_transformer_= lambda r: r.data(),
        records = chunk
    )
    print(records)

[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]


In [6]:
# merge accomplishments
skills = []
accomplishments = []
for person in people_json:

    # extend skills list
    tmp_skills = person['skills'].copy()
    for skill in tmp_skills:
        skill['personId'] = person['id']
    skills.extend(tmp_skills)

    # extend accomplishments list
    tmp_accomplishments = person['accomplishments'].copy()
    for accomplishment in tmp_accomplishments:
        accomplishment['personId'] = person['id']
    accomplishments.extend(tmp_accomplishments)



In [7]:
skills[:3]

[{'skill': {'name': 'Python'},
  'proficiency': 4,
  'years_experience': 4,
  'context': 'Programming for security automation and scripting',
  'is_primary': False,
  'personId': 'MhzMrjwz'},
 {'skill': {'name': 'AWS'},
  'proficiency': 3,
  'years_experience': 3,
  'context': 'Cloud security architecture and compliance',
  'is_primary': False,
  'personId': 'MhzMrjwz'},
 {'skill': {'name': 'Swift'},
  'proficiency': 2,
  'years_experience': 1,
  'context': 'Used for developing banking mobile app and iOS applications during internship and bootcamp.',
  'is_primary': True,
  'personId': '5BiANRmk'}]

In [8]:
accomplishments[:2]

[{'type': 'BUILT',
  'thing': {'name': 'security_monitoring_system_MhzMrjwz',
   'type': 'SYSTEM',
   'domain': 'SECURITY'},
  'impact_description': 'Detected and prevented 95% of attempted cyber attacks',
  'year': 2022,
  'role': 'Senior Security Engineer',
  'duration': None,
  'team_size': None,
  'context': 'CyberDefense Corp',
  'personId': 'MhzMrjwz'},
 {'type': 'BUILT',
  'thing': {'name': 'zero_trust_authentication_system_MhzMrjwz',
   'type': 'SYSTEM',
   'domain': 'SECURITY'},
  'impact_description': 'Implemented for 10,000+ employees using modern identity protocols',
  'year': 2022,
  'role': 'Senior Security Engineer',
  'duration': None,
  'team_size': None,
  'context': 'CyberDefense Corp',
  'personId': 'MhzMrjwz'}]

In [9]:
for chunk in chunks(skills):
    records = driver.execute_query(
        """
        UNWIND $records AS rec
        MATCH(person:Person {id:rec.personId})
        MERGE(skill:Skill {name:rec.skill.name})
        MERGE(person)-[r:KNOWS]->(skill)
        SET r.proficiency = rec.proficiency,
            r.years_experience = rec.years_experience,
            r.context  = rec.context,
            r.is_primary = rec.is_primary
        RETURN count(rec) AS records_upserted
        """,
        #database_=DATABASE,
        routing_=RoutingControl.WRITE,
        result_transformer_= lambda r: r.data(),
        records = chunk
    )
    print(records)

[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 2}]


In [10]:
for chunk in chunks(accomplishments):
    records = driver.execute_query(
        """
        UNWIND $records AS rec

        //match people
        MATCH(person:Person {id:rec.personId})

        //merge accomplishments
        MERGE(thing:Thing {name:rec.thing.name})
        MERGE(person)-[r:$(rec.type)]->(thing)
        SET r.impact_description = rec.impact_description,
            r.year = rec.year,
            r.role  = rec.role,
            r.duration = rec.duration,
            r.team_size = rec.team_size,
            r.context  = rec.context

        //merge domain and work type
        MERGE(Domain:Domain {name:rec.thing.domain})
        MERGE(thing)-[:IN]->(Domain)
        MERGE(WorkType:WorkType {name:rec.thing.type})
        MERGE(thing)-[:OF]->(WorkType)

        RETURN count(rec) AS records_upserted
        """,
        #database_=DATABASE,
        routing_=RoutingControl.WRITE,
        result_transformer_= lambda r: r.data(),
        records = chunk
    )
    print(records)

[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 1}]


### Result

![](img/graph-sample-after-ner.png)

## Agent With Text Cypher Retrieval Via MCP

In [11]:
from AgentRunner import AgentRunner
# build adk agent with neo4j mcp
from person import Domain, WorkType, SkillName
from google.adk.models.lite_llm import LiteLlm
from google.adk.agents import Agent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, StdioServerParameters

database_agent = Agent(
    name="graph_database_agent",
    # model="gemini-2.0-flash-exp",
    model=LiteLlm(model="openai/gpt-4.1"),
    # model=LiteLlm(model="anthropic/claude-sonnet-4-20250514"),
    description="""
    Agent to access knowledge graph stored in graph database
    """,
    instruction=f"""
    You are a human resources assistant who helps with skills analysis, talent search, and team formation at Cyberdyne Systems.

    You can access the knowledge (in a graph database) on Cyberdyne System employees based on their resume and profiles directly. ALWAYS get the schema first with `get_schema` and keep it in memory. Only use node labels, relationship types, and property names, and patterns in that schema to generate valid Cypher queries using the `read_neo4j_cypher` tool with proper parameter syntax ($parameter). If you get errors or empty results check the schema and try again at least up to 3 times.

    For domain knowledge, use these standard values:
    - Domains: {[i.value for i in Domain]}
    - Work Types: {[i.value for i in WorkType]}
    - Skills: {[i.value for i in SkillName]}

    Also never return embedding properties in Cypher queries. This will result in delays and errors.

    When responding to the user:
    - if your response includes people, include there names and IDs. Never just there Ids.
    - You must explain your retrieval logic and where the data came from. You must say exactly how relevance, similarity, etc. was inferred during search.

    Use information from previous queries when possible instead of asking the user again.
    """,
    tools=[MCPToolset(
        connection_params=StdioServerParameters(
            command='uvx',
            args=[
                "mcp-neo4j-cypher",
            ],
            env={ k: os.environ[k] for k in ["NEO4J_URI","NEO4J_USERNAME","NEO4J_PASSWORD"] }
        ),
        tool_filter=['get_neo4j_schema','read_neo4j_cypher']
    )]
)

db_agent_runner = AgentRunner(app_name='db_agent', user_id='Mr. Ed', agent=database_agent)
await db_agent_runner.start_session()

Session started successfully with ID: 7f58e263-8b3e-4142-8428-d4e7ea40b7ee


True

### How many Python developers

In [12]:
res = await db_agent_runner.run("How many Python developers do I have?")

None id='call_lsQ8WF1Dm79JVfRDPBwfbLw9' args={} name='get_neo4j_schema' None
None None will_continue=None scheduling=None id='call_lsQ8WF1Dm79JVfRDPBwfbLw9' name='get_neo4j_schema' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"label": "Person", "attributes": {"id": "STRING indexed", "current_title": "STRING", "text": "STRING", "level": "STRING", "location": "STRING", "email": "STRING", "department": "STRING", "name": "STRING", "years_experience": "INTEGER", "embedding": "LIST"}, "relationships": {"BUILT": "Thing", "WON": "Thing", "SHIPPED": "Thing", "KNOWS": "Skill", "PUBLISHED": "Thing", "OPTIMIZED": "Thing", "LED": "Thing", "MANAGED": "Thing"}}, {"label": "Skill", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "Thing", "attributes": {"name": "STRING indexed"}, "relationships": {"IN": "Domain", "OF": "WorkType"}}, {"label": "Domain", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "WorkType

In [13]:
from IPython.core.display import Markdown

display(Markdown(res))

There are 28 Python developers at Cyberdyne Systems.

Retrieval logic: I queried employees who have a direct relationship (KNOWS) to the skill "Python" in the company's knowledge graph. This means these individuals have Python explicitly listed in their skills, ensuring relevance and accuracy from their profile data.

### Who is most similar to Lucas Martinez

In [14]:
res = await db_agent_runner.run("Who is most similar to Lucas Martinez and why?")

None id='call_9Brk00YRccBVAVIJnnjI1QDR' args={'query': 'MATCH (p:Person {name: "Lucas Martinez"})-[:KNOWS]->(s:Skill)\nWITH p, collect(s.name) as lucas_skills\nMATCH (other:Person)-[:KNOWS]->(otherSkill:Skill)\nWHERE other.name <> "Lucas Martinez"\nWITH other, lucas_skills, collect(otherSkill.name) as other_skills\nWITH other, apoc.coll.intersection(lucas_skills, other_skills) as shared_skills, lucas_skills, other_skills\nWITH other, size(shared_skills) as num_shared, size(lucas_skills) as lucas_count, size(other_skills) as other_count, shared_skills\nORDER BY num_shared DESC, other.name ASC\nRETURN other.name AS similar_person_name, other.id AS similar_person_id, num_shared, lucas_count, other_count, shared_skills LIMIT 1'} name='read_neo4j_cypher' None
None None will_continue=None scheduling=None id='call_9Brk00YRccBVAVIJnnjI1QDR' name='read_neo4j_cypher' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"similar_person_name": "Sarah Chen", "simi

In [15]:
display(Markdown(res))

The most similar person to Lucas Martinez is Sarah Chen (ID: xRPBlhk9).

Reasoning and explanation:
- I searched for people in the knowledge graph who are not Lucas Martinez and compared their skill sets.
- I calculated similarity based on the number of overlapping skills (using set intersection).
- Sarah Chen has the highest number of shared skills with Lucas Martinez—6 in total: Python, Natural Language Processing, Docker, Machine Learning, AWS, and Computer Vision.
- Lucas has 11 skills, Sarah has 9, and they overlap on 6, making the overlap significant.

This inference is directly from the database by finding whose skills most closely match Lucas's, measuring similarity by the count of shared skills.

### Summarize my technical talent and skills distribution

In [16]:
await db_agent_runner.restart_session()

Session 7f58e263-8b3e-4142-8428-d4e7ea40b7ee ended successfully
Session started successfully with ID: 408e68a0-0af4-4062-b719-3429e34616ff


In [17]:
res = await db_agent_runner.run("Summarize my technical talent and skills distribution?")

None id='call_jHxuxwo0d8gyzqfIwO2X7nYV' args={} name='get_neo4j_schema' None
None None will_continue=None scheduling=None id='call_jHxuxwo0d8gyzqfIwO2X7nYV' name='get_neo4j_schema' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"label": "Person", "attributes": {"id": "STRING indexed", "current_title": "STRING", "text": "STRING", "level": "STRING", "location": "STRING", "email": "STRING", "department": "STRING", "name": "STRING", "years_experience": "INTEGER", "embedding": "LIST"}, "relationships": {"BUILT": "Thing", "WON": "Thing", "SHIPPED": "Thing", "KNOWS": "Skill", "PUBLISHED": "Thing", "OPTIMIZED": "Thing", "LED": "Thing", "MANAGED": "Thing"}}, {"label": "Skill", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "Thing", "attributes": {"name": "STRING indexed"}, "relationships": {"IN": "Domain", "OF": "WorkType"}}, {"label": "Domain", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "WorkType

In [18]:
display(Markdown(res))

Summary of Technical Talent and Skills Distribution

I analyzed the skills distribution among all technical employees at Cyberdyne Systems by counting the number of people who "KNOW" each skill. This was found in the knowledge graph, which directly connects Person nodes with their respective Skill nodes.

Main findings:
- The most common skills are programming languages and cloud infrastructure: Python (28 people), SQL (16), AWS (15).
- Leadership and collaborative skills are also highly present: Team Management (14), Project Management (11), Leadership (6).
- Data and AI skills are well-represented: Machine Learning (11), Data Engineering (8), Computer Vision (7), Deep Learning (6), Natural Language Processing (6), TensorFlow (6).
- DevOps skills like Docker (10) and Kubernetes (9) are common, indicating readiness for modern deployment practices.
- Frontend and backend technologies such as Java (6), JavaScript (6), React (5), Go (4), and Node.js (2) show full-stack capabilities.

The data came from direct relationship counts in the graph database, reflecting how many employees have indicated each skill in their profiles. This provides a high-quality, accurate measure of talent concentration in each technical area, based on explicit skill declarations.

If you'd like a breakdown by department, level, or experience range, let me know!

In [19]:
await db_agent_runner.end_session()

Session 408e68a0-0af4-4062-b719-3429e34616ff ended successfully


True

## Adding Expert Tools

### Tool - Finding Similar People
![](img/similarity-score-via-graph-traversal.png)

In [20]:
def find_similar_people(person_id: str):
    """
    This function will return potential similar people to the provided person based on common skill and types and domains of accomplishments.  You can use this as a starting point to find similarities scores. But should use follow up tools and queries to collect more info.
    :param person_id: the id of the person to search for similarities for
    :return: a list of person ids for similar candidates order by score which is the count of common skill and types and domains of accomplishments.
    """
    res = driver.execute_query(
        '''
        MATCH p=(p1:Person {id:$personId})--()
                 ((:!Person)--() ){0,3}
                 (p2:Person)
        RETURN count(*) AS score, p2.id AS person_id
        ORDER BY score DESC LIMIT $limit
         ''',
        personId=person_id,
        limit=5, #just hard code for now
        result_transformer_= lambda r: r.data())

    return res


find_similar_people("3ffr8dYb")

[{'score': 77, 'person_id': 'eOIAxtcB'},
 {'score': 54, 'person_id': '3ffr8dYb'},
 {'score': 37, 'person_id': '8wvf1psS'},
 {'score': 31, 'person_id': 'LUUCJ14S'},
 {'score': 31, 'person_id': 'Q1ZkhCBu'}]

### Tool - Find Similarities Between Two People

![](img/similarity-between-people.png)

In [21]:
def find_similarities_between_people(person1_id: str, person2_id: str):
    """
    This function will return potential similarities between people in the form of skill and accomplishment paths.  You can use this as a starting point to find similarities and query the graph further using the various name fields and person ids.
    :param person1_id: the id of the first person to compare
    :param person2_id: the id of the second person to compare
    :return: a list of paths between the two people, each path is a compact ascii string representation.  It should reflect the patterns in the graph schema
    """
    res = driver.execute_query(
        '''
        MATCH p=(p1:Person {id:$person1_id})--()
                 ((:!Person)--() ){0,3}
                 (p2:Person{id:$person2_id})
        WITH p, nodes(p) as path_nodes, relationships(p) as path_rels, p1, p2
        RETURN
          "(" + labels(path_nodes[0])[0] + " {name: \\"" + path_nodes[0].name + "\\" id: \\"" + path_nodes[0].id + "\\"})" +
          reduce(chain = "", i IN range(0, size(path_rels)-1) |
            chain +
            "-[" + type(path_rels[i]) + "]-" +
            "(" + labels(path_nodes[i+1])[0] + " {name: \\"" + path_nodes[i+1].name +
            CASE WHEN "Person" IN labels(path_nodes[i+1])
                 THEN "\\" id: \\"" + path_nodes[i+1].id +"\\""
                 ELSE "\\"" END + "})"
          ) as paths ORDER BY p1.id, p2.id
         ''',
        person1_id=person1_id,
        person2_id=person2_id,
        result_transformer_= lambda r: r.values())

    return res


find_similarities_between_people("3ffr8dYb", "5RGDw14z")


[['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[KNOWS]-(Skill {name: "Python"})-[KNOWS]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[KNOWS]-(Skill {name: "Java"})-[KNOWS]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[KNOWS]-(Skill {name: "Data Engineering"})-[KNOWS]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[KNOWS]-(Skill {name: "Team Management"})-[KNOWS]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[OPTIMIZED]-(Thing {name: "trading_db_systems_3ffr8dYb"})-[OF]-(WorkType {name: "SYSTEM"})-[OF]-(Thing {name: "disaster_recovery_system_5RGDw14z"})-[BUILT]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[BUILT]-(Thing {name: "db_monitoring_system_3ffr8dYb"})-[OF]-(WorkType {name: "SYSTEM"})-[OF]-(Thing {name: "disaster_recovery_

In [22]:
def get_person_resume(person_id: str):
    """
    Gets the full resume of a person
    :param person_id: the id of the person
    :return: resume text and person name
    """
    res = driver.execute_query(
        '''
        MATCH (n:Person {id: $personId})
        RETURN n.text as resume, n.name AS name
         ''',
        personId=person_id,
        result_transformer_= lambda r: r.data())

    return res

get_person_resume("3ffr8dYb")

[{'resume': 'Kai Wong\nDatabase Performance Engineer\nEmail: kai.wong@email.com\nLocation: Hong Kong\nExperience: 7 years\nProfessional Summary\nDatabase performance specialist with 7 years experience optimizing high-scale database systems.\nExpert in SQL optimization, distributed databases, and Python automation.\nProfessional Experience\nSenior Database Performance Engineer | Financial Trading Platform | 2021 - Present\n- Optimized trading database systems handling 1M+ transactions per second using advanced SQL\ntechniques\n- Built database monitoring system using Python detecting performance issues before customer impact\n- Led database engineering team of 5 optimizing distributed PostgreSQL clusters\nDatabase Engineer | E-commerce Platform | 2019 - 2021\n- Implemented database sharding strategy supporting 100x user growth using PostgreSQL and Python\n- Developed automated database backup and recovery system achieving 99.99% data durability\n- Built query optimization framework redu

In [23]:
def get_person_name(person_id: str):
    """
    Gets a person name given their id
    :param person_id: the unique id of the person
    :return: person name
    """
    res = driver.execute_query(
        '''
        MATCH (n:Person {id: $personId})
        RETURN n.name
         ''',
        personId=person_id,
        result_transformer_= lambda r: r.values())

    return res

get_person_name("3ffr8dYb")

[['Kai Wong']]

In [24]:
def get_person_ids_from_name(person_name: str):
    """
    Gets all the unique people ids who have the provided name
    :param person_name: the name to look up person ids with
    :return: person ids that can be used for other tools and queries.  Note that names aren't guaranteed to be unique so you may get more than one person.
    """
    res = driver.execute_query(
        '''
        MATCH (n:Person {name: $personName})
        RETURN n.id
         ''',
        personName=person_name,
        result_transformer_= lambda r: r.values())

    return res

get_person_ids_from_name("Kai Wong")

[['3ffr8dYb']]

In [25]:
talent_agent = Agent(
    name="talent_agent",
    # model="gemini-2.0-flash-exp",
    model=LiteLlm(model="openai/gpt-4.1"),
    # model=LiteLlm(model="anthropic/claude-sonnet-4-20250514"),
    description="""
    Knowledge assistant for skills analysis, search, and team formation
    """,
    instruction=f"""
    You are a human resources assistant who helps with skills analysis, talent search, and team formation at Cyberdyne Systems.

    Your tools retrieve data from internal knowledge on Cyberdyne System employees based on their resume and profiles.

    Try to prioritize expert tools (those other than `read_neo4j_cypher`) as appropriate since they have expert approved logic for access data. Though you may need to directly access data afterwards to pull more details.


    When you need more flexible logic for aggregations, follow-up or anything else, you can access the knowledge (in a graph database) directly. ALWAYS get the schema first with `get_schema` and keep it in memory. Only use node labels, relationship types, and property names, and patterns in that schema to generate valid Cypher queries using the `read_neo4j_cypher` tool with proper parameter syntax ($parameter). If you get errors or empty results check the schema and try again at least up to 3 times.

    For domain knowledge, use these standard values:
    - Domains: {[i.value for i in Domain]}
    - Work Types: {[i.value for i in WorkType]}
    - Skills: {[i.value for i in SkillName]}

    Also never return embedding properties in Cypher queries. This will result in delays and errors.

    When responding to the user:
    - if your response includes people, include there names and IDs. Never just there Ids.
    - You must explain your retrieval logic and where the data came from. You must say exactly how relevance, similarity, etc. was inferred during search

    Use information from previous queries when possible instead of asking the user again.
    """,
    tools=[find_similar_people,
           find_similarities_between_people,
           get_person_name,
           get_person_resume,
           get_person_ids_from_name,
           MCPToolset(
        connection_params=StdioServerParameters(
            command='uvx',
            args=[
                "mcp-neo4j-cypher",
            ],
            env={ k: os.environ[k] for k in ["NEO4J_URI","NEO4J_USERNAME","NEO4J_PASSWORD"] }
        ),
        tool_filter=['get_neo4j_schema','read_neo4j_cypher']
    )]
)

talent_agent_runner = AgentRunner(app_name='talent_agent', user_id='Mr. Ed', agent=talent_agent)
await talent_agent_runner.start_session()

Session started successfully with ID: fbfd1558-1534-4b48-a57a-3ddc2f82ba85


True

### How many Python developers

In [26]:
res = await talent_agent_runner.run("How many Python developers do I have?")

None id='call_p5gWePtmlr2uIhS8lLlK3TiC' args={} name='get_neo4j_schema' None
None None will_continue=None scheduling=None id='call_p5gWePtmlr2uIhS8lLlK3TiC' name='get_neo4j_schema' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"label": "Person", "attributes": {"id": "STRING indexed", "current_title": "STRING", "text": "STRING", "level": "STRING", "location": "STRING", "email": "STRING", "department": "STRING", "name": "STRING", "years_experience": "INTEGER", "embedding": "LIST"}, "relationships": {"BUILT": "Thing", "WON": "Thing", "SHIPPED": "Thing", "KNOWS": "Skill", "PUBLISHED": "Thing", "OPTIMIZED": "Thing", "LED": "Thing", "MANAGED": "Thing"}}, {"label": "Skill", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "Thing", "attributes": {"name": "STRING indexed"}, "relationships": {"IN": "Domain", "OF": "WorkType"}}, {"label": "Domain", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "WorkType

In [27]:
display(Markdown(res))

There are 28 Python developers at Cyberdyne Systems.

Retrieval logic:
- I queried the internal graph database for all people (Person nodes) with a relationship (KNOWS) to the Python skill (Skill node with name "Python"). 
- Counting these distinct people returns the number of Python developers.
- Data is based on explicit skill relationships in employee profiles.

### Who is most similar to Lucas Martinez

In [28]:
res = await talent_agent_runner.run("Who is most similar to Lucas Martinez and why?")

None id='call_mqN1jYk0jHT8uZJt3myX2w9j' args={'person_name': 'Lucas Martinez'} name='get_person_ids_from_name' None
None None will_continue=None scheduling=None id='call_mqN1jYk0jHT8uZJt3myX2w9j' name='get_person_ids_from_name' response={'result': [['HowfM0O2']]}
None id='call_w2VwSGukUnoS5sTzfSop0eAs' args={'person_id': 'HowfM0O2'} name='find_similar_people' None
None None will_continue=None scheduling=None id='call_w2VwSGukUnoS5sTzfSop0eAs' name='find_similar_people' response={'result': [{'score': 72, 'person_id': 'LUUCJ14S'}, {'score': 66, 'person_id': 'MpQCrNqA'}, {'score': 61, 'person_id': '8hvI9MCT'}, {'score': 58, 'person_id': 'Yvhy6A21'}, {'score': 55, 'person_id': 'Y7Dbiku6'}]}
None id='call_7zurgIGneu397v87dgi6C0Eo' args={'person_id': 'LUUCJ14S'} name='get_person_name' None
None id='call_vrG4TVKna6ave79BaCfEKMyN' args={'person1_id': 'HowfM0O2', 'person2_id': 'LUUCJ14S'} name='find_similarities_between_people' None
None None will_continue=None scheduling=None id='call_7zurgIGn

In [29]:
display(Markdown(res))

The person most similar to Lucas Martinez is Elena Popov (ID: LUUCJ14S).

Reason for similarity:
- Both Lucas Martinez and Elena Popov know several key skills: Python, AWS, Machine Learning, and Computer Vision.
- They have accomplishments in highly aligned domains, especially AI.
- Both have built, shipped, or published products and systems in the AI domain, such as personalized learning platforms, medical image analysis webapps, adaptive algorithms, and reinforcement learning and computer vision projects.
- Their project types are similar, involving PRODUCT and SYSTEM work types, and even research and published work.

How this was determined:
- I used an expert similarity tool that ranks people based on counts of common skills, accomplishment domains, and work types.
- Elena Popov had the highest similarity score to Lucas Martinez.
- I further confirmed this similarity by inspecting direct relationship paths, showing strong overlaps in skills and the nature of their projects within AI and Machine Learning.

All inference is based on the skills and accomplishment relationships mapped in the internal knowledge graph.

### Summarize my technical talent and skills distribution

In [30]:
await talent_agent_runner.restart_session()

Session fbfd1558-1534-4b48-a57a-3ddc2f82ba85 ended successfully
Session started successfully with ID: b352afb9-6998-4cd1-b5f1-abe409dc87ed


In [31]:
res = await talent_agent_runner.run("Summarize Cyberdyne's technical talent and skills distribution")

None id='call_sWYE2hPDLWhljPvkRT9HJgdM' args={} name='get_neo4j_schema' None
None None will_continue=None scheduling=None id='call_sWYE2hPDLWhljPvkRT9HJgdM' name='get_neo4j_schema' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"label": "Person", "attributes": {"id": "STRING indexed", "current_title": "STRING", "text": "STRING", "level": "STRING", "location": "STRING", "email": "STRING", "department": "STRING", "name": "STRING", "years_experience": "INTEGER", "embedding": "LIST"}, "relationships": {"BUILT": "Thing", "WON": "Thing", "SHIPPED": "Thing", "KNOWS": "Skill", "PUBLISHED": "Thing", "OPTIMIZED": "Thing", "LED": "Thing", "MANAGED": "Thing"}}, {"label": "Skill", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "Thing", "attributes": {"name": "STRING indexed"}, "relationships": {"IN": "Domain", "OF": "WorkType"}}, {"label": "Domain", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "WorkType

In [32]:
display(Markdown(res))

Here is a summary of Cyberdyne Systems' technical talent and skills distribution, based on an analysis of internal employee skill and domain data:

1. Most Popular Technical Skills:
   - The largest number of employees have proficiency in Python (28 people), followed by SQL (16), AWS (15), and Team Management (14).
   - Other notable in-demand skills include Machine Learning and Project Management (11 each), Docker (10), and Kubernetes (9).
   - Artificial Intelligence (AI) skills are widely distributed, with strong representation in Machine Learning, Deep Learning, Computer Vision, and Natural Language Processing.
   - Additional skills with a moderate presence include Data Engineering, Data Analysis, TensorFlow, PyTorch, Java, JavaScript, Leadership, React, Statistics, and Cloud-related technologies (Google Cloud Platform, AWS, Azure).

2. Distribution Across Technical Domains:
   - Domains with the most associated activities and accomplishments (such as delivered projects or recognized work) are:
     - AI (84 activities)
     - Analytics (24)
     - Data Engineering (21)
     - Security (20)
     - Database and Web (18 each)
     - Cloud (14), DevOps (13), Mobile (11), Microservices (10), Platform (8)

3. Breadth of Employee Skillsets:
   - The majority of employees possess between 6 and 11 technical skills (with a few having as many as 11 skill tags).
   - This suggests a workforce with both depth (some with extensive expertise) and versatility (many with broad, cross-domain capabilities).

Retrieval logic and explanation:
- The skills distribution and popularity were gathered by counting the number of employees connected to each skill in the database (via the Person-KNOWS-Skill relationship).
- Domain counts represent the number of technical accomplishments or activities (Thing nodes) linked to each top-level domain.
- The breadth of skills per employee was determined by tallying the number of unique skills each person knows.

This analysis provides a clear view of both the most common technical specializations and the organizational capacity for cross-functional work. If you'd like further breakdowns (by department, by trends over time, or for a specific subset), let me know!

In [33]:
await talent_agent_runner.end_session()

Session b352afb9-6998-4cd1-b5f1-abe409dc87ed ended successfully


True