# Graph Construction & Retrieval
This notebook demonstrates how constructing a simple starter knowledge graph from documents can help with agent retrieval tools and improve responses.

document extraction is actually accomplished in [extract-resumes-to-people.py](extract-resumes-to-people.py) which stages structured people with skills and accomplishments in the [extracted-people-data.json](extracted-people-data.json) file. This is done for convenience for the workshop to avoid throttling OpenAI requests.

We are targeting the below schema which will help surface important relationships in a symbolic manner for our agent use cases.

![](img/graph-data-model.png)



In [1]:
#get env setup
import getpass
import os
from dotenv import load_dotenv

#get env setup
load_dotenv('nb.env', override=True)

if not os.environ.get('NEO4J_URI'):
    os.environ['NEO4J_URI'] = getpass.getpass('NEO4J_URI:\n')
if not os.environ.get('NEO4J_USERNAME'):
    os.environ['NEO4J_USERNAME'] = getpass.getpass('NEO4J_USERNAME:\n')
if not os.environ.get('NEO4J_PASSWORD'):
    os.environ['NEO4J_PASSWORD'] = getpass.getpass('NEO4J_PASSWORD:\n')

NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')

## Graph Construction

In [2]:
import json
from person import Person, SkillName

#read json models back
with open('extracted-people-data.json', 'r') as file:
    people_json = json.load(file)
people = [Person(**person) for person in people_json]
people[0]

Person(id='MhzMrjwz', name='Robert Johnson', email='robert.johnson@email.com', current_title='Security Engineer', department=<Department.ENGINEERING: 'Engineering'>, level=<Level.SENIOR: 'Senior'>, hire_date=None, skills=[HasSkill(skill=Skill(name=<SkillName.PYTHON: 'Python'>), proficiency=4, years_experience=4, context='Programming for security automation and scripting', is_primary=False), HasSkill(skill=Skill(name=<SkillName.AWS: 'AWS'>), proficiency=3, years_experience=3, context='Cloud security architecture and compliance', is_primary=False)], accomplishments=[Accomplishment(type=<AccomplishmentType.BUILT: 'BUILT'>, thing=Thing(name='security_monitoring_system_MhzMrjwz', type=<WorkType.SYSTEM: 'SYSTEM'>, domain=<Domain.SECURITY: 'SECURITY'>), impact_description='Detected and prevented 95% of attempted cyber attacks', year=2022, role='Senior Security Engineer', duration=None, team_size=None, context='CyberDefense Corp'), Accomplishment(type=<AccomplishmentType.BUILT: 'BUILT'>, thing

In [3]:
from neo4j import GraphDatabase

# load into People nodes in Neo4j

#instantiate driver
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

#test neo4j connection
driver.execute_query("MATCH(n) RETURN count(n)")

EagerResult(records=[<Record count(n)=327>], summary=<neo4j._work.summary.ResultSummary object at 0x1340755d0>, keys=['count(n)'])

In [4]:
from neo4j import RoutingControl

#create uniqueness constraint if not exists
driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:Person) REQUIRE (n.id) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)

driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:Skill) REQUIRE (n.name) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)

driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:Thing) REQUIRE (n.name) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)

driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:Domain) REQUIRE (n.name) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)

driver.execute_query(
    'CREATE CONSTRAINT IF NOT EXISTS FOR (n:WorkType) REQUIRE (n.name) IS NODE KEY',
    #database_=DATABASE,
    routing_=RoutingControl.WRITE
)


EagerResult(records=[], summary=<neo4j._work.summary.ResultSummary object at 0x1340984d0>, keys=[])

In [5]:
# merge people
def chunks(xs, n=10):
    n = max(1, n)
    return [xs[i:i + n] for i in range(0, len(xs), n)]

for chunk in chunks(people_json):
    records = driver.execute_query(
        """
        UNWIND $records AS rec
        MERGE(person:Person {id:rec.id})
        SET person.name = rec.name,
            person.email = rec.email,
            person.current_title = rec.current_title,
            person.department = rec.department,
            person.level = rec.level,
            person.years_experience = rec.years_experience,
            person.location = rec.location
        RETURN count(rec) AS records_upserted
        """,
        #database_=DATABASE,
        routing_=RoutingControl.WRITE,
        result_transformer_= lambda r: r.data(),
        records = chunk
    )
    print(records)

[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]


In [6]:
# merge accomplishments
skills = []
accomplishments = []
for person in people_json:

    # extend skills list
    tmp_skills = person['skills'].copy()
    for skill in tmp_skills:
        skill['personId'] = person['id']
    skills.extend(tmp_skills)

    # extend accomplishments list
    tmp_accomplishments = person['accomplishments'].copy()
    for accomplishment in tmp_accomplishments:
        accomplishment['personId'] = person['id']
    accomplishments.extend(tmp_accomplishments)



In [7]:
skills[:3]

[{'skill': {'name': 'Python'},
  'proficiency': 4,
  'years_experience': 4,
  'context': 'Programming for security automation and scripting',
  'is_primary': False,
  'personId': 'MhzMrjwz'},
 {'skill': {'name': 'AWS'},
  'proficiency': 3,
  'years_experience': 3,
  'context': 'Cloud security architecture and compliance',
  'is_primary': False,
  'personId': 'MhzMrjwz'},
 {'skill': {'name': 'Swift'},
  'proficiency': 2,
  'years_experience': 1,
  'context': 'Used for developing banking mobile app and iOS applications during internship and bootcamp.',
  'is_primary': True,
  'personId': '5BiANRmk'}]

In [8]:
accomplishments[:2]

[{'type': 'BUILT',
  'thing': {'name': 'security_monitoring_system_MhzMrjwz',
   'type': 'SYSTEM',
   'domain': 'SECURITY'},
  'impact_description': 'Detected and prevented 95% of attempted cyber attacks',
  'year': 2022,
  'role': 'Senior Security Engineer',
  'duration': None,
  'team_size': None,
  'context': 'CyberDefense Corp',
  'personId': 'MhzMrjwz'},
 {'type': 'BUILT',
  'thing': {'name': 'zero_trust_authentication_system_MhzMrjwz',
   'type': 'SYSTEM',
   'domain': 'SECURITY'},
  'impact_description': 'Implemented for 10,000+ employees using modern identity protocols',
  'year': 2022,
  'role': 'Senior Security Engineer',
  'duration': None,
  'team_size': None,
  'context': 'CyberDefense Corp',
  'personId': 'MhzMrjwz'}]

In [9]:
for chunk in chunks(skills):
    records = driver.execute_query(
        """
        UNWIND $records AS rec
        MATCH(person:Person {id:rec.personId})
        MERGE(skill:Skill {name:rec.skill.name})
        MERGE(person)-[r:KNOWS]->(skill)
        SET r.proficiency = rec.proficiency,
            r.years_experience = rec.years_experience,
            r.context  = rec.context,
            r.is_primary = rec.is_primary
        RETURN count(rec) AS records_upserted
        """,
        #database_=DATABASE,
        routing_=RoutingControl.WRITE,
        result_transformer_= lambda r: r.data(),
        records = chunk
    )
    print(records)

[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 2}]


In [10]:
for chunk in chunks(accomplishments):
    records = driver.execute_query(
        """
        UNWIND $records AS rec

        //match people
        MATCH(person:Person {id:rec.personId})

        //merge accomplishments
        MERGE(thing:Thing {name:rec.thing.name})
        MERGE(person)-[r:$(rec.type)]->(thing)
        SET r.impact_description = rec.impact_description,
            r.year = rec.year,
            r.role  = rec.role,
            r.duration = rec.duration,
            r.team_size = rec.team_size,
            r.context  = rec.context

        //merge domain and work type
        MERGE(Domain:Domain {name:rec.thing.domain})
        MERGE(thing)-[:IN]->(Domain)
        MERGE(WorkType:WorkType {name:rec.thing.type})
        MERGE(thing)-[:OF]->(WorkType)

        RETURN count(rec) AS records_upserted
        """,
        #database_=DATABASE,
        routing_=RoutingControl.WRITE,
        result_transformer_= lambda r: r.data(),
        records = chunk
    )
    print(records)

[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 10}]
[{'records_upserted': 1}]


## Agent With Text Cypher Retrieval Via MCP

In [11]:
from AgentRunner import AgentRunner
# build adk agent with neo4j mcp
from person import Domain, WorkType, SkillName
from google.adk.models.lite_llm import LiteLlm
from google.adk.agents import Agent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, StdioServerParameters

database_agent = Agent(
    name="graph_database_agent",
    # model="gemini-2.0-flash-exp",
    model=LiteLlm(model="openai/gpt-4.1"),
    # model=LiteLlm(model="anthropic/claude-sonnet-4-20250514"),
    description="""
    Agent to access knowledge graph stored in graph database
    """,
    instruction=f"""
    You are a human resources assistant who helps with skills analysis, talent search, and team formation at Skynet.

    You can access the knowledge (in a graph database) on Skynet employees based on their resume and profiles directly. ALWAYS get the schema first with `get_schema` and keep it in memory. Only use node labels, relationship types, and property names, and patterns in that schema to generate valid Cypher queries using the `read_neo4j_cypher` tool with proper parameter syntax ($parameter). If you get errors or empty results check the schema and try again at least up to 3 times.

    For domain knowledge, use these standard values:
    - Domains: {[i.value for i in Domain]}
    - Work Types: {[i.value for i in WorkType]}
    - Skills: {[i.value for i in SkillName]}

    Also never return embedding properties in Cypher queries. This will result in delays and errors.

    When responding to the user:
    - if your response includes people, include there names and IDs. Never just there Ids.
    - You must explain your retrieval logic and where the data came from. You must say exactly how relevance, similarity, etc. was inferred during search.

    Use information from previous queries when possible instead of asking the user again.
    """,
    tools=[MCPToolset(
        connection_params=StdioServerParameters(
            command='uvx',
            args=[
                "mcp-neo4j-cypher",
            ],
            env={ k: os.environ[k] for k in ["NEO4J_URI","NEO4J_USERNAME","NEO4J_PASSWORD"] }
        ),
        tool_filter=['get_neo4j_schema','read_neo4j_cypher']
    )]
)

db_agent_runner = AgentRunner(app_name='db_agent', user_id='Mr. Ed', agent=database_agent)
await db_agent_runner.start_session()

Session started successfully with ID: 638cad48-db57-4e91-8d2b-d1e8593b5b7f


True

### How many Python developers

In [12]:
res = await db_agent_runner.run("How many Python developers do I have?")

None id='call_hYePFe0VU9L2KWXEwYd01h8T' args={} name='get_neo4j_schema' None
None None will_continue=None scheduling=None id='call_hYePFe0VU9L2KWXEwYd01h8T' name='get_neo4j_schema' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"label": "Person", "attributes": {"id": "STRING indexed", "current_title": "STRING", "text": "STRING", "level": "STRING", "location": "STRING", "email": "STRING", "department": "STRING", "name": "STRING", "years_experience": "INTEGER", "embedding": "LIST"}, "relationships": {"BUILT": "Thing", "WON": "Thing", "SHIPPED": "Thing", "KNOWS": "Skill", "PUBLISHED": "Thing", "OPTIMIZED": "Thing", "LED": "Thing", "MANAGED": "Thing"}}, {"label": "Skill", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "Thing", "attributes": {"name": "STRING indexed"}, "relationships": {"IN": "Domain", "OF": "WorkType"}}, {"label": "Domain", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "WorkType

In [13]:
from IPython.core.display import Markdown

display(Markdown(res))

You have 28 Python developers at Skynet.

Retrieval logic:
- I searched for people (Person nodes) who have a relationship (KNOWS) with the skill "Python" (Skill node) in the database.
- The result count (28) includes all people recognized as knowing Python as a skill.

This information was directly obtained by counting Person nodes linked to the Skill node named "Python," as per your skills taxonomy. Let me know if you'd like a list of their names or further filtering!

### Who is most similar to Lucas Martinez

In [14]:
res = await db_agent_runner.run("Who is most similar to Lucas Martinez and why?")

None id='call_Hq2M4koTwD8mS1SRiK2nbMpv' args={'query': 'MATCH (ref:Person {name: "Lucas Martinez"})-[:KNOWS]->(refSkill:Skill)\nWITH ref, collect(refSkill.name) AS refSkills\nMATCH (other:Person)-[:KNOWS]->(otherSkill:Skill)\nWHERE other.name <> "Lucas Martinez"\nWITH ref, refSkills, other, collect(DISTINCT otherSkill.name) AS otherSkills\nWITH other, apoc.coll.intersection(refSkills, otherSkills) AS overlap, apoc.coll.union(refSkills, otherSkills) AS unionSkills\nWITH other, size(overlap) AS intersection_size, size(unionSkills) AS union_size\nORDER BY (1.0 * intersection_size) / union_size DESC, intersection_size DESC\nLIMIT 1\nRETURN other.name AS most_similar_name, other.id AS most_similar_id, intersection_size, union_size'} name='read_neo4j_cypher' None
None None will_continue=None scheduling=None id='call_Hq2M4koTwD8mS1SRiK2nbMpv' name='read_neo4j_cypher' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"most_similar_name": "Sarah Chen", "mos

In [15]:
display(Markdown(res))

The most similar person to Lucas Martinez is Sarah Chen (ID: xRPBlhk9).

Explanation:
- The similarity was calculated by comparing the set of skills (KNOWS relationships to Skill nodes) of Lucas Martinez to other people in the database.
- The Jaccard similarity (intersection/union of skills) was used to rank similarity.
- Sarah Chen shares 6 skills with Lucas Martinez out of a combined set of 14 unique skills.

This makes Sarah Chen the most similar person to Lucas Martinez in terms of skills at Skynet. If you want to see the specific overlapping skills or get more details, let me know!

### Summarize my technical talent and skills distribution

In [16]:
await db_agent_runner.restart_session()

Session 638cad48-db57-4e91-8d2b-d1e8593b5b7f ended successfully
Session started successfully with ID: f0ec11be-57ad-4b95-8751-2e351b6477e6


In [17]:
res = await db_agent_runner.run("Summarize my technical talent and skills distribution?")

None id='call_szUkYVx1voviLmWJP8l8To7r' args={} name='get_neo4j_schema' None
None None will_continue=None scheduling=None id='call_szUkYVx1voviLmWJP8l8To7r' name='get_neo4j_schema' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"label": "Person", "attributes": {"id": "STRING indexed", "current_title": "STRING", "text": "STRING", "level": "STRING", "location": "STRING", "email": "STRING", "department": "STRING", "name": "STRING", "years_experience": "INTEGER", "embedding": "LIST"}, "relationships": {"BUILT": "Thing", "WON": "Thing", "SHIPPED": "Thing", "KNOWS": "Skill", "PUBLISHED": "Thing", "OPTIMIZED": "Thing", "LED": "Thing", "MANAGED": "Thing"}}, {"label": "Skill", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "Thing", "attributes": {"name": "STRING indexed"}, "relationships": {"IN": "Domain", "OF": "WorkType"}}, {"label": "Domain", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "WorkType

In [18]:
display(Markdown(res))

Here’s a summary of your organization's technical talent and skills distribution based on the knowledge graph:

- The data was retrieved by counting how many people (Person nodes) are connected to each skill (Skill nodes) via the KNOWS relationship in the database.
- This approach reveals which skills are most commonly held, indicating both talent concentration and potential gaps.

Top technical skills (with number of people possessing them):

1. Python (28 people)
2. SQL (16)
3. AWS (15)
4. Team Management (14)
5. Machine Learning (11)
6. Project Management (11)
7. Docker (10)
8. Kubernetes (9)
9. Data Engineering (8)
10. Computer Vision (7)
11. Deep Learning (6)
12. Natural Language Processing (6)
13. TensorFlow (6)
14. Data Analysis (6)
15. Java (6)
16. JavaScript (6)
17. Leadership (6)

Other notable skills present, but with fewer people, include Go, PyTorch, React, Google Cloud Platform, and Business Intelligence. There’s at least one person for each of Data Science, Scikit-learn, and Scala, among others.

In summary, your strongest technical skills are in Python, SQL, cloud technologies (especially AWS), containerization (Docker, Kubernetes), and machine learning/AI frameworks. You also have a notable concentration in leadership and management capabilities, showing both technical breadth and organizational expertise.

This summary was constructed by aggregating how many employees are associated with each skill, directly from the employee profiles stored in your knowledge graph. If you'd like a breakdown by department, role, or additional analysis, please specify!

In [19]:
await db_agent_runner.end_session()

Session f0ec11be-57ad-4b95-8751-2e351b6477e6 ended successfully


True

## Adding Expert Tools

In [34]:
def find_similar_people(person_id: str):
    """
    This function will return potential similar people to the provided person based on common skill and types and domains of accomplishments.  You can use this as a starting point to find similarities scores. But should use follow up tools and queries to collect more info.
    :param person_id: the id of the person to search for similarities for
    :return: a list of person ids for similar candidates order by score which is the count of common skill and types and domains of accomplishments.
    """
    res = driver.execute_query(
        '''
        MATCH p=(p1:Person {id:$personId})--()
                 ((:!Person)--() ){0,3}
                 (p2:Person)
        RETURN count(*) AS score, p2.id AS person_id
        ORDER BY score DESC LIMIT $limit
         ''',
        personId=person_id,
        limit=5, #just hard code for now
        result_transformer_= lambda r: r.data())

    return res


find_similar_people("3ffr8dYb")

[{'score': 77, 'person_id': 'eOIAxtcB'},
 {'score': 54, 'person_id': '3ffr8dYb'},
 {'score': 37, 'person_id': '8wvf1psS'},
 {'score': 31, 'person_id': 'LUUCJ14S'},
 {'score': 31, 'person_id': 'Q1ZkhCBu'}]

In [21]:
def find_similarities_between_people(person1_id: str, person2_id: str):
    """
    This function will return potential similarities between people in the form of skill and accomplishment paths.  You can use this as a starting point to find similarities and query the graph further using the various name fields and person ids.
    :param person1_id: the id of the first person to compare
    :param person2_id: the id of the second person to compare
    :return: a list of paths between the two people, each path is a compact ascii string representation.  It should reflect the patterns in the graph schema
    """
    res = driver.execute_query(
        '''
        MATCH p=(p1:Person {id:$person1_id})--()
                 ((:!Person)--() ){0,3}
                 (p2:Person{id:$person2_id})
        WITH p, nodes(p) as path_nodes, relationships(p) as path_rels, p1, p2
        RETURN
          "(" + labels(path_nodes[0])[0] + " {name: \\"" + path_nodes[0].name + "\\" id: \\"" + path_nodes[0].id + "\\"})" +
          reduce(chain = "", i IN range(0, size(path_rels)-1) |
            chain +
            "-[" + type(path_rels[i]) + "]-" +
            "(" + labels(path_nodes[i+1])[0] + " {name: \\"" + path_nodes[i+1].name +
            CASE WHEN "Person" IN labels(path_nodes[i+1])
                 THEN "\\" id: \\"" + path_nodes[i+1].id +"\\""
                 ELSE "\\"" END + "})"
          ) as paths ORDER BY p1.id, p2.id
         ''',
        person1_id=person1_id,
        person2_id=person2_id,
        result_transformer_= lambda r: r.values())

    return res


find_similarities_between_people("3ffr8dYb", "5RGDw14z")


[['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[KNOWS]-(Skill {name: "Python"})-[KNOWS]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[KNOWS]-(Skill {name: "Java"})-[KNOWS]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[KNOWS]-(Skill {name: "Data Engineering"})-[KNOWS]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[KNOWS]-(Skill {name: "Team Management"})-[KNOWS]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[OPTIMIZED]-(Thing {name: "trading_db_systems_3ffr8dYb"})-[OF]-(WorkType {name: "SYSTEM"})-[OF]-(Thing {name: "disaster_recovery_system_5RGDw14z"})-[BUILT]-(Person {name: "Benjamin Clark" id: "5RGDw14z"})'],
 ['(Person {name: "Kai Wong" id: "3ffr8dYb"})-[BUILT]-(Thing {name: "db_monitoring_system_3ffr8dYb"})-[OF]-(WorkType {name: "SYSTEM"})-[OF]-(Thing {name: "disaster_recovery_

In [22]:
def get_person_resume(person_id: str):
    """
    Gets the full resume of a person
    :param person_id: the id of the person
    :return: resume text and person name
    """
    res = driver.execute_query(
        '''
        MATCH (n:Person {id: $personId})
        RETURN n.text as resume, n.name AS name
         ''',
        personId=person_id,
        result_transformer_= lambda r: r.data())

    return res

get_person_resume("3ffr8dYb")

[{'resume': 'Kai Wong\nDatabase Performance Engineer\nEmail: kai.wong@email.com\nLocation: Hong Kong\nExperience: 7 years\nProfessional Summary\nDatabase performance specialist with 7 years experience optimizing high-scale database systems.\nExpert in SQL optimization, distributed databases, and Python automation.\nProfessional Experience\nSenior Database Performance Engineer | Financial Trading Platform | 2021 - Present\n- Optimized trading database systems handling 1M+ transactions per second using advanced SQL\ntechniques\n- Built database monitoring system using Python detecting performance issues before customer impact\n- Led database engineering team of 5 optimizing distributed PostgreSQL clusters\nDatabase Engineer | E-commerce Platform | 2019 - 2021\n- Implemented database sharding strategy supporting 100x user growth using PostgreSQL and Python\n- Developed automated database backup and recovery system achieving 99.99% data durability\n- Built query optimization framework redu

In [23]:
def get_person_name(person_id: str):
    """
    Gets a person name given their id
    :param person_id: the unique id of the person
    :return: person name
    """
    res = driver.execute_query(
        '''
        MATCH (n:Person {id: $personId})
        RETURN n.name
         ''',
        personId=person_id,
        result_transformer_= lambda r: r.values())

    return res

get_person_name("3ffr8dYb")

[['Kai Wong']]

In [24]:
def get_person_ids_from_name(person_name: str):
    """
    Gets all the unique people ids who have the provided name
    :param person_name: the name to look up person ids with
    :return: person ids that can be used for other tools and queries.  Note that names aren't guaranteed to be unique so you may get more than one person.
    """
    res = driver.execute_query(
        '''
        MATCH (n:Person {name: $personName})
        RETURN n.id
         ''',
        personName=person_name,
        result_transformer_= lambda r: r.values())

    return res

get_person_ids_from_name("Kai Wong")

[['3ffr8dYb']]

In [25]:
talent_agent = Agent(
    name="talent_agent",
    # model="gemini-2.0-flash-exp",
    model=LiteLlm(model="openai/gpt-4.1"),
    # model=LiteLlm(model="anthropic/claude-sonnet-4-20250514"),
    description="""
    Knowledge assistant for skills analysis, search, and team formation
    """,
    instruction=f"""
    You are a human resources assistant who helps with skills analysis, talent search, and team formation Skynet.

    Your tools retrieve data from internal knowledge on Skynet employees based on their resume and profiles.

    Try to prioritize expert tools (those other than `read_neo4j_cypher`) as appropriate since they have expert approved logic for access data. Though you may need to directly access data afterwards to pull more details.


    When you need more flexible logic for aggregations, follow-up or anything else, you can access the knowledge (in a graph database) directly. ALWAYS get the schema first with `get_schema` and keep it in memory. Only use node labels, relationship types, and property names, and patterns in that schema to generate valid Cypher queries using the `read_neo4j_cypher` tool with proper parameter syntax ($parameter). If you get errors or empty results check the schema and try again at least up to 3 times.

    For domain knowledge, use these standard values:
    - Domains: {[i.value for i in Domain]}
    - Work Types: {[i.value for i in WorkType]}
    - Skills: {[i.value for i in SkillName]}

    Also never return embedding properties in Cypher queries. This will result in delays and errors.

    When responding to the user:
    - if your response includes people, include there names and IDs. Never just there Ids.
    - You must explain your retrieval logic and where the data came from. You must say exactly how relevance, similarity, etc. was inferred during search

    Use information from previous queries when possible instead of asking the user again.
    """,
    tools=[find_similar_people,
           find_similarities_between_people,
           get_person_name,
           get_person_resume,
           get_person_ids_from_name,
           MCPToolset(
        connection_params=StdioServerParameters(
            command='uvx',
            args=[
                "mcp-neo4j-cypher",
            ],
            env={ k: os.environ[k] for k in ["NEO4J_URI","NEO4J_USERNAME","NEO4J_PASSWORD"] }
        ),
        tool_filter=['get_neo4j_schema','read_neo4j_cypher']
    )]
)

talent_agent_runner = AgentRunner(app_name='talent_agent', user_id='Mr. Ed', agent=talent_agent)
await talent_agent_runner.start_session()

Session started successfully with ID: 31a29cfe-109f-4ac8-a22f-00ae0cf6d040


True

### How many Python developers

In [26]:
res = await talent_agent_runner.run("How many Python developers do I have?")

None id='call_6ssxmmAg52GDgnnyqO4jbKig' args={} name='get_neo4j_schema' None
None None will_continue=None scheduling=None id='call_6ssxmmAg52GDgnnyqO4jbKig' name='get_neo4j_schema' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"label": "Person", "attributes": {"id": "STRING indexed", "current_title": "STRING", "text": "STRING", "level": "STRING", "location": "STRING", "email": "STRING", "department": "STRING", "name": "STRING", "years_experience": "INTEGER", "embedding": "LIST"}, "relationships": {"BUILT": "Thing", "WON": "Thing", "SHIPPED": "Thing", "KNOWS": "Skill", "PUBLISHED": "Thing", "OPTIMIZED": "Thing", "LED": "Thing", "MANAGED": "Thing"}}, {"label": "Skill", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "Thing", "attributes": {"name": "STRING indexed"}, "relationships": {"IN": "Domain", "OF": "WorkType"}}, {"label": "Domain", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "WorkType

In [27]:
display(Markdown(res))

You currently have 28 Python developers in your organization. 

Retrieval Logic:
- I queried the database for all people (Person nodes) who have a "KNOWS" relationship to the "Python" skill (Skill node with name "Python").
- The count only includes unique individuals who know Python.

If you need a list of their names or further breakdown (e.g., by department or experience), let me know!

### Who is most similar to Lucas Martinez

In [28]:
res = await talent_agent_runner.run("Who is most similar to Lucas Martinez and why?")

None id='call_2AA5FoFlzUPR7EbQ6blM9yf9' args={'person_name': 'Lucas Martinez'} name='get_person_ids_from_name' None
None None will_continue=None scheduling=None id='call_2AA5FoFlzUPR7EbQ6blM9yf9' name='get_person_ids_from_name' response={'result': [['HowfM0O2']]}
None id='call_Q4sacatV148llI8UiGh25mSU' args={'person_id': 'HowfM0O2'} name='find_similar_people' None
None None will_continue=None scheduling=None id='call_Q4sacatV148llI8UiGh25mSU' name='find_similar_people' response={'result': [{'score': 72, 'person_id': 'LUUCJ14S'}, {'score': 66, 'person_id': 'MpQCrNqA'}, {'score': 61, 'person_id': '8hvI9MCT'}, {'score': 58, 'person_id': 'Yvhy6A21'}, {'score': 55, 'person_id': 'Y7Dbiku6'}]}
None id='call_tzw0SRN1pkz0kSe4WzV3bkMI' args={'person_id': 'LUUCJ14S'} name='get_person_name' None
None id='call_rqQpN1i2VlRUGvFOyRQZgrGp' args={'person1_id': 'HowfM0O2', 'person2_id': 'LUUCJ14S'} name='find_similarities_between_people' None
None None will_continue=None scheduling=None id='call_tzw0SRN1

In [29]:
display(Markdown(res))

The person most similar to Lucas Martinez (ID: HowfM0O2) is Elena Popov (ID: LUUCJ14S).

Reason for similarity:
- Relevance was inferred using an expert search tool that measures the count of shared skills and accomplishment domains/types.
- Both Lucas and Elena have key technical overlaps: they know Python, AWS, Machine Learning, and Computer Vision.
- They both have built, shipped, or published projects and products within the AI domain, such as personalized learning platforms, reinforcement learning projects, and medical image analysis tools.
- Their work aligns in work types (PRODUCT and SYSTEM) and involves similar themes like deep learning optimization, ML evaluation frameworks, and real-time inference infrastructure.

This strong overlap in technical skills, domains (especially AI), and similar types of high-level accomplishments makes Elena Popov the most similar employee to Lucas Martinez in your organization.

### Summarize my technical talent and skills distribution

In [40]:
await talent_agent_runner.restart_session()

Session started successfully with ID: 406f58ed-ec29-4de8-a85a-0eee2817c049


True

In [41]:
res = await talent_agent_runner.run("Summarize Skynet's technical talent and skills distribution?")

None id='call_4IahobwfPLiJu5ckKfbenZh0' args={} name='get_neo4j_schema' None
None None will_continue=None scheduling=None id='call_4IahobwfPLiJu5ckKfbenZh0' name='get_neo4j_schema' response={'result': CallToolResult(meta=None, content=[TextContent(type='text', text='[{"label": "Person", "attributes": {"id": "STRING indexed", "current_title": "STRING", "text": "STRING", "level": "STRING", "location": "STRING", "email": "STRING", "department": "STRING", "name": "STRING", "years_experience": "INTEGER", "embedding": "LIST"}, "relationships": {"BUILT": "Thing", "WON": "Thing", "SHIPPED": "Thing", "KNOWS": "Skill", "PUBLISHED": "Thing", "OPTIMIZED": "Thing", "LED": "Thing", "MANAGED": "Thing"}}, {"label": "Skill", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "Thing", "attributes": {"name": "STRING indexed"}, "relationships": {"IN": "Domain", "OF": "WorkType"}}, {"label": "Domain", "attributes": {"name": "STRING indexed"}, "relationships": {}}, {"label": "WorkType

In [42]:
display(Markdown(res))

Here’s a summary of Skynet’s technical talent and skills distribution, based on direct analysis of structured employee profiles in Skynet’s internal database:

Skill Distribution:
- The most widely held technical skills are Python (28 people), SQL (16), AWS (15), Docker (10), Kubernetes (9), and Data Engineering (8). These indicate strength in data handling, backend, and cloud-native development.
- Key advanced/data/AI skills also show strong presence: Machine Learning (11 people), Deep Learning (6), Computer Vision (7), Natural Language Processing (6), and TensorFlow (6). PyTorch, Data Analysis, and Data Science are also represented.
- Notable supporting competencies include Project Management (11), Team Management (14), Leadership (6), and Communication across several people.
- Popular software, programming languages, and frameworks also represented: Java (6), JavaScript (6), React (5), Go (4), Google Cloud Platform (4), Node.js, Scala, Django, and Flask.
- Additional specialties: Cloud Architecture, Business Intelligence, Product Management, and Product Strategy are covered within small groups.

Domain Distribution:
- The largest areas of domain expertise (from people’s major accomplishments) are AI (14 people), Analytics (7), Data Engineering (7), Web (6), and Cloud (6).
- There is also competency in Database, Microservices, DevOps, Security, Mobile, and Platform domains.

Retrieval Logic:
- For technical skills, I queried for all skill connections from people and ranked by the number of people with each skill.
- For domains, I examined accomplishments (projects built, shipped, led, etc.) and identified domain tags from those works.

This distribution shows that Skynet has broad and deep technical strength in core programming, cloud infrastructure, AI/ML, data engineering, and modern development workflows, with a concentration in AI and analytics domains and wide support from related disciplines.

In [33]:
await talent_agent_runner.end_session()

Session 02d716db-49dc-45be-82fa-ca7428caaae7 ended successfully


True