# Controlled Vocabularies: Resolving Natural Language to Standardized Terms

An agent resolves free-text disease names into controlled vocabulary codes using the VocabularyConnector. We demonstrate two strategies: Tree (medium vocabularies, ~1K terms) with exact/fuzzy matching, and RAG (large vocabularies, 1K+ terms) with semantic search via vector embeddings. Both produce standardized term codes suitable for downstream SQL queries against clinical databases.

In [None]:
from agentic_patterns.agents.vocabulary import create_vocabulary_agent
from agentic_patterns.core.connectors.vocabulary.connector import VocabularyConnector
from agentic_patterns.core.connectors.vocabulary.registry import (
    register_vocabulary,
    reset,
)
from agentic_patterns.core.connectors.vocabulary.strategy_tree import StrategyTree
from agentic_patterns.core.connectors.vocabulary.strategy_rag import StrategyRag
from agentic_patterns.core.connectors.vocabulary.toy_data import (
    get_toy_tree_terms,
    get_toy_rag_terms,
)

## Setting Up Vocabularies

We register two toy vocabularies directly using in-memory data. In production, vocabularies are loaded from files (OBO, OWL, etc.) via `vocabularies.yaml`. The Tree strategy stores terms in an adjacency list; the RAG strategy indexes them into a vector database for semantic search.

In [None]:
reset()

# Tree strategy: Sequence Ontology (~20 terms, exact + fuzzy matching)
tree_backend = StrategyTree(name="sequence_ontology", terms=get_toy_tree_terms())
register_vocabulary("sequence_ontology", tree_backend)

# RAG strategy: Gene Ontology (~15 terms, semantic vector search)
rag_backend = StrategyRag(name="gene_ontology", collection="gene_ontology_demo")
for term in get_toy_rag_terms():
    rag_backend.add_term(term)
register_vocabulary("gene_ontology", rag_backend)

print(f"Tree backend: {tree_backend.info()}")
print(f"RAG backend: {rag_backend.info()}")

## Using the Connector Directly

The VocabularyConnector wraps strategy backends with a uniform API. Every method returns a formatted string -- the agent never handles raw objects.

In [None]:
connector = VocabularyConnector()

# Tree: search by text, navigate hierarchy
print("--- Tree: search for 'coding sequence' ---")
print(connector.search("sequence_ontology", "coding sequence"))

print("\n--- Tree: ancestors of CDS (SO:0000316) ---")
print(connector.ancestors("sequence_ontology", "SO:0000316"))

print("\n--- Tree: validate a typo ---")
print(connector.validate("sequence_ontology", "SO:0000317"))

In [None]:
# RAG: semantic search -- finds related terms even with different wording
print("--- RAG: search for 'cell death' ---")
print(connector.search("gene_ontology", "cell death", max_results=3))

print("\n--- RAG: suggest terms for 'inflammation in tissues' ---")
print(connector.suggest("gene_ontology", "inflammation in tissues", max_results=3))

## Vocabulary Agent

The vocabulary agent wraps these connector methods as tools. Given a natural language query, it decides which vocabulary to search, resolves terms, and navigates hierarchies autonomously.

In [None]:
from agentic_patterns.core.agents import run_agent

agent = create_vocabulary_agent(vocab_names=["sequence_ontology", "gene_ontology"])

query = (
    "I need the controlled vocabulary code for 'programmed cell death' in Gene Ontology. "
    "Also find the code for 'single nucleotide variant' in Sequence Ontology "
    "and show me its parent terms."
)

result, nodes = await run_agent(agent, query, verbose=True)
print(f"\nAgent output:\n{result.result.output}")