-
Notifications
You must be signed in to change notification settings - Fork 6
Semantic Search
gaia skills search lets you find named skills by name or description. For local embedding-based search across the full generic skill graph, the embeddings pipeline is available for registry development installs.
You can perform simple search operations natively using the Gaia CLI:
# Substring and fuzzy keyword search across named implementations
gaia skills search "autonomous research agent"
gaia skills search "code debugging"
# List all local/global installed named skills
gaia skills list
# View detailed metadata for a specific named skill
gaia skills info karpathy/autoresearchThis works out of the box with any standard installation of the gaia-cli and does not require local neural network inference.
For semantic similarity matching across the entire generic and named skill registry, developers can install the embeddings extension package inside a local clone of the registry:
# Install the CLI package with embeddings dependencies
pip install -e ".[embeddings]"This installs sentence-transformers and numpy. The pre-trained model (all-MiniLM-L6-v2, ~90 MB) is downloaded automatically upon first execution and cached locally in your home directory.
To compute pairwise similarity scores and update the search index, run the compilation scripts from the root of the repository:
# Recompute pairwise cosine similarities and output index
python3 scripts/computeSimilarity.pyTip
The git pre-commit hook runs this verification automatically for registry modifications if you have initialized the hooks via bash scripts/install-git-hooks.sh.
-
Text Encoding: Each skill node in the registry is compiled into a single textual representation formatted as
"{name}: {description}". -
Embedding Generation: The text is encoded into a 384-dimensional vector using the local
all-MiniLM-L6-v2transformer model. -
Similarity Calculation: Pairwise cosine similarity is computed between all vectors. Pairs scoring above a configurable threshold (default
0.3) are outputted to the index. -
Graph Persistence: The similarity index is written directly to
registry/similarity.json.
The MCP server and standard CLI queries read this pre-computed registry/similarity.json file at runtime to perform instantaneous semantic lookups without incurring neural network inference latency.
The embedding vectors are generated using the following local model configuration:
| Property | Specification / Value |
|---|---|
| Model Name | all-MiniLM-L6-v2 |
| Output Dimensions | 384 |
| Max Sequence Length | 256 tokens |
| Disk Footprint | ~90 MB (cached locally after first fetch) |
| Device Target | Runs locally on CPU (falls back to CUDA/MPS if available) |
| Network Dependency | None (100% offline after the initial model download) |
gaia skills search returns no results:
The standard skills search command uses fuzzy substring matching. For full semantic search capabilities, make sure you have installed the [embeddings] extra and that registry/similarity.json has been successfully compiled.
Stale Embeddings (New skills do not appear in searches):
The search indices are static generated files. When adding new generic nodes under registry/nodes/ or named markdown implementations under registry/named/, you must regenerate the indexes using the build commands:
# Recompile the registry graph and regenerate embeddings
gaia dev buildThis automatically invokes the embedding generation script and updates registry/embeddings.json and registry/similarity.json.