Skip to content
Oliver Atkinson edited this page May 31, 2026 · 2 revisions

Search

Lexicon search has four user-facing modes. Treat them as lenses over the same search entries rather than separate indexes.

Modes

Mode Use it for
hybrid Default search. Combines path/name token matches, phrase matches, metadata, references, and semantic scores when embeddings are available.
semantic Natural-language meaning search. Best with --embedding-provider mlx and a cached embedding index.
token Deterministic path, name, type, default, and reference lookup. Useful for editor tooling and exact-ish IDs.
lexical Phrase and fuzzy text search over IDs, notes, comments, defaults, and metadata without synonym rules.

Scopes

Scope Search space
own The declared document graph only. Fastest and most predictable.
live Search own first, then expand likely hits through resolved lemma context. Good for interactive search over inherited context.
full Materialize the resolved lemma search space, then search it. Traversal counts depth and budget at each child edge and stops repeated inherited structure so recursive types do not expand forever.

--depth, --candidates, and --budget bound live and full traversal.

Deterministic Search

swift run lexicon search Examples/search-demo.lexicon submit order \
	--mode hybrid \
	--limit 5 \
	--embedding-provider none

Useful deterministic examples:

Query Expected kind of match
submit order API operation, such as demo.api.order.submit.
payment decline API or UI error concepts around declined payments.
free shipping Campaign or feature terms.
checkout button UI button terms.
purchase completed event Analytics event terms.

Semantic Search

On Apple platforms, the base build can use NaturalLanguage sentence embeddings. For MLX-backed semantic search, build the CLI with the MLXSearch package trait:

swift run --traits MLXSearch lexicon search Examples/search-demo.lexicon \
	"customer asks for money back after purchase" \
	--mode semantic \
	--embedding-provider mlx \
	--embedding-model TaylorAI/bge-micro-v2

The first MLX semantic search builds a document embedding cache and logs indexing progress to stderr. Pass --embedding-cache to choose a cache path, or --rebuild-embeddings to force regeneration.

Natural-language queries are useful when the user's phrase does not share tokens with the vocabulary:

Query Likely match
customer asks for money back after purchase refund request terms
late delivery after carrier delay delivery delay terms
card issuer rejected transaction payment decline terms
basket threshold delivery cost free shipping campaign terms
stock availability nearly sold out inventory stock terms

Names-Only Search

swift run lexicon search commerce.lexicon checkout button \
	--mode token \
	--names-only

Use --names-only for editor completion-style workflows where notes, comments, defaults, and references should not influence ranking.

Source-Only Search

swift run lexicon search commerce.lexicon submit \
	--source-only

By default, search composes imports. Use --source-only when you need to inspect only the file being edited.

Practical Search Workflow

  1. Start with --mode hybrid --scope own.
  2. If a concept is inherited through types, retry with --scope live.
  3. If a query is natural-language and token-poor, use --mode semantic.
  4. If ranking seems noisy, use --names-only or switch to token.
  5. If results are unexpectedly missing, run lexicon validate and check composition diagnostics first.

Clone this wiki locally