Skip to content

feat: add remote embedding via proxy with cache layer#79

Merged
samcm merged 6 commits intomasterfrom
jolly-cow-748
Mar 17, 2026
Merged

feat: add remote embedding via proxy with cache layer#79
samcm merged 6 commits intomasterfrom
jolly-cow-748

Conversation

@samcm
Copy link
Copy Markdown
Member

@samcm samcm commented Mar 17, 2026

Move embedding computation from the local ONNX model to a remote API (OpenRouter) accessed through the proxy, with a cache layer to avoid redundant calls. The server checks embedding availability from the proxy's /datasources response and uses the remote embedder when available, falling back to local ONNX otherwise.

samcm added 6 commits March 17, 2026 11:37
Extract Embedder interface from the concrete ONNX struct, add a
RemoteEmbedder that calls the proxy's new /embed endpoint, and a
generic cache package (memory + Redis) so the proxy can avoid
redundant OpenRouter API calls.

The server checks embedding availability from the proxy's /datasources
response and uses the remote embedder when available, falling back to
local ONNX otherwise.
Drop LocalEmbedder and the hugot/ONNX dependency entirely — embedding
is handled by the proxy's remote service. Simplify searchruntime.Build
to only use the remote embedder.

Add tests:
- pkg/cache: memory cache unit tests + Redis integration tests (testcontainers)
- pkg/embedding: RemoteEmbedder tests with mocked proxy endpoint
- pkg/proxy: EmbeddingService tests with mocked OpenRouter API
- pkg/resource: rewrite EIP index test to use a stub embedder
RemoteEmbedder now sends hashes to /embed/check first, then only sends
text for uncached items to /embed. Single-item embeds skip the check
and go directly to /embed.
Cache:
- InMemoryCache and RedisCache now accept a TTL at construction time
- Embedding cache uses a 30-day TTL so orphaned entries expire
- Expired in-memory entries are skipped on read (lazy expiry)

Metrics (panda_proxy_embedding_*):
- embedding_requests_total{status} — OpenRouter API call count
- embedding_request_duration_seconds — API call latency
- embedding_tokens_total{type} — prompt/total token consumption
- embedding_cost_usd — cumulative estimated cost in USD
- embedding_items_total{source} — cache_hit vs cache_miss counts
- extractDatasourceType now recognizes /embed paths

Config:
- cost_per_token field on EmbeddingConfig (default $0.02/1M tokens)
When cost_per_token is not set in config, the EmbeddingService
queries the API's /models endpoint at startup and extracts the
prompt cost for the configured model. Falls back gracefully if
the fetch fails.
@samcm samcm merged commit 88b6dc6 into master Mar 17, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant