feat: add remote embedding via proxy with cache layer#79
Merged
Conversation
Extract Embedder interface from the concrete ONNX struct, add a RemoteEmbedder that calls the proxy's new /embed endpoint, and a generic cache package (memory + Redis) so the proxy can avoid redundant OpenRouter API calls. The server checks embedding availability from the proxy's /datasources response and uses the remote embedder when available, falling back to local ONNX otherwise.
Drop LocalEmbedder and the hugot/ONNX dependency entirely — embedding is handled by the proxy's remote service. Simplify searchruntime.Build to only use the remote embedder. Add tests: - pkg/cache: memory cache unit tests + Redis integration tests (testcontainers) - pkg/embedding: RemoteEmbedder tests with mocked proxy endpoint - pkg/proxy: EmbeddingService tests with mocked OpenRouter API - pkg/resource: rewrite EIP index test to use a stub embedder
RemoteEmbedder now sends hashes to /embed/check first, then only sends text for uncached items to /embed. Single-item embeds skip the check and go directly to /embed.
Cache:
- InMemoryCache and RedisCache now accept a TTL at construction time
- Embedding cache uses a 30-day TTL so orphaned entries expire
- Expired in-memory entries are skipped on read (lazy expiry)
Metrics (panda_proxy_embedding_*):
- embedding_requests_total{status} — OpenRouter API call count
- embedding_request_duration_seconds — API call latency
- embedding_tokens_total{type} — prompt/total token consumption
- embedding_cost_usd — cumulative estimated cost in USD
- embedding_items_total{source} — cache_hit vs cache_miss counts
- extractDatasourceType now recognizes /embed paths
Config:
- cost_per_token field on EmbeddingConfig (default $0.02/1M tokens)
When cost_per_token is not set in config, the EmbeddingService queries the API's /models endpoint at startup and extracts the prompt cost for the configured model. Falls back gracefully if the fetch fails.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Move embedding computation from the local ONNX model to a remote API (OpenRouter) accessed through the proxy, with a cache layer to avoid redundant calls. The server checks embedding availability from the proxy's
/datasourcesresponse and uses the remote embedder when available, falling back to local ONNX otherwise.