feat(embed): configurable embedding dim + ollama timeout via env#5
Merged
yhyyz merged 1 commit intoMay 18, 2026
Conversation
The openai-compatible embedder hardcodes `dims: 1024` at construction and uses a hardcoded `Duration::from_secs(10)` reqwest timeout. Two problems this surfaces in practice: 1. Any embed model whose native dim != 1024 produces vectors the Lance store can't hold. `nomic-embed-text` (768), `bge-small` (384), `all-MiniLM-L6-v2` (384) all break currently. 2. On CPU-only / older hardware, a single embed call against a large model (e.g. mxbai-embed-large, 334M params) routinely takes 11-15s, blowing the 10s timeout. omem-server returns 500; ollama keeps computing and discards the result. Useless work, no successful writes. Add two `OmemConfig` fields populated from env vars: - `OMEM_EMBED_DIM` (default 1024) — passed into `OpenAICompatEmbedder::dims`, returned by `EmbedService::dimensions()` - `OMEM_EMBED_TIMEOUT_SECS` (default 10) — used to build the `reqwest::Client` timeout Both fall back to the existing defaults when unset or zero, so behavior for existing deployments is unchanged. Validated end-to-end on a 2013-vintage Xeon E5-2697 v2 host running ollama with `all-minilm-512` (384-dim, num_ctx=512). Embed calls dropped from 11-15s on mxbai-embed-large (1024) to 200-500ms, allowing a 185-chunk memory migration to complete with zero failures where the previous configuration failed at 41-67%. Three new openai_compat tests cover: - `embed_dim_from_config_overrides_default` — dims() returns the configured value - `embed_dim_zero_falls_back_to_1024` — guards against misconfig producing zero-length vectors - `embed_timeout_zero_falls_back_to_default` — guards against zero-timeout reqwest behavior This complements PR (schema-flexible vector dim) — that PR makes the Lance store accept any dim; this one lets the active embedder report it. Together they make non-1024 embedding models actually usable on the openai-compat path.
Contributor
Author
4 tasks
Contributor
|
Merged and deployed! Thanks @doctatortot 🙏 This pairs perfectly with #4 — together they unlock end-to-end support for smaller/faster embedding models on the openai-compatible path. The zero-value fallback guards are a nice defensive touch, and the timeout fix is a real pain-point solver for CPU-only / Ollama setups. Both PRs are now live on api.ourmem.ai. Great work! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes two values in the openai-compat embedder configurable instead of hardcoded:
OMEM_EMBED_DIM(default 1024) — drivesOpenAICompatEmbedder::dims/EmbedService::dimensions()OMEM_EMBED_TIMEOUT_SECS(default 10) — drives thereqwest::Clienttimeout for the embed callBoth default to the existing hardcoded values when unset or zero, so behavior for any existing deployment is unchanged.
Motivation
Two problems running omem-server with non-default models or on slower hardware:
Hardcoded
dims: 1024atOpenAICompatEmbedder::new()means any model whose native dim != 1024 produces vectors the Lance store can't store. Affectsnomic-embed-text(768),bge-small-en-v1.5(384),all-MiniLM-L6-v2(384), and others — all of these otherwise fail at first write with an Arrow length-mismatch panic.Hardcoded 10s reqwest timeout kills inference on CPU-only / older hardware. A single embed call against
mxbai-embed-large(334M-param BERT) on a 2013 Xeon E5-2697 v2 routinely takes 11-15s under back-to-back load. omem-server returns 500 to the caller; ollama keeps computing and discards the eventual result. Lots of wasted CPU and no writes land.Validated end-to-end
On a 2013 Xeon E5-2697 v2 (no AVX2/AVX-512), running ollama with a custom
all-minilm-512model (384-dim, num_ctx=512):The migration script that exercised this is just
POST /v1/memoriesper chunk back-to-back with a 6s spacing.Relationship to the schema-flexible dim PR
That PR (still open) makes
LanceStore's vector column accept any dim from the embedder. This PR lets the openai-compat embedder actually report a non-1024 dim. Together they unlock smaller/faster embedding models on the openai-compat path. They can land in either order; behavior is back-compat in both directions.Back-compat
OmemConfig::default()keeps the existing 1024-dim / 10s-timeout values.Tests
Three new tests on
OpenAICompatEmbedder::new:embed_dim_from_config_overrides_default— confirmsdims()returns the configured valueembed_dim_zero_falls_back_to_1024— guards misconfigembed_timeout_zero_falls_back_to_default— guards misconfigChecklist
OmemConfig::default()unchanged, zero values fall back)