Skip to content

feat(embed): configurable embedding dim + ollama timeout via env#5

Merged
yhyyz merged 1 commit into
ourmem:mainfrom
doctatortot:upstream-pr/configurable-embed-dim-and-timeout
May 18, 2026
Merged

feat(embed): configurable embedding dim + ollama timeout via env#5
yhyyz merged 1 commit into
ourmem:mainfrom
doctatortot:upstream-pr/configurable-embed-dim-and-timeout

Conversation

@doctatortot
Copy link
Copy Markdown
Contributor

Summary

Makes two values in the openai-compat embedder configurable instead of hardcoded:

  • OMEM_EMBED_DIM (default 1024) — drives OpenAICompatEmbedder::dims / EmbedService::dimensions()
  • OMEM_EMBED_TIMEOUT_SECS (default 10) — drives the reqwest::Client timeout for the embed call

Both default to the existing hardcoded values when unset or zero, so behavior for any existing deployment is unchanged.

Motivation

Two problems running omem-server with non-default models or on slower hardware:

  1. Hardcoded dims: 1024 at OpenAICompatEmbedder::new() means any model whose native dim != 1024 produces vectors the Lance store can't store. Affects nomic-embed-text (768), bge-small-en-v1.5 (384), all-MiniLM-L6-v2 (384), and others — all of these otherwise fail at first write with an Arrow length-mismatch panic.

  2. Hardcoded 10s reqwest timeout kills inference on CPU-only / older hardware. A single embed call against mxbai-embed-large (334M-param BERT) on a 2013 Xeon E5-2697 v2 routinely takes 11-15s under back-to-back load. omem-server returns 500 to the caller; ollama keeps computing and discards the eventual result. Lots of wasted CPU and no writes land.

Validated end-to-end

On a 2013 Xeon E5-2697 v2 (no AVX2/AVX-512), running ollama with a custom all-minilm-512 model (384-dim, num_ctx=512):

  • Embed call latency: 11-15s (mxbai-embed-large @ 1024) → 200-500ms (all-minilm @ 384)
  • Memory migration of 185 chunks: previously failed at 41-67% rate (timeouts), now completes with zero failures

The migration script that exercised this is just POST /v1/memories per chunk back-to-back with a 6s spacing.

Relationship to the schema-flexible dim PR

That PR (still open) makes LanceStore's vector column accept any dim from the embedder. This PR lets the openai-compat embedder actually report a non-1024 dim. Together they unlock smaller/faster embedding models on the openai-compat path. They can land in either order; behavior is back-compat in both directions.

Back-compat

  • OmemConfig::default() keeps the existing 1024-dim / 10s-timeout values.
  • Existing tests pass unchanged.
  • Zero values for either env var fall back to the defaults (guards against misconfig producing zero-length vectors or zero-timeout reqwest behavior).

Tests

Three new tests on OpenAICompatEmbedder::new:

  • embed_dim_from_config_overrides_default — confirms dims() returns the configured value
  • embed_dim_zero_falls_back_to_1024 — guards misconfig
  • embed_timeout_zero_falls_back_to_default — guards misconfig

Checklist

  • Existing openai_compat tests still pass
  • New behavior covered
  • Back-compat preserved (OmemConfig::default() unchanged, zero values fall back)
  • No new dependencies

The openai-compatible embedder hardcodes `dims: 1024` at construction
and uses a hardcoded `Duration::from_secs(10)` reqwest timeout. Two
problems this surfaces in practice:

1. Any embed model whose native dim != 1024 produces vectors the
   Lance store can't hold. `nomic-embed-text` (768), `bge-small`
   (384), `all-MiniLM-L6-v2` (384) all break currently.
2. On CPU-only / older hardware, a single embed call against a
   large model (e.g. mxbai-embed-large, 334M params) routinely
   takes 11-15s, blowing the 10s timeout. omem-server returns 500;
   ollama keeps computing and discards the result. Useless work,
   no successful writes.

Add two `OmemConfig` fields populated from env vars:

- `OMEM_EMBED_DIM` (default 1024) — passed into
  `OpenAICompatEmbedder::dims`, returned by `EmbedService::dimensions()`
- `OMEM_EMBED_TIMEOUT_SECS` (default 10) — used to build the
  `reqwest::Client` timeout

Both fall back to the existing defaults when unset or zero, so
behavior for existing deployments is unchanged.

Validated end-to-end on a 2013-vintage Xeon E5-2697 v2 host running
ollama with `all-minilm-512` (384-dim, num_ctx=512). Embed calls
dropped from 11-15s on mxbai-embed-large (1024) to 200-500ms,
allowing a 185-chunk memory migration to complete with zero failures
where the previous configuration failed at 41-67%.

Three new openai_compat tests cover:
- `embed_dim_from_config_overrides_default` — dims() returns the
  configured value
- `embed_dim_zero_falls_back_to_1024` — guards against misconfig
  producing zero-length vectors
- `embed_timeout_zero_falls_back_to_default` — guards against
  zero-timeout reqwest behavior

This complements PR (schema-flexible vector dim) — that PR makes the
Lance store accept any dim; this one lets the active embedder report
it. Together they make non-1024 embedding models actually usable on
the openai-compat path.
@doctatortot
Copy link
Copy Markdown
Contributor Author

Companion to #4 (schema-flexible LanceStore vector dim). #4 makes the store accept any dim from the embedder; this PR lets the embedder produce one. Each is back-compat with existing defaults so they can land in either order.

@yhyyz
Copy link
Copy Markdown
Contributor

yhyyz commented May 18, 2026

Merged and deployed! Thanks @doctatortot 🙏

This pairs perfectly with #4 — together they unlock end-to-end support for smaller/faster embedding models on the openai-compatible path. The zero-value fallback guards are a nice defensive touch, and the timeout fix is a real pain-point solver for CPU-only / Ollama setups.

Both PRs are now live on api.ourmem.ai. Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants