Skip to content

Implement dynamic Ollama embedding dimension resolution with server probing#237

Merged
lucaseduoli merged 7 commits into
mainfrom
copilot/dynamic-embedding-dimension-resolution
Oct 9, 2025
Merged

Implement dynamic Ollama embedding dimension resolution with server probing#237
lucaseduoli merged 7 commits into
mainfrom
copilot/dynamic-embedding-dimension-resolution

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Oct 9, 2025

Summary

Replaces hardcoded Ollama embedding dimensions with dynamic resolution that probes the running Ollama server to determine the actual vector dimension for the selected embedding model. Maintains static maps and default fallback for unknown models or failure cases.

Motivation

Currently, OLLAMA_EMBEDDING_DIMENSIONS in src/config/settings.py contains a hardcoded subset of model→dimension mappings. This approach is brittle and quickly becomes stale as new models are released or custom models are used. Ollama exposes an embeddings API that returns the embedding vector, allowing us to infer the true dimension at runtime.

Changes

src/utils/embeddings.py (+120 lines)

New async functions:

  • _probe_ollama_embedding_dimension(endpoint, model_name) - Probes the Ollama server's /api/embeddings endpoint with a test string. Tries modern API format ({model, input}) first, then falls back to legacy format ({model, prompt}). Returns the embedding dimension from the response or 0 on failure.

  • resolve_embedding_dimension(embedding_model, provider, endpoint) - Main resolution function that conditionally probes Ollama when provider and endpoint are provided. Falls back to static maps via get_embedding_dimensions(), then to VECTOR_DIM (1536) for unknown models.

Modified function:

  • create_dynamic_index_body(embedding_model, provider, endpoint) - Changed from sync to async and added optional provider and endpoint parameters. Now calls await resolve_embedding_dimension() instead of direct static lookup.

src/main.py (+7 lines)

Modified init_index():

# Before:
dynamic_index_body = create_dynamic_index_body(embedding_model)

# After:
dynamic_index_body = await create_dynamic_index_body(
    embedding_model,
    provider=config.provider.model_provider,
    endpoint=config.provider.endpoint
)

Behavior

Ollama with endpoint configured

  1. Attempts to probe server at {endpoint}/api/embeddings during index creation
  2. If successful and dimension > 0: uses probed dimension
  3. If probe fails: falls back to static map
  4. If model unknown: falls back to VECTOR_DIM (1536)

Non-Ollama providers (OpenAI, watsonx)

  • No probing attempted
  • Uses static dimension maps as before
  • Falls back to VECTOR_DIM for unknown models

Error handling

All errors (network timeout, connection refused, invalid JSON, missing fields) result in graceful fallback with logging. No exceptions propagate to prevent index creation failures.

Benefits

  • Supports custom models: Automatically detects dimensions for any model available on the Ollama server
  • Future-proof: No need to update static maps when new models are released
  • Zero overhead: Probing happens only once during index creation, not per-request
  • Backwards compatible: Optional parameters, existing behavior preserved for non-Ollama providers
  • Graceful degradation: Falls back to static maps and defaults on any error

Testing

Validated behavior across 7 scenarios:

  • Ollama with known model (server available) → uses probed dimension
  • Ollama with custom model (server available) → uses probed dimension
  • Ollama with server unavailable → falls back to static map
  • Ollama without endpoint → falls back to static map
  • OpenAI provider → uses static map (no probing)
  • watsonx provider → uses static map (no probing)
  • Unknown model → falls back to default (1536)

Deployment

  • No new dependencies (httpx already in project)
  • No configuration changes required
  • No database migrations needed
  • Safe for immediate deployment (graceful fallback on errors)
Original prompt

Summary
Replace the hardcoded Ollama embedding dimensions with a dynamic resolution that probes the running Ollama server to determine the actual vector dimension for the selected embedding model. Maintain static maps and default fallback for unknown models or failure cases. Ensure the OpenSearch index is created with the correct knn_vector.dimension based on the resolved size.

Rationale
Currently, OLLAMA_EMBEDDING_DIMENSIONS in src/config/settings.py contains a hardcoded subset of model→dimension mappings. This is brittle and quickly becomes stale. Ollama exposes an embeddings API that returns the embedding vector, allowing us to infer the true dimension at runtime. We should use that during index creation, while preserving fallbacks.

Scope

  • Implement a probe that calls the Ollama embeddings endpoint and infers vector length for the configured embedding model.
  • Update create_dynamic_index_body to resolve dimensions dynamically using the probe when provider == "ollama", falling back to static maps and then VECTOR_DIM when probing fails.
  • Update main.py to pass provider and endpoint into the dynamic index body creation and await the async function.
  • Do not change the legacy _ensure_opensearch_index path.

Technical Plan

  1. src/utils/embeddings.py
  • Add async helper _probe_ollama_embedding_dimension(endpoint, model_name) using httpx.AsyncClient that POSTs to {endpoint}/api/embeddings with a short string. Support both {model, input} and legacy {model, prompt} payloads. Infer dimension from len(embedding) or len(embeddings[0]).
  • Add async resolve_embedding_dimension(embedding_model, provider?, endpoint?) that:
    • If provider == "ollama" and endpoint provided → probe; if successful and > 0, return it
    • Else return the existing static-maps result via get_embedding_dimensions(); if unknown, that returns VECTOR_DIM
  • Make create_dynamic_index_body async, and compute dimensions via resolve_embedding_dimension(...). Keep the returned mapping identical except for using the resolved dimension.
  1. src/main.py
  • Update init_index() to await create_dynamic_index_body(embedding_model, provider=config.provider.model_provider, endpoint=config.provider.endpoint)
  • Ensure imports and async usage are correct; no changes to _ensure_opensearch_index or other flows.

Acceptance Criteria

  • When provider is ollama and a valid endpoint is configured, the backend issues a single POST to /api/embeddings for the configured embedding model during index creation, and sets knn_vector.dimension to the returned vector length.
  • If the probe fails or the server returns unexpected JSON, the system falls back to static maps and then VECTOR_DIM=1536.
  • Non-Ollama providers (OpenAI, watsonx) behave exactly as before.
  • init_index runs without raising, and the index mapping contains the resolved dimension value.

Notes

  • This executes only at index creation time (after onboarding), so no per-request overhead.
  • We can optionally add in-memory caching later if desired.

Files to modify

  • src/utils/embeddings.py: implement probe + async create_dynamic_index_body and helpers.
  • src/main.py: pass provider/endpoint and await new async function.

No images are attached.

*This pull request was created as a result of the following prompt from Copilot chat.* > Summary > Replace the hardcoded Ollama embedding dimensions with a dynamic resolution that probes the running Ollama server to determine the actual vector dimension for the selected embedding model. Maintain static maps and default fallback for unknown models or failure cases. Ensure the OpenSearch index is created with the correct knn_vector.dimension based on the resolved size. > > Rationale > Currently, OLLAMA_EMBEDDING_DIMENSIONS in src/config/settings.py contains a hardcoded subset of model→dimension mappings. This is brittle and quickly becomes stale. Ollama exposes an embeddings API that returns the embedding vector, allowing us to infer the true dimension at runtime. We should use that during index creation, while preserving fallbacks. > > Scope > - Implement a probe that calls the Ollama embeddings endpoint and infers vector length for the configured embedding model. > - Update create_dynamic_index_body to resolve dimensions dynamically using the probe when provider == "ollama", falling back to static maps and then VECTOR_DIM when probing fails. > - Update main.py to pass provider and endpoint into the dynamic index body creation and await the async function. > - Do not change the legacy _ensure_opensearch_index path. > > Technical Plan > 1) src/utils/embeddings.py > - Add async helper _probe_ollama_embedding_dimension(endpoint, model_name) using httpx.AsyncClient that POSTs to {endpoint}/api/embeddings with a short string. Support both {model, input} and legacy {model, prompt} payloads. Infer dimension from len(embedding) or len(embeddings[0]). > - Add async resolve_embedding_dimension(embedding_model, provider?, endpoint?) that: > - If provider == "ollama" and endpoint provided → probe; if successful and > 0, return it > - Else return the existing static-maps result via get_embedding_dimensions(); if unknown, that returns VECTOR_DIM > - Make create_dynamic_index_body async, and compute dimensions via resolve_embedding_dimension(...). Keep the returned mapping identical except for using the resolved dimension. > > 2) src/main.py > - Update init_index() to await create_dynamic_index_body(embedding_model, provider=config.provider.model_provider, endpoint=config.provider.endpoint) > - Ensure imports and async usage are correct; no changes to _ensure_opensearch_index or other flows. > > Acceptance Criteria > - When provider is ollama and a valid endpoint is configured, the backend issues a single POST to /api/embeddings for the configured embedding model during index creation, and sets knn_vector.dimension to the returned vector length. > - If the probe fails or the server returns unexpected JSON, the system falls back to static maps and then VECTOR_DIM=1536. > - Non-Ollama providers (OpenAI, watsonx) behave exactly as before. > - init_index runs without raising, and the index mapping contains the resolved dimension value. > > Notes > - This executes only at index creation time (after onboarding), so no per-request overhead. > - We can optionally add in-memory caching later if desired. > > Files to modify > - src/utils/embeddings.py: implement probe + async create_dynamic_index_body and helpers. > - src/main.py: pass provider/endpoint and await new async function. > > No images are attached.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: phact <1313220+phact@users.noreply.github.com>
Copilot AI changed the title [WIP] Update embedding dimensions to be dynamic from Ollama API Implement dynamic Ollama embedding dimension resolution with server probing Oct 9, 2025
Copilot AI requested a review from phact October 9, 2025 01:33
@lucaseduoli lucaseduoli marked this pull request as ready for review October 9, 2025 17:28
@lucaseduoli lucaseduoli self-requested a review October 9, 2025 18:41
Copy link
Copy Markdown
Collaborator

@lucaseduoli lucaseduoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lucaseduoli lucaseduoli merged commit 140d246 into main Oct 9, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants