Implement dynamic Ollama embedding dimension resolution with server probing by Copilot · Pull Request #237 · langflow-ai/openrag

Copilot · 2025-10-09T01:18:56Z

Summary

Replaces hardcoded Ollama embedding dimensions with dynamic resolution that probes the running Ollama server to determine the actual vector dimension for the selected embedding model. Maintains static maps and default fallback for unknown models or failure cases.

Motivation

Currently, OLLAMA_EMBEDDING_DIMENSIONS in src/config/settings.py contains a hardcoded subset of model→dimension mappings. This approach is brittle and quickly becomes stale as new models are released or custom models are used. Ollama exposes an embeddings API that returns the embedding vector, allowing us to infer the true dimension at runtime.

Changes

src/utils/embeddings.py (+120 lines)

New async functions:

_probe_ollama_embedding_dimension(endpoint, model_name) - Probes the Ollama server's /api/embeddings endpoint with a test string. Tries modern API format ({model, input}) first, then falls back to legacy format ({model, prompt}). Returns the embedding dimension from the response or 0 on failure.
resolve_embedding_dimension(embedding_model, provider, endpoint) - Main resolution function that conditionally probes Ollama when provider and endpoint are provided. Falls back to static maps via get_embedding_dimensions(), then to VECTOR_DIM (1536) for unknown models.

Modified function:

create_dynamic_index_body(embedding_model, provider, endpoint) - Changed from sync to async and added optional provider and endpoint parameters. Now calls await resolve_embedding_dimension() instead of direct static lookup.

src/main.py (+7 lines)

Modified init_index():

# Before:
dynamic_index_body = create_dynamic_index_body(embedding_model)

# After:
dynamic_index_body = await create_dynamic_index_body(
    embedding_model,
    provider=config.provider.model_provider,
    endpoint=config.provider.endpoint
)

Behavior

Ollama with endpoint configured

Attempts to probe server at {endpoint}/api/embeddings during index creation
If successful and dimension > 0: uses probed dimension
If probe fails: falls back to static map
If model unknown: falls back to VECTOR_DIM (1536)

Non-Ollama providers (OpenAI, watsonx)

No probing attempted
Uses static dimension maps as before
Falls back to VECTOR_DIM for unknown models

Error handling

All errors (network timeout, connection refused, invalid JSON, missing fields) result in graceful fallback with logging. No exceptions propagate to prevent index creation failures.

Benefits

Supports custom models: Automatically detects dimensions for any model available on the Ollama server
Future-proof: No need to update static maps when new models are released
Zero overhead: Probing happens only once during index creation, not per-request
Backwards compatible: Optional parameters, existing behavior preserved for non-Ollama providers
Graceful degradation: Falls back to static maps and defaults on any error

Testing

Validated behavior across 7 scenarios:

Ollama with known model (server available) → uses probed dimension
Ollama with custom model (server available) → uses probed dimension
Ollama with server unavailable → falls back to static map
Ollama without endpoint → falls back to static map
OpenAI provider → uses static map (no probing)
watsonx provider → uses static map (no probing)
Unknown model → falls back to default (1536)

Deployment

No new dependencies (httpx already in project)
No configuration changes required
No database migrations needed
Safe for immediate deployment (graceful fallback on errors)

Original prompt

Summary
Replace the hardcoded Ollama embedding dimensions with a dynamic resolution that probes the running Ollama server to determine the actual vector dimension for the selected embedding model. Maintain static maps and default fallback for unknown models or failure cases. Ensure the OpenSearch index is created with the correct knn_vector.dimension based on the resolved size.

Rationale
Currently, OLLAMA_EMBEDDING_DIMENSIONS in src/config/settings.py contains a hardcoded subset of model→dimension mappings. This is brittle and quickly becomes stale. Ollama exposes an embeddings API that returns the embedding vector, allowing us to infer the true dimension at runtime. We should use that during index creation, while preserving fallbacks.

Scope

Implement a probe that calls the Ollama embeddings endpoint and infers vector length for the configured embedding model.

Update create_dynamic_index_body to resolve dimensions dynamically using the probe when provider == "ollama", falling back to static maps and then VECTOR_DIM when probing fails.

Update main.py to pass provider and endpoint into the dynamic index body creation and await the async function.

Do not change the legacy _ensure_opensearch_index path.

Technical Plan

src/utils/embeddings.py

Add async helper _probe_ollama_embedding_dimension(endpoint, model_name) using httpx.AsyncClient that POSTs to {endpoint}/api/embeddings with a short string. Support both {model, input} and legacy {model, prompt} payloads. Infer dimension from len(embedding) or len(embeddings[0]).

Add async resolve_embedding_dimension(embedding_model, provider?, endpoint?) that:

If provider == "ollama" and endpoint provided → probe; if successful and > 0, return it

Else return the existing static-maps result via get_embedding_dimensions(); if unknown, that returns VECTOR_DIM

Make create_dynamic_index_body async, and compute dimensions via resolve_embedding_dimension(...). Keep the returned mapping identical except for using the resolved dimension.

src/main.py

Update init_index() to await create_dynamic_index_body(embedding_model, provider=config.provider.model_provider, endpoint=config.provider.endpoint)

Ensure imports and async usage are correct; no changes to _ensure_opensearch_index or other flows.

Acceptance Criteria

When provider is ollama and a valid endpoint is configured, the backend issues a single POST to /api/embeddings for the configured embedding model during index creation, and sets knn_vector.dimension to the returned vector length.

If the probe fails or the server returns unexpected JSON, the system falls back to static maps and then VECTOR_DIM=1536.

Non-Ollama providers (OpenAI, watsonx) behave exactly as before.

init_index runs without raising, and the index mapping contains the resolved dimension value.

Notes

This executes only at index creation time (after onboarding), so no per-request overhead.

We can optionally add in-memory caching later if desired.

Files to modify

src/utils/embeddings.py: implement probe + async create_dynamic_index_body and helpers.

src/main.py: pass provider/endpoint and await new async function.

No images are attached.

*This pull request was created as a result of the following prompt from Copilot chat.* > Summary > Replace the hardcoded Ollama embedding dimensions with a dynamic resolution that probes the running Ollama server to determine the actual vector dimension for the selected embedding model. Maintain static maps and default fallback for unknown models or failure cases. Ensure the OpenSearch index is created with the correct knn_vector.dimension based on the resolved size. > > Rationale > Currently, OLLAMA_EMBEDDING_DIMENSIONS in src/config/settings.py contains a hardcoded subset of model→dimension mappings. This is brittle and quickly becomes stale. Ollama exposes an embeddings API that returns the embedding vector, allowing us to infer the true dimension at runtime. We should use that during index creation, while preserving fallbacks. > > Scope > - Implement a probe that calls the Ollama embeddings endpoint and infers vector length for the configured embedding model. > - Update create_dynamic_index_body to resolve dimensions dynamically using the probe when provider == "ollama", falling back to static maps and then VECTOR_DIM when probing fails. > - Update main.py to pass provider and endpoint into the dynamic index body creation and await the async function. > - Do not change the legacy _ensure_opensearch_index path. > > Technical Plan > 1) src/utils/embeddings.py > - Add async helper _probe_ollama_embedding_dimension(endpoint, model_name) using httpx.AsyncClient that POSTs to {endpoint}/api/embeddings with a short string. Support both {model, input} and legacy {model, prompt} payloads. Infer dimension from len(embedding) or len(embeddings[0]). > - Add async resolve_embedding_dimension(embedding_model, provider?, endpoint?) that: > - If provider == "ollama" and endpoint provided → probe; if successful and > 0, return it > - Else return the existing static-maps result via get_embedding_dimensions(); if unknown, that returns VECTOR_DIM > - Make create_dynamic_index_body async, and compute dimensions via resolve_embedding_dimension(...). Keep the returned mapping identical except for using the resolved dimension. > > 2) src/main.py > - Update init_index() to await create_dynamic_index_body(embedding_model, provider=config.provider.model_provider, endpoint=config.provider.endpoint) > - Ensure imports and async usage are correct; no changes to _ensure_opensearch_index or other flows. > > Acceptance Criteria > - When provider is ollama and a valid endpoint is configured, the backend issues a single POST to /api/embeddings for the configured embedding model during index creation, and sets knn_vector.dimension to the returned vector length. > - If the probe fails or the server returns unexpected JSON, the system falls back to static maps and then VECTOR_DIM=1536. > - Non-Ollama providers (OpenAI, watsonx) behave exactly as before. > - init_index runs without raising, and the index mapping contains the resolved dimension value. > > Notes > - This executes only at index creation time (after onboarding), so no per-request overhead. > - We can optionally add in-memory caching later if desired. > > Files to modify > - src/utils/embeddings.py: implement probe + async create_dynamic_index_body and helpers. > - src/main.py: pass provider/endpoint and await new async function. > > No images are attached.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: phact <1313220+phact@users.noreply.github.com>

…ding-dimension-resolution

lucaseduoli

LGTM

Initial plan

59738f9

Copilot AI assigned Copilot and phact Oct 9, 2025

Copilot started work on behalf of phact October 9, 2025 01:19 View session

phact mentioned this pull request Oct 9, 2025

Dynamically get embedding sizes for ollama models #238

Closed

Implement dynamic Ollama embedding dimension resolution with probing

c05061c

Co-authored-by: phact <1313220+phact@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update embedding dimensions to be dynamic from Ollama API~~ Implement dynamic Ollama embedding dimension resolution with server probing Oct 9, 2025

Copilot AI requested a review from phact October 9, 2025 01:33

Copilot finished work on behalf of phact October 9, 2025 01:33

lucaseduoli added 2 commits October 9, 2025 14:02

Merge remote-tracking branch 'origin/main' into copilot/dynamic-embed…

389ded6

…ding-dimension-resolution

Fix Ollama probing

3f94dfc

lucaseduoli marked this pull request as ready for review October 9, 2025 17:28

phact and others added 3 commits October 9, 2025 13:43

raise instead of dims 0

1327a56

Show better error

72f2fd3

Run embedding probe before saving settings so that user can update

bc3752a

lucaseduoli self-requested a review October 9, 2025 18:41

lucaseduoli approved these changes Oct 9, 2025

View reviewed changes

phact approved these changes Oct 9, 2025

View reviewed changes

lucaseduoli merged commit 140d246 into main Oct 9, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement dynamic Ollama embedding dimension resolution with server probing#237

Implement dynamic Ollama embedding dimension resolution with server probing#237
lucaseduoli merged 7 commits into
mainfrom
copilot/dynamic-embedding-dimension-resolution

Copilot AI commented Oct 9, 2025 •

edited

Loading

Uh oh!

lucaseduoli left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

src/utils/embeddings.py (+120 lines)

src/main.py (+7 lines)

Behavior

Ollama with endpoint configured

Non-Ollama providers (OpenAI, watsonx)

Error handling

Benefits

Testing

Deployment

Uh oh!

lucaseduoli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Oct 9, 2025 •

edited

Loading