Skip to content

Add non-batch memory embedding concurrency control#57733

Closed
itilys wants to merge 1 commit intoopenclaw:mainfrom
itilys:feature/memorysearch-ollama-nonbatch-concurrency
Closed

Add non-batch memory embedding concurrency control#57733
itilys wants to merge 1 commit intoopenclaw:mainfrom
itilys:feature/memorysearch-ollama-nonbatch-concurrency

Conversation

@itilys
Copy link
Copy Markdown

@itilys itilys commented Mar 30, 2026

Summary

This PR adds agents.defaults.memorySearch.remote.nonBatchConcurrency to control indexing concurrency when memory embeddings run through the non-batch path.

This is especially useful for slower or local providers such as Ollama, where aggressive parallel indexing can cause timeout-heavy reindex failures on small machines like Raspberry Pi or low-resource VPS hosts.

What changed

  • added memorySearch.remote.nonBatchConcurrency config support
  • propagated the setting through resolved memory config
  • used it in non-batch indexing concurrency selection
  • added a conservative fallback for ollama to use concurrency 1
  • documented the new config key
  • added config merge coverage in tests

Why

Batch controls currently help only providers that expose batch embedding APIs.

Providers that fall back to non-batch indexing, especially Ollama, can still hit timeout failures because indexing concurrency remains too high for constrained hardware.

This change makes non-batch indexing tunable and improves local-first reliability.

Real-world motivation

In a Raspberry Pi-class environment running local Ollama:

  • memory reindex repeatedly timed out
  • reducing chunk size helped but was not sufficient
  • forcing effective non-batch concurrency to 1 made indexing complete successfully
  • semantic search worked correctly after reindex

Example config

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "provider": "ollama",
        "remote": {
          "baseUrl": "http://127.0.0.1:11434",
          "apiKey": "ollama",
          "nonBatchConcurrency": 1
        }
      }
    }
  }
}

@openclaw-barnacle openclaw-barnacle Bot added extensions: memory-core Extension: memory-core agents Agent runtime and tooling size: XS labels Mar 30, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 30, 2026

Greptile Summary

This PR adds a new agents.defaults.memorySearch.remote.nonBatchConcurrency configuration key to control the degree of parallelism used during non-batch embedding indexing. The change is targeted at local providers like Ollama that can be overwhelmed by the default concurrency of 4 on resource-constrained hardware (Raspberry Pi, low-memory VPS, etc.).

What changed:

  • nonBatchConcurrency?: number added to MemorySearchConfig (config type), ResolvedMemorySearchConfig (resolved type), and the mergeConfig function, following the exact same override-then-default pattern used for other remote fields.
  • getIndexConcurrency() in MemoryManagerEmbeddingOps now reads the resolved setting, clamps it to an integer ≥ 1, and falls back to a hardcoded 1 for the ollama provider before falling through to EMBEDDING_INDEX_CONCURRENCY (4).
  • All schema housekeeping files (labels, help text, generated schema, quality-test target list) are updated consistently.
  • A new test confirms that nonBatchConcurrency propagates correctly through the default→agent merge.

Notable behavior change: The Ollama fallback (return 1 when provider.id === \"ollama\" and no explicit config is present) silently reduces indexing concurrency from 4 to 1 for all Ollama users, not just those on constrained hardware. This is deliberately conservative and is overridable via the new key, but high-resource Ollama setups will see a 4× throughput reduction without any warning or log message.

Confidence Score: 5/5

Safe to merge — no logic errors; the only concern is a silent performance regression for Ollama users on capable machines, which is a deliberate conservative tradeoff and is fully overridable.

All P2 findings. The implementation is correct, well-validated (finite-number check + floor + max(1,…)), consistently propagated through the type and config system, and covered by a new test. The Ollama hardcoded default is a deliberate design choice documented in the PR description; the concern about powerful-machine users is a quality suggestion, not a bug.

extensions/memory-core/src/memory/manager-embedding-ops.ts — review the Ollama fallback and consider adding a diagnostic log line so users on high-resource machines can discover the new knob.

Important Files Changed

Filename Overview
extensions/memory-core/src/memory/manager-embedding-ops.ts getIndexConcurrency updated to respect nonBatchConcurrency config and add an Ollama-specific conservative default of 1; logic is correct but the Ollama fallback silently reduces concurrency for all Ollama users.
src/agents/memory-search.ts nonBatchConcurrency added to ResolvedMemorySearchConfig type and correctly propagated in mergeConfig using the same override-then-default ?? pattern as other remote fields.
src/config/types.tools.ts nonBatchConcurrency?: number added to the remote block of MemorySearchConfig with a clear JSDoc comment.
src/agents/memory-search.test.ts New test verifies that nonBatchConcurrency from defaults survives an agent-level override of other remote fields; expectMergedRemoteConfig helper extended cleanly with an optional extras parameter.
src/config/schema.base.generated.ts New schema entry for nonBatchConcurrency added with label, help text, and performance tag, consistent with adjacent entries.
src/config/schema.help.ts Help text for the new key added in alphabetical position between headers and batch.enabled.
src/config/schema.labels.ts Label for the new key added consistently with surrounding entries.
src/config/schema.help.quality.test.ts New key added to TARGET_KEYS coverage list to ensure it is covered by the schema help-quality tests.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/memory-core/src/memory/manager-embedding-ops.ts
Line: 550-552

Comment:
**Ollama default is a silent performance regression for high-resource setups**

The hardcoded `return 1` for Ollama changes the previous default from `EMBEDDING_INDEX_CONCURRENCY` (4) to 1 for all Ollama users, not only those on constrained hardware. Anyone running Ollama on a powerful machine (a local workstation with plenty of resources, for example) will see indexing throughput drop to 25% of its previous level with no warning.

Since `nonBatchConcurrency` is already checked first, the conservative default is effectively the right call for fragile setups and is overridable — but it would be worth a log-level note (debug or info) when this path is taken, so users can discover the setting if they notice slow indexing. Alternatively, a comment in the code documenting the tradeoff would help future maintainers understand why only Ollama gets this treatment.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "add non-batch memory embedding concurren..." | Re-trigger Greptile

Comment on lines +550 to +552
if (this.provider?.id === "ollama") {
return 1;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Ollama default is a silent performance regression for high-resource setups

The hardcoded return 1 for Ollama changes the previous default from EMBEDDING_INDEX_CONCURRENCY (4) to 1 for all Ollama users, not only those on constrained hardware. Anyone running Ollama on a powerful machine (a local workstation with plenty of resources, for example) will see indexing throughput drop to 25% of its previous level with no warning.

Since nonBatchConcurrency is already checked first, the conservative default is effectively the right call for fragile setups and is overridable — but it would be worth a log-level note (debug or info) when this path is taken, so users can discover the setting if they notice slow indexing. Alternatively, a comment in the code documenting the tradeoff would help future maintainers understand why only Ollama gets this treatment.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/memory-core/src/memory/manager-embedding-ops.ts
Line: 550-552

Comment:
**Ollama default is a silent performance regression for high-resource setups**

The hardcoded `return 1` for Ollama changes the previous default from `EMBEDDING_INDEX_CONCURRENCY` (4) to 1 for all Ollama users, not only those on constrained hardware. Anyone running Ollama on a powerful machine (a local workstation with plenty of resources, for example) will see indexing throughput drop to 25% of its previous level with no warning.

Since `nonBatchConcurrency` is already checked first, the conservative default is effectively the right call for fragile setups and is overridable — but it would be worth a log-level note (debug or info) when this path is taken, so users can discover the setting if they notice slow indexing. Alternatively, a comment in the code documenting the tradeoff would help future maintainers understand why only Ollama gets this treatment.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82b23db749

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/config/types.tools.ts
apiKey?: SecretInput;
headers?: Record<string, string>;
/** Max concurrent non-batch embedding tasks during indexing. Useful for slower local providers such as Ollama. */
nonBatchConcurrency?: number;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add nonBatchConcurrency to runtime config schema

This commit introduces memorySearch.remote.nonBatchConcurrency in config types and merge logic, but MemorySearchSchema.remote is still strict and does not include that field in src/config/zod-schema.agent-runtime.ts, so OpenClawSchema.safeParse will reject configs that set the new option as an unknown key. That makes the documented feature unusable and can fail config validation at startup for users who follow the new setting; the runtime schema needs a matching nonBatchConcurrency number field (with the intended constraints) for both defaults and agent overrides.

Useful? React with 👍 / 👎.

@steipete
Copy link
Copy Markdown
Contributor

Thanks @itilys. I carried this forward on main in 802f13a.

What landed:

  • memorySearch.remote.nonBatchConcurrency for inline/non-batch embedding indexing
  • separate batch vs non-batch concurrency handling
  • conservative Ollama default of 1 concurrent non-batch embedding request
  • config schema/help/docs updates, including Ollama memory-search examples
  • regression coverage for config merge and concurrency resolution

Verification on the landed commit path:

  • pnpm test src/agents/memory-search.test.ts extensions/memory-core/src/memory/manager-embedding-timeout.test.ts
  • live Ollama lane with OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_OLLAMA=1 ... pnpm test:live -- extensions/ollama/ollama.live.test.ts
  • pnpm check:changed

Closing this PR as superseded-by-main while preserving your credit in the changelog. Thanks again.

@steipete steipete closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: memory-core Extension: memory-core size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants