improvement(kb): add configurable concurrency to chunks processing, speedup up to 22x for large docs #2681

waleedlatif1 · 2026-01-05T08:24:53Z

Summary

add configurable concurrency to chunks processing, sped up 22x for large docs by processing chunks concurrently instead of serially

Type of Change

Performance

Testing

Tested manually

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…ped up 22x for large docs

vercel · 2026-01-05T08:24:58Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Review	Updated (UTC)
docs	Skipped		Jan 5, 2026 8:24am

greptile-apps · 2026-01-05T08:29:19Z

Greptile Summary

Replaced serial batch processing with concurrent worker pool pattern in embedding generation, achieving up to 22x speedup for large documents.

Key Changes:

Added processWithConcurrency helper function that implements a worker pool pattern to process embedding batches in parallel
Changed default KB_CONFIG_CONCURRENCY_LIMIT from 20→50 concurrent API calls
Changed default KB_CONFIG_BATCH_SIZE from 20→2000 chunks per batch
Removed KB_CONFIG_DELAY_BETWEEN_BATCHES delay (100ms→0ms) for maximum speed
Updated generateEmbeddings to use concurrent processing instead of sequential for loop
Made MAX_EMBEDDING_BATCH configurable via KB_CONFIG_BATCH_SIZE environment variable

How It Works:
The worker pool spawns up to 50 concurrent workers that pull batches from a shared queue and process them in parallel. Each worker processes embedding API calls with retry logic for rate limiting (429) and server errors (5xx). Results are collected in order and flattened before returning.

Performance Impact:
Processing moves from O(n) serial time to O(n/concurrency) parallel time, with actual speedup depending on network latency and API response times. The 22x speedup indicates the bottleneck was primarily I/O wait time rather than computation.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The implementation is clean and well-structured. The worker pool pattern is correctly implemented with proper index tracking to maintain result order. Existing retry logic with exponential backoff handles rate limiting gracefully. Configuration changes are reasonable defaults that can be overridden via environment variables. No breaking changes to API or behavior.
No files require special attention

Important Files Changed

Filename	Overview
apps/sim/lib/knowledge/embeddings.ts	Added concurrent batch processing with worker pool pattern - processes embedding batches in parallel instead of serially
apps/sim/lib/core/config/env.ts	Updated configuration defaults: concurrency from 20→50, batch size from 20→2000, delay from 100ms→0ms for max speed
apps/sim/lib/knowledge/documents/service.ts	Updated `MAX_EMBEDDING_BATCH` to use configurable `KB_CONFIG_BATCH_SIZE` environment variable

Sequence Diagram

sequenceDiagram
    participant Client
    participant generateEmbeddings
    participant processWithConcurrency
    participant Worker1
    participant Worker2
    participant WorkerN
    participant callEmbeddingAPI
    participant OpenAI/Azure

    Client->>generateEmbeddings: texts[], model, workspaceId
    generateEmbeddings->>generateEmbeddings: batchByTokenLimit(texts, 8000 tokens)
    Note over generateEmbeddings: Split into batches by token limit
    
    generateEmbeddings->>processWithConcurrency: batches[], concurrency=50
    Note over processWithConcurrency: Create worker pool (min of 50 or batch count)
    
    par Concurrent Workers
        processWithConcurrency->>Worker1: Process batch[0]
        processWithConcurrency->>Worker2: Process batch[1]
        processWithConcurrency->>WorkerN: Process batch[n]
    end
    
    par Parallel API Calls
        Worker1->>callEmbeddingAPI: batch texts
        Worker2->>callEmbeddingAPI: batch texts
        WorkerN->>callEmbeddingAPI: batch texts
    end
    
    par API Requests with Retry
        callEmbeddingAPI->>OpenAI/Azure: POST /embeddings
        OpenAI/Azure-->>callEmbeddingAPI: embeddings data
        Note over callEmbeddingAPI: Retry on 429/5xx with exponential backoff
    end
    
    Worker1-->>processWithConcurrency: batch embeddings
    Worker2-->>processWithConcurrency: batch embeddings
    WorkerN-->>processWithConcurrency: batch embeddings
    
    processWithConcurrency->>processWithConcurrency: Collect all results
    processWithConcurrency-->>generateEmbeddings: all batch results
    
    generateEmbeddings->>generateEmbeddings: batchResults.flat()
    generateEmbeddings-->>Client: embeddings[][]

…ped up 22x for large docs (#2681)

improvement(kb): add configurable concurrency to chunks processing, s…

f5548c3

…ped up 22x for large docs

waleedlatif1 changed the title ~~improvement(kb): add configurable concurrency to chunks processing, sped up 22x for large docs~~ improvement(kb): add configurable concurrency to chunks processing, speed up to 22x for large docs Jan 5, 2026

waleedlatif1 changed the title ~~improvement(kb): add configurable concurrency to chunks processing, speed up to 22x for large docs~~ improvement(kb): add configurable concurrency to chunks processing, speedup up to 22x for large docs Jan 5, 2026

waleedlatif1 merged commit 0977ed2 into staging Jan 5, 2026
11 checks passed

waleedlatif1 deleted the improvement/kb branch January 5, 2026 08:29

waleedlatif1 mentioned this pull request Jan 5, 2026

v0.5.50: import improvements, ui upgrades, kb styling and performance improvements #2682

Merged

waleedlatif1 added a commit that referenced this pull request Jan 8, 2026

improvement(kb): add configurable concurrency to chunks processing, s…

c7440cb

…ped up 22x for large docs (#2681)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improvement(kb): add configurable concurrency to chunks processing, speedup up to 22x for large docs #2681

improvement(kb): add configurable concurrency to chunks processing, speedup up to 22x for large docs #2681

Uh oh!

waleedlatif1 commented Jan 5, 2026

Uh oh!

vercel bot commented Jan 5, 2026

Uh oh!

greptile-apps bot commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

improvement(kb): add configurable concurrency to chunks processing, speedup up to 22x for large docs #2681

improvement(kb): add configurable concurrency to chunks processing, speedup up to 22x for large docs #2681

Uh oh!

Conversation

waleedlatif1 commented Jan 5, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel bot commented Jan 5, 2026

Uh oh!

greptile-apps bot commented Jan 5, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants