Skip to content

Conversation

@waleedlatif1
Copy link
Collaborator

Summary

  • add configurable concurrency to chunks processing, sped up 22x for large docs by processing chunks concurrently instead of serially

Type of Change

  • Performance

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Jan 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Review Updated (UTC)
docs Skipped Skipped Jan 5, 2026 8:24am

@waleedlatif1 waleedlatif1 changed the title improvement(kb): add configurable concurrency to chunks processing, sped up 22x for large docs improvement(kb): add configurable concurrency to chunks processing, speed up to 22x for large docs Jan 5, 2026
@waleedlatif1 waleedlatif1 changed the title improvement(kb): add configurable concurrency to chunks processing, speed up to 22x for large docs improvement(kb): add configurable concurrency to chunks processing, speedup up to 22x for large docs Jan 5, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile Summary

Replaced serial batch processing with concurrent worker pool pattern in embedding generation, achieving up to 22x speedup for large documents.

Key Changes:

  • Added processWithConcurrency helper function that implements a worker pool pattern to process embedding batches in parallel
  • Changed default KB_CONFIG_CONCURRENCY_LIMIT from 20→50 concurrent API calls
  • Changed default KB_CONFIG_BATCH_SIZE from 20→2000 chunks per batch
  • Removed KB_CONFIG_DELAY_BETWEEN_BATCHES delay (100ms→0ms) for maximum speed
  • Updated generateEmbeddings to use concurrent processing instead of sequential for loop
  • Made MAX_EMBEDDING_BATCH configurable via KB_CONFIG_BATCH_SIZE environment variable

How It Works:
The worker pool spawns up to 50 concurrent workers that pull batches from a shared queue and process them in parallel. Each worker processes embedding API calls with retry logic for rate limiting (429) and server errors (5xx). Results are collected in order and flattened before returning.

Performance Impact:
Processing moves from O(n) serial time to O(n/concurrency) parallel time, with actual speedup depending on network latency and API response times. The 22x speedup indicates the bottleneck was primarily I/O wait time rather than computation.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is clean and well-structured. The worker pool pattern is correctly implemented with proper index tracking to maintain result order. Existing retry logic with exponential backoff handles rate limiting gracefully. Configuration changes are reasonable defaults that can be overridden via environment variables. No breaking changes to API or behavior.
  • No files require special attention

Important Files Changed

Filename Overview
apps/sim/lib/knowledge/embeddings.ts Added concurrent batch processing with worker pool pattern - processes embedding batches in parallel instead of serially
apps/sim/lib/core/config/env.ts Updated configuration defaults: concurrency from 20→50, batch size from 20→2000, delay from 100ms→0ms for max speed
apps/sim/lib/knowledge/documents/service.ts Updated MAX_EMBEDDING_BATCH to use configurable KB_CONFIG_BATCH_SIZE environment variable

Sequence Diagram

sequenceDiagram
    participant Client
    participant generateEmbeddings
    participant processWithConcurrency
    participant Worker1
    participant Worker2
    participant WorkerN
    participant callEmbeddingAPI
    participant OpenAI/Azure

    Client->>generateEmbeddings: texts[], model, workspaceId
    generateEmbeddings->>generateEmbeddings: batchByTokenLimit(texts, 8000 tokens)
    Note over generateEmbeddings: Split into batches by token limit
    
    generateEmbeddings->>processWithConcurrency: batches[], concurrency=50
    Note over processWithConcurrency: Create worker pool (min of 50 or batch count)
    
    par Concurrent Workers
        processWithConcurrency->>Worker1: Process batch[0]
        processWithConcurrency->>Worker2: Process batch[1]
        processWithConcurrency->>WorkerN: Process batch[n]
    end
    
    par Parallel API Calls
        Worker1->>callEmbeddingAPI: batch texts
        Worker2->>callEmbeddingAPI: batch texts
        WorkerN->>callEmbeddingAPI: batch texts
    end
    
    par API Requests with Retry
        callEmbeddingAPI->>OpenAI/Azure: POST /embeddings
        OpenAI/Azure-->>callEmbeddingAPI: embeddings data
        Note over callEmbeddingAPI: Retry on 429/5xx with exponential backoff
    end
    
    Worker1-->>processWithConcurrency: batch embeddings
    Worker2-->>processWithConcurrency: batch embeddings
    WorkerN-->>processWithConcurrency: batch embeddings
    
    processWithConcurrency->>processWithConcurrency: Collect all results
    processWithConcurrency-->>generateEmbeddings: all batch results
    
    generateEmbeddings->>generateEmbeddings: batchResults.flat()
    generateEmbeddings-->>Client: embeddings[][]
Loading

@waleedlatif1 waleedlatif1 merged commit 0977ed2 into staging Jan 5, 2026
11 checks passed
@waleedlatif1 waleedlatif1 deleted the improvement/kb branch January 5, 2026 08:29
waleedlatif1 added a commit that referenced this pull request Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants