I'm looking to scale ingestion to millions of messages and hitting a throughput bottleneck in extract_knowledge_from_text_batch.
The call site in semrefindex.py passes len(text_batch) as concurrency, so within a batch it does parallelize. But batches themselves are sequential in _add_llm_knowledge_incremental:
for text_location_batch in batches:
await semrefindex.add_batch_to_semantic_ref_index_from_list(...)
With batch_size=50, that's 50 concurrent extraction calls, wait for all, then next 50. A single 15-chunk message (88K chars) took 317 seconds and produced 793 semrefs — quality is great, throughput is rough.
Some initial ideas, but wanted to get the team's input on the right approach:
- Exposing concurrency as a configurable setting rather than hardcoding it
- Pipelining the embedding step with extraction — right now embedding finishes completely before extraction starts
max_chars_per_chunk on KnowledgeExtractor is defined and has the TODO on line 27 of convknowledge.py but isn't read anywhere yet — without it, large messages hit the embedding model's 8K token limit and I had to add chunking on the ingestion side
Open to guidance on what would be most useful to tackle first.
I'm looking to scale ingestion to millions of messages and hitting a throughput bottleneck in
extract_knowledge_from_text_batch.The call site in
semrefindex.pypasseslen(text_batch)as concurrency, so within a batch it does parallelize. But batches themselves are sequential in_add_llm_knowledge_incremental:With
batch_size=50, that's 50 concurrent extraction calls, wait for all, then next 50. A single 15-chunk message (88K chars) took 317 seconds and produced 793 semrefs — quality is great, throughput is rough.Some initial ideas, but wanted to get the team's input on the right approach:
max_chars_per_chunkonKnowledgeExtractoris defined and has the TODO on line 27 ofconvknowledge.pybut isn't read anywhere yet — without it, large messages hit the embedding model's 8K token limit and I had to add chunking on the ingestion sideOpen to guidance on what would be most useful to tackle first.