Completely rewrite add_messages_streaming#277
Conversation
Split embedding strategy (uncached chunk, cached related terms).
Add precomputed-embedding write paths for message and related-term indexes, introducing explicit *_with_embeddings methods in interfaces and both memory/SQLite implementations. Refactor existing add methods to compute embeddings once and delegate, enabling pipeline commit paths to reuse worker-generated embeddings without recomputation.
Previously typechat.Failure from the extractor was a soft error: the message was still committed (with no knowledge) and the failure recorded. Since LLM responses are non-deterministic, a Failure is just as unreliable as a raised exception, so both now stop the pipeline at the failing message and propagate the error. - Remove extraction_failure_msg from ChunkProcessingResult and _ChunkCommitResult; simplify _commit_batch_from_chunk_results - Keep stop_state.exception in sync with stop_at_message_id so it always reflects the lowest-ordinal failing message - Update tests accordingly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace nested try/except* handling with ExceptionGroup handling. - Preserve producer_state and stop_state exceptions and raise a combined ExceptionGroup when multiple distinct failures occur. - Complete ChunkProcessingResult docstring with all class fields and clarify success semantics.
…er ConversationSettings)
…ssages are added.
|
@KRRT7 If you still care about typeagent-py I'd appreciate your review! |
|
I'll review it in the morning |
|
|
||
| # Step 2: Generate embeddings (only if extraction succeeded) | ||
| try: | ||
| result.chunk_embedding = await embedding_model.get_embedding_nocache(chunk_text) |
There was a problem hiding this comment.
chunk_embedding is computed but never written to the message text index
There was a problem hiding this comment.
That's a deep one! Need to dig more for this.
gvanrossum
left a comment
There was a problem hiding this comment.
@bmerkle It's ready for your re-review. I fixed the chunk embeddings issue. Also had added one more PR since your review came in.
…remental_with_embeddings Chunk embeddings are already consumed by messages.extend(); passing them through to this function served no purpose. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After the success-check early return, extracted_knowledge is guaranteed non-None; an assert communicates this to pyright and catches programmer errors at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ing] Both the IMessageCollection protocol and the MemoryMessageCollection concrete class now use the specific type. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add index_messages=False flag to IMessageCollection.extend() so callers can skip message text index population. Deserialization uses it because message_index.deserialize() will replace the index from the sidecar file anyway. Also change generate_embeddings(cache=True) to cache=False since indexing embeddings should not be cached. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@bmerkle Claude and I addressed all your feedback. Please re-review. |
Review findingsAll 734 tests pass on the branch. The prior-round issues (type narrowing, dead param, type tightening, deserialization wasted work) look good. The following are new observations not covered in earlier threads. Correctness
Silent embedding = chunk_embedding_map.get((msg_ord, chunk_ord))
if embedding is not None:
chunk_embeddings.append(embedding)The validation loop above this already raises
If Missing comment on replace-not-append semantics (
PerformanceUnnecessary list allocations in hot path (
for action in list(knowledge.actions) + list(knowledge.inverse_actions):This allocates three lists (two coercions + one concatenation) to iterate once. Use Extraction and chunk embedding are sequential (
knowledge_result, chunk_embedding = await asyncio.gather(
knowledge_extractor.extract(chunk_text),
embedding_model.get_embedding_nocache(chunk_text),
)At
Semaphore held through try:
...compute result...
await result_queue.put(result) # can block when queue full
finally:
sem.release() # held until put() completesWhen the reassembler is blocked on a commit, |
… embedding and extraction+related-embeddings with semaphore
|
@KRRT7, I think Claude and I have addressed your concerns in the most recent 8 commits (one per issue). |
Aside from these, the PR looks solid and ready to go. |
|
Thanks! Fixed. |
|
LGTM now |
|
I have one failing tests, however not really stable fail. output appended below should be proceed with this PR and file a new bug ? |
|
additional failing test... |
|
I think those tests are inherently flaky -- they depend on the LLM service being up, speedy, and returning the right answer. You can file a bug for that and in the meantime merge this PR. |
|
Thanks! |
The throughput is now much higher -- e.g. with concurrency 10 and batch size 10, the Adrian podcast ingest in 40 seconds, compared to 90 on main (with the previous pipelining implementation).
A consequence of the new design is that the message index is now populated at the time messages are added -- the secondary index building no longer needs to do this.