Fix Collection.insert silent failure bug#7
Merged
sushanthpy merged 6 commits intomainfrom Jan 10, 2026
Merged
Conversation
- Collection.insert_batch now persists data via KV layer transactions - Fixed Collection.get to retrieve documents from storage - Fixed Collection.count to scan prefix and return actual count - Fixed Collection.delete to remove documents from storage - Added _vector_key() and _vectors_prefix() helper methods Bug: insert() validated dimensions but never stored data (stub implementation) All Collection tests now passing.
- _vector_search: Cosine similarity on KV-stored vectors - _keyword_search: Simple text matching with term frequency - _hybrid_search: RRF fusion of vector and keyword results - Added _matches_filter helper for metadata filtering All search methods now return actual results from stored data. This is a baseline implementation - can be replaced with FFI for HNSW.
namespace.py: - create_collection: Persists config to KV storage - get_collection: Loads from storage if not in cache - list_collections: Scans storage prefix for all collections - delete_collection: Removes config and all vectors from storage database.py: - stats(): Returns placeholder dict (accurate count needs FFI) - checkpoint(): Safely calls FFI if available, no-op otherwise Collections now persist across db close/reopen.
- cache_put: Added KV fallback that stores JSON with embedding, TTL, timestamp - cache_get: Added KV fallback with cosine similarity matching - Both try FFI first, fall back to KV storage if unavailable - Handles TTL expiration, threshold filtering - Test shows 100% cache hit rate with similar queries
- Vector search now normalizes cosine similarity from [-1,1] to [0,1] - Cache get also normalized to match - Fixes threshold comparisons (0.85-0.9 now meaningful) - Prevents cache miss due to similarity range mismatch
- Add ffi_collection_search and ffi_collection_keyword_search to Database. - Update Collection to prefer FFI search methods with Python fallback. - Standardize FFI handle usage and fix toondb_query_temporal_graph signature. - Add gRPC dependencies to pyproject.toml. - Ignore _bin/ directory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug: insert() validated dimensions but never stored data (stub implementation) All Collection tests now passing.