Skip to content

Fix Collection.insert silent failure bug#7

Merged
sushanthpy merged 6 commits intomainfrom
release/0.3.7
Jan 10, 2026
Merged

Fix Collection.insert silent failure bug#7
sushanthpy merged 6 commits intomainfrom
release/0.3.7

Conversation

@sushanthpy
Copy link
Copy Markdown
Member

  • Collection.insert_batch now persists data via KV layer transactions
  • Fixed Collection.get to retrieve documents from storage
  • Fixed Collection.count to scan prefix and return actual count
  • Fixed Collection.delete to remove documents from storage
  • Added _vector_key() and _vectors_prefix() helper methods

Bug: insert() validated dimensions but never stored data (stub implementation) All Collection tests now passing.

- Collection.insert_batch now persists data via KV layer transactions
- Fixed Collection.get to retrieve documents from storage
- Fixed Collection.count to scan prefix and return actual count
- Fixed Collection.delete to remove documents from storage
- Added _vector_key() and _vectors_prefix() helper methods

Bug: insert() validated dimensions but never stored data (stub implementation)
All Collection tests now passing.
- _vector_search: Cosine similarity on KV-stored vectors
- _keyword_search: Simple text matching with term frequency
- _hybrid_search: RRF fusion of vector and keyword results
- Added _matches_filter helper for metadata filtering

All search methods now return actual results from stored data.
This is a baseline implementation - can be replaced with FFI for HNSW.
namespace.py:
- create_collection: Persists config to KV storage
- get_collection: Loads from storage if not in cache
- list_collections: Scans storage prefix for all collections
- delete_collection: Removes config and all vectors from storage

database.py:
- stats(): Returns placeholder dict (accurate count needs FFI)
- checkpoint(): Safely calls FFI if available, no-op otherwise

Collections now persist across db close/reopen.
- cache_put: Added KV fallback that stores JSON with embedding, TTL, timestamp
- cache_get: Added KV fallback with cosine similarity matching
- Both try FFI first, fall back to KV storage if unavailable
- Handles TTL expiration, threshold filtering
- Test shows 100% cache hit rate with similar queries
- Vector search now normalizes cosine similarity from [-1,1] to [0,1]
- Cache get also normalized to match
- Fixes threshold comparisons (0.85-0.9 now meaningful)
- Prevents cache miss due to similarity range mismatch
- Add ffi_collection_search and ffi_collection_keyword_search to Database.
- Update Collection to prefer FFI search methods with Python fallback.
- Standardize FFI handle usage and fix toondb_query_temporal_graph signature.
- Add gRPC dependencies to pyproject.toml.
- Ignore _bin/ directory.
@sushanthpy sushanthpy marked this pull request as ready for review January 10, 2026 04:31
@sushanthpy sushanthpy merged commit 09faed7 into main Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant