Fix Collection.insert silent failure bug by sushanthpy · Pull Request #7 · sochdb/sochdb-python-sdk

sushanthpy · 2026-01-09T19:35:53Z

Collection.insert_batch now persists data via KV layer transactions
Fixed Collection.get to retrieve documents from storage
Fixed Collection.count to scan prefix and return actual count
Fixed Collection.delete to remove documents from storage
Added _vector_key() and _vectors_prefix() helper methods

Bug: insert() validated dimensions but never stored data (stub implementation) All Collection tests now passing.

- Collection.insert_batch now persists data via KV layer transactions - Fixed Collection.get to retrieve documents from storage - Fixed Collection.count to scan prefix and return actual count - Fixed Collection.delete to remove documents from storage - Added _vector_key() and _vectors_prefix() helper methods Bug: insert() validated dimensions but never stored data (stub implementation) All Collection tests now passing.

- _vector_search: Cosine similarity on KV-stored vectors - _keyword_search: Simple text matching with term frequency - _hybrid_search: RRF fusion of vector and keyword results - Added _matches_filter helper for metadata filtering All search methods now return actual results from stored data. This is a baseline implementation - can be replaced with FFI for HNSW.

namespace.py: - create_collection: Persists config to KV storage - get_collection: Loads from storage if not in cache - list_collections: Scans storage prefix for all collections - delete_collection: Removes config and all vectors from storage database.py: - stats(): Returns placeholder dict (accurate count needs FFI) - checkpoint(): Safely calls FFI if available, no-op otherwise Collections now persist across db close/reopen.

- cache_put: Added KV fallback that stores JSON with embedding, TTL, timestamp - cache_get: Added KV fallback with cosine similarity matching - Both try FFI first, fall back to KV storage if unavailable - Handles TTL expiration, threshold filtering - Test shows 100% cache hit rate with similar queries

- Vector search now normalizes cosine similarity from [-1,1] to [0,1] - Cache get also normalized to match - Fixes threshold comparisons (0.85-0.9 now meaningful) - Prevents cache miss due to similarity range mismatch

- Add ffi_collection_search and ffi_collection_keyword_search to Database. - Update Collection to prefer FFI search methods with Python fallback. - Standardize FFI handle usage and fix toondb_query_temporal_graph signature. - Add gRPC dependencies to pyproject.toml. - Ignore _bin/ directory.

sushanthpy added 6 commits January 9, 2026 11:35

Normalize cosine similarity to [0,1] range

834fbb2

- Vector search now normalizes cosine similarity from [-1,1] to [0,1] - Cache get also normalized to match - Fixes threshold comparisons (0.85-0.9 now meaningful) - Prevents cache miss due to similarity range mismatch

sushanthpy marked this pull request as ready for review January 10, 2026 04:31

sushanthpy merged commit 09faed7 into main Jan 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Collection.insert silent failure bug#7

Fix Collection.insert silent failure bug#7
sushanthpy merged 6 commits intomainfrom
release/0.3.7

sushanthpy commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sushanthpy commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant