Python: Adding USearch memory connector #2358

alexbarev · 2023-08-08T15:38:28Z

Motivation and Context

The integration of USearch as a memory connector to Semantic Kernel (SK).

Description

The USearch Index does not natively have the ability to store different collections, and it only stores embeddings without other attributes like MemoryRecord.

The USearchMemoryStore class encapsulates these capabilities. It uses the USearch Index to store a collection of embeddings under unique IDs, with original collection names mapped to those IDs. Other MemoryRecord attributes are stored in a pyarrow.Table, which is mapped to each collection.

It's important to note the current behavior when a user removes a record or upserts a new one with an existing ID: the old row is not removed from the pyarrow.Table. This is done for performance reasons but could lead to the table growing in size.

By default, USearchMemoryStore operates as an in-memory store. To enable persistence, you must set the persist mode with calling appropriate __init__ , supplying a path to the directory for the persist files. For each collection, two files will be created: {collection_name}.usearch and {collection_name}.parquet. Changes will only be dumped to the disk when close_async is called. Due to the interface provided by the base class MemoryStoreBase, this happens implicitly when using a context manager, or it may be called explicitly.

Since collection names are used to store files on disk, all names are converted to lowercase.

To ensure efficient use of memory, you should call close_async.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

ashvardanian · 2023-08-08T15:54:28Z

This is exciting! We are also working on C# bindings for USearch to allow broader integration with SK 🤗 cc @dluc

Fix: removing cast to `str` due to patch in USearch

alexbarev · 2023-08-08T16:51:45Z

@microsoft-github-policy-service agree

Refactor: method naming Docs: update to fit changes

Docs: clarification

dluc · 2023-08-14T23:12:50Z

awesome, thank you @ashvardanian - I'll take a look asap (FYI, there's a quick git conflict to fix when you have a chance)

ashvardanian · 2023-08-16T12:08:06Z

Hey, @dluc! @AleksandrKent has updated the poetry file. It seems to be the only collision. But it will re-appear as soon as you have any other dependency updates, so we should try merging this sooner. Please let us know if anything has to be polished.

awharrison-28

Thank you for this contribution :)

dluc

missing file headers

python/tests/integration/connectors/memory/test_usearch.py

python/semantic_kernel/connectors/memory/usearch/__init__.py

python/semantic_kernel/connectors/memory/usearch/usearch_memory_store.py

…y_store.py

### Motivation and Context The integration of [USearch](https://github.com/unum-cloud/usearch) as a memory connector to Semantic Kernel (SK). ### Description The USearch `Index` does not natively have the ability to store different collections, and it only stores embeddings without other attributes like `MemoryRecord`. The `USearchMemoryStore` class encapsulates these capabilities. It uses the USearch `Index` to store a collection of embeddings under unique IDs, with original collection names mapped to those IDs. Other `MemoryRecord ` attributes are stored in a `pyarrow.Table`, which is mapped to each collection. It's important to note the current behavior when a user removes a record or upserts a new one with an existing ID: the old row is not removed from the `pyarrow.Table`. This is done for performance reasons but could lead to the table growing in size. By default, `USearchMemoryStore` operates as an in-memory store. To enable persistence, you must set the persist mode with calling appropriate `__init__ `, supplying a path to the directory for the persist files. For each collection, two files will be created: `{collection_name}.usearch` and `{collection_name}.parquet`. Changes will only be dumped to the disk when `close_async` is called. Due to the interface provided by the base class `MemoryStoreBase`, this happens implicitly when using a context manager, or it may be called explicitly. Since collection names are used to store files on disk, all names are converted to lowercase. To ensure efficient use of memory, you should call `close_async`. --------- Co-authored-by: Abby Harrison <abby.harrison@microsoft.com> Co-authored-by: Abby Harrison <54643756+awharrison-28@users.noreply.github.com> Co-authored-by: Devis Lucato <dluc@users.noreply.github.com>

Adding USearch memory connector

f644702

shawncal added python Pull requests for the Python Semantic Kernel memory connector labels Aug 8, 2023

shawncal changed the title ~~Adding USearch memory connector~~ Python: Adding USearch memory connector Aug 8, 2023

Fix: free Index and metadata in close_async

4e1abdb

Fix: removing cast to `str` due to patch in USearch

alexbarev added 3 commits August 9, 2023 14:07

Fix: previous freeing logic of collections

b3e469b

Refactor: method naming Docs: update to fit changes

Refactor: pyarrow.schema

e13e310

Docs: clarification

Merge remote-tracking branch 'upstream/main' into usearch_py

4194d7f

alexbarev marked this pull request as ready for review August 9, 2023 13:50

alexbarev requested a review from a team as a code owner August 9, 2023 13:50

dluc self-assigned this Aug 14, 2023

Merge branch 'main' into usearch_py

22e0432

resolved conflicts with main and fixed spelling error

255ca6e

awharrison-28 approved these changes Aug 23, 2023

View reviewed changes

Merge branch 'main' into usearch_py

0fe2691

dluc reviewed Aug 23, 2023

View reviewed changes

python/tests/integration/connectors/memory/test_usearch.py Show resolved Hide resolved

python/semantic_kernel/connectors/memory/usearch/__init__.py Show resolved Hide resolved

python/semantic_kernel/connectors/memory/usearch/usearch_memory_store.py Show resolved Hide resolved

dluc added 3 commits August 23, 2023 15:58

Update python/tests/integration/connectors/memory/test_usearch.py

5500434

Update python/semantic_kernel/connectors/memory/usearch/__init__.py

d806b65

Update python/semantic_kernel/connectors/memory/usearch/usearch_memor…

3c90d65

…y_store.py

dluc approved these changes Aug 23, 2023

View reviewed changes

dluc enabled auto-merge August 23, 2023 22:59

Merge branch 'main' into usearch_py

2200a5a

dluc mentioned this pull request Aug 23, 2023

.Net: dotnet build pipelines should not run when merging a python PR #2556

Closed

dluc disabled auto-merge August 23, 2023 23:18

dluc merged commit 3881a31 into microsoft:main Aug 23, 2023
29 checks passed

alexbarev deleted the usearch_py branch September 12, 2023 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Adding USearch memory connector #2358

Python: Adding USearch memory connector #2358

alexbarev commented Aug 8, 2023

ashvardanian commented Aug 8, 2023

alexbarev commented Aug 8, 2023

dluc commented Aug 14, 2023

ashvardanian commented Aug 16, 2023

awharrison-28 left a comment

dluc left a comment

Python: Adding USearch memory connector #2358

Python: Adding USearch memory connector #2358

Conversation

alexbarev commented Aug 8, 2023

Motivation and Context

Description

Contribution Checklist

ashvardanian commented Aug 8, 2023

alexbarev commented Aug 8, 2023

dluc commented Aug 14, 2023

ashvardanian commented Aug 16, 2023

awharrison-28 left a comment

Choose a reason for hiding this comment

dluc left a comment

Choose a reason for hiding this comment