Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Adding USearch memory connector #2358

Merged
merged 12 commits into from Aug 23, 2023
Merged

Conversation

alexbarev
Copy link
Contributor

Motivation and Context

The integration of USearch as a memory connector to Semantic Kernel (SK).

Description

The USearch Index does not natively have the ability to store different collections, and it only stores embeddings without other attributes like MemoryRecord.

The USearchMemoryStore class encapsulates these capabilities. It uses the USearch Index to store a collection of embeddings under unique IDs, with original collection names mapped to those IDs. Other MemoryRecord attributes are stored in a pyarrow.Table, which is mapped to each collection.

It's important to note the current behavior when a user removes a record or upserts a new one with an existing ID: the old row is not removed from the pyarrow.Table. This is done for performance reasons but could lead to the table growing in size.

By default, USearchMemoryStore operates as an in-memory store. To enable persistence, you must set the persist mode with calling appropriate __init__ , supplying a path to the directory for the persist files. For each collection, two files will be created: {collection_name}.usearch and {collection_name}.parquet. Changes will only be dumped to the disk when close_async is called. Due to the interface provided by the base class MemoryStoreBase, this happens implicitly when using a context manager, or it may be called explicitly.

Since collection names are used to store files on disk, all names are converted to lowercase.

To ensure efficient use of memory, you should call close_async.

Contribution Checklist

@shawncal shawncal added python Pull requests for the Python Semantic Kernel memory connector labels Aug 8, 2023
@shawncal shawncal changed the title Adding USearch memory connector Python: Adding USearch memory connector Aug 8, 2023
@ashvardanian
Copy link

This is exciting! We are also working on C# bindings for USearch to allow broader integration with SK 🤗 cc @dluc

Fix: removing cast to `str` due to patch in USearch
@alexbarev
Copy link
Contributor Author

@microsoft-github-policy-service agree

@alexbarev alexbarev marked this pull request as ready for review August 9, 2023 13:50
@alexbarev alexbarev requested a review from a team as a code owner August 9, 2023 13:50
@dluc dluc self-assigned this Aug 14, 2023
@dluc
Copy link
Collaborator

dluc commented Aug 14, 2023

awesome, thank you @ashvardanian - I'll take a look asap (FYI, there's a quick git conflict to fix when you have a chance)

@ashvardanian
Copy link

Hey, @dluc! @AleksandrKent has updated the poetry file. It seems to be the only collision. But it will re-appear as soon as you have any other dependency updates, so we should try merging this sooner. Please let us know if anything has to be polished.

Copy link
Contributor

@awharrison-28 awharrison-28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution :)

Copy link
Collaborator

@dluc dluc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing file headers

@dluc dluc enabled auto-merge August 23, 2023 22:59
@dluc dluc disabled auto-merge August 23, 2023 23:18
@dluc dluc merged commit 3881a31 into microsoft:main Aug 23, 2023
29 checks passed
@alexbarev alexbarev deleted the usearch_py branch September 12, 2023 12:26
SOE-YoungS pushed a commit to SOE-YoungS/semantic-kernel that referenced this pull request Nov 1, 2023
### Motivation and Context

The integration of [USearch](https://github.com/unum-cloud/usearch) as a
memory connector to Semantic Kernel (SK).

### Description
     
The USearch `Index` does not natively have the ability to store
different collections, and it only stores embeddings without other
attributes like `MemoryRecord`.

The `USearchMemoryStore` class encapsulates these capabilities. It uses
the USearch `Index` to store a collection of embeddings under unique
IDs, with original collection names mapped to those IDs. Other
`MemoryRecord ` attributes are stored in a `pyarrow.Table`, which is
mapped to each collection.

It's important to note the current behavior when a user removes a record
or upserts a new one with an existing ID: the old row is not removed
from the `pyarrow.Table`. This is done for performance reasons but could
lead to the table growing in size.

By default, `USearchMemoryStore` operates as an in-memory store. To
enable persistence, you must set the persist mode with calling
appropriate `__init__ `, supplying a path to the directory for the
persist files. For each collection, two files will be created:
`{collection_name}.usearch` and `{collection_name}.parquet`. Changes
will only be dumped to the disk when `close_async` is called. Due to the
interface provided by the base class `MemoryStoreBase`, this happens
implicitly when using a context manager, or it may be called explicitly.

Since collection names are used to store files on disk, all names are
converted to lowercase.

To ensure efficient use of memory, you should call `close_async`.
---------

Co-authored-by: Abby Harrison <abby.harrison@microsoft.com>
Co-authored-by: Abby Harrison <54643756+awharrison-28@users.noreply.github.com>
Co-authored-by: Devis Lucato <dluc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory connector python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants