Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Change VolatileMemoryStore.CreateCollectionAsync to not throw when collection already exists #2300

Merged

Conversation

dehoward
Copy link
Contributor

@dehoward dehoward commented Aug 3, 2023

Motivation and Context

today the VolatileMemoryStore will throw when creating a collection if it already exists. other stores (e.g. Postgres, DuckDB) follow a different pattern and return gracefully. throwing an error can create a problem for parallel writes to the memory store, which is being added in ChatCopilot here.

Description

Contribution Checklist

updates the volatile memory store to only throw for valid exceptions
when calling `CreateCollectionAsync`. `TryAdd` is expected to return
false when the key already exists, and throwing for this case can
prevent concurrent memory adds.
@dehoward dehoward added the PR: ready for review All feedback addressed, ready for reviews label Aug 3, 2023
@dehoward dehoward self-assigned this Aug 3, 2023
@dehoward dehoward requested a review from a team as a code owner August 3, 2023 12:55
@shawncal shawncal added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel kernel.core labels Aug 3, 2023
@dsgrieve
Copy link
Member

dsgrieve commented Aug 3, 2023

I think this is a good change but it is contrary to the contract of CreateCollectionAsync in IMemoryStore. If you're going to do this, then I think you need to change the contract of CreateCollectionAsync and make this change to all implementations of IMemoryStore.

@dehoward
Copy link
Contributor Author

dehoward commented Aug 3, 2023

I think this is a good change but it is contrary to the contract of CreateCollectionAsync in IMemoryStore. If you're going to do this, then I think you need to change the contract of CreateCollectionAsync and make this change to all implementations of IMemoryStore.

@dsgrieve contrary in what way? the contract of IMemoryStore.CreateCollectionAsync doesn't specify what the behavior should be. DuckDB, Postgres, and Redis already implement this while Pinecone, Redis, and Chroma don't, so there doesn't seem to be a clear pattern here.

with that said, I think I could update this change to check to see if the collection already exists before throwing the error, which is more similar to what we're doing in other places.

@dsgrieve
Copy link
Member

dsgrieve commented Aug 3, 2023

I think this is a good change but it is contrary to the contract of CreateCollectionAsync in IMemoryStore. If you're going to do this, then I think you need to change the contract of CreateCollectionAsync and make this change to all implementations of IMemoryStore.

@dsgrieve contrary in what way? the contract of IMemoryStore.CreateCollectionAsync doesn't specify what the behavior should be. DuckDB, Postgres, and Redis already implement this while Pinecone, Redis, and Chroma don't, so there doesn't seem to be a clear pattern here.

with that said, I think I could update this change to check to see if the collection already exists before throwing the error, which is more similar to what we're doing in other places.

At the time I ported IMemoryStore.cs to Java, I recall the code throwing an exception if the collection already existed. Now, it seems that the .NET implementation has drifted quite far from the Java implementation and I have some catching up to do.

Sorry for the noise.

Copy link
Member

@dmytrostruk dmytrostruk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DuckDB, Postgres, and Redis already implement this while Pinecone, Redis, and Chroma don't, so there doesn't seem to be a clear pattern here

@dehoward @dsgrieve I think it is expected behavior because different DBs have different API. It depends how consumers will use IMemoryStore interface. For example, SemanticTextMemory has a check if collection exists before its creation:

if (!(await this._storage.DoesCollectionExistAsync(collection, cancellationToken).ConfigureAwait(false)))
{
await this._storage.CreateCollectionAsync(collection, cancellationToken).ConfigureAwait(false);
}

The only difference would be when you use concrete DB directly instead of using it through Kernel.

In case of VolatileMemoryStore, since it's ConcurrentDictionary, I wouldn't throw an exception, just to make CreateCollection operation idempotent.

@dmytrostruk dmytrostruk added the PR: ready to merge PR has been approved by all reviewers, and is ready to merge. label Aug 3, 2023
@shawncal
Copy link
Member

shawncal commented Aug 3, 2023

I think this is a good change but it is contrary to the contract of CreateCollectionAsync in IMemoryStore. If you're going to do this, then I think you need to change the contract of CreateCollectionAsync and make this change to all implementations of IMemoryStore.

@dsgrieve contrary in what way? the contract of IMemoryStore.CreateCollectionAsync doesn't specify what the behavior should be. DuckDB, Postgres, and Redis already implement this while Pinecone, Redis, and Chroma don't, so there doesn't seem to be a clear pattern here.
with that said, I think I could update this change to check to see if the collection already exists before throwing the error, which is more similar to what we're doing in other places.

At the time I ported IMemoryStore.cs to Java, I recall the code throwing an exception if the collection already existed. Now, it seems that the .NET implementation has drifted quite far from the Java implementation and I have some catching up to do.

Sorry for the noise.

Not noise. This is a good consideration, and was also my instinct too. I'm not sure this is what that function should do, but whatever it does, we should document it and standardize it everywhere (all connectors, all languages).

We may want to change method to (or add a new one) that's named more clearly, like CreateCollectionIfNotExists or
GetCollection(bool createIfNotExists)

@shawncal
Copy link
Member

shawncal commented Aug 3, 2023

Member

My 2c: these functions are not really meant for the C# developer; they're for the AI to call. Throwing exceptions will just inevitably break AI-driven scenarios, so we should make SKFunctions as resilient as possible. When in doubt, defer to the AI's likely intent of calling the function, and make it do the logical thing, not relying on exceptions as control logic. In this case, the intent is clearly to create the collection -- throwing because it already exists it a technicality that makes sense from a strictly-code Collections library, but cannot possibly be productive for the AI that just wants to make something happen.

I agree with the removal of the exception, and I would also like to see the Memory Skill docs updated describe this expected behavior, assuming we all agree. Then we can propagate that behavior and documentation to all the connectors and languages.

@shawncal shawncal added this pull request to the merge queue Aug 3, 2023
Merged via the queue into microsoft:main with commit da44ac3 Aug 3, 2023
13 checks passed
SOE-YoungS pushed a commit to SOE-YoungS/semantic-kernel that referenced this pull request Nov 1, 2023
… when collection already exists (microsoft#2300)

### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->
today the `VolatileMemoryStore` will throw when creating a collection if
it already exists. other stores (e.g. Postgres, DuckDB) follow a
different pattern and return gracefully. throwing an error can create a
problem for parallel writes to the memory store, which is being added in
ChatCopilot [here](microsoft/chat-copilot#83).

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

### Contribution Checklist


<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel Issues or pull requests impacting the core kernel .NET Issue or Pull requests regarding .NET code PR: ready for review All feedback addressed, ready for reviews PR: ready to merge PR has been approved by all reviewers, and is ready to merge.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants