Add wrappers for the Points APIs #11

roji · 2023-10-19T20:02:24Z

@russcam here's a follow-up PR to #10, adding wrappers for the Qdrant Point APIs. This continues to follow the Python client, and should cover most of the remaining API surface (there's still the snapshot management APIs which was can do).

This is still very much open in terms of exact shape, e.g. splitting across to different client classes (i.e. I'm happy to do any changes you feel are necessary). I also have some additional minor work planned around creating the client, timeout management etc. but these will come as separate PRs.

roji · 2023-10-19T20:05:51Z

tests/Qdrant.Client.Tests/PointTests.cs

+		Assert.Equal("goodbye", payloadKeyValue.Value.StringValue);
+	}
+
+	[Fact(Skip = "Fails with HTTP CANCEL")]


Note this failing in a pretty bad way, with an HTTP-level CANCEL error (RecommendBatch fails in exactly the same way). This presumably works via Python so I'm probably doing something wrong - did you see this working via .NET?

Panic backtrace in qdrant

2023-10-24 22:02:23 2023-10-24T12:02:23.289892Z ERROR qdrant::startup: Panic backtrace: 2023-10-24 22:02:23 0: qdrant::startup::setup_panic_hook::{{closure}} 2023-10-24 22:02:23 1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:2007:9 2023-10-24 22:02:23 2: std::panicking::rust_panic_with_hook 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:709:13 2023-10-24 22:02:23 3: std::panicking::begin_panic::{{closure}} 2023-10-24 22:02:23 4: std::sys_common::backtrace::__rust_end_short_backtrace 2023-10-24 22:02:23 5: std::panicking::begin_panic 2023-10-24 22:02:23 6: validator::types::ValidationErrors::merge 2023-10-24 22:02:23 7: <alloc::vec::Vec<V> as api::grpc::validate::ValidateExt>::validate 2023-10-24 22:02:23 8: <api::grpc::qdrant::SearchBatchPoints as validator::traits::Validate>::validate 2023-10-24 22:02:23 9: qdrant::tonic::api::validate 2023-10-24 22:02:23 10: <qdrant::tonic::api::points_api::PointsService as api::grpc::qdrant::points_server::Points>::search_batch::{{closure}} 2023-10-24 22:02:23 11: <<api::grpc::qdrant::points_server::PointsServer<T> as tower_service::Service<http::request::Request<B>>>::call::SearchBatchSvc<T> as tonic::server::service::UnaryService<api::grpc::qdrant::SearchBatchPoints>>::call::{{closure}} 2023-10-24 22:02:23 12: <api::grpc::qdrant::points_server::PointsServer<T> as tower_service::Service<http::request::Request<B>>>::call::{{closure}} 2023-10-24 22:02:23 13: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll 2023-10-24 22:02:23 14: <tonic::transport::service::router::RoutesFuture as core::future::future::Future>::poll 2023-10-24 22:02:23 15: <qdrant::tonic::tonic_telemetry::TonicTelemetryService<S> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}} 2023-10-24 22:02:23 16: <qdrant::tonic::logging::LoggingMiddleware<S> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}} 2023-10-24 22:02:23 17: <tonic::transport::server::SvcFuture<F> as core::future::future::Future>::poll 2023-10-24 22:02:23 18: <hyper::proto::h2::server::H2Stream<F,B> as core::future::future::Future>::poll 2023-10-24 22:02:23 19: tokio::runtime::task::core::Core<T,S>::poll 2023-10-24 22:02:23 20: tokio::runtime::task::raw::poll 2023-10-24 22:02:23 21: tokio::runtime::scheduler::multi_thread::worker::Context::run_task 2023-10-24 22:02:23 22: tokio::runtime::task::raw::poll 2023-10-24 22:02:23 23: std::sys_common::backtrace::__rust_begin_short_backtrace 2023-10-24 22:02:23 24: core::ops::function::FnOnce::call_once{{vtable.shim}} 2023-10-24 22:02:23 25: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:1993:9 2023-10-24 22:02:23 26: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:1993:9 2023-10-24 22:02:23 27: std::sys::unix::thread::Thread::new::thread_start 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys/unix/thread.rs:108:17 2023-10-24 22:02:23 28: <unknown> 2023-10-24 22:02:23 29: clone 2023-10-24 22:02:23 2023-10-24 22:02:23 2023-10-24T12:02:23.289933Z ERROR qdrant::startup: Panic occurred in file /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/validator-0.16.1/src/types.rs at line 156: Attempt to replace non-empty ValidationErrors entry

The test passes when the collection name is also specified for each SearchPoints in the array (WithPayload is also needed to return the payload):

[Fact] public async Task SearchBatch() { await CreateAndSeedCollection("collection_1"); var batchResults = await _client.SearchBatchAsync( "collection_1", new SearchPoints[] { new() { CollectionName = "collection_1", Vector = { 10.4f, 11.4f }, Limit = 1, WithPayload = new WithPayloadSelector { Enable = true }}, new() { CollectionName = "collection_1", Vector = { 3.4f, 4.4f }, Limit = 1, WithPayload = new WithPayloadSelector { Enable = true } } }); Assert.Collection( batchResults, br => { var point = Assert.Single(br.Result); Assert.Equal(9ul, point.Id); var payloadKeyValue = Assert.Single(point.Payload); Assert.Equal("foo", payloadKeyValue.Key); Assert.Equal("goodbye", payloadKeyValue.Value.StringValue); }, br => { var point = Assert.Single(br.Result); Assert.Equal(8ul, point.Id); var payloadKeyValue = Assert.Single(point.Payload); Assert.Equal("foo", payloadKeyValue.Key); Assert.Equal("hello", payloadKeyValue.Value.StringValue); }); }

It looks to me like it may be a bug, and a consequence of the validation on SearchPoints for collection name invoked when validating SearchBatchPoints

Thanks for looking into this! I've modified the test so that it passes.

This indeed seems like a bug to me (especially the way in which it crashes). Given that these batch methods accept a collection outside of the batch, I'm assuming that referencing different collections within the same batch isn't supported.. So requiring the collection name inside the batch doesn't make sense.

Do you want to open an issue in Qdrant to flag this?

I've opened qdrant/qdrant#2880

This indeed seems like a bug to me (especially the way in which it crashes). Given that these batch methods accept a collection outside of the batch, I'm assuming that referencing different collections within the same batch isn't supported.. So requiring the collection name inside the batch doesn't make sense.

Yep, agreed. What do you think about not requiring the consumer to specify the collection name on each item, and instead, assigning the collection name to each item in the batch method in the high level client? In the scenario where a user assigns a different collection name to one or more of the items, it would mean we overwrite it, but the collection name of each item is ignored on the server side anyway- if the collection name is changed to some non-existent collection, the test still passes 🙂 So, the only requirement currently is to provide a value of at least length 1, perhaps the collection name is a sensible default?

Good idea! At least as a temporary workaround for the Qdrant-side bug... Hopefully they'll fix it on their side soon.

(pushed)

roji · 2023-10-19T20:10:43Z

src/Qdrant.Client/QdrantClient.cs

+	/// </param>
+	public async Task<IReadOnlyList<ScoredPoint>> SearchAsync(
+		string collectionName,
+		ReadOnlyMemory<float> vector,


Note us accepting a ReadOnlyMemory<float> here.. We're generally trying to standardize over ROM<float> .NET as the representation of embeddings as much as possible.

Unfortunately, since we're not wrapping everything, other APIs which deal with embeddings still accept only the protobuf-generated RepeatedField<float>; there's not much we can do about that unless more flexibility is added to the protobuf code generation, or if we decide to go all-in and wrap everything.

Using ROM<float> is therefore a bit inconsistent. On the other hand, accepting a RepeatedField<float> here would be kinda senseless, since we'd need to copy it into the RepeatedField property on the gRPC-generated SearchPoints type inside (RepeatedField properties aren't assignable; you can only add into them).

ROM<float> does have the advantage of accepting an array or a slice of an array, and if it's an array there's an optimization for the copying internally inside RepeatedField. So while not perfect, I think this seems like a reasonable compromise... Let me know your opinion @russcam.

/cc @stephentoub

Sounds reasonable to me. I'd be for extending protobuf code generation in future if this is the general direction .NET is taking

Note that there's Marc Gravell's protobuf-net as an alternative to the Google Protobuf package; it generally generates more idiomatic C# code, e.g. using List<T> or arrays for repeated values instead of RepeatedField<T>. It also optionally allows these to have setters, which can allow users to just set the list directly rather than copy values in (which is required in the Google RepeatedField<T> case and hurts performance).

I'm guessing it's much more likely that we can get Marc to add support for ReadOnlyMemory<T> as an additional alternative to list/array, than to make the same change in the Google package. I actually experimented with switching over to protobuf-net, and it works well (I have a branch with that work). Let me know if you're interested in this - if we want we can make that switch before publishing the first Qdrant SDK package (to avoid breaking changes later).

I'd be keen to see that. Do you think it's a net improvement overall?

I think so... protobuf-net generally just does a better job generating idiomatic C# code, and is more perf-oriented. I don't think it's a huge deal, but it may be worth it.

Take a look at this branch; IIRC I got it to a point where tests passed. I'm happy to make it into a PR if you're into it.

Unfortunately can't see your private fork. Do I need to be added as a collaborator on it?

Makes sense, you should receive an invite.

russcam

Had a first pass through and have some comments

russcam · 2023-10-24T11:30:50Z

src/Qdrant.Client/QdrantClient.cs

+	public Task<UpdateResult> SetPayloadAsync(
+		string collectionName,
+		IReadOnlyDictionary<string, Value> payload,
+		IReadOnlyList<string> guidIds,


think this (and anywhere where point id is a UUID) should be

Suggested change

IReadOnlyList<string> guidIds,

IReadOnlyList<Guid> guidIds,

On the qdrant side, these map to Uuid

Yeah, I did this initially. The problem is that protobuf doesn't have a GUID/UUID type - PointId.uuid is typed as a string. It's easy enough to accept a .NET Guid in this API and convert it to a string internally (another example of where a wrapper for gRPC/protobuf makes sense), but other APIs/types which we're not wrapping will continue to work with strings, leading to an inconsistency (e.g. the protobuf-generated RetrievedPoint returned by RetrieveAsync will continue to expose a string, unless we decide to deep-wrap).

On the other hand, it seems like we're OK with a bit of inconsistency e.g. accepting ReadOnlyMemory<float> in SearchAsync, but having it RepeatedField<float> in various other contexts. So I'm making this change here as proposed.

It's a hat tip to the idea of deep wrapping responses too, just concerned about how much effort that might be to maintain!

I agree... I initially approached this project with the idea of wrapping everything, but given the amount of stuff it may be prudent to not go down that route, at least at first... This is all prerelease, so we can also decide to go all-in later if we want to.

src/Qdrant.Client/Grpc/WriteOrdering.cs

src/Qdrant.Client/QdrantClient.cs

russcam · 2023-10-24T11:58:14Z

tests/Qdrant.Client.Tests/CollectionTests.cs

-		=> _client.CreateCollectionAsync("collection_1", new VectorParams { Size = 4, Distance = Distance.Cosine });
+	public async Task CreateCollection()
+	{
+		await _client.CreateCollectionAsync("collection_1", new VectorParams { Size = 4, Distance = Distance.Cosine });


The collection name clashes with GrpcCollectionTests::CanCreateCollection. Suggest renaming the collection in CanCreateCollection

I've gone with the same solution done in CollectionTests and the other suites - delete all collections in Qdrant before every test. This also makes things more resilient in case e.g. a test crashes and leaves the collection behind.

Note that at the moment all test classes are annotated with [Collection("Qdrant")], preventing any tests from running in parallel and interfering with one another. If this ever becomes a problem and we want parallelization, we can switch to having test class-scoped collection names or similar.

test class scoped collection names sounds like a good change to make in a follow up

russcam · 2023-10-24T12:19:40Z

tests/Qdrant.Client.Tests/PointTests.cs

+		Assert.Equal("goodbye", payloadKeyValue.Value.StringValue);
+	}
+
+	[Fact(Skip = "Fails with HTTP CANCEL")]


Panic backtrace in qdrant

2023-10-24 22:02:23 2023-10-24T12:02:23.289892Z ERROR qdrant::startup: Panic backtrace: 2023-10-24 22:02:23 0: qdrant::startup::setup_panic_hook::{{closure}} 2023-10-24 22:02:23 1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:2007:9 2023-10-24 22:02:23 2: std::panicking::rust_panic_with_hook 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:709:13 2023-10-24 22:02:23 3: std::panicking::begin_panic::{{closure}} 2023-10-24 22:02:23 4: std::sys_common::backtrace::__rust_end_short_backtrace 2023-10-24 22:02:23 5: std::panicking::begin_panic 2023-10-24 22:02:23 6: validator::types::ValidationErrors::merge 2023-10-24 22:02:23 7: <alloc::vec::Vec<V> as api::grpc::validate::ValidateExt>::validate 2023-10-24 22:02:23 8: <api::grpc::qdrant::SearchBatchPoints as validator::traits::Validate>::validate 2023-10-24 22:02:23 9: qdrant::tonic::api::validate 2023-10-24 22:02:23 10: <qdrant::tonic::api::points_api::PointsService as api::grpc::qdrant::points_server::Points>::search_batch::{{closure}} 2023-10-24 22:02:23 11: <<api::grpc::qdrant::points_server::PointsServer<T> as tower_service::Service<http::request::Request<B>>>::call::SearchBatchSvc<T> as tonic::server::service::UnaryService<api::grpc::qdrant::SearchBatchPoints>>::call::{{closure}} 2023-10-24 22:02:23 12: <api::grpc::qdrant::points_server::PointsServer<T> as tower_service::Service<http::request::Request<B>>>::call::{{closure}} 2023-10-24 22:02:23 13: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll 2023-10-24 22:02:23 14: <tonic::transport::service::router::RoutesFuture as core::future::future::Future>::poll 2023-10-24 22:02:23 15: <qdrant::tonic::tonic_telemetry::TonicTelemetryService<S> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}} 2023-10-24 22:02:23 16: <qdrant::tonic::logging::LoggingMiddleware<S> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}} 2023-10-24 22:02:23 17: <tonic::transport::server::SvcFuture<F> as core::future::future::Future>::poll 2023-10-24 22:02:23 18: <hyper::proto::h2::server::H2Stream<F,B> as core::future::future::Future>::poll 2023-10-24 22:02:23 19: tokio::runtime::task::core::Core<T,S>::poll 2023-10-24 22:02:23 20: tokio::runtime::task::raw::poll 2023-10-24 22:02:23 21: tokio::runtime::scheduler::multi_thread::worker::Context::run_task 2023-10-24 22:02:23 22: tokio::runtime::task::raw::poll 2023-10-24 22:02:23 23: std::sys_common::backtrace::__rust_begin_short_backtrace 2023-10-24 22:02:23 24: core::ops::function::FnOnce::call_once{{vtable.shim}} 2023-10-24 22:02:23 25: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:1993:9 2023-10-24 22:02:23 26: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:1993:9 2023-10-24 22:02:23 27: std::sys::unix::thread::Thread::new::thread_start 2023-10-24 22:02:23 at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys/unix/thread.rs:108:17 2023-10-24 22:02:23 28: <unknown> 2023-10-24 22:02:23 29: clone 2023-10-24 22:02:23 2023-10-24 22:02:23 2023-10-24T12:02:23.289933Z ERROR qdrant::startup: Panic occurred in file /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/validator-0.16.1/src/types.rs at line 156: Attempt to replace non-empty ValidationErrors entry

The test passes when the collection name is also specified for each SearchPoints in the array (WithPayload is also needed to return the payload):

[Fact] public async Task SearchBatch() { await CreateAndSeedCollection("collection_1"); var batchResults = await _client.SearchBatchAsync( "collection_1", new SearchPoints[] { new() { CollectionName = "collection_1", Vector = { 10.4f, 11.4f }, Limit = 1, WithPayload = new WithPayloadSelector { Enable = true }}, new() { CollectionName = "collection_1", Vector = { 3.4f, 4.4f }, Limit = 1, WithPayload = new WithPayloadSelector { Enable = true } } }); Assert.Collection( batchResults, br => { var point = Assert.Single(br.Result); Assert.Equal(9ul, point.Id); var payloadKeyValue = Assert.Single(point.Payload); Assert.Equal("foo", payloadKeyValue.Key); Assert.Equal("goodbye", payloadKeyValue.Value.StringValue); }, br => { var point = Assert.Single(br.Result); Assert.Equal(8ul, point.Id); var payloadKeyValue = Assert.Single(point.Payload); Assert.Equal("foo", payloadKeyValue.Key); Assert.Equal("hello", payloadKeyValue.Value.StringValue); }); }

It looks to me like it may be a bug, and a consequence of the validation on SearchPoints for collection name invoked when validating SearchBatchPoints

tests/Qdrant.Client.Tests/PointTests.cs

russcam · 2023-10-24T12:24:43Z

src/Qdrant.Client/QdrantClient.cs

+	/// </param>
+	public async Task<IReadOnlyList<ScoredPoint>> SearchAsync(
+		string collectionName,
+		ReadOnlyMemory<float> vector,


Sounds reasonable to me. I'd be for extending protobuf code generation in future if this is the general direction .NET is taking

roji · 2023-10-25T16:21:51Z

@russcam addressed your comments - take a look and tell me what you think.

russcam

LGTM! 🎉

Add wrappers for the Points APIs

c707ab9

roji requested a review from russcam October 19, 2023 20:02

roji commented Oct 19, 2023

View reviewed changes

roji mentioned this pull request Oct 20, 2023

Add simpler instantiation options for Qdrant clientsSetup #12

Merged

russcam reviewed Oct 24, 2023

View reviewed changes

roji added 5 commits October 25, 2023 17:18

Accept Guid instead of string for point IDs

d2d6021

Remove KeyValuePair deconstruction

82e4fcf

Delete collections when starting GrpcCollectionTests

e8485da

Updated example for WriteOrdering

48bb6cf

Enable tests SearchBatch and RecommendBatch

e559e7b

roji requested a review from russcam October 25, 2023 16:21

Work around Qdrant bug in SearchBatch/RecommendBatch

d1c3503

russcam approved these changes Oct 26, 2023

View reviewed changes

roji merged commit 328cd1a into qdrant:main Oct 26, 2023

roji deleted the PointsWrappers branch October 26, 2023 13:42

This was referenced Oct 26, 2023

Switch to class-scoped collection names in the tests #13

Open

Consider switching from the Google Protobuf library to protobuf-net #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wrappers for the Points APIs #11

Add wrappers for the Points APIs #11

roji commented Oct 19, 2023

roji Oct 19, 2023

russcam Oct 24, 2023

roji Oct 25, 2023

russcam Oct 26, 2023

russcam Oct 26, 2023

roji Oct 26, 2023 •

edited

Loading

roji Oct 19, 2023

russcam Oct 24, 2023

roji Oct 25, 2023

russcam Oct 26, 2023

roji Oct 26, 2023

russcam Oct 26, 2023

roji Oct 26, 2023

russcam left a comment

russcam Oct 24, 2023

roji Oct 25, 2023

russcam Oct 26, 2023

roji Oct 26, 2023

russcam Oct 24, 2023

roji Oct 25, 2023

russcam Oct 26, 2023

russcam Oct 24, 2023

russcam Oct 24, 2023

roji commented Oct 25, 2023

russcam left a comment

Add wrappers for the Points APIs #11

Add wrappers for the Points APIs #11

Conversation

roji commented Oct 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roji Oct 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

russcam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roji commented Oct 25, 2023

russcam left a comment

Choose a reason for hiding this comment

roji Oct 26, 2023 •

edited

Loading