Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Query iterator method #80

Merged
merged 2 commits into from
Jul 10, 2024

Conversation

BlackGad
Copy link
Contributor

Added Query iterator implementation that behave in same manner as python Query operator.

It would be pleasure to see this PR merged and released.

This was referenced Apr 19, 2024
@BlackGad
Copy link
Contributor Author

Any updates on this?

@roji
Copy link
Collaborator

roji commented Apr 22, 2024

@BlackGad will get to it soon, hang tight.

@BlackGad
Copy link
Contributor Author

BlackGad commented Jun 7, 2024

so? any progress so far?

@roji
Copy link
Collaborator

roji commented Jun 7, 2024

@BlackGad apologies, I'll try to set aside a bit of time next week for this SDK.

@BlackGad
Copy link
Contributor Author

so?)

@BlackGad
Copy link
Contributor Author

BlackGad commented Jul 5, 2024

I understand that this is just a side project for you @roji, but could you consider approving the request? This missing functionality is crucial for us to dump data into our internal cold storage, and it's quite cumbersome to maintain an internal fork. Additionally, I believe the community would benefit from having this helpful functionality in C#.

@weianweigan
Copy link
Collaborator

@BlackGad Thanks! I'll review and approve this PR. Perhaps @roji is currently busy.

@weianweigan weianweigan merged commit 30199ee into milvus-io:main Jul 10, 2024
2 checks passed
Copy link
Collaborator

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not having had time to look at this earlier - but sometimes it can take a while as other things take priority.

I know this was already merged, but I can't say I'm very happy about it given that the PR has zero tests for the new API. I'd be against releasing a new version in this state.

Also see below for some comments which I'd like to see us fix before releasing.

Finally, can you please point to the corresponding Python code which I assume this is based on?

object? pkLastValue;
int processedDuringIterationCount;
var pkFieldsData = response.FieldsData.Single(x => x.FieldId == pkField.FieldID);
if (pkField.DataType == DataType.VarChar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a switch here, with a default case that throws, in case Milvus add support for a 3rd PK type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I don't think it makes sense to keep code a certain way just because the python implementation looks a certain way. This is a different project in a different language, and we should be doing things in the best way possible here regardless of what the Python driver looks like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

new Grpc.KeyValuePair {Key = Constants.Iterator, Value = "True"},
new Grpc.KeyValuePair {Key = Constants.ReduceStopForBest, Value = "True"},
new Grpc.KeyValuePair {Key = Constants.BatchSize, Value = batchSize.ToString(CultureInfo.InvariantCulture)},
new Grpc.KeyValuePair {Key = Constants.Offset, Value = 0.ToString(CultureInfo.InvariantCulture)},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not just "0"? Are you sure we even need to set Offset at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line you refer to doesn't seem to have anything to do with this - it just checks that offset is zero; whereas here I'm asking why not just "0" as simpler code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh do you mean use constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

new Grpc.KeyValuePair {Key = Constants.Limit, Value = Math.Min(batchSize, userLimit).ToString(CultureInfo.InvariantCulture)});

var processedItemsCount = 0;
while (!cancellationToken.IsCancellationRequested)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cancellation token should be checked internally in InvokeAsync which is called just below - is there any reason for the additional check here? If not, it should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then it will be while(true) so it is defensive strategy so I will be sure that loop will broke.

I prefer to leave this. What do you think?

Copy link
Collaborator

@roji roji Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the check doesn't ensure that the loop will be broken any more than a simple while (true) (i.e. the cancellation token mat never get triggered). The thing that guarantees that the loop will break is the yield break inside, when there are no more results. And since the cancellation token is checked within InvokeAsync anyway, I think this should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

QueryParameters? parameters = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
if ((parameters?.Offset ?? 0) != 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason not to support Offset? It could simply be set on the very first call, and from that point on just continue with the batching as usual?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same python logic port. From my tests milvus behave strange when you try to setup offset.

https://github.com/milvus-io/pymilvus/blob/6625af7fd0bfdf75fd596e7a9a871528cb539ef7/pymilvus/orm/iterator.py#L435

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to understand better if this is an actual Milvus limitation or just an unimplemented feature at the client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think clients should behave the same way. I don't have time to look into it right now, but I saw some strange results when I changed the offset to a non-zero value. I need this code merged into the master branch so I can use it and not rely on a fork. Maybe we should delay this proof of concept to future releases for those who really need it?

Milvus.Client/MilvusCollection.Entity.cs Show resolved Hide resolved
Milvus.Client/MilvusCollection.Entity.cs Show resolved Hide resolved
new Grpc.KeyValuePair {Key = Constants.ReduceStopForBest, Value = "True"},
new Grpc.KeyValuePair {Key = Constants.BatchSize, Value = batchSize.ToString(CultureInfo.InvariantCulture)},
new Grpc.KeyValuePair {Key = Constants.Offset, Value = 0.ToString(CultureInfo.InvariantCulture)},
new Grpc.KeyValuePair {Key = Constants.Limit, Value = Math.Min(batchSize, userLimit).ToString(CultureInfo.InvariantCulture)});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about Limit vs. BatchSize here... If Limit is bigger than BatchSize, will have the same value (because of the Min here). If Limit is smaller than BatchSize, BatchSize isn't reduced... What's the logic here supposed to be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit and batch size must be equal here. Will fix

// If user expression is not null, we should use it
{userExpression: not null} => userExpression,
// If user expression is null and pk field is string
{pkField.DataType: DataType.VarChar} => $"{pkField.Name} != ''",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this non-empty string check - and the below < {long.MaxValue} check needed? Why not just leave Expr empty if the user didn't specify an expression?

Copy link
Collaborator

@roji roji Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlackGad what about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to guaranty order on first request when user expression is absent. Again it is port from python :)

@BlackGad
Copy link
Contributor Author

BlackGad commented Jul 13, 2024

Finally, can you please point to the corresponding Python code which I assume this is based on?

Here it is https://github.com/milvus-io/pymilvus/blob/6625af7fd0bfdf75fd596e7a9a871528cb539ef7/pymilvus/orm/iterator.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants