Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Distinct field values #25343

Open
1 task done
filip-halt opened this issue Jul 5, 2023 · 15 comments
Open
1 task done

[Feature]: Distinct field values #25343

filip-halt opened this issue Jul 5, 2023 · 15 comments
Assignees
Labels
kind/feature Issues related to feature request from users

Comments

@filip-halt
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

Find all the unique field values in a collection without having to iterate through all data.

Describe the solution you'd like.

Something equivalent to sql query(select distinct field_name from mytable)

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

@filip-halt filip-halt added the kind/feature Issues related to feature request from users label Jul 5, 2023
@xiaofan-luan
Copy link
Contributor

is there a specific use case for the distinct clause?

@xiaofan-luan
Copy link
Contributor

is it in search or only in query?

@faileon
Copy link

faileon commented Sep 25, 2023

Is there any plans for this? My use case would be to present user with a list of available values which can be used for filtering in future queries. Without this I have to manage a list of distinct values myself elsewhere.

@xiaofan-luan
Copy link
Contributor

can you describe you data model and the specific use case so I can give more advice

@faileon
Copy link

faileon commented Sep 26, 2023

Let me try. I let my users store their arbitrary documents in Milvus. I let them define which fields should be used to make embeddings and which are metadata. For each tenant I create a different collection. Users define from which fields on the original documents I should make embeddings and which to use as metadata for filtering purposes. Let's say one of my users has collection of "articles" and defined "category" as a metadata field that can be of any string value ("sport", "news",...). I would like to get distinct values of said "category" field - is that possible within Milvus?

@cardoso-neto
Copy link

I also couldn't find how to do this.

@xiaofan-luan
Copy link
Contributor

I thought groupby feature is what you are looking for.
You can groupby a field name and get top k most related group but not entity.
is that what you are looking for? @cardoso-neto
This feature will be released on 2.4

@cardoso-neto
Copy link

This would work indeed. Looking forward.

@cardoso-neto
Copy link

My use case is reading all unique values of a Milvus collection column. More specifically the column I use for partition key. Since Milvus "maps" that to a standardized name (_default_i), I couldn't use Collection().partitions for that.

@xiaofan-luan
Copy link
Contributor

_default_i

So that's saying you want to know how many partition keys are there in total?

@xiaofan-luan
Copy link
Contributor

which means count the distinct partitionkey

@xiaofan-luan
Copy link
Contributor

/assign @jaime0815
sounds like something we need to work on

@lehotskysamuel
Copy link

I have a similar use case: I take a book, split it into chunks and then store the book title in scalar column for each chunk. I then process n books. When doing the vector search, I want to filter by a book (or multiple).

With this functionality I could:

  1. query milvus to get all distinct values from the column (all book titles) --> THIS IS WHAT THIS TICKET IS ABOUT
  2. display the list on user interface and let user pick a list of books to search across
  3. do the vector search with filtering based on the book

@Izukimat
Copy link

I second to @lehotskysamuel.
In RAG application, all 1-3 functionality are essential. I wonder if we could achieve this without preparing another database.

@LeoHemamou
Copy link

Does anyone know if it's solved or not ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

8 participants