Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring back sparse_vector mapping #98104

Closed
benwtrent opened this issue Aug 1, 2023 · 6 comments
Closed

Bring back sparse_vector mapping #98104

benwtrent opened this issue Aug 1, 2023 · 6 comments
Assignees
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@benwtrent
Copy link
Member

Description

We have rank_features and its useful as is. However, when comparing with new retrieval techniques, the name rank_features isn't expected.

Users expect sparse_vector as a partner with dense_vector.

So, we should bring back sparse_vector. It may initially be a simple copy of rank_features.

See history: #48368, #48781

@benwtrent benwtrent added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types labels Aug 1, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Aug 1, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@ioanatia ioanatia self-assigned this Aug 4, 2023
@liranabn liranabn assigned carlosdelest and unassigned ioanatia Aug 16, 2023
@kderusso
Copy link
Member

Unassigning this for now, as first priority once someone is free.

@kderusso kderusso assigned kderusso and unassigned kderusso Aug 23, 2023
@carlosdelest carlosdelest self-assigned this Aug 29, 2023
@carlosdelest
Copy link
Member

This is the work that I'll start doing for this task:

  • Change SparseVectorFieldMapper implementation, so it:
    • is no longer deprecated
    • checks for 7.x and versions less than 8.11 and warn / error appropriately
    • extends from RankFeaturesFieldMapper, and just changes the CONTENT_TYPE. We could make a complete copy, but I think we can save some duplicated code this way (and we might revisit this decision later)
  • Create sparse_vector docs
  • Change ELSER docs to refer to sparse_vector field type instead of rank_features. We can add compatibility notes to ensure users understand they can use ELSER with rank_features and sparse_vector interchangeably - but that sparse_vector is the field to be used onwards to refer to it.

Anything I'm missing?

Some questions I would like your opinion on:

  • Should we address explicitly that sparse_vector is equivalent to rank_features? Or at least mention they are compatible? (I'd say yes)
  • Should we bring back script vector functions that operate on sparse_vectors? (I'd say yes as they can be used on scripts and feel natural to provide a simplified way to calculate cosine similarity for example)
  • Should we add back sparse_vector feature usage? (I'd say yes so we can compare adoption of this field type vs rank_features)

@benwtrent
Copy link
Member Author

extends from RankFeaturesFieldMapper, and just changes the CONTENT_TYPE. We could make a complete copy, but I think we can save some duplicated code this way (and we might revisit this decision later)

Please, just copy paste. Especially since rankfeaturesfieldmapper is in a module and sparsevector will not be.

Change ELSER docs to refer to sparse_vector field type instead of rank_features.

We should't do that until the ELSER queries are updated and such. Don't do anything related to ELSER in the sparse_vector work. It should be done in follow up PRs to ensure we focus on the correct things.

Should we address explicitly that sparse_vector is equivalent to rank_features?

IMO, no, we should not. They do the same thing for now but they might diverge and sparse_vectors may support things that rank_features may not and vice versa.

Should we bring back script vector functions that operate on sparse_vectors?

Maybe in a separate PR. This is not as important as getting the field mapping back and switching ELSER.

Should we add back sparse_vector feature usage? (I'd say yes so we can compare adoption of this field type vs rank_features)

Yes, we should

@carlosdelest
Copy link
Member

Thanks for the feedback, @benwtrent !

I created this draft PR. Can you please take a look to ensure it's in line with your thoughts?

@carlosdelest
Copy link
Member

Completed per #98996

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

6 participants