Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support for raw sparse vectors input in the neural sparse query #608

Closed
brusic opened this issue Feb 21, 2024 · 5 comments
Closed

Comments

@brusic
Copy link

brusic commented Feb 21, 2024

Is your feature request related to a problem?

Neural sparse search

Currently the neural search query only accepts the model id alongside the text to be encoded, which requires a model to be registered into a pipeline. The query should also support passing in the vector directly, bypassing the pipeline phase. It can be beneficial for clients to do the encoding for several reasons: ad hoc analysis, unit testing, custom/unsupported models.

What solution would you like?

Accept a vector, similar to knn search

GET /my-nlp-index/_search
{
  "_source": {
    "excludes": [
      "passage_embedding"
    ]
  },
  "query": {
    "neural_sparse": {
      "passage_embedding": {
        "vector": ['a':2, 'b':3, 'c':5, 'd':6],        
        "k": 5
      }
    }
  }
}


What alternatives have you considered?

rank_features is a close alternative, but can only rank (boost) other query clauses.

Do you have any additional context?

ES will soon have a weighted_tokens query, which is analogous to their text_expansion query.

@zhichao-aws
Copy link
Member

Do you mean the neural_sparse query clause? Seems you're using query tokens and weights to search, and paste a link about sparse search. But neural query is using dense models

@brusic
Copy link
Author

brusic commented Feb 22, 2024

Correct, sorry for the confusion. Used the wrong query in my example, probably due to never having used the neural_sparse query. Updated the example and add a link to the sparse search.

@vamshin vamshin removed the untriaged label Mar 21, 2024
@model-collapse model-collapse changed the title [FEATURE] Support for vectors as parameters in the neural search query [FEATURE] Support for raw sparse vectors input in the neural sparse query Apr 2, 2024
@model-collapse
Copy link
Collaborator

@brusic I changed the title into an accurate one.

@zhichao-aws
Copy link
Member

Hi @brusic , our enhancements has been merged now and will be released at 2.14 version. Now users can just use neural sparse query with raw tokens. Sample query:

"neural_sparse": {
  "<vector_field>": {
    "query_tokens": {
       "token 1": 1.1,
       "token 2": 2.2,
       "token 3": 3.3
    }
  }
}

@zhichao-aws
Copy link
Member

Close this issue as we have finished the feature. Feel free to re-open it if there is more discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
OpenSearch Project Roadmap
2.14.0 (Release window opens April 30...
Development

No branches or pull requests

5 participants