[FEATURE] Support for raw sparse vectors input in the neural sparse query #608

brusic · 2024-02-21T21:59:18Z

Is your feature request related to a problem?

Currently the neural search query only accepts the model id alongside the text to be encoded, which requires a model to be registered into a pipeline. The query should also support passing in the vector directly, bypassing the pipeline phase. It can be beneficial for clients to do the encoding for several reasons: ad hoc analysis, unit testing, custom/unsupported models.

What solution would you like?

Accept a vector, similar to knn search

GET /my-nlp-index/_search
{
  "_source": {
    "excludes": [
      "passage_embedding"
    ]
  },
  "query": {
    "neural_sparse": {
      "passage_embedding": {
        "vector": ['a':2, 'b':3, 'c':5, 'd':6],        
        "k": 5
      }
    }
  }
}

What alternatives have you considered?

rank_features is a close alternative, but can only rank (boost) other query clauses.

Do you have any additional context?

ES will soon have a weighted_tokens query, which is analogous to their text_expansion query.

The text was updated successfully, but these errors were encountered:

zhichao-aws · 2024-02-22T02:32:49Z

Do you mean the neural_sparse query clause? Seems you're using query tokens and weights to search, and paste a link about sparse search. But neural query is using dense models

brusic · 2024-02-22T18:17:29Z

Correct, sorry for the confusion. Used the wrong query in my example, probably due to never having used the neural_sparse query. Updated the example and add a link to the sparse search.

model-collapse · 2024-04-02T23:35:46Z

@brusic I changed the title into an accurate one.

zhichao-aws · 2024-04-23T03:30:38Z

Hi @brusic , our enhancements has been merged now and will be released at 2.14 version. Now users can just use neural sparse query with raw tokens. Sample query:

"neural_sparse": {
  "<vector_field>": {
    "query_tokens": {
       "token 1": 1.1,
       "token 2": 2.2,
       "token 3": 3.3
    }
  }
}

zhichao-aws · 2024-04-30T03:53:12Z

Close this issue as we have finished the feature. Feel free to re-open it if there is more discussion

brusic added enhancement untriaged labels Feb 21, 2024

model-collapse self-assigned this Feb 22, 2024

model-collapse added this to 2.14.0 in OpenSearch Project Roadmap Mar 6, 2024

vamshin removed the untriaged label Mar 21, 2024

zhichao-aws assigned zhichao-aws and unassigned model-collapse Mar 26, 2024

model-collapse changed the title ~~[FEATURE] Support for vectors as parameters in the neural search query~~ [FEATURE] Support for raw sparse vectors input in the neural sparse query Apr 2, 2024

zhichao-aws mentioned this issue Apr 16, 2024

enhancements: support neural_sparse query by tokens #693

Merged

5 tasks

bbarani added the v2.14.0 label Apr 22, 2024

zhichao-aws mentioned this issue Apr 25, 2024

[DOC] add query_by_tokens option in Neural Sparse Search opensearch-project/documentation-website#7027

Closed

4 tasks

zhichao-aws closed this as completed Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support for raw sparse vectors input in the neural sparse query #608

[FEATURE] Support for raw sparse vectors input in the neural sparse query #608

brusic commented Feb 21, 2024 •

edited

zhichao-aws commented Feb 22, 2024

brusic commented Feb 22, 2024

model-collapse commented Apr 2, 2024

zhichao-aws commented Apr 23, 2024

zhichao-aws commented Apr 30, 2024

[FEATURE] Support for raw sparse vectors input in the neural sparse query #608

[FEATURE] Support for raw sparse vectors input in the neural sparse query #608

Comments

brusic commented Feb 21, 2024 • edited

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

zhichao-aws commented Feb 22, 2024

brusic commented Feb 22, 2024

model-collapse commented Apr 2, 2024

zhichao-aws commented Apr 23, 2024

zhichao-aws commented Apr 30, 2024

brusic commented Feb 21, 2024 •

edited