Display error for text_expansion if the queried field does not have the right type #105581

saikatsarkar056 · 2024-02-15T23:31:00Z

In this PR, we check whether text_expansion has the queried field of the right type.

Search Query

PUT my-index
{
  "mappings": {
    "properties": {
      "content_embedding": {
        "type": "sparse_vector"
      },
      "content": {
        "type": "text"
      }
    }
  }
}

PUT _ingest/pipeline/elser-v2-test
{
  "processors": [
    {
      "inference": {
        "model_id": ".elser_model_2",
        "input_output": [
          {
            "input_field": "content",
            "output_field": "content_embedding"
          }
        ]
      }
    }
  ]
}


POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "test-data",
    "size": 50
  },
  "dest": {
    "index": "my-index",
    "pipeline": "elser-v2-test"
  }
}

GET _tasks/CoTcIc_XTBSPEbBJxhysSQ:20048

GET /my-index/_search


GET my-index/_search
{
   "query":{
      "text_expansion":{
         "content2":{
            "model_id":".elser_model_2",
            "model_text":"How to avoid muscle soreness after running?"
         }
      }
   }
}

Response

{
  "error": {
    "root_cause": [
      {
        "type": "parse_exception",
        "reason": "[content2] is not a mapped field"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my-index",
        "node": "CoTcIc_XTBSPEbBJxhysSQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: [content2] is not a mapped field",
          "index_uuid": "tKlP30fURvqLQHIPm_V_ww",
          "index": "my-index",
          "caused_by": {
            "type": "parse_exception",
            "reason": "[content2] is not a mapped field"
          }
        }
      }
    ],
    "caused_by": {
      "type": "parse_exception",
      "reason": "[content2] is not a mapped field"
    }
  },
  "status": 400
}

.java-version

...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java

x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/ml/text_expansion_search.yml

kderusso

Changes look good overall, waiting for tests to pass. Please make sure to add the appropriate labels to this PR, thank you.

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java

...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java

saikatsarkar056 · 2024-02-21T18:25:01Z

I am trying to assign :Team to this PR. However, elasticsearchmachine is removing the Team from the labels.

@kderusso

kderusso

Nice iterations. I have some non-blocking suggestions.

...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java

kderusso · 2024-02-22T13:41:20Z

...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java

This is nice and clean. I think we could think about another optimization here - Right now WeightedTokensQueryBuilder.toToQuery pulls the field document count and calculates the token frequency ratio for every query. Since token pruning is an opt in feature for text expansion queries, we might want to update that method in the WeightedTokenBuilder to short-circuit this and only get these values if we have a non-null token pruning configuration. WDYT?

@kderusso I think you mean doToQuery method here. My understanding is that we should only calculate the token frequency ratio and return BooleanQuery if we have non-null token pruning configuration. Am I right? So, we should go to the following direction:

if (this.tokenPruningConfig == null) { return new MatchNoDocsQuery("The \"" + getName() + "\" query does not have any pruning configuration"); } var qb = new BooleanQuery.Builder(); int fieldDocCount = context.getIndexReader().getDocCount(fieldName); ...

Please let me know if my understanding is correct about token pruning.

You're right, I had a typo - doToQuery - here's the link to the method I'm talking about.

Your suggestion to return a MatchNoDocsQuery is not what we want here. If we did that, every time we sent in a text expansion query without a pruning configuration, no documents would ever be returned. Since pruning configuration is opt-in and optional this is very undesireable behavior.

No, what I'm suggesting is altering how we determine whether we want to keep tokens in the WeightedTokensQueryBuilder.

Today, we do the following:

Get the document count for the field name

Calculate the best weight for each token

Get the average token frequency ratio

Start building a boolean query, determining if we should keep each token.

If the pruning configuration is null, two things are true:

We want to keep every token, because we don't want any tokens to be pruned

Because we want to keep every token there is no need to calculate the counts or frequency ratios.

So I propose we short-circuit this and only calculate those ratios if there exists a pruning configuration.

Does this make sense to you?

Thank you for the explanation. Now, I got the idea about this optimization. I will change the code and notify you for another review.

@kderusso I did some optimization and code clean-up around pruning configuration. Can you please review the changes again? Thank you.

kderusso

Changes look good Saikat, thanks for iterating. I have two minor comments and then I think this looks good.

...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java

…he right type

saikatsarkar056 marked this pull request as draft February 15, 2024 23:31

elasticsearchmachine added the v8.14.0 label Feb 15, 2024

saikatsarkar056 mentioned this pull request Feb 15, 2024

Display error for text_expansion if the queried field does not have the right type #98098

Closed

kderusso reviewed Feb 20, 2024

View reviewed changes

saikatsarkar056 force-pushed the text_expansion_error branch from ab44344 to 17c7cbd Compare February 21, 2024 02:19

saikatsarkar056 marked this pull request as ready for review February 21, 2024 18:09

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 21, 2024

saikatsarkar056 requested review from a team and kderusso February 21, 2024 18:09

kderusso reviewed Feb 21, 2024

View reviewed changes

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java Outdated Show resolved Hide resolved

...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java Outdated Show resolved Hide resolved

saikatsarkar056 added the Team:Search Meta label for search team label Feb 21, 2024

elasticsearchmachine removed the Team:Search Meta label for search team label Feb 21, 2024

saikatsarkar056 added >non-issue Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Feb 21, 2024

elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Search Meta label for search team labels Feb 21, 2024

saikatsarkar056 added Team:Enterprise Search Meta label for Enterprise Search team and removed needs:triage Requires assignment of a team area label labels Feb 21, 2024

elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Enterprise Search Meta label for Enterprise Search team labels Feb 21, 2024

saikatsarkar056 self-assigned this Feb 21, 2024

saikatsarkar056 added the Team:Enterprise Search Meta label for Enterprise Search team label Feb 21, 2024

elasticsearchmachine removed the Team:Enterprise Search Meta label for Enterprise Search team label Feb 21, 2024

saikatsarkar056 removed the needs:triage Requires assignment of a team area label label Feb 21, 2024

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 21, 2024

saikatsarkar056 force-pushed the text_expansion_error branch from ed30007 to 644fc68 Compare February 21, 2024 19:02

saikatsarkar056 added :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team labels Feb 21, 2024

saikatsarkar056 requested a review from a team February 21, 2024 23:55

kderusso reviewed Feb 22, 2024

View reviewed changes

saikatsarkar056 force-pushed the text_expansion_error branch 2 times, most recently from d508615 to dc23e1b Compare February 22, 2024 21:33

kderusso reviewed Feb 23, 2024

View reviewed changes

kderusso approved these changes Feb 23, 2024

View reviewed changes

saikatsarkar056 added 21 commits February 23, 2024 09:11

Display error for text_expansion if the queried field does not have t…

eaa3054

…he right type

Display error for text_expansion if the queried field does not have t…

496aa0c

…he right type

Display error for text_expansion if the queried field does not have t…

3601fa1

…he right type

Display error for text_expansion if the queried field does not have t…

d287e38

…he right type

Display error for text_expansion if the queried field does not have t…

c29a17e

…he right type

Display error for text_expansion if the queried field does not have t…

df7d7ed

…he right type

Display error for text_expansion if the queried field does not have t…

023a5c7

…he right type

Display error for text_expansion if the queried field does not have t…

fc6f437

…he right type

Display error for text_expansion if the queried field does not have t…

454f3c6

…he right type

Display error for text_expansion if the queried field does not have t…

b2f7b0d

…he right type

Display error for text_expansion if the queried field does not have t…

f7b821c

…he right type

Display error for text_expansion if the queried field does not have t…

d256853

…he right type

Display error for text_expansion if the queried field does not have t…

762409d

…he right type

Display error for text_expansion if the queried field does not have t…

f4c2c33

…he right type

Display error for text_expansion if the queried field does not have t…

91f0a73

…he right type

Clean up the code

6113c8c

Optimize the code for pruning config

d536e07

Optimize the code for pruning config

3fb95e7

Write findBestWeightFor for clear code

cb071b6

Run Spotless

ee3233b

Remove findBestWeightFor method

8969c31

saikatsarkar056 force-pushed the text_expansion_error branch from 82171bc to 8969c31 Compare February 23, 2024 16:11

saikatsarkar056 merged commit 2e9e8f8 into elastic:main Feb 23, 2024

kderusso mentioned this pull request Mar 4, 2024

Bugfix for mixed version cluster queries using text expansion #105912

Merged

Display error for text_expansion if the queried field does not have the right type #105581

Display error for text_expansion if the queried field does not have the right type #105581

Uh oh!

Conversation

saikatsarkar056 commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kderusso left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

saikatsarkar056 commented Feb 21, 2024

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kderusso Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

saikatsarkar056 Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kderusso Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

saikatsarkar056 Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

saikatsarkar056 Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

saikatsarkar056 commented Feb 15, 2024 •

edited

Loading

kderusso left a comment •

edited

Loading

saikatsarkar056 Feb 22, 2024 •

edited

Loading