Skip to content

Conversation

@saikatsarkar056
Copy link
Contributor

@saikatsarkar056 saikatsarkar056 commented Feb 15, 2024

In this PR, we check whether text_expansion has the queried field of the right type.

Search Query
PUT my-index
{
  "mappings": {
    "properties": {
      "content_embedding": {
        "type": "sparse_vector"
      },
      "content": {
        "type": "text"
      }
    }
  }
}

PUT _ingest/pipeline/elser-v2-test
{
  "processors": [
    {
      "inference": {
        "model_id": ".elser_model_2",
        "input_output": [
          {
            "input_field": "content",
            "output_field": "content_embedding"
          }
        ]
      }
    }
  ]
}


POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "test-data",
    "size": 50
  },
  "dest": {
    "index": "my-index",
    "pipeline": "elser-v2-test"
  }
}

GET _tasks/CoTcIc_XTBSPEbBJxhysSQ:20048

GET /my-index/_search


GET my-index/_search
{
   "query":{
      "text_expansion":{
         "content2":{
            "model_id":".elser_model_2",
            "model_text":"How to avoid muscle soreness after running?"
         }
      }
   }
}
Response
{
  "error": {
    "root_cause": [
      {
        "type": "parse_exception",
        "reason": "[content2] is not a mapped field"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my-index",
        "node": "CoTcIc_XTBSPEbBJxhysSQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: [content2] is not a mapped field",
          "index_uuid": "tKlP30fURvqLQHIPm_V_ww",
          "index": "my-index",
          "caused_by": {
            "type": "parse_exception",
            "reason": "[content2] is not a mapped field"
          }
        }
      }
    ],
    "caused_by": {
      "type": "parse_exception",
      "reason": "[content2] is not a mapped field"
    }
  },
  "status": 400
}

@saikatsarkar056 saikatsarkar056 marked this pull request as ready for review February 21, 2024 18:09
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 21, 2024
@saikatsarkar056 saikatsarkar056 requested review from a team and kderusso February 21, 2024 18:09
Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good overall, waiting for tests to pass. Please make sure to add the appropriate labels to this PR, thank you.

@saikatsarkar056 saikatsarkar056 added the Team:Search Meta label for search team label Feb 21, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:Search Meta label for search team label Feb 21, 2024
@saikatsarkar056 saikatsarkar056 added >non-issue Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Feb 21, 2024
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Search Meta label for search team labels Feb 21, 2024
@saikatsarkar056 saikatsarkar056 added Team:Enterprise Search Meta label for Enterprise Search team and removed needs:triage Requires assignment of a team area label labels Feb 21, 2024
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Enterprise Search Meta label for Enterprise Search team labels Feb 21, 2024
@saikatsarkar056 saikatsarkar056 self-assigned this Feb 21, 2024
@saikatsarkar056 saikatsarkar056 added the Team:Enterprise Search Meta label for Enterprise Search team label Feb 21, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:Enterprise Search Meta label for Enterprise Search team label Feb 21, 2024
@saikatsarkar056
Copy link
Contributor Author

I am trying to assign :Team to this PR. However, elasticsearchmachine is removing the Team from the labels.

@kderusso

@saikatsarkar056 saikatsarkar056 removed the needs:triage Requires assignment of a team area label label Feb 21, 2024
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 21, 2024
@saikatsarkar056 saikatsarkar056 added :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team labels Feb 21, 2024
@saikatsarkar056 saikatsarkar056 requested a review from a team February 21, 2024 23:55
Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice iterations. I have some non-blocking suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and clean. I think we could think about another optimization here - Right now WeightedTokensQueryBuilder.toToQuery pulls the field document count and calculates the token frequency ratio for every query. Since token pruning is an opt in feature for text expansion queries, we might want to update that method in the WeightedTokenBuilder to short-circuit this and only get these values if we have a non-null token pruning configuration. WDYT?

Copy link
Contributor Author

@saikatsarkar056 saikatsarkar056 Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kderusso I think you mean doToQuery method here. My understanding is that we should only calculate the token frequency ratio and return BooleanQuery if we have non-null token pruning configuration. Am I right? So, we should go to the following direction:

if (this.tokenPruningConfig == null) {
   return new MatchNoDocsQuery("The \"" + getName() + "\" query does not have any pruning configuration");
}

var qb = new BooleanQuery.Builder();
int fieldDocCount = context.getIndexReader().getDocCount(fieldName);
...

Please let me know if my understanding is correct about token pruning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I had a typo - doToQuery - here's the link to the method I'm talking about.

Your suggestion to return a MatchNoDocsQuery is not what we want here. If we did that, every time we sent in a text expansion query without a pruning configuration, no documents would ever be returned. Since pruning configuration is opt-in and optional this is very undesireable behavior.

No, what I'm suggesting is altering how we determine whether we want to keep tokens in the WeightedTokensQueryBuilder.

Today, we do the following:

  1. Get the document count for the field name
  2. Calculate the best weight for each token
  3. Get the average token frequency ratio
  4. Start building a boolean query, determining if we should keep each token.

If the pruning configuration is null, two things are true:

  1. We want to keep every token, because we don't want any tokens to be pruned
  2. Because we want to keep every token there is no need to calculate the counts or frequency ratios.

So I propose we short-circuit this and only calculate those ratios if there exists a pruning configuration.

Does this make sense to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. Now, I got the idea about this optimization. I will change the code and notify you for another review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kderusso I did some optimization and code clean-up around pruning configuration. Can you please review the changes again? Thank you.

@saikatsarkar056 saikatsarkar056 force-pushed the text_expansion_error branch 2 times, most recently from d508615 to dc23e1b Compare February 22, 2024 21:33
Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good Saikat, thanks for iterating. I have two minor comments and then I think this looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:EnterpriseSearch/Application Enterprise Search >non-issue Team:Enterprise Search Meta label for Enterprise Search team v8.14.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants