-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Display error for text_expansion if the queried field does not have the right type #105581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display error for text_expansion if the queried field does not have the right type #105581
Conversation
...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java
Outdated
Show resolved
Hide resolved
...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java
Outdated
Show resolved
Hide resolved
...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/ml/text_expansion_search.yml
Outdated
Show resolved
Hide resolved
ab44344 to
17c7cbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good overall, waiting for tests to pass. Please make sure to add the appropriate labels to this PR, thank you.
...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java
Outdated
Show resolved
Hide resolved
...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java
Outdated
Show resolved
Hide resolved
|
I am trying to assign :Team to this PR. However, elasticsearchmachine is removing the Team from the labels. |
ed30007 to
644fc68
Compare
kderusso
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice iterations. I have some non-blocking suggestions.
...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java
Outdated
Show resolved
Hide resolved
...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice and clean. I think we could think about another optimization here - Right now WeightedTokensQueryBuilder.toToQuery pulls the field document count and calculates the token frequency ratio for every query. Since token pruning is an opt in feature for text expansion queries, we might want to update that method in the WeightedTokenBuilder to short-circuit this and only get these values if we have a non-null token pruning configuration. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kderusso I think you mean doToQuery method here. My understanding is that we should only calculate the token frequency ratio and return BooleanQuery if we have non-null token pruning configuration. Am I right? So, we should go to the following direction:
if (this.tokenPruningConfig == null) {
return new MatchNoDocsQuery("The \"" + getName() + "\" query does not have any pruning configuration");
}
var qb = new BooleanQuery.Builder();
int fieldDocCount = context.getIndexReader().getDocCount(fieldName);
...
Please let me know if my understanding is correct about token pruning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I had a typo - doToQuery - here's the link to the method I'm talking about.
Your suggestion to return a MatchNoDocsQuery is not what we want here. If we did that, every time we sent in a text expansion query without a pruning configuration, no documents would ever be returned. Since pruning configuration is opt-in and optional this is very undesireable behavior.
No, what I'm suggesting is altering how we determine whether we want to keep tokens in the WeightedTokensQueryBuilder.
Today, we do the following:
- Get the document count for the field name
- Calculate the best weight for each token
- Get the average token frequency ratio
- Start building a boolean query, determining if we should keep each token.
If the pruning configuration is null, two things are true:
- We want to keep every token, because we don't want any tokens to be pruned
- Because we want to keep every token there is no need to calculate the counts or frequency ratios.
So I propose we short-circuit this and only calculate those ratios if there exists a pruning configuration.
Does this make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. Now, I got the idea about this optimization. I will change the code and notify you for another review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kderusso I did some optimization and code clean-up around pruning configuration. Can you please review the changes again? Thank you.
d508615 to
dc23e1b
Compare
kderusso
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good Saikat, thanks for iterating. I have two minor comments and then I think this looks good.
...ck/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/TextExpansionQueryBuilder.java
Outdated
Show resolved
Hide resolved
...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java
Outdated
Show resolved
Hide resolved
...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/queries/WeightedTokensQueryBuilder.java
Outdated
Show resolved
Hide resolved
82171bc to
8969c31
Compare
In this PR, we check whether text_expansion has the queried field of the right type.
Search Query
Response