New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a More Like This query routing requirement check (#29678) #33974
Add a More Like This query routing requirement check (#29678) #33974
Conversation
Routing requirements for requests (get, update, bulk, terms and explain) are checked from the transport action layer to fail fast (see o.e.c.m.MetaData.routingRequired API usages). But in the particular case of MLT queries, the shard search request source has to be parsed to retrieve routing attribute values from MLT like items. To keep failing as fast as possible for MLT queries, routing requirement checks are done in the search service after parsing source and before DFS, query and fetch phases.
Pinging @elastic/es-search-aggs |
Hi @cbismuth , thanks for your contribution. There is a problem I think with your fix, which is that a more like this query could also be part of a compound query, while your code only looks at the top-level query. Also, I don't love having to check instanceof queries and do something depending on that. I think that an easier fix is possible here. Like items are retrieved using the multi term vectors API, which does the proper validation when routing is required yet not provided. The problem is that we currently ignore any error returned by such API when retrieving the items, see https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/query/MoreLikeThisQueryBuilder.java#L1113 . Do you see what I mean? |
Thank you @javanna. I see what you mean, it will be far more better, I'll update my PR accordingly. |
This reverts commit 2065aaf
@javanna should any exception be propagated (e.g. date parsing in this test) or should we restrict this fix only to missing I think we should restrict only to missing |
Actual exception is |
hi @cbismuth good catch, I think the exception should always be |
Hi @javanna, thank you, I've updated this PR based on your recommandations. In this proposal I've only checked Here is below an enhanced version in which any exception but Would you recommend this version below? private static void checkResponseException(MultiTermVectorsItemResponse response) throws IOException {
Exception cause = response.getFailure().getCause();
if (ExceptionsHelper.unwrap(cause, DocumentMissingException.class) == null) {
if (cause instanceof IOException) {
throw (IOException) cause;
} else {
throw new IOException(cause);
}
}
} |
hi @cbismuth thanks for updating, and sorry for the long wait. I would only address the routing missing problem in this PR. May I ask you to add a test for the multi_get fix too? Thanks! |
test this please |
You're welcome @javanna. Sure, I'll have a look and add a test with multiple MLT items. I think you forgot to ping @elasticmachine in your previous comment. |
Hi @javanna, I've added an assertion when an MLT query contains multiple items (some with a routing attribute and some without). Is it what you asked for when you said multi_get fix too? Thank you. |
hi @cbismuth , no I meant that given you have made changes to the multi_get API as well, those should be tested too. Do you see what I mean? |
Oh yes, thank you @javanna, I'll add a test to cover this change. |
retest this please |
I've rebased with |
…g_attribute # Conflicts: # server/src/main/java/org/elasticsearch/action/termvectors/TransportMultiTermVectorsAction.java
Hi @javanna, PR is up-to-date with latest |
retest this please |
Yes! Green build 😉 |
Hi @javanna, I'm sure you're quite busy, so here is a quick follow up after your last review:
It would be totally awesome if we could get this PR merged, thanks a lot! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @cbismuth for your work. Looks good, I will merge this soon.
Thanks a lot @javanna 👍 |
I've merged |
retest this please |
We have a green build 😄 |
thanks for the pings @cbismuth , and sorry about the wait. It's merged now ;) |
No worries @javanna, thank you for your help 😉 |
More like this query allows to provide identifiers of documents to be retrieved as like/unlike items. It can happen that at retrieval time an error is thrown, for instance caused by missing routing value when `_routing` is set required in the mapping. Instead of ignoring such error and returning no documents for the query, the error should be re-thrown and returned to users. As part of this change also mget and mtermvectors are unified in the way they throw such exception like it happens in other places, so that a `RoutingMissingException` is raised. Closes #29678
More like this query allows to provide identifiers of documents to be retrieved as like/unlike items. It can happen that at retrieval time an error is thrown, for instance caused by missing routing value when `_routing` is set required in the mapping. Instead of ignoring such error and returning no documents for the query, the error should be re-thrown and returned to users. As part of this change also mget and mtermvectors are unified in the way they throw such exception like it happens in other places, so that a `RoutingMissingException` is raised. Closes #29678
Routing requirements for requests (get, update, bulk, terms and explain) are checked from the transport action layer to fail fast (see
o.e.c.m.MetaData.routingRequired
API usages).But in the particular case of MLT queries, the shard search request source has to be parsed to retrieve routing attribute values from MLT like items.
To keep failing as fast as possible for MLT queries, routing requirement checks are done in the search service after parsing source and before DFS, query and fetch phases.