Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a More Like This query routing requirement check (#29678) #33974

Merged
merged 26 commits into from Nov 26, 2018
Merged

Add a More Like This query routing requirement check (#29678) #33974

merged 26 commits into from Nov 26, 2018

Conversation

cbismuth
Copy link
Contributor

Routing requirements for requests (get, update, bulk, terms and explain) are checked from the transport action layer to fail fast (see o.e.c.m.MetaData.routingRequired API usages).

But in the particular case of MLT queries, the shard search request source has to be parsed to retrieve routing attribute values from MLT like items.

To keep failing as fast as possible for MLT queries, routing requirement checks are done in the search service after parsing source and before DFS, query and fetch phases.

Routing requirements for requests (get, update, bulk, terms and explain)
are checked from the transport action layer to fail fast
(see o.e.c.m.MetaData.routingRequired API usages).

But in the particular case of MLT queries, the shard search request
source has to be parsed to retrieve routing attribute values from MLT
like items.

To keep failing as fast as possible for MLT queries, routing requirement
checks are done in the search service after parsing source and before
DFS, query and fetch phases.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@javanna
Copy link
Member

javanna commented Oct 2, 2018

Hi @cbismuth , thanks for your contribution.

There is a problem I think with your fix, which is that a more like this query could also be part of a compound query, while your code only looks at the top-level query. Also, I don't love having to check instanceof queries and do something depending on that.

I think that an easier fix is possible here. Like items are retrieved using the multi term vectors API, which does the proper validation when routing is required yet not provided. The problem is that we currently ignore any error returned by such API when retrieving the items, see https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/query/MoreLikeThisQueryBuilder.java#L1113 . Do you see what I mean?

@cbismuth
Copy link
Contributor Author

cbismuth commented Oct 2, 2018

Thank you @javanna. I see what you mean, it will be far more better, I'll update my PR accordingly.

@cbismuth
Copy link
Contributor Author

cbismuth commented Oct 3, 2018

@javanna should any exception be propagated (e.g. date parsing in this test) or should we restrict this fix only to missing routing attribute?

I think we should restrict only to missing routing attribute at first to avoid unexpected breaking changes.

@cbismuth
Copy link
Contributor Author

cbismuth commented Oct 3, 2018

Actual exception is java.lang.IllegalArgumentException: routing is required for [index1]/[type1]/[1]. I'll try to change it to a more convenient RoutingMissingException to ease exception checking in the MoreLikeThisQueryBuilder#getFieldsFor API.

@javanna
Copy link
Member

javanna commented Oct 3, 2018

hi @cbismuth good catch, I think the exception should always be RoutingMissingException, and having a different exception complicates the catching, so it needs to be fixed here too. On whether we want to catch other exceptions, we want to leave things as they are when it comes to documents not found, but do raise when the routing is missing, even if only one item includes such exception and others are properly found.

@cbismuth
Copy link
Contributor Author

cbismuth commented Oct 3, 2018

Hi @javanna, thank you, I've updated this PR based on your recommandations. In this proposal I've only checked RoutingMissingException.

Here is below an enhanced version in which any exception but DocumentMissingException is checked. It breaks MoreLikeThisIT#testMoreLikeThisMalformedArtificialDocs due to intended bad date format, but I can fix it, no worries.

Would you recommend this version below?

private static void checkResponseException(MultiTermVectorsItemResponse response) throws IOException {
    Exception cause = response.getFailure().getCause();
    if (ExceptionsHelper.unwrap(cause, DocumentMissingException.class) == null) {
        if (cause instanceof IOException) {
            throw (IOException) cause;
        } else {
            throw new IOException(cause);
        }
    }
}

@javanna
Copy link
Member

javanna commented Oct 22, 2018

hi @cbismuth thanks for updating, and sorry for the long wait. I would only address the routing missing problem in this PR. May I ask you to add a test for the multi_get fix too? Thanks!

@javanna
Copy link
Member

javanna commented Oct 22, 2018

test this please

@cbismuth
Copy link
Contributor Author

cbismuth commented Oct 22, 2018

You're welcome @javanna. Sure, I'll have a look and add a test with multiple MLT items.

I think you forgot to ping @elasticmachine in your previous comment.

@cbismuth
Copy link
Contributor Author

Hi @javanna, I've added an assertion when an MLT query contains multiple items (some with a routing attribute and some without).

Is it what you asked for when you said multi_get fix too? Thank you.

@javanna
Copy link
Member

javanna commented Oct 23, 2018

hi @cbismuth , no I meant that given you have made changes to the multi_get API as well, those should be tested too. Do you see what I mean?

@cbismuth
Copy link
Contributor Author

Oh yes, thank you @javanna, I'll add a test to cover this change.

@javanna
Copy link
Member

javanna commented Nov 8, 2018

retest this please

@cbismuth
Copy link
Contributor Author

I've rebased with master and bwc tests are green on my side, could you please try to rerun tests? Thanks a lot!

…g_attribute

# Conflicts:
#	server/src/main/java/org/elasticsearch/action/termvectors/TransportMultiTermVectorsAction.java
@cbismuth
Copy link
Contributor Author

Hi @javanna, PR is up-to-date with latest master and ready for build, thanks 👍

@javanna
Copy link
Member

javanna commented Nov 19, 2018

retest this please

@cbismuth
Copy link
Contributor Author

Yes! Green build 😉

@cbismuth
Copy link
Contributor Author

Hi @javanna, I'm sure you're quite busy, so here is a quick follow up after your last review:

  • Integration tests have been converted to unit tests with all required dependencies,
  • Build is all green.

It would be totally awesome if we could get this PR merged, thanks a lot!

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @cbismuth for your work. Looks good, I will merge this soon.

@cbismuth
Copy link
Contributor Author

Thanks a lot @javanna 👍

@cbismuth
Copy link
Contributor Author

I've merged master into current to fix the build. Could you please trigger another CI build? Thanks a lot 👍

@javanna
Copy link
Member

javanna commented Nov 26, 2018

retest this please

@cbismuth
Copy link
Contributor Author

We have a green build 😄

@javanna javanna merged commit 04ebc63 into elastic:master Nov 26, 2018
@javanna
Copy link
Member

javanna commented Nov 26, 2018

thanks for the pings @cbismuth , and sorry about the wait. It's merged now ;)

@cbismuth
Copy link
Contributor Author

No worries @javanna, thank you for your help 😉

@cbismuth cbismuth deleted the 29678_more_like_this_query_required_routing_attribute branch November 26, 2018 13:01
javanna pushed a commit that referenced this pull request Nov 27, 2018
More like this query allows to provide identifiers of documents to be retrieved as like/unlike items.
It can happen that at retrieval time an error is thrown, for instance caused by missing routing value when `_routing` is set required in the mapping.
Instead of ignoring such error and returning no documents for the query, the error should be re-thrown and returned to users. As part of this
change also mget and mtermvectors are unified in the way they throw such exception like it happens in other places, so that a `RoutingMissingException` is raised.

Closes #29678
javanna pushed a commit that referenced this pull request Nov 27, 2018
More like this query allows to provide identifiers of documents to be retrieved as like/unlike items.
It can happen that at retrieval time an error is thrown, for instance caused by missing routing value when `_routing` is set required in the mapping.
Instead of ignoring such error and returning no documents for the query, the error should be re-thrown and returned to users. As part of this
change also mget and mtermvectors are unified in the way they throw such exception like it happens in other places, so that a `RoutingMissingException` is raised.

Closes #29678
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v6.5.2 v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants