Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate reference to _type in lookup queries #37016

Conversation

mayya-sharipova
Copy link
Contributor

Relates to #35190

@mayya-sharipova mayya-sharipova added >deprecation :Search Foundations/Mapping Index mappings, including merging and defining field types labels Dec 28, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jtibshirani jtibshirani changed the title Deprecate reference to _type in lookup queries and aggs Deprecate reference to _type in lookup queries Jan 3, 2019
@jtibshirani jtibshirani force-pushed the deprecate-reference-to_type-aggregations-retrieving-fields branch 3 times, most recently from 0d159a3 to ab26db4 Compare January 4, 2019 01:26
mayya-sharipova and others added 2 commits January 4, 2019 11:27
* Add integration test coverage for typeless lookup queries.
* Fix a bug around typeless terms lookup queries.
* Make sure to provide non-deprecated methods in QueryBuilders.
* Make the deprecation messages more accurate.
* Update the query DSL documentation.
* Avoid creating duplicate query builder tests.

Don't use 'ids' when generating random queries to avoid types warnings.
@jtibshirani jtibshirani force-pushed the deprecate-reference-to_type-aggregations-retrieving-fields branch from 3b25c64 to d7b0cd8 Compare January 4, 2019 19:27
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani I left a few comments, let me know what you think. Since I haven't done so many type removal related reviews I think it would be good to also get a second reviewer on this.

if (out.getVersion().onOrAfter(Version.V_7_0_0)) {
out.writeOptionalString(type);
} else {
out.writeString(type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might break if type is "null". We should write some dummy type to the stream then?

Copy link
Contributor

@jtibshirani jtibshirani Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that before this change, type could never be null in this class (it is marked as final, and the constructor validates that it is not null). I will add a comment here to clarify.

This also reminds me of something I was hoping to get your thoughts on. With this implementation, it is not possible to use a typeless terms lookup query in a mixed 6.x and 7.0 cluster. The error will occur when a 6.x nodes tries to decode the optional string that the 7.0 node is sending here. Does this seem reasonable in terms of behavior + error messaging (not sure if there is any precedent for such a change)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error will occur when a 6.x nodes tries to decode the optional string that the 7.0 node is sending here.

Exactly what I thought, with the exception that the problem happens then a 7.0 node sends an object with a null-type I think we should make sure this cannot happen though. If we want to keep the current decision in this PR to allow null-references for types in the POJOs we need to send some dummy type when serializing to a 6.x node. We should also add unit tests for this case.

Copy link
Contributor

@jtibshirani jtibshirani Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I understand your concern now. I had assumed that passing a dummy type would actually cause more confusion than simply erroring out. For example, say a user had an index with document type my_type, and a mixed-node cluster. Then a typeless terms lookup query will always succeed if the lookup document is on a 7.0 node, but show no error and not find the document if it lives on a 6.x node. This is because the type name _doc will be used for the 6.x lookup, and we will fail to find the lookup document without any indication as to why. Using _doc in get requests to mean 'any type' is functionality we only support on 7.0.

I'm now thinking it would be clearest if I explicitly checked for a null type here, and errored with the message "Typeless terms lookup queries are not supported if any node is running a version before 7.0."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the type name _doc will be used for the 6.x lookup, and we will fail to find the lookup document without any indication as to why.

Is there a chance to use the dummy "_doc" type in order to not break serialization but then infer the "right" type name when on the shard for this case? As far as i understand the types removal process all 6.x indices should only have one type left, even if it is named differently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be possible if we backported the part of #35790 that applied to 'get' requests. I would prefer not to do that if possible because it is not so straightfoward to backport, especially since 6.x can contain some indices with multiple types, as it supports indices created in 5.x. I also don't think that using typeless terms lookups in a mixed-node cluster will be a common enough occurrence to warrant the extra work + complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining, that is unfortunate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 36549 PR for the IndexRequest class I opted to serialize using the result of a call to type() which will return _doc if null. This made 7.x+6.x nodes happy to talk to each other.
It was important that IndexRequest kept null types in the calling client because type==null was the condition I used to know if a blank request should inherit a choice of type from a BulkRequest's choice of default.

@jtibshirani
Copy link
Contributor

Thanks @cbuescher for the helpful comments, I've pushed some commits to address them. I understand your thoughts around a second reviewer, just wanted to note for context that (1) these changes are quite different from the types deprecation work we've done so far, and (2) both Mayya and I have taken a close look at them.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor nit, but I think the mixed cluster serialization issue thats left open needs adressing and also probably a unit test that checks we can send/receive query builders from a 6.x node.

@jtibshirani
Copy link
Contributor

I think this is ready for another look.

} else {
if (type == null) {
throw new IllegalArgumentException("Typeless [terms] lookup queries are not supported if any " +
"node is running a version before 7.0.");
Copy link
Member

@cbuescher cbuescher Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like throwing an IAE on the serialization layer, but I see that we don't have the information about the output streams version elsewhere. I will try to look into how this error propagates to e.g. the user making a rest call and I'd like to discuss the alternative of using a "dummy" type (with the downside that this then won't match lookup items with a non-default 6.x type name) again. In any case we should add something to the docs or some "known issues" document to capture this edge case.

Copy link
Contributor

@jtibshirani jtibshirani Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I can also take a look at this by trying it out in REST tests. I will ping you offline to further discuss the idea of using a 'dummy' type name.

}
if (indexedShapeId != null && indexedShapeType == null) {
throw new IllegalArgumentException("indexedShapeType is required if indexedShapeId is specified");
throw new IllegalArgumentException("either shapeBytes or indexedShapeId is required");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"shapeBytes" should be "shape"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -912,9 +964,18 @@ public static MoreLikeThisQueryBuilder fromXContent(XContentParser parser) throw
if (stopWords != null) {
moreLikeThisQueryBuilder.stopWords(stopWords);
}

if (!moreLikeThisQueryBuilder.isTypeless()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer == false to !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if (out.getVersion().onOrAfter(Version.V_7_0_0)) {
out.writeOptionalString(type);
} else {
out.writeString(type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 36549 PR for the IndexRequest class I opted to serialize using the result of a call to type() which will return _doc if null. This made 7.x+6.x nodes happy to talk to each other.
It was important that IndexRequest kept null types in the calling client because type==null was the condition I used to know if a blank request should inherit a choice of type from a BulkRequest's choice of default.

assertThat(query, instanceOf(MoreLikeThisQueryBuilder.class));

MoreLikeThisQueryBuilder mltQuery = (MoreLikeThisQueryBuilder) query;
if (!mltQuery.isTypeless()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

==false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

assertThat(query, CoreMatchers.instanceOf(TermsQueryBuilder.class));

TermsQueryBuilder termsQuery = (TermsQueryBuilder) query;
if (termsQuery.termsLookup() != null && termsQuery.termsLookup().type() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an isTypeless() abstraction like you did for MLT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jtibshirani
Copy link
Contributor

Thanks to both of you for the review. I pushed changes to address @markharwood's suggestions. I looked into changing the strategy or handling mixed-cluster serialization, and my intuition is to stick with the current approach:

  • There is precedent for throwing an IllegalArgumentException within a writeTo method. Even within query builder classes, there are a couple examples (PercolateQueryBuilder and GeoShapeQueryBuilder).
  • I tried a refactor where TermsLookup used _doc instead of a null type to indicate it was typeless. However, this made it hard to tell later whether it was constructed in a truly typeless manner, or just by setting type: _doc. I'm not able to avoid this issue by moving the deprecation check into TermsLookup parsing, because of subtleties of how TermsQueryBuilderTests works. In particular, if it were moved into TermsLookup, then we could emit a deprecation warning but fail parsing in testUnknownFields, resulting in an unexpected warning that is never covered by assertWarnings.
  • We could introduce a more complex structure where _doc is used for serialization, but we separately track that the lookup is 'typeless', but I think this results in more complex code.
  • I'm not actually able to replicate a situation where a terms lookup query is serialized to a 6.x node, because I think the coordinating node always fetches the document and rewrites it to a normal terms query before sending the search request to other nodes. So I'm not sure this case can even be hit outside of tests, and it seems good to take the option that results in the simplest code.

@cbuescher
Copy link
Member

My intuition is to stick with the current approach

Then I'm fine with that, as far as I understand user will still have the option to use types if anything on their side doesn't work without types in a mixed-cluster environment, they'd only have to live with the deprecation warnings. Is that correct?
In any case, I'd at least like to document this somewhere as a "known issue" (as in: terms lookups don't work in mixed clusters without types). I'm not sure if the documentation is the right place, maybe we have some extra documentation around types removal where we can put this? I don't know if we had a "known issues" document somewhere at some point, I cannot seem to find it but maybe you can check?

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jtibshirani
Copy link
Contributor

as far as I understand user will still have the option to use types if anything on their side doesn't work without types in a mixed-cluster environment, they'd only have to live with the deprecation warnings. Is that correct?

Yes, that's correct.

I can't find a 'known issues' document, but I will create another section under removal_of_types.asciidoc about 6.7 and the mixed cluster set-up. I've made a note on the meta issue to remember to do this.

@jtibshirani
Copy link
Contributor

@elasticmachine run gradle build tests 1

@jtibshirani jtibshirani merged commit ec32e66 into elastic:master Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>deprecation :Search Foundations/Mapping Index mappings, including merging and defining field types v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants