Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search hit refactoring #41656

Merged
merged 7 commits into from Mar 30, 2020
Merged

Conversation

sandmannn
Copy link
Contributor

@sandmannn sandmannn commented Apr 29, 2019

This is the second PR to deal with the issue discussed in #24422. In the first PR #38373 the GetResult is adjusted while here we concentrate on SearchHit

@colings86 colings86 added the :Search/Search Search-related issues that do not fall into other categories label Apr 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@sandmannn sandmannn closed this May 26, 2019
@sandmannn sandmannn deleted the search-hit-pr branch May 26, 2019 21:01
@sandmannn sandmannn restored the search-hit-pr branch May 26, 2019 21:01
@sandmannn sandmannn reopened this Jun 5, 2019
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing this. I have a few comments. The most important one is to ensure we don't break the json response structure.

} else if (size == 1) {
DocumentField hitField = DocumentField.readDocumentField(in);
fields = singletonMap(hitField.getName(), hitField);
if (in.getVersion().onOrAfter(Version.V_8_0_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 7.3.0, since that is currently 7.x, where this would be backported to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the last rebasing I adapted to 8.0.0 again; @mayya-sharipova should I maybe use an earlier version instead, e.g. 7.7.0?

}
}

private void splitFieldsByMetadata(Map<String, DocumentField> fields, Map<String, DocumentField> outOther,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outOther -> outDocument? to parallel documentFields that it writes out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

public void fields(Map<String, DocumentField> fields) {
this.fields = fields;
this.metaFields.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clear? We should just set to maps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a way to make sure that the new content coming from fields map does not get mixed with old content of the maps. If I got you right, this will be replaced with creation of 2 new empty hash maps here.

this.fields = fields;
this.metaFields.clear();
this.documentFields.clear();
if (fields == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While null is allowed right now, I don't think we ever actually pass it. I see 3 non-test uses of this method, all of which pass a non-null map. I think you can add a requiresNonNull here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added Objects.requireNonNull

@@ -624,9 +675,16 @@ public XContentBuilder toInnerXContent(XContentBuilder builder, Params params) t
if (source != null) {
XContentHelper.writeRawField(SourceFieldMapper.NAME, source, builder, params);
}
if (!otherFields.isEmpty()) {
builder.startObject(Fields.FIELDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't break the json response like this. The union of the document and metadata fields should still exist under "fields".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected. Looks like the metadata fields were just on the root object, while only document fields were under fields. Corrected to what it has been before.

It is slightly concerning that we have a dependency to static MapperService.getAllMetaFields() in the json deserialization part, which contradicts our strategic plan of separating the metadata and document fields in different maps on SearchHit level. This is not the objective of this PR, but not sure how can we deal with this in future

@sandmannn sandmannn force-pushed the search-hit-pr branch 2 times, most recently from 269f3b8 to 8745b55 Compare June 10, 2019 13:10
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed you had updated this PR. In the future, it is best to ping when there are more updates to review, as it is difficult to notice status updates on a PR vs watching for pings.

Are you still interested in getting this in? i left a few comments.

@@ -543,6 +605,8 @@ public void setInnerHits(Map<String, SearchHits> innerHits) {
static final String _PRIMARY_TERM = "_primary_term";
static final String _SCORE = "_score";
static final String FIELDS = "fields";
static final String DOCUMENT_FIELDS = "document_fields";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the original issue, modifying the json output structure is not something we want to do, since it would be a breaking change and affect basically every user of elasticsearch. We should continue writing to xcontent in the old structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @rjernst , I think the issue I struggled with last time and tried to describe in #41656 (comment) is that it is difficult to discern metafields from regular fields during deserialization if we store both of them under the same key "fields". Then the only source of information would be MapperService.getAllMetaFields() which we are trying to refactor away. Do you see any other way to separate those groups of fields during deserialization if json structure remains the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are 2 cases here: rest (json) responses, and internal node to node communication. The latter is handled by the stream input/output version checks.

For rest/json response, we currently recommend users upgrade Elasticsearch before upgrading the HLRC, which means the rest layer here must stay the same for compatibility with the older client during this upgrade period.

Only document fields are output within the "fields" object. Meta fields are output at the root level (the same level as "fields"). We should be able to catch the rest of the special fields we did not parse specially (ie if a field wasn't parsed explicitly like _id, then it is a meta field). Currently we register each meta field in the object parser. Instead, we should now be using an UnknownFieldConsumer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rjernst, I was reading through code again - looks like that concern of mine is no big issue after all.

I think there are 2 cases here: rest (json) responses, and internal node to node communication. The latter is handled by the stream input/output version

agree. The node to node communication was already addressed before in this PR, we are only discussing json responses here

For rest/json response, we currently recommend users upgrade Elasticsearch before upgrading the HLRC, which means the rest layer here must stay the same for compatibility with the older client during this upgrade period.

I previously though it would be difficult to have it compatible, however after having another look, I don't see problems with that. Currently this PR should not be introducing breaking changes to rest/json api.

Only document fields are output within the "fields" object.

Is preserved.

Meta fields are output at the root level (the same level as "fields").

Is preserved as well.

We should be able to catch the rest of the special fields we did not parse specially (ie if a field wasn't parsed explicitly like _id, then it is a meta field).

Exactly this is happening in the declareMetaDataFields method which already existed before this PR. Currently this method is mainly remaining intact in the PR, I only change where in the object parser map does it store the metafields

Currently we register each meta field in the object parser. Instead, we should now be using an UnknownFieldConsumer.

This is the only part I have questions about. Looks like declareMetaDataFields is doing the job already, why should we move from object parser to UnknownFieldConsumer?

Also addressed part of your comments in the last commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova I was referring to the last question here.

} else if (size == 1) {
DocumentField hitField = DocumentField.readDocumentField(in);
fields = singletonMap(hitField.getName(), hitField);
if (in.getVersion().onOrAfter(Version.V_7_3_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the corresponding read will need to be updated to 7.4 now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to Version.V_8_0_0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova could you please confirm that Version.V_8_0_0 should be used for this PR (and not e.g. some 7.x.x. version)

@@ -264,6 +271,44 @@ public void writeTo(StreamOutput out) throws IOException {
}
}

private Map<String, DocumentField> readFields(StreamInput in) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading and writing the document fields directly shoudln't be necessary. Instead we can use in.readMap(StreamInput::readString, DocumentField::new) and for writing out.writeMap(map, StreamOutput::writeString, DocumentField::writeTo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added readMap/writeMap for communication with server versions 8.0.0+; for communication with the older versions the old serialization using explicit DocumentField parsing remains as before to avoid potential backward compatibility bugs.

return fields.values().iterator();
// need to join the fields and metadata fields
Map<String, DocumentField> allFields = this.getFields();
if (allFields == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can never be null, right? We always create a new map in getFields() currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

Map<String, DocumentField> searchFields = getSearchFields(context, fieldsVisitor, subDocId,
storedToRequestedFields, subReaderContext);

Map<String, DocumentField> metaFields = new HashMap<>(),
documentFields = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please use a separate line for declaration of documentFields, no need for a continuation of the type.

@mayya-sharipova
Copy link
Contributor

@sandmannn Sorry that it has been long since we were doing review of this PR. We would you like to continue to review it and drive it to the end in earnest.
Would you like to continue working on it and resolve the conflicts? Or would you like to me to continue working on it?

@sandmannn
Copy link
Contributor Author

@mayya-sharipova nice to hear from you. I can work on it during the next 2 weeks. I plan to have limited availability after that, so it is reasonable to make a handover after that.

For now, I will concentrate on addressing the merge conflicts and getting the tests back to green. In addition, there is one open question from the previous review cycle regarding usage of UnknownFieldConsumer- I will tag you there, maybe you can add a comment while I am dealing with merge conflicts

@sandmannn sandmannn force-pushed the search-hit-pr branch 2 times, most recently from 8745b55 to 77c4daf Compare March 4, 2020 23:27
@sandmannn
Copy link
Contributor Author

sandmannn commented Mar 5, 2020

@mayya-sharipova to my current understanding, the PR is rebased on master and does not provide any serialization inconsistency neither with JSON nor in binary. Yet the question of migration from explicitly enumerated metafields to using UnknownFieldConsumer is still open - looks like it may be a quite complicated step that may be better addressable in a separate PR as this one is getting large. We need to make the decision about the scope of PR and for that I need input from you.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandmannn Thanks for rebasing your PR. The PR looks overall very good to me. I left some comments

About using UnknownFieldConsumer instead of declareMetaDataFields it depends on you if you want to tackle this in this PR or you would like to leave it to a new PR.
I don't expect many changes for using UnknownFieldConsumer. The code should be something:

private static ObjectParser.UnknownFieldConsumer<Map<String, Object>> unknownMetaFieldConsumer = (map, fieldName, fieldValue) -> {
        Map<String, DocumentField> fieldMap = (Map<String, DocumentField>) map.computeIfAbsent(METADATA_FIELDS, v -> new HashMap<String, DocumentField>());
        fieldMap.put(fieldName, new DocumentField(fieldName, Collections.singletonList(fieldValue)));
    };

private static final ObjectParser<Map<String, Object>, Void> MAP_PARSER = new ObjectParser<>("innerHitParser", unknownMetaFieldConsumer,
        HashMap::new); 

But we can also leave this to a new PR. In this case, can you please put TODO comment to replace declareMetaDataFields with UnknownFieldConsumer.

SearchHit searchHit = new SearchHit(docId, fieldsVisitor.id(), searchFields);
Map<String, DocumentField> metaFields = new HashMap<>();
Map<String, DocumentField> documentFields = new HashMap<>();
splitSearchHitFields(searchFields, documentFields, metaFields);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use SearchHit::splitFieldsByMetadata instead of introducing a new function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sandmannn
Copy link
Contributor Author

But we can also leave this to a new PR. In this case, can you please put TODO comment to replace declareMetaDataFields with UnknownFieldConsumer.

TODO added.

@mayya-sharipova
Copy link
Contributor

@elasticmachine test this please

@mayya-sharipova
Copy link
Contributor

@sandmannn Thanks for your changes, can you please address failing tests.

@sandmannn
Copy link
Contributor Author

@sandmannn Thanks for your changes, can you please address failing tests.

I am on it, will need some time.

@sandmannn
Copy link
Contributor Author

sandmannn commented Mar 25, 2020

@sandmannn Thanks for your changes, can you please address failing tests.

I am on it, will need some time.

adjusted the way new DocumentFields were added to an existing SearchHit instance which was previously causing the integration test to fail; @mayya-sharipova please retrigger the CI job

@mayya-sharipova
Copy link
Contributor

@elasticmachine test this please

@mayya-sharipova
Copy link
Contributor

@sandmannn Congratulations, your PR has been merger to master! And it looks like we have not broken anything!
Would you like to create a backport PR against 7.x? There should not be much changes from this PR -- mainly to ensure proper index versions while reading and writing to stream?

@sandmannn
Copy link
Contributor Author

sandmannn commented Mar 31, 2020

@sandmannn Congratulations, your PR has been merger to master! And it looks like we have not broken anything!
Would you like to create a backport PR against 7.x? There should not be much changes from this PR -- mainly to ensure proper index versions while reading and writing to stream?

Hi @mayya-sharipova as it should mainly consist of downporting the changes to another branch, I think I can do it but please expect some delay since, as we discussed above, I will have less opportunity to contribute starting from end of March/beginning of April due to some extra load related to my employment.

I guess I should open a new PR against https://github.com/elastic/elasticsearch/tree/7.7 ?

@mayya-sharipova
Copy link
Contributor

@sandmannn Thank you. Please let me know if you don't have time to do this, I can do the backport myself.
If you have some time, then please open the PR against 7.x

@mayya-sharipova
Copy link
Contributor

@sandmannn No worries about backport, I will do it myself tomorrow. We want backports to happen fast.

@sandmannn
Copy link
Contributor Author

@mayya-sharipova thanks for taking it over; I doubt I would have found time for it today.

mayya-sharipova pushed a commit to mayya-sharipova/elasticsearch that referenced this pull request Apr 1, 2020
Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue elastic#24422 to remove
dependency on MapperService to check if a field is metafield.

Relates to PR: elastic#38373
Relates to issue elastic#24422
mayya-sharipova added a commit that referenced this pull request Apr 1, 2020
Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue #24422 to remove
dependency on MapperService to check if a field is metafield.

Relates to PR: #38373
Relates to issue #24422

Co-authored-by: sandmannn <bohdanpukalskyi@gmail.com>
mayya-sharipova added a commit that referenced this pull request Apr 1, 2020
Charge wire version for SearchHit fields to 7.8

related to #41656, #54584
yyff pushed a commit to yyff/elasticsearch that referenced this pull request Apr 17, 2020
Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue elastic#24422 to remove
dependency on MapperService to check if a field is metafield.

Relates to PR: elastic#38373
Relates to issue elastic#24422
yyff pushed a commit to yyff/elasticsearch that referenced this pull request Apr 17, 2020
Charge wire version for SearchHit fields to 7.8

related to elastic#41656, elastic#54584
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request May 14, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

This PR also refactors DocumentField to add a new parameter
to the constructor of DocumentField isMetadataField
which describes whether this is metadata field or not.
This refactoring is necessary as there is no more statiic
method to check if a field is meta-field, but DocumentField
objects are always created from the contexts  where this
information is available, and will be passed to DocumentField
constructor for further usage.

Related elastic#38373, elastic#41656
Closes elastic#24422
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request May 14, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

This PR also refactors DocumentField to add a new parameter
to the constructor of DocumentField isMetadataField
which describes whether this is metadata field or not.
This refactoring is necessary as there is no more statiic
method to check if a field is meta-field, but DocumentField
objects are always created from the contexts  where this
information is available, and will be passed to DocumentField
constructor for further usage.

Related elastic#38373, elastic#41656
Closes elastic#24422
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request May 15, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

This PR also refactors DocumentField to add a new parameter
to the constructor of DocumentField isMetadataField
which describes whether this is metadata field or not.
This refactoring is necessary as there is no more statiic
method to check if a field is meta-field, but DocumentField
objects are always created from the contexts  where this
information is available, and will be passed to DocumentField
constructor for further usage.

Related elastic#38373, elastic#41656
Closes elastic#24422
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request May 29, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related elastic#38373, elastic#41656
Closes elastic#24422
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request May 29, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related elastic#38373, elastic#41656
Closes elastic#24422
mayya-sharipova added a commit that referenced this pull request Jun 5, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related #38373, #41656
Closes #24422
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Jun 5, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related elastic#38373, elastic#41656
Closes elastic#24422
mayya-sharipova added a commit that referenced this pull request Jun 8, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related #38373, #41656
Closes #24422
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jun 23, 2020
This commit restores the filtering of empty fields during the
xcontent serialization of SearchHit. The filtering was removed
unintentionally in elastic#41656.
jimczi added a commit that referenced this pull request Jun 25, 2020
This commit restores the filtering of empty fields during the
xcontent serialization of SearchHit. The filtering was removed
unintentionally in #41656.
jimczi added a commit that referenced this pull request Jun 25, 2020
This commit restores the filtering of empty fields during the
xcontent serialization of SearchHit. The filtering was removed
unintentionally in #41656.
jimczi added a commit that referenced this pull request Jun 25, 2020
This commit restores the filtering of empty fields during the
xcontent serialization of SearchHit. The filtering was removed
unintentionally in #41656.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories v7.8.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants