Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing metadata information in GetResult #38373

Merged
merged 23 commits into from May 23, 2019

Conversation

Projects
None yet
7 participants
@sandmannn
Copy link
Contributor

commented Feb 4, 2019

This PR adds a new parameter to the constructor of document field isMetadataField which describes whether the DocumentField in question is metadata field or not. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.

@@ -350,15 +352,15 @@ public static GetResult fromXContentEmbedded(XContentParser parser, String index
}
} else if (FIELDS.equals(currentFieldName)) {
while(parser.nextToken() != XContentParser.Token.END_OBJECT) {
DocumentField getField = DocumentField.fromXContent(parser);
DocumentField getField = DocumentField.fromXContent(parser, false);

This comment has been minimized.

Copy link
@sandmannn

sandmannn Feb 4, 2019

Author Contributor

@rjernst This DocumentField is not inside of _source, but I assumed that it is not a metadata field based on the way the FIELDS component is filled during serialization https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/get/GetResult.java#L277-L282

@elasticmachine

This comment has been minimized.

Copy link

commented Feb 5, 2019

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

@sandmannn Thank you for your PR. Can you please sing CLA, so after that we can start CI test for this PR.

@sandmannn

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2019

Hi, I actually signed it several times, but the check is still red. Should I provide a Transaction ID?

@javanna

This comment has been minimized.

Copy link
Member

commented Feb 25, 2019

sorry @sandmannn it looks like we had some technical problems at the time that you signed the CLA, we apologize. Would you mind signing it once again? Hopefully this time it will work.

@sandmannn

This comment has been minimized.

Copy link
Contributor Author

commented Feb 25, 2019

Hi @javanna , I just did it again. looks like that check is passed now.

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor

commented Feb 25, 2019

@elasticmachine test this please

@elasticcla

This comment has been minimized.

Copy link

commented Feb 26, 2019

Hi @sandmannn, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@sandmannn sandmannn force-pushed the sandmannn:documentfield branch from 88c2a71 to e657574 Feb 26, 2019

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

@elasticmachine test this please

1 similar comment
@mayya-sharipova

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

@elasticmachine test this please

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

@elasticmachine test this please

@mayya-sharipova mayya-sharipova self-requested a review Feb 28, 2019

@@ -47,14 +46,16 @@
public class DocumentField implements Streamable, ToXContentFragment, Iterable<Object> {

private String name;
private Boolean isMetadataField;

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Feb 28, 2019

Contributor

can this be final? If you are adding a new member variable, you should modify all corresponding methods to incorporate it, methods such as readFrom, writeTo, equals, hashCode, toString. For readFrom and writeTo the serialization should be a version dependent (an example here)

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Feb 28, 2019

Contributor

We need to think more how to organize DocumentField class. Usually fromXContent -> toXContent ->fromXContent should produce an equal object (and this is what we test in DocumentFieldTests). The way how you organized a DocumentField class, it doesn't happen as toXContent is using isMetadataField, and fromXContent is not using it.

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Feb 28, 2019

Contributor

if it requires a lot of changes to use isMetadataField in toXContent (we need to investigate this first), then we should document why we are not including it in toXContent, and also exclude this field from equals, hashCode - also documenting why isMetadataField is not participating in these functions

This comment has been minimized.

Copy link
@sandmannn

sandmannn Mar 6, 2019

Author Contributor

Thanks for the review!

Regarding consistency of fromXContent -> toXContent ->fromXContent we had a relevant discussion in the issue description. There was a suggestion of using different json structures depending when serializing from new versions, while still being able to deserialize json from old versions, as suggested in #24422 (comment) . In such case it would have been possible to have enough information in the json to be able to make the conversions consistent without context. Yet there was a strong argument to avoid as mentioned here #24422 (comment)
as changing such json structure may result in issues when serializing it in one version and parsing in another.

In short, looks like it is not a lot of changes, but a bwc concern that prevents us from storing the isMetadataField in toXContent. It would be great if you share your thoughts on the matter.

This comment has been minimized.

Copy link
@sandmannn

sandmannn Mar 6, 2019

Author Contributor

here we need to make a trade-off between a)extending serialized json schema for document field, additional coding to guarantee backward compatibility for serialized content and b)guessing the meaning of json fields depending on the context, which may potentially introduce some bugs if our assumptions are incorrect in some case, e.g. it is not completely obvious if it is a right approach here https://github.com/elastic/elasticsearch/pull/38373/files/314c58a64e5c4f2fb3fc9ec6e3366a0209597dd5#r253675646

what is the process for deciding between a and b here?

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Mar 7, 2019

Contributor

@sandmannn Thanks for clarification. Ok, lets leave your code for json serialization for DocumentField at it is now, but just put comments why isMetadataField is a special field, why it is not participating in equals, hashCode, toString, and toXContent.
You still need to modify readFrom and writeTo methods to include this field depending on the version.

@mayya-sharipova
Copy link
Contributor

left a comment

@sandmannn thanks for your PR, here is the 1st round of requested changes.

@@ -132,7 +133,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
return builder;
}

public static DocumentField fromXContent(XContentParser parser) throws IOException {
public static DocumentField fromXContent(XContentParser parser, boolean inMetadataArea) throws IOException {

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Feb 28, 2019

Contributor

inMetadataArea why not call this parameter the same isMetadataField?

This comment has been minimized.

Copy link
@sandmannn

sandmannn Mar 6, 2019

Author Contributor

this parameter naming follows the intent to emphasize and make it explicit that we decide whether the field is metadata or not depending on whether it is located in metadata area of xcontent or not.

I don't really have a strong opinion here, we can replace it with isMetadataField and add some comments, if you prefer that.

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Mar 7, 2019

Contributor

thanks for the explanation, isMetadataField sounds better for consistency with the rest of the code. And making a field being a part of medata makes it a metadata field.

@rjernst
Copy link
Member

left a comment

After looking through this in more detail, I wonder if DocumentField should not have any isMetadataField() method at all. Currently we determine whether a field is a meta field or not through this static list in the MapperService. Yet we want this to be through the registration of the underlying Mapper (which we already have, it is just not exposed to most parts of the code). DocumentField itself is a POJO of sorts, having no knowledge of the node it is being serialized/deserialized on. Instead of trying to force this information into the DocumentField, I think we can modify the two callers of DocumentField.isMetadataField() to look elsewhere (ie the MapperService) to determine whether a field is a meta field or not. This would have the same underlying effect: the node the DocumentField is being used on would control where in the structure the field ends up. But this is no worse than the node level static list we have today, and matches with the node level registration we have.

@sandmannn

This comment has been minimized.

Copy link
Contributor Author

commented Mar 10, 2019

Thanks for the suggestion @rjernst , it might result in a cleaner solution. Yet currently it returns us to the old question from the issue discussion #24422 (comment) i.e. what can be the right way of getting the right instance of MapperService in the xContent serialization functions of GetResult and SearchHit where we need this information. Is there any way to directly access the current Node and/or MapperService ? Otherwise we may need to inject MapperService in all constructors of GetResult and SearchHit, which is also quite messy.

@rjernst

This comment has been minimized.

Copy link
Member

commented Mar 13, 2019

@sandmannn You are right my suggestion raises other concerns. I looked over this with @mayya-sharipova and we came up with a slightly different approach I think will be much cleaner. The idea is to change the places storing DocumentField to have two maps, one for fields and one for metadata fields. In all the DocumentField construction sites you changed here, the object is placed into a Map which later gets read when serializing xcontent in GetResult and SearchHit. By those two classes using two maps, the call sites you changed here can decide which map to place the newly constructed DocumentField into, rather than stashing this knowledge inside the DocumentField. I don't think it would be a very large change to this PR, but if it turns out to be, adding the new map could be a done as a first isolated PR.

@sandmannn

This comment has been minimized.

Copy link
Contributor Author

commented Mar 14, 2019

Thanks for aligning @rjernst , @mayya-sharipova ! Just to make sure I got the idea right, let me paraphrase it. Lets focus for a moment at GetResult as it already has a construct that looks somewhat similar. When we are serializing GetResult we are already separating the metadata fields from other fields (currently not in maps, but in lists) here https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/get/GetResult.java#L254-L257

If I understanding the suggestion correctly, the idea is to do this separation even earlier, i.e. when we are receiving the fields in the constructor, we would separate them into something like this.metadataFieldsMap and this.otherFieldsMap instead of filling the this.fields directly
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/get/GetResult.java#L93-L95

In such case, I think one piece is still not completely clear for me: how will we separate between metadata and non-metadata fields in constructor of GetResult? Using the same field.isMetadataField() (which I believe is not our intent) or via something else? Once we have this separating information I think we will be able to keep it during serialization/deserialization, but getting this separation in the first place may be a bit tricky.

@rjernst

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

We wouldn't be doing the split within the GetResult constructor. It would be at the locations calling the constructor. In the case of GetResult, this is deserialization, and ShardGetService. For the first, we need to update the serialization format for GetResult to store two separate maps, and in deserialization we would use the existing static method (it will have to remain in the code until we no longer support rolling upgrades to whichever version this is added in, ie 8.0). For ShardGetService, we can have the MapperService.

Doc review1 (#10)
* Added version dependent binary stream serialization

* Added comments for not used functions

* Updated constructor of GetResult

* Removed old changes in documentField

* clean tests

* Clean up document field changes

* more cleanup

* Addjusted dependent tests for GetResult

* Minor cleanup
@@ -76,14 +76,15 @@ public void testToXContent() throws IOException {
{
GetResult getResult = new GetResult("index", "type", "id", 0, 1, 1, true, new BytesArray("{ \"field1\" : " +
"\"value1\", \"field2\":\"value2\"}"), singletonMap("field1", new DocumentField("field1",
singletonList("value1"))));
singletonList("value1"))),singletonMap("field1", new DocumentField("metafield",

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova May 9, 2019

Contributor

nit. space before singletonMap

This comment has been minimized.

Copy link
@sandmannn

sandmannn May 11, 2019

Author Contributor

done

return fields;
}

public static void splitFieldsByMetadata(Map<String, DocumentField> fields, Map<String, DocumentField> outOther,

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova May 9, 2019

Contributor

should be drop public modifier here?

This comment has been minimized.

Copy link
@sandmannn

sandmannn May 11, 2019

Author Contributor

Unfortunately this method is required in tests e.g. https://github.com/elastic/elasticsearch/pull/38373/files#diff-219c73a7a0b84de3e96bdda87ec92270R218 when we are building random getResult using the builders of random DocumentFields and need to understand where to put each of the randomly generated fields. Not sure how to go around it.

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova May 21, 2019

Contributor

@sandmannn removing public modifier will make this method package-private which will still allow it to be used in GetResultTests class.

This comment has been minimized.

Copy link
@sandmannn

sandmannn May 21, 2019

Author Contributor

thanks, done

@mayya-sharipova
Copy link
Contributor

left a comment

Thanks @sandmannn.
I have left some small comments, and provided they are addressed, and the build is green, this PR looks good to me.

@rjernst
Copy link
Member

left a comment

Thank you for persevering on this issue @sandmannn! I just have a few more minor comments, which should be easy changes.

if (fields == null) {
// need to join the fields and metadata fields
Map<String, DocumentField> allFields = this.getFields();
if (allFields == null) {

This comment has been minimized.

Copy link
@rjernst

rjernst May 21, 2019

Member

I dont' think this is possible? getFields() never returns null

This comment has been minimized.

Copy link
@sandmannn

sandmannn May 21, 2019

Author Contributor

done

}
}
for (DocumentField field : metaFields) {
for (DocumentField field : metaFields.values()) {

This comment has been minimized.

Copy link
@rjernst

rjernst May 21, 2019

Member

Using .values() can result in changes in ordering. However, it looks like we already had this problem. I opened #42307 to followup.

int size = in.readVInt();
if (size == 0) {
fields = emptyMap();
if (in.getVersion().onOrAfter(Version.V_8_0_0)) {

This comment has been minimized.

Copy link
@rjernst

rjernst May 21, 2019

Member

I think this should be 7.3.0, since we intend to backport to 7.x?

This comment has been minimized.

Copy link
@sandmannn

sandmannn May 21, 2019

Author Contributor

at the moment of writing this, the latest among 7.x.x is 7.2.0 : https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/Version.java#L110
Should we add 7.3.0 to that file?

This comment has been minimized.

Copy link
@rjernst

rjernst May 21, 2019

Member

Sorry I mispoke. 7.2.0 is good.

This comment has been minimized.

Copy link
@sandmannn

sandmannn May 21, 2019

Author Contributor

hm, what is the regular sequence of actions to create a downport here?

As expected, without adjustments in serialization for 7.2 the tests for the mixed cluster fail. Should I open a similar downport PR against https://github.com/elastic/elasticsearch/tree/7.x, get it merged, and then expect the build here to be green?

This comment has been minimized.

Copy link
@rjernst

rjernst May 21, 2019

Member

We handle this by temporarily disabling bwc tests. I just pushed the disabling to your branch. Tomorrow I will test the bwc behavior locally before pushing.

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova May 22, 2019

Contributor

@rjernst Looks like 7.2 has already passed feature freeze. Should this PR target 7.3 and master?

This comment has been minimized.

Copy link
@rjernst

rjernst May 22, 2019

Member

Yes, 7.2 was branched, but the 7.3 constant hasn't been pushed yet. I will update this branch today once the constant exists.

@@ -201,9 +202,14 @@ private GetResult innerGetLoadFromStoredFields(String type, String id, String[]

if (!fieldVisitor.fields().isEmpty()) {
fieldVisitor.postProcess(mapperService);
fields = new HashMap<>(fieldVisitor.fields().size());
nonMetaDataFields = new HashMap<>();

This comment has been minimized.

Copy link
@rjernst

rjernst May 21, 2019

Member

we should use documentFields for consistency

This comment has been minimized.

Copy link
@sandmannn

sandmannn May 21, 2019

Author Contributor

done

sandmannn and others added some commits May 21, 2019

@rjernst

This comment has been minimized.

Copy link
Member

commented May 22, 2019

I've tested backcompat after locally backporting and it looks good. I'll merge this once CI completes.

@rjernst rjernst merged commit 8177e71 into elastic:master May 23, 2019

8 checks passed

CLA All commits in pull request signed
Details
elasticsearch-ci/1 Build finished.
Details
elasticsearch-ci/2 Build finished.
Details
elasticsearch-ci/bwc Build finished.
Details
elasticsearch-ci/default-distro Build finished.
Details
elasticsearch-ci/docbldesx Build finished.
Details
elasticsearch-ci/oss-distro-docs Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details
@rjernst

This comment has been minimized.

Copy link
Member

commented May 23, 2019

Thanks for pushing through on this @sandmannn!!

rjernst added a commit to rjernst/elasticsearch that referenced this pull request May 23, 2019

Split document and metadata fields in GetResult (elastic#38373)
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in elastic#24422.
@sandmannn

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2019

thanks for merging! Now what is a process about the actual downport? Do we need a new PR against 7.x?

@sandmannn sandmannn deleted the sandmannn:documentfield branch May 23, 2019

rjernst added a commit that referenced this pull request May 23, 2019

Split document and metadata fields in GetResult (#38373) (#42456)
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.

rjernst added a commit to rjernst/elasticsearch that referenced this pull request May 23, 2019

Reenable bwc tests
This commit reenables bwc tests now that the backport of elastic#38373 is
complete.

@rjernst rjernst referenced this pull request May 23, 2019

Merged

Reenable bwc tests #42478

rjernst added a commit that referenced this pull request May 23, 2019

Reenable bwc tests (#42478)
This commit reenables bwc tests now that the backport of #38373 is
complete.

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019

Split document and metadata fields in GetResult (elastic#38373)
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in elastic#24422.

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019

Reenable bwc tests (elastic#42478)
This commit reenables bwc tests now that the backport of elastic#38373 is
complete.
@rjernst

This comment has been minimized.

Copy link
Member

commented May 31, 2019

@sandmannn I have already backported, see a49bafc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.