Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing metadata information in GetResult #38373

Merged
merged 23 commits into from May 23, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
e657574
Storing metadata information in DocumentField
sandmannn Feb 4, 2019
3f0bca8
Corrected constructor in test
sandmannn Feb 26, 2019
3233957
Merge remote-tracking branch 'upstream/master' into documentfield
sandmannn Feb 27, 2019
314c58a
Refined ml tests
sandmannn Feb 27, 2019
2abfc80
Doc review1 (#10)
sandmannn Mar 27, 2019
5d80874
Adjusted collection used for iterator (#11)
sandmannn Apr 3, 2019
5f52b9e
Adjusted namings of fields member (#12)
sandmannn Apr 23, 2019
6146d71
Merge branch 'master' into documentfield
sandmannn Apr 29, 2019
9f3308e
Adjusted deserialization of _ignored metafield
sandmannn May 8, 2019
e1f42c6
Fixed merge issues
sandmannn May 8, 2019
748b8fc
Adjusted depending tests in other projects
sandmannn May 8, 2019
f4fabd1
Removed interim hashmap wrappers
sandmannn May 9, 2019
c4d2cdf
Merge branch 'master' into documentfield
sandmannn May 9, 2019
fa60aa0
Adjusted constructor calls in tests
sandmannn May 9, 2019
e1e871e
Removed unused variable
sandmannn May 11, 2019
3d2109a
Changed access level of helper method
sandmannn May 21, 2019
ed39246
Merge remote-tracking branch 'upstream/master' into documentfield
sandmannn May 21, 2019
69db08e
Removed unnecessary code
sandmannn May 21, 2019
c7335c2
disable bwc tests for backport
rjernst May 21, 2019
94c6206
Merge branch 'master' into documentfield
rjernst May 22, 2019
d9b2639
Merge branch 'master' into documentfield
rjernst May 22, 2019
0550d7d
Merge branch 'master' into documentfield
rjernst May 22, 2019
75e2519
update to 7.3.0 constant
rjernst May 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -33,6 +33,7 @@
import org.elasticsearch.Version;
import org.elasticsearch.common.document.DocumentField;
import org.elasticsearch.common.lucene.search.Queries;
import org.elasticsearch.index.mapper.MapperService;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.fetch.FetchSubPhase;
import org.elasticsearch.search.internal.SearchContext;
Expand Down Expand Up @@ -110,7 +111,8 @@ static void innerHitsExecute(Query mainQuery, IndexSearcher indexSearcher, Searc
hit.fields(fields);
}
IntStream slots = convertTopDocsToSlots(topDocs, rootDocsBySlot);
fields.put(fieldName, new DocumentField(fieldName, slots.boxed().collect(Collectors.toList())));
boolean isMetadataField = MapperService.isMetadataField(fieldName);
fields.put(fieldName, new DocumentField(fieldName, slots.boxed().collect(Collectors.toList()), isMetadataField));
}
}
}
Expand Down
Expand Up @@ -26,7 +26,6 @@
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentParser;
import org.elasticsearch.index.get.GetResult;
import org.elasticsearch.index.mapper.MapperService;
import org.elasticsearch.search.SearchHit;

import java.io.IOException;
Expand All @@ -47,14 +46,16 @@
public class DocumentField implements Streamable, ToXContentFragment, Iterable<Object> {

private String name;
private Boolean isMetadataField;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be final? If you are adding a new member variable, you should modify all corresponding methods to incorporate it, methods such as readFrom, writeTo, equals, hashCode, toString. For readFrom and writeTo the serialization should be a version dependent (an example here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think more how to organize DocumentField class. Usually fromXContent -> toXContent ->fromXContent should produce an equal object (and this is what we test in DocumentFieldTests). The way how you organized a DocumentField class, it doesn't happen as toXContent is using isMetadataField, and fromXContent is not using it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it requires a lot of changes to use isMetadataField in toXContent (we need to investigate this first), then we should document why we are not including it in toXContent, and also exclude this field from equals, hashCode - also documenting why isMetadataField is not participating in these functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Regarding consistency of fromXContent -> toXContent ->fromXContent we had a relevant discussion in the issue description. There was a suggestion of using different json structures depending when serializing from new versions, while still being able to deserialize json from old versions, as suggested in #24422 (comment) . In such case it would have been possible to have enough information in the json to be able to make the conversions consistent without context. Yet there was a strong argument to avoid as mentioned here #24422 (comment)
as changing such json structure may result in issues when serializing it in one version and parsing in another.

In short, looks like it is not a lot of changes, but a bwc concern that prevents us from storing the isMetadataField in toXContent. It would be great if you share your thoughts on the matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we need to make a trade-off between a)extending serialized json schema for document field, additional coding to guarantee backward compatibility for serialized content and b)guessing the meaning of json fields depending on the context, which may potentially introduce some bugs if our assumptions are incorrect in some case, e.g. it is not completely obvious if it is a right approach here https://github.com/elastic/elasticsearch/pull/38373/files/314c58a64e5c4f2fb3fc9ec6e3366a0209597dd5#r253675646

what is the process for deciding between a and b here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandmannn Thanks for clarification. Ok, lets leave your code for json serialization for DocumentField at it is now, but just put comments why isMetadataField is a special field, why it is not participating in equals, hashCode, toString, and toXContent.
You still need to modify readFrom and writeTo methods to include this field depending on the version.

private List<Object> values;

private DocumentField() {
}

public DocumentField(String name, List<Object> values) {
public DocumentField(String name, List<Object> values, boolean isMetadataField) {
this.name = Objects.requireNonNull(name, "name must not be null");
this.values = Objects.requireNonNull(values, "values must not be null");
this.isMetadataField = isMetadataField;
}

/**
Expand Down Expand Up @@ -85,7 +86,7 @@ public List<Object> getValues() {
* @return The field is a metadata field
*/
public boolean isMetadataField() {
return MapperService.isMetadataField(name);
return this.isMetadataField;
}

@Override
Expand Down Expand Up @@ -132,7 +133,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
return builder;
}

public static DocumentField fromXContent(XContentParser parser) throws IOException {
public static DocumentField fromXContent(XContentParser parser, boolean inMetadataArea) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inMetadataArea why not call this parameter the same isMetadataField?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter naming follows the intent to emphasize and make it explicit that we decide whether the field is metadata or not depending on whether it is located in metadata area of xcontent or not.

I don't really have a strong opinion here, we can replace it with isMetadataField and add some comments, if you prefer that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation, isMetadataField sounds better for consistency with the rest of the code. And making a field being a part of medata makes it a metadata field.

ensureExpectedToken(XContentParser.Token.FIELD_NAME, parser.currentToken(), parser::getTokenLocation);
String fieldName = parser.currentName();
XContentParser.Token token = parser.nextToken();
Expand All @@ -141,7 +142,7 @@ public static DocumentField fromXContent(XContentParser parser) throws IOExcepti
while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
values.add(parseFieldsValue(parser));
}
return new DocumentField(fieldName, values);
return new DocumentField(fieldName, values, inMetadataArea);
}

@Override
Expand Down
Expand Up @@ -338,7 +338,9 @@ public static GetResult fromXContentEmbedded(XContentParser parser, String index
} else if (FOUND.equals(currentFieldName)) {
found = parser.booleanValue();
} else {
fields.put(currentFieldName, new DocumentField(currentFieldName, Collections.singletonList(parser.objectText())));
// This fields is in metadata area of the xContent, thus should be treated as metadata
fields.put(currentFieldName, new DocumentField(currentFieldName,
Collections.singletonList(parser.objectText()), true));
}
} else if (token == XContentParser.Token.START_OBJECT) {
if (SourceFieldMapper.NAME.equals(currentFieldName)) {
Expand All @@ -350,15 +352,15 @@ public static GetResult fromXContentEmbedded(XContentParser parser, String index
}
} else if (FIELDS.equals(currentFieldName)) {
while(parser.nextToken() != XContentParser.Token.END_OBJECT) {
DocumentField getField = DocumentField.fromXContent(parser);
DocumentField getField = DocumentField.fromXContent(parser, false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst This DocumentField is not inside of _source, but I assumed that it is not a metadata field based on the way the FIELDS component is filled during serialization https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/get/GetResult.java#L277-L282

fields.put(getField.getName(), getField);
}
} else {
parser.skipChildren(); // skip potential inner objects for forward compatibility
}
} else if (token == XContentParser.Token.START_ARRAY) {
if (IgnoredFieldMapper.NAME.equals(currentFieldName)) {
fields.put(currentFieldName, new DocumentField(currentFieldName, parser.list()));
fields.put(currentFieldName, new DocumentField(currentFieldName, parser.list(), false));
} else {
parser.skipChildren(); // skip potential inner arrays for forward compatibility
}
Expand Down
Expand Up @@ -202,7 +202,8 @@ private GetResult innerGetLoadFromStoredFields(String type, String id, String[]
fieldVisitor.postProcess(mapperService);
fields = new HashMap<>(fieldVisitor.fields().size());
for (Map.Entry<String, List<Object>> entry : fieldVisitor.fields().entrySet()) {
fields.put(entry.getKey(), new DocumentField(entry.getKey(), entry.getValue()));
boolean isMetadataField = MapperService.isMetadataField(entry.getKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like isMetadataField variable is never used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

fields.put(entry.getKey(), new DocumentField(entry.getKey(), entry.getValue(), isMetadataField));
}
}
}
Expand Down
Expand Up @@ -323,7 +323,8 @@ private static Fields generateTermVectorsFromDoc(IndexShard indexShard, TermVect
seenFields.add(field.name());
}
String[] values = doc.getValues(field.name());
documentFields.add(new DocumentField(field.name(), Arrays.asList((Object[]) values)));
boolean isMetadataField = MapperService.isMetadataField(field.name());
documentFields.add(new DocumentField(field.name(), Arrays.asList((Object[]) values), isMetadataField));
}
return generateTermVectors(indexShard,
XContentHelper.convertToMap(parsedDocument.source(), true, request.xContentType()).v2(), documentFields,
Expand Down
6 changes: 3 additions & 3 deletions server/src/main/java/org/elasticsearch/search/SearchHit.java
Expand Up @@ -799,7 +799,7 @@ private static void declareMetaDataFields(ObjectParser<Map<String, Object>, Void
@SuppressWarnings("unchecked")
Map<String, DocumentField> fieldMap = (Map<String, DocumentField>) map.computeIfAbsent(Fields.FIELDS,
v -> new HashMap<String, DocumentField>());
DocumentField field = new DocumentField(metadatafield, list);
DocumentField field = new DocumentField(metadatafield, list, true);
fieldMap.put(field.getName(), field);
}, (p, c) -> parseFieldsValue(p),
new ParseField(metadatafield));
Expand All @@ -809,7 +809,7 @@ private static void declareMetaDataFields(ObjectParser<Map<String, Object>, Void
Map<String, DocumentField> fieldMap = (Map<String, DocumentField>) map.computeIfAbsent(Fields.FIELDS,
v -> new HashMap<String, DocumentField>());
fieldMap.put(field.getName(), field);
}, (p, c) -> new DocumentField(metadatafield, Collections.singletonList(parseFieldsValue(p))),
}, (p, c) -> new DocumentField(metadatafield, Collections.singletonList(parseFieldsValue(p)), true),
new ParseField(metadatafield), ValueType.VALUE);
}
}
Expand All @@ -819,7 +819,7 @@ private static void declareMetaDataFields(ObjectParser<Map<String, Object>, Void
private static Map<String, DocumentField> parseFields(XContentParser parser) throws IOException {
Map<String, DocumentField> fields = new HashMap<>();
while (parser.nextToken() != XContentParser.Token.END_OBJECT) {
DocumentField field = DocumentField.fromXContent(parser);
DocumentField field = DocumentField.fromXContent(parser, false);
fields.put(field.getName(), field);
}
return fields;
Expand Down
Expand Up @@ -243,10 +243,13 @@ private Map<String, DocumentField> getSearchFields(SearchContext context,

if (storedToRequestedFields.containsKey(storedField)) {
for (String requestedField : storedToRequestedFields.get(storedField)) {
searchFields.put(requestedField, new DocumentField(requestedField, storedValues));
boolean isMetadataField = MapperService.isMetadataField(requestedField);

searchFields.put(requestedField, new DocumentField(requestedField, storedValues, isMetadataField));
}
} else {
searchFields.put(storedField, new DocumentField(storedField, storedValues));
boolean isMetadataField = MapperService.isMetadataField(storedField);
searchFields.put(storedField, new DocumentField(storedField, storedValues, isMetadataField));
}
}
return searchFields;
Expand Down
Expand Up @@ -31,6 +31,7 @@
import org.elasticsearch.index.fielddata.SortedBinaryDocValues;
import org.elasticsearch.index.fielddata.SortedNumericDoubleValues;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.MapperService;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.fetch.FetchSubPhase;
Expand Down Expand Up @@ -124,7 +125,8 @@ public void hitsExecute(SearchContext context, SearchHit[] hits) throws IOExcept
}
DocumentField hitField = hit.getFields().get(field);
if (hitField == null) {
hitField = new DocumentField(field, new ArrayList<>(2));
boolean isMetadataField = MapperService.isMetadataField(field);
hitField = new DocumentField(field, new ArrayList<>(2), isMetadataField);
hit.getFields().put(field, hitField);
}
final List<Object> values = hitField.getValues();
Expand Down
Expand Up @@ -23,6 +23,7 @@
import org.apache.lucene.index.ReaderUtil;
import org.elasticsearch.common.document.DocumentField;
import org.elasticsearch.common.util.CollectionUtils;
import org.elasticsearch.index.mapper.MapperService;
import org.elasticsearch.script.FieldScript;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.fetch.FetchSubPhase;
Expand Down Expand Up @@ -84,7 +85,8 @@ public void hitsExecute(SearchContext context, SearchHit[] hits) throws IOExcept
} else {
values = Collections.singletonList(value);
}
hitField = new DocumentField(scriptFieldName, values);
boolean isMetadataField = MapperService.isMetadataField(scriptFieldName);
hitField = new DocumentField(scriptFieldName, values, isMetadataField);
hit.getFields().put(scriptFieldName, hitField);
}
}
Expand Down
Expand Up @@ -68,7 +68,7 @@ protected ExplainResponse createTestInstance() {
0, 1, randomNonNegativeLong(),
true,
RandomObjects.randomSource(random()),
singletonMap(fieldName, new DocumentField(fieldName, values)));
singletonMap(fieldName, new DocumentField(fieldName, values, false)));
return new ExplainResponse(index, type, id, exist, explanation, getResult);
}

Expand All @@ -85,7 +85,7 @@ public void testToXContent() throws IOException {
Explanation explanation = Explanation.match(1.0f, "description", Collections.emptySet());
GetResult getResult = new GetResult(null, null, null, 0, 1, -1, true, new BytesArray("{ \"field1\" : " +
"\"value1\", \"field2\":\"value2\"}"), singletonMap("field1", new DocumentField("field1",
singletonList("value1"))));
singletonList("value1"), false)));
ExplainResponse response = new ExplainResponse(index, type, id, exist, explanation, getResult);

XContentBuilder builder = XContentFactory.contentBuilder(XContentType.JSON);
Expand Down
Expand Up @@ -94,7 +94,7 @@ public void testToXContent() {
{
GetResponse getResponse = new GetResponse(new GetResult("index", "type", "id", 0, 1, 1, true, new BytesArray("{ \"field1\" : " +
"\"value1\", \"field2\":\"value2\"}"), Collections.singletonMap("field1", new DocumentField("field1",
Collections.singletonList("value1")))));
Collections.singletonList("value1"), false))));
String output = Strings.toString(getResponse);
assertEquals("{\"_index\":\"index\",\"_type\":\"type\",\"_id\":\"id\",\"_version\":1,\"_seq_no\":0,\"_primary_term\":1," +
"\"found\":true,\"_source\":{ \"field1\" : \"value1\", \"field2\":\"value2\"},\"fields\":{\"field1\":[\"value1\"]}}",
Expand All @@ -110,7 +110,7 @@ public void testToXContent() {
public void testToString() {
GetResponse getResponse = new GetResponse(new GetResult("index", "type", "id", 0, 1, 1, true,
new BytesArray("{ \"field1\" : " + "\"value1\", \"field2\":\"value2\"}"),
Collections.singletonMap("field1", new DocumentField("field1", Collections.singletonList("value1")))));
Collections.singletonMap("field1", new DocumentField("field1", Collections.singletonList("value1"), false))));
assertEquals("{\"_index\":\"index\",\"_type\":\"type\",\"_id\":\"id\",\"_version\":1,\"_seq_no\":0,\"_primary_term\":1," +
"\"found\":true,\"_source\":{ \"field1\" : \"value1\", \"field2\":\"value2\"},\"fields\":{\"field1\":[\"value1\"]}}",
getResponse.toString());
Expand Down
Expand Up @@ -104,7 +104,7 @@ void sendExecuteMultiSearch(MultiSearchRequest request, SearchTask task, ActionL
};

SearchHits hits = new SearchHits(new SearchHit[]{new SearchHit(1, "ID", new Text("type"),
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(collapseValue))))},
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(collapseValue), false)))},
new TotalHits(1, TotalHits.Relation.EQUAL_TO), 1.0F);
InternalSearchResponse internalSearchResponse = new InternalSearchResponse(hits, null, null, null, false, null, 1);
AtomicReference<SearchResponse> reference = new AtomicReference<>();
Expand Down Expand Up @@ -158,9 +158,9 @@ void sendExecuteMultiSearch(MultiSearchRequest request, SearchTask task, ActionL
};

SearchHits hits = new SearchHits(new SearchHit[]{new SearchHit(1, "ID", new Text("type"),
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(collapseValue)))),
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(collapseValue), false))),
new SearchHit(2, "ID2", new Text("type"),
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(collapseValue))))},
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(collapseValue), false)))},
new TotalHits(1, TotalHits.Relation.EQUAL_TO), 1.0F);
InternalSearchResponse internalSearchResponse = new InternalSearchResponse(hits, null, null, null, false, null, 1);
AtomicReference<SearchResponse> reference = new AtomicReference<>();
Expand Down Expand Up @@ -190,9 +190,9 @@ void sendExecuteMultiSearch(MultiSearchRequest request, SearchTask task, ActionL
};

SearchHits hits = new SearchHits(new SearchHit[]{new SearchHit(1, "ID", new Text("type"),
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(null)))),
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(null), false))),
new SearchHit(2, "ID2", new Text("type"),
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(null))))},
Collections.singletonMap("someField", new DocumentField("someField", Collections.singletonList(null), false)))},
new TotalHits(1, TotalHits.Relation.EQUAL_TO), 1.0F);
InternalSearchResponse internalSearchResponse = new InternalSearchResponse(hits, null, null, null, false, null, 1);
AtomicReference<SearchResponse> reference = new AtomicReference<>();
Expand Down
Expand Up @@ -560,7 +560,7 @@ public void testRoutingExtraction() throws Exception {
assertNull(UpdateHelper.calculateRouting(getResult, indexRequest));

Map<String, DocumentField> fields = new HashMap<>();
fields.put("_routing", new DocumentField("_routing", Collections.singletonList("routing1")));
fields.put("_routing", new DocumentField("_routing", Collections.singletonList("routing1"), false));

// Doc exists and has the parent and routing fields
getResult = new GetResult("test", "type", "1", 0, 1, 0, true, null, fields);
Expand Down
Expand Up @@ -69,8 +69,8 @@ public void testToXContent() throws IOException {
{
BytesReference source = new BytesArray("{\"title\":\"Book title\",\"isbn\":\"ABC-123\"}");
Map<String, DocumentField> fields = new HashMap<>();
fields.put("title", new DocumentField("title", Collections.singletonList("Book title")));
fields.put("isbn", new DocumentField("isbn", Collections.singletonList("ABC-123")));
fields.put("title", new DocumentField("title", Collections.singletonList("Book title"), false));
fields.put("isbn", new DocumentField("isbn", Collections.singletonList("ABC-123"), false));

UpdateResponse updateResponse = new UpdateResponse(new ReplicationResponse.ShardInfo(3, 2),
new ShardId("books", "books_uuid", 2), "book", "1", 7, 17, 2, UPDATED);
Expand Down