New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ignored field values to synthetic source #107567
Conversation
Hi @kkrik-es, I've created a changelog YAML for you. |
…o fix/synthetic-source/object
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, i have left some suggestions. One note i have is that it feels kind of incidental that IgnoredValuesFieldMapper
is actually a mapper. I wonder if we can implement it as some kind of helper instead and it would be simpler. I don't have a good suggestion here though.
...pi-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/20_synthetic_source.yml
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/FieldDataParseHelper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/FieldDataParseHelper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
# Conflicts: # server/src/main/java/module-info.java # server/src/main/resources/META-INF/services/org.elasticsearch.features.FeatureSpecification
@@ -345,6 +364,17 @@ public final boolean addDynamicMapper(Mapper mapper) { | |||
int additionalFieldsToAdd = getNewFieldsSize() + mapperSize; | |||
if (indexSettings().isIgnoreDynamicFieldsBeyondLimit()) { | |||
if (mappingLookup.exceedsLimit(indexSettings().getMappingTotalFieldsLimit(), additionalFieldsToAdd)) { | |||
if (indexSettings().getMode().isSyntheticSourceEnabled() || SourceFieldMapper.isSynthetic(mappingLookup)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea to only apply ignored values for when maximum number of fields have been exceeded in this pr? Or also doing this for when for example object field has been disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we add more cases incrementally, to keep this one short.
@@ -96,6 +96,14 @@ private static SourceFieldMapper toType(FieldMapper in) { | |||
return (SourceFieldMapper) in; | |||
} | |||
|
|||
public static boolean isSynthetic(MappingLookup mappingLookup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed? I think mappingLookup.isSourceSynthetic()
can used instead?
server/src/test/java/org/elasticsearch/index/mapper/DocCountFieldMapperTests.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/FieldDataParseHelper.java
Outdated
Show resolved
Hide resolved
new IgnoredValuesFieldMapper.Values(mapper.name(), parentOffset, FieldDataParseHelper.encodeToken(parser())) | ||
); | ||
} catch (IOException e) { | ||
throw new IllegalArgumentException("failed to parse field [" + mapper.name() + " ]", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to throwing an exception here, fyi.
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/DocCountFieldMapperTests.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public void postParse(DocumentParserContext context) { | ||
for (Values values : context.getIgnoredFieldValues()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add an assert here that check that if synthetic source is disabled then there are no ignored field values?
Something like context.mappingLookup().isSourceSynthetic() || (context.mappingLookup().isSourceSynthetic() == false && context.getIgnoredFieldValues().isEmpty())
* This overlaps with {@link IgnoredFieldMapper} that tracks just the ignored field names. It's worth evaluating | ||
* if we can replace it for all use cases to avoid duplication, assuming that the storage tradeoff is favorable. | ||
*/ | ||
public class IgnoredValuesFieldMapper extends MetadataFieldMapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the purpose of the class is to store field values that would be ignored if synthetic source is enabled, maybe IgnoredSyntheticSourceValues
is a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignored fields and values are only used for synthetic source, by definition. For instance, IgnoreMalformedStoredValues
and IgnoredFieldMapper
don't mention synthetic source, even though they're tied to it. I'd say we leave it as is, for simplicity and consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IgnoredFieldMapper
can be used outside the context of synthetic source. Also when source is stored.
This enumerates the fields that have not been indexed, but are available in the source.
IgnoreMalformedStoredValues
isn't a field mapper, but more of a helper class to deal reading/writing malformed field values. It is only used in the context of synthetic source. I think MalformedSyntheticSourceValues
is a better name for this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that field mappers can generally support synthetic source. Adding that to the class name feels leaky; how the values get used is up to the callers of this class. I'm also not a big fan of longer names unless they really help with disambiguation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another attempt :), what about IgnoredSourceFieldMapper
? It is shorter than the other name proposal, and ties it to source, this field mapper keeps track of pieces of the _source that ended being ignored.
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/DocCountFieldMapperTests.java
Show resolved
Hide resolved
} | ||
|
||
public void trackObjectsWithIgnoredFields() { | ||
if (values == null || values.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just use null
for signaling that nothing needs to happen? That way the values
field doesn't need to be initialized with en empty list and there is nu need to set values
to null here at line 158?
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
…o fix/synthetic-source/object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating here. I think it is getting close.
import java.nio.charset.StandardCharsets; | ||
import java.util.Arrays; | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FieldData part of the name of this class is confusing with the field data abstraction that was use in search for scripting, sorting and aggregations. Maybe XContentDataHelper
is a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think we should make this class package protected. Its only users are in the org.elasticsearch.index.mapper
package.
byte[] nameBytes = values.name.getBytes(StandardCharsets.UTF_8); | ||
byte[] bytes = new byte[4 + nameBytes.length + values.value.length]; | ||
ByteUtils.writeIntLE(values.name.length() + PARENT_OFFSET_IN_NAME_OFFSET * values.parentOffset, bytes, 0); | ||
System.arraycopy(nameBytes, 0, bytes, 4, nameBytes.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could reuse the name from _ignored
doc values field (Salvatore's pr will store it as doc values instead of stored fields). We end up storing it in _ignored too if number of fields is exceeded.
I think this is tricky, because then we don't know which value from _ignored
field belongs to a value from_ignored_values
field. In case of multi values for the same field, doc values store in alphabetic order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there's certain duplication here.. I added a note about this in the javadoc. I think this is good for now, let's get some mileage for this and optimize if we find out it's an issue in practice - ignored fields should be the exception after initial setup..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we can keep this in the back of our minds, and consider solutions to reduce storage for the two meta fields.
* This overlaps with {@link IgnoredFieldMapper} that tracks just the ignored field names. It's worth evaluating | ||
* if we can replace it for all use cases to avoid duplication, assuming that the storage tradeoff is favorable. | ||
*/ | ||
public class IgnoredValuesFieldMapper extends MetadataFieldMapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another attempt :), what about IgnoredSourceFieldMapper
? It is shorter than the other name proposal, and ties it to source, this field mapper keeps track of pieces of the _source that ended being ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor comments, LGTM otherwise.
// (N % PARENT_OFFSET_IN_NAME_OFFSET) | ||
private static final int PARENT_OFFSET_IN_NAME_OFFSET = 1 << 16; | ||
|
||
public static final String NAME = "_ignored_values"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also rename _ignored_values
to _ignored_source
?
|
||
public static final TypeParser PARSER = new FixedTypeParser(context -> INSTANCE); | ||
|
||
static final NodeFeature TRACK_IGNORED_VALUES = new NodeFeature("mapper.track_ignored_values"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and rename this constant as well?
server/src/test/java/org/elasticsearch/index/mapper/XContentDataHelperTests.java
Outdated
Show resolved
Hide resolved
…83110) ## Summary Fixes #182837. Fixes #182514. The number of fields returned by the field caps API is different across ES versions in forward compatibility tests. In ES 8.15.0, the `_ignored_source` field was added (elastic/elasticsearch#107567). This fixes the API integration test for field caps to assert the correct number of fields across versions. Note that in Kibana `8.15` we refactored away from using fields caps directly in this way and removed the corresponding API endpoint and tests (#182588), that's why there's this dedicated `7.17` PR to just fix the assertions on the existing test. To test this locally, the following commands for the functional tests server and runner can be used to run the tests in different forward compatibility scenarios: ``` # 7.17 tests server node scripts/functional_tests_server.js --config x-pack/test/api_integration/config.ts # 7.17 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts # 8.14 tests server ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.14.0/manifest-latest-verified.json" node scripts/functional_tests_server.js # 8.14 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.14.0-SNAPSHOT # 8.15 tests server ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.15.0/manifest-latest-verified.json" node scripts/functional_tests_server.js # 8.15 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.15.0-SNAPSHOT ``` Note in `7.17` the API integration tests are not split up yet into several configs so the commands above will run ALL Kibaan API integration tests. The command to run the tests server for a specific ES version is also shared in the buildkite reports, for example: https://buildkite.com/elastic/kibana-7-dot-17-es-8-dot-15-forward-compatibility/builds/20#annotation-es-snapshot-manifest The versions the compatibility tests will currently run against can be found here: https://github.com/elastic/kibana/blob/main/versions.json ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
This PR uses infrastructure from #107567 to implement a fallback implementation of synthetic source for field mappers that don't support it natively. In that case we will store source of such field as is in a separate stored field.
This change introduces
IgnoredSourceFieldMapper
to track ignored field names and values, and extendsObjectMapper
to access these while constructing synthetic source.IgnoredSourceFieldMapper
relies on the generic logic for parsing and writing various supported tokens. This logic is moved toXContentDataHelper
to be properly shared withIgnoreMalformedStoredValues
.Related to #106825