Add ignored field values to synthetic source #107567

kkrik-es · 2024-04-17T14:02:06Z

This change introduces IgnoredSourceFieldMapper to track ignored field names and values, and extends ObjectMapper to access these while constructing synthetic source.

IgnoredSourceFieldMapper relies on the generic logic for parsing and writing various supported tokens. This logic is moved to XContentDataHelper to be properly shared with IgnoreMalformedStoredValues.

Related to #106825

elasticsearchmachine · 2024-04-17T14:02:52Z

Hi @kkrik-es, I've created a changelog YAML for you.

…o fix/synthetic-source/object

elasticsearchmachine · 2024-04-18T10:42:24Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

lkts

Overall LGTM, i have left some suggestions. One note i have is that it feels kind of incidental that IgnoredValuesFieldMapper is actually a mapper. I wonder if we can implement it as some kind of helper instead and it would be simpler. I don't have a good suggestion here though.

...pi-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/20_synthetic_source.yml

server/src/main/java/org/elasticsearch/index/mapper/FieldDataParseHelper.java

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

# Conflicts: # server/src/main/java/module-info.java # server/src/main/resources/META-INF/services/org.elasticsearch.features.FeatureSpecification

martijnvg · 2024-04-23T09:34:17Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java

@@ -345,6 +364,17 @@ public final boolean addDynamicMapper(Mapper mapper) {
            int additionalFieldsToAdd = getNewFieldsSize() + mapperSize;
            if (indexSettings().isIgnoreDynamicFieldsBeyondLimit()) {
                if (mappingLookup.exceedsLimit(indexSettings().getMappingTotalFieldsLimit(), additionalFieldsToAdd)) {
+                    if (indexSettings().getMode().isSyntheticSourceEnabled() || SourceFieldMapper.isSynthetic(mappingLookup)) {


Is the idea to only apply ignored values for when maximum number of fields have been exceeded in this pr? Or also doing this for when for example object field has been disabled?

I'd say we add more cases incrementally, to keep this one short.

martijnvg · 2024-04-23T09:35:56Z

server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java

@@ -96,6 +96,14 @@ private static SourceFieldMapper toType(FieldMapper in) {
        return (SourceFieldMapper) in;
    }

+    public static boolean isSynthetic(MappingLookup mappingLookup) {


I don't think this is needed? I think mappingLookup.isSourceSynthetic() can used instead?

server/src/test/java/org/elasticsearch/index/mapper/DocCountFieldMapperTests.java

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

server/src/main/java/org/elasticsearch/index/mapper/FieldDataParseHelper.java

kkrik-es · 2024-04-23T10:26:48Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java

+                                new IgnoredValuesFieldMapper.Values(mapper.name(), parentOffset, FieldDataParseHelper.encodeToken(parser()))
+                            );
+                        } catch (IOException e) {
+                            throw new IllegalArgumentException("failed to parse field [" + mapper.name() + " ]", e);


Switched to throwing an exception here, fyi.

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

server/src/test/java/org/elasticsearch/index/mapper/DocCountFieldMapperTests.java

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

martijnvg · 2024-04-25T08:13:51Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

+
+    @Override
+    public void postParse(DocumentParserContext context) {
+        for (Values values : context.getIgnoredFieldValues()) {


maybe add an assert here that check that if synthetic source is disabled then there are no ignored field values?
Something like context.mappingLookup().isSourceSynthetic() || (context.mappingLookup().isSourceSynthetic() == false && context.getIgnoredFieldValues().isEmpty())

martijnvg · 2024-04-25T08:15:21Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

+ * This overlaps with {@link IgnoredFieldMapper} that tracks just the ignored field names. It's worth evaluating
+ * if we can replace it for all use cases to avoid duplication, assuming that the storage tradeoff is favorable.
+ */
+public class IgnoredValuesFieldMapper extends MetadataFieldMapper {


Given that the purpose of the class is to store field values that would be ignored if synthetic source is enabled, maybe IgnoredSyntheticSourceValues is a better name?

Ignored fields and values are only used for synthetic source, by definition. For instance, IgnoreMalformedStoredValues and IgnoredFieldMapper don't mention synthetic source, even though they're tied to it. I'd say we leave it as is, for simplicity and consistency?

IgnoredFieldMapper can be used outside the context of synthetic source. Also when source is stored.
This enumerates the fields that have not been indexed, but are available in the source.

IgnoreMalformedStoredValues isn't a field mapper, but more of a helper class to deal reading/writing malformed field values. It is only used in the context of synthetic source. I think MalformedSyntheticSourceValues is a better name for this class.

It seems to me that field mappers can generally support synthetic source. Adding that to the class name feels leaky; how the values get used is up to the callers of this class. I'm also not a big fan of longer names unless they really help with disambiguation.

Another attempt :), what about IgnoredSourceFieldMapper? It is shorter than the other name proposal, and ties it to source, this field mapper keeps track of pieces of the _source that ended being ignored.

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

server/src/test/java/org/elasticsearch/index/mapper/DocCountFieldMapperTests.java

martijnvg · 2024-04-25T08:32:20Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

+        }
+
+        public void trackObjectsWithIgnoredFields() {
+            if (values == null || values.isEmpty()) {


Maybe just use null for signaling that nothing needs to happen? That way the values field doesn't need to be initialized with en empty list and there is nu need to set values to null here at line 158?

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

…o fix/synthetic-source/object

martijnvg

Thanks for iterating here. I think it is getting close.

martijnvg · 2024-04-26T07:30:34Z

server/src/main/java/org/elasticsearch/index/mapper/FieldDataParseHelper.java

+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+
+/**


The FieldData part of the name of this class is confusing with the field data abstraction that was use in search for scripting, sorting and aggregations. Maybe XContentDataHelper is a better name?

Also I think we should make this class package protected. Its only users are in the org.elasticsearch.index.mapper package.

martijnvg · 2024-04-26T07:44:25Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

+        byte[] nameBytes = values.name.getBytes(StandardCharsets.UTF_8);
+        byte[] bytes = new byte[4 + nameBytes.length + values.value.length];
+        ByteUtils.writeIntLE(values.name.length() + PARENT_OFFSET_IN_NAME_OFFSET * values.parentOffset, bytes, 0);
+        System.arraycopy(nameBytes, 0, bytes, 4, nameBytes.length);


It would be nice if we could reuse the name from _ignored doc values field (Salvatore's pr will store it as doc values instead of stored fields). We end up storing it in _ignored too if number of fields is exceeded.

I think this is tricky, because then we don't know which value from _ignored field belongs to a value from_ignored_values field. In case of multi values for the same field, doc values store in alphabetic order.

Yeah there's certain duplication here.. I added a note about this in the javadoc. I think this is good for now, let's get some mileage for this and optimize if we find out it's an issue in practice - ignored fields should be the exception after initial setup..

Agreed, we can keep this in the back of our minds, and consider solutions to reduce storage for the two meta fields.

martijnvg · 2024-04-26T07:48:13Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java

+ * This overlaps with {@link IgnoredFieldMapper} that tracks just the ignored field names. It's worth evaluating
+ * if we can replace it for all use cases to avoid duplication, assuming that the storage tradeoff is favorable.
+ */
+public class IgnoredValuesFieldMapper extends MetadataFieldMapper {


Another attempt :), what about IgnoredSourceFieldMapper? It is shorter than the other name proposal, and ties it to source, this field mapper keeps track of pieces of the _source that ended being ignored.

martijnvg

Left a few minor comments, LGTM otherwise.

martijnvg · 2024-04-26T08:40:42Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredSourceFieldMapper.java

+    // (N % PARENT_OFFSET_IN_NAME_OFFSET)
+    private static final int PARENT_OFFSET_IN_NAME_OFFSET = 1 << 16;
+
+    public static final String NAME = "_ignored_values";


Also rename _ignored_values to _ignored_source?

martijnvg · 2024-04-26T08:42:04Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredSourceFieldMapper.java

+
+    public static final TypeParser PARSER = new FixedTypeParser(context -> INSTANCE);
+
+    static final NodeFeature TRACK_IGNORED_VALUES = new NodeFeature("mapper.track_ignored_values");


and rename this constant as well?

server/src/test/java/org/elasticsearch/index/mapper/XContentDataHelperTests.java

Relates #107567

…83110) ## Summary Fixes #182837. Fixes #182514. The number of fields returned by the field caps API is different across ES versions in forward compatibility tests. In ES 8.15.0, the `_ignored_source` field was added (elastic/elasticsearch#107567). This fixes the API integration test for field caps to assert the correct number of fields across versions. Note that in Kibana `8.15` we refactored away from using fields caps directly in this way and removed the corresponding API endpoint and tests (#182588), that's why there's this dedicated `7.17` PR to just fix the assertions on the existing test. To test this locally, the following commands for the functional tests server and runner can be used to run the tests in different forward compatibility scenarios: ``` # 7.17 tests server node scripts/functional_tests_server.js --config x-pack/test/api_integration/config.ts # 7.17 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts # 8.14 tests server ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.14.0/manifest-latest-verified.json" node scripts/functional_tests_server.js # 8.14 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.14.0-SNAPSHOT # 8.15 tests server ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.15.0/manifest-latest-verified.json" node scripts/functional_tests_server.js # 8.15 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.15.0-SNAPSHOT ``` Note in `7.17` the API integration tests are not split up yet into several configs so the commands above will run ALL Kibaan API integration tests. The command to run the tests server for a specific ES version is also shared in the buildkite reports, for example: https://buildkite.com/elastic/kibana-7-dot-17-es-8-dot-15-forward-compatibility/builds/20#annotation-es-snapshot-manifest The versions the compatibility tests will currently run against can be found here: https://github.com/elastic/kibana/blob/main/versions.json ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

This PR uses infrastructure from #107567 to implement a fallback implementation of synthetic source for field mappers that don't support it natively. In that case we will store source of such field as is in a separate stored field.

Add ignored field values to synthetic source

16cb2ee

kkrik-es added >enhancement Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings labels Apr 17, 2024

kkrik-es self-assigned this Apr 17, 2024

elasticsearchmachine added the v8.14.0 label Apr 17, 2024

Update docs/changelog/107567.yaml

5e24638

kkrik-es added 3 commits April 17, 2024 17:45

initialize map

b4a74c3

Merge remote-tracking branch 'origin/fix/synthetic-source/object' int…

8bec60d

…o fix/synthetic-source/object

yaml fix

9e43764

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

kkrik-es added 2 commits April 18, 2024 11:22

add node feature

528986e

add comments

6dd4080

kkrik-es requested review from martijnvg and lkts April 18, 2024 10:41

kkrik-es marked this pull request as ready for review April 18, 2024 10:42

Merge branch 'refs/heads/main' into fix/synthetic-source/object

2925e65

lkts reviewed Apr 18, 2024

View reviewed changes

kkrik-es added 4 commits April 22, 2024 14:20

small fixes

3bd67c1

missing cluster feature in yaml

d3caea1

Merge branch 'refs/heads/main' into fix/synthetic-source/object

cfb2601

# Conflicts: # server/src/main/java/module-info.java # server/src/main/resources/META-INF/services/org.elasticsearch.features.FeatureSpecification

constants for chars, stored fields

d74b7fa

martijnvg reviewed Apr 23, 2024

View reviewed changes

kkrik-es added 2 commits April 23, 2024 13:16

remove duplicate method

4c4f0a7

throw exception on parse failure

a210b7a

kkrik-es commented Apr 23, 2024

View reviewed changes

lkts reviewed Apr 23, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java Outdated Show resolved Hide resolved

remove Base64 encoding

09f2f77

martijnvg reviewed Apr 24, 2024

View reviewed changes

kkrik-es added 2 commits April 24, 2024 17:02

add assert on IgnoredValuesFieldMapper::write

6200b23

Merge branch 'refs/heads/main' into fix/synthetic-source/object

efddbe7

lkts approved these changes Apr 24, 2024

View reviewed changes

Merge branch 'elastic:main' into fix/synthetic-source/object

50d11aa

martijnvg reviewed Apr 25, 2024

View reviewed changes

kkrik-es added 4 commits April 25, 2024 14:18

changes from review

5389e58

Merge remote-tracking branch 'origin/fix/synthetic-source/object' int…

5940b5b

…o fix/synthetic-source/object

simplify logic

e2f1f69

add comment

ff8896c

martijnvg reviewed Apr 26, 2024

View reviewed changes

rename classes

cb85495

martijnvg approved these changes Apr 26, 2024

View reviewed changes

kkrik-es and others added 3 commits April 26, 2024 12:02

rename _ignored_values to _ignored_source

ace0559

rename _ignored_values to _ignored_source

3cc62dc

Merge branch 'elastic:main' into fix/synthetic-source/object

c05d4c9

kkrik-es merged commit 3183e6d into elastic:main Apr 26, 2024
14 checks passed

dnhatn mentioned this pull request Apr 26, 2024

Mute synthetic source YAML tests #107958

Merged

dnhatn added a commit that referenced this pull request Apr 26, 2024

Mute synthetic source YAML tests (#107958)

01cc967

Relates #107567

kkrik-es deleted the fix/synthetic-source/object branch April 29, 2024 11:54

kkrik-es mentioned this pull request Apr 30, 2024

Support ignore_dynamic_beyond_limit for all fields using synthetic source #106487

Closed

lkts mentioned this pull request May 2, 2024

Add generic fallback implementation for synthetic source #108222

Merged

walterra mentioned this pull request May 10, 2024

[ML] Anomaly Detection: Fix API integration tests for field caps. elastic/kibana#183110

Merged

2 tasks

lkts mentioned this pull request May 21, 2024

Support fields that use fallback synthetic source in ESQL #108883

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ignored field values to synthetic source #107567

Add ignored field values to synthetic source #107567

kkrik-es commented Apr 17, 2024 •

edited

elasticsearchmachine commented Apr 17, 2024

elasticsearchmachine commented Apr 18, 2024

lkts left a comment

martijnvg Apr 23, 2024

kkrik-es Apr 23, 2024

martijnvg Apr 23, 2024

kkrik-es Apr 23, 2024

martijnvg Apr 25, 2024

martijnvg Apr 25, 2024

kkrik-es Apr 25, 2024

martijnvg Apr 25, 2024

kkrik-es Apr 25, 2024

martijnvg Apr 26, 2024

martijnvg Apr 25, 2024

martijnvg left a comment

martijnvg Apr 26, 2024

martijnvg Apr 26, 2024

martijnvg Apr 26, 2024

kkrik-es Apr 26, 2024

martijnvg Apr 26, 2024

martijnvg Apr 26, 2024

martijnvg left a comment

martijnvg Apr 26, 2024

martijnvg Apr 26, 2024


		public static final TypeParser PARSER = new FixedTypeParser(context -> INSTANCE);

		static final NodeFeature TRACK_IGNORED_VALUES = new NodeFeature("mapper.track_ignored_values");

Add ignored field values to synthetic source #107567

Add ignored field values to synthetic source #107567

Conversation

kkrik-es commented Apr 17, 2024 • edited

elasticsearchmachine commented Apr 17, 2024

elasticsearchmachine commented Apr 18, 2024

lkts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkrik-es commented Apr 17, 2024 •

edited