Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ignored field values to synthetic source #107567

Merged
merged 26 commits into from Apr 26, 2024

Conversation

kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented Apr 17, 2024

This change introduces IgnoredSourceFieldMapper to track ignored field names and values, and extends ObjectMapper to access these while constructing synthetic source.

IgnoredSourceFieldMapper relies on the generic logic for parsing and writing various supported tokens. This logic is moved to XContentDataHelper to be properly shared with IgnoreMalformedStoredValues.

Related to #106825

@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

@kkrik-es kkrik-es requested review from martijnvg and lkts April 18, 2024 10:41
@kkrik-es kkrik-es marked this pull request as ready for review April 18, 2024 10:42
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, i have left some suggestions. One note i have is that it feels kind of incidental that IgnoredValuesFieldMapper is actually a mapper. I wonder if we can implement it as some kind of helper instead and it would be simpler. I don't have a good suggestion here though.

# Conflicts:
#	server/src/main/java/module-info.java
#	server/src/main/resources/META-INF/services/org.elasticsearch.features.FeatureSpecification
@@ -345,6 +364,17 @@ public final boolean addDynamicMapper(Mapper mapper) {
int additionalFieldsToAdd = getNewFieldsSize() + mapperSize;
if (indexSettings().isIgnoreDynamicFieldsBeyondLimit()) {
if (mappingLookup.exceedsLimit(indexSettings().getMappingTotalFieldsLimit(), additionalFieldsToAdd)) {
if (indexSettings().getMode().isSyntheticSourceEnabled() || SourceFieldMapper.isSynthetic(mappingLookup)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea to only apply ignored values for when maximum number of fields have been exceeded in this pr? Or also doing this for when for example object field has been disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we add more cases incrementally, to keep this one short.

@@ -96,6 +96,14 @@ private static SourceFieldMapper toType(FieldMapper in) {
return (SourceFieldMapper) in;
}

public static boolean isSynthetic(MappingLookup mappingLookup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed? I think mappingLookup.isSourceSynthetic() can used instead?

new IgnoredValuesFieldMapper.Values(mapper.name(), parentOffset, FieldDataParseHelper.encodeToken(parser()))
);
} catch (IOException e) {
throw new IllegalArgumentException("failed to parse field [" + mapper.name() + " ]", e);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to throwing an exception here, fyi.


@Override
public void postParse(DocumentParserContext context) {
for (Values values : context.getIgnoredFieldValues()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an assert here that check that if synthetic source is disabled then there are no ignored field values?
Something like context.mappingLookup().isSourceSynthetic() || (context.mappingLookup().isSourceSynthetic() == false && context.getIgnoredFieldValues().isEmpty())

* This overlaps with {@link IgnoredFieldMapper} that tracks just the ignored field names. It's worth evaluating
* if we can replace it for all use cases to avoid duplication, assuming that the storage tradeoff is favorable.
*/
public class IgnoredValuesFieldMapper extends MetadataFieldMapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the purpose of the class is to store field values that would be ignored if synthetic source is enabled, maybe IgnoredSyntheticSourceValues is a better name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignored fields and values are only used for synthetic source, by definition. For instance, IgnoreMalformedStoredValues and IgnoredFieldMapper don't mention synthetic source, even though they're tied to it. I'd say we leave it as is, for simplicity and consistency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IgnoredFieldMapper can be used outside the context of synthetic source. Also when source is stored.
This enumerates the fields that have not been indexed, but are available in the source.

IgnoreMalformedStoredValues isn't a field mapper, but more of a helper class to deal reading/writing malformed field values. It is only used in the context of synthetic source. I think MalformedSyntheticSourceValues is a better name for this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that field mappers can generally support synthetic source. Adding that to the class name feels leaky; how the values get used is up to the callers of this class. I'm also not a big fan of longer names unless they really help with disambiguation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another attempt :), what about IgnoredSourceFieldMapper? It is shorter than the other name proposal, and ties it to source, this field mapper keeps track of pieces of the _source that ended being ignored.

}

public void trackObjectsWithIgnoredFields() {
if (values == null || values.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use null for signaling that nothing needs to happen? That way the values field doesn't need to be initialized with en empty list and there is nu need to set values to null here at line 158?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating here. I think it is getting close.

import java.nio.charset.StandardCharsets;
import java.util.Arrays;

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FieldData part of the name of this class is confusing with the field data abstraction that was use in search for scripting, sorting and aggregations. Maybe XContentDataHelper is a better name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think we should make this class package protected. Its only users are in the org.elasticsearch.index.mapper package.

byte[] nameBytes = values.name.getBytes(StandardCharsets.UTF_8);
byte[] bytes = new byte[4 + nameBytes.length + values.value.length];
ByteUtils.writeIntLE(values.name.length() + PARENT_OFFSET_IN_NAME_OFFSET * values.parentOffset, bytes, 0);
System.arraycopy(nameBytes, 0, bytes, 4, nameBytes.length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could reuse the name from _ignored doc values field (Salvatore's pr will store it as doc values instead of stored fields). We end up storing it in _ignored too if number of fields is exceeded.

I think this is tricky, because then we don't know which value from _ignored field belongs to a value from_ignored_values field. In case of multi values for the same field, doc values store in alphabetic order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there's certain duplication here.. I added a note about this in the javadoc. I think this is good for now, let's get some mileage for this and optimize if we find out it's an issue in practice - ignored fields should be the exception after initial setup..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can keep this in the back of our minds, and consider solutions to reduce storage for the two meta fields.

* This overlaps with {@link IgnoredFieldMapper} that tracks just the ignored field names. It's worth evaluating
* if we can replace it for all use cases to avoid duplication, assuming that the storage tradeoff is favorable.
*/
public class IgnoredValuesFieldMapper extends MetadataFieldMapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another attempt :), what about IgnoredSourceFieldMapper? It is shorter than the other name proposal, and ties it to source, this field mapper keeps track of pieces of the _source that ended being ignored.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments, LGTM otherwise.

// (N % PARENT_OFFSET_IN_NAME_OFFSET)
private static final int PARENT_OFFSET_IN_NAME_OFFSET = 1 << 16;

public static final String NAME = "_ignored_values";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also rename _ignored_values to _ignored_source?


public static final TypeParser PARSER = new FixedTypeParser(context -> INSTANCE);

static final NodeFeature TRACK_IGNORED_VALUES = new NodeFeature("mapper.track_ignored_values");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and rename this constant as well?

@kkrik-es kkrik-es merged commit 3183e6d into elastic:main Apr 26, 2024
14 checks passed
dnhatn added a commit that referenced this pull request Apr 26, 2024
@kkrik-es kkrik-es deleted the fix/synthetic-source/object branch April 29, 2024 11:54
walterra added a commit to elastic/kibana that referenced this pull request May 10, 2024
…83110)

## Summary

Fixes #182837.
Fixes #182514.

The number of fields returned by the field caps API is different across
ES versions in forward compatibility tests. In ES 8.15.0, the
`_ignored_source` field was added
(elastic/elasticsearch#107567). This fixes the
API integration test for field caps to assert the correct number of
fields across versions. Note that in Kibana `8.15` we refactored away
from using fields caps directly in this way and removed the
corresponding API endpoint and tests
(#182588), that's why there's this
dedicated `7.17` PR to just fix the assertions on the existing test.

To test this locally, the following commands for the functional tests
server and runner can be used to run the tests in different forward
compatibility scenarios:

```
# 7.17 tests server
node scripts/functional_tests_server.js --config x-pack/test/api_integration/config.ts
# 7.17 tests runner
node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts

# 8.14 tests server
ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.14.0/manifest-latest-verified.json" node scripts/functional_tests_server.js
# 8.14 tests runner
node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.14.0-SNAPSHOT

# 8.15 tests server
ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.15.0/manifest-latest-verified.json" node scripts/functional_tests_server.js
# 8.15 tests runner
node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.15.0-SNAPSHOT
```

Note in `7.17` the API integration tests are not split up yet into
several configs so the commands above will run ALL Kibaan API
integration tests.

The command to run the tests server for a specific ES version is also
shared in the buildkite reports, for example:
https://buildkite.com/elastic/kibana-7-dot-17-es-8-dot-15-forward-compatibility/builds/20#annotation-es-snapshot-manifest

The versions the compatibility tests will currently run against can be
found here: https://github.com/elastic/kibana/blob/main/versions.json

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
lkts added a commit that referenced this pull request May 21, 2024
This PR uses infrastructure from #107567 to implement a fallback implementation of synthetic source for field mappers that don't support it natively. In that case we will store source of such field as is in a separate stored field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants