Skip to content

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Oct 11, 2024

Here we check for the existence of a host.name field in index sort settings
when the index mode is logsdb and decide to inject the field in the mapping
depending on whether it exists or not. By default host.name is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the host.name field only if strictly
required. A user who wants to override index sort settings without including
a host.name field would be able to do so without finding an additional
host.name field in the mappings (injected automatically). If users override the
sort settings and a host.name field is not included we don't need
to inject such field since sorting does not require it anymore.

As a result of this change we have the following:

  • the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for logsdb)
  • the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings

Note also that all sort settings index.sort.* are final which means doing this
check once, when mappings are merged at template composition time, is enough.

@salvatore-campagna salvatore-campagna self-assigned this Oct 11, 2024
@salvatore-campagna salvatore-campagna changed the title Inject the host.name field only if required for sorting for logsdb index mode. Inject the host.name field only if required for sorting when using logsdb index mode. Oct 11, 2024
@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @salvatore-campagna.

return DEFAULT_LOGS_TIMESTAMP_MAPPING;
public CompressedXContent getDefaultMapping(final IndexSettings indexSettings) {
final IndexSortConfig.FieldSortSpec[] fieldSortSpecs = indexSettings.getIndexSortConfig().sortSpecs;
if (fieldSortSpecs == null || fieldSortSpecs.length == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this check because the index sort for logsdb should already be injected in the IndexSortConfig constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need this for the case if index.sort is set to null?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to be extra cautious and make sure we always inject something if there is no sort configuration...but I think this is needed in case there is no sort configuration. EVen if it might not make sense a user can specify empty sort config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a test where index sort config is an empty array

}

public static final CompressedXContent DEFAULT_TIME_SERIES_TIMESTAMP_MAPPING;
private static CompressedXContent createMapping(boolean includeHostName) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we add default to the method name? I think createMapping is quite generic and might cause confusion.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a testing comment, LGTM otherwise.

return DEFAULT_LOGS_TIMESTAMP_MAPPING;
public CompressedXContent getDefaultMapping(final IndexSettings indexSettings) {
final IndexSortConfig.FieldSortSpec[] fieldSortSpecs = indexSettings.getIndexSortConfig().sortSpecs;
if (fieldSortSpecs == null || fieldSortSpecs.length == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need this for the case if index.sort is set to null?

- match: { .$backing_index.mappings.properties.agent.properties.id.type: keyword }
- match: { .$backing_index.mappings.properties.host.properties.name.type: text }
- match: { .$backing_index.mappings.properties.host.properties.id.type: long }
- match: { .$backing_index.mappings.properties.host.properties.name.ignore_above: null }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host.* fields are introduced here via dynamic mapping updates, right? Maybe remove host.name field from the data and assert that no host.name exist?


- match: { error.type: "illegal_argument_exception" }
- match: { error.reason: "composable template [logsdb-index-template] template after composition with component templates [logsdb-mappings] is invalid" }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also introduce a test case where we only sort by @timestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...some integration like APM does it.

}

for (final IndexSortConfig.FieldSortSpec fieldSortSpec : fieldSortSpecs) {
if (HOST_NAME.equals(fieldSortSpec.field)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user provides an override with "hostname" or "attributes.host.name"?

I think we should just skip injecting anything other than timestamp when index sorting is explicitly set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tested this right? Can you be more precise...what do you mean with override?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An override is an explicit setting of index sorting. I'm trying to understand why this loop is needed, is it to pick the default setting from IndexSortConfig, as mentioned above?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop is required because the sort field is normally a list of fields...all index.sort.* settings are used as "parallel" arrays. The host.name or any other field could be the first, second,...you can look at IndexSortConfig.

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used to merge default mappings with the result of template composition...which happens taking into account priorities and fields defined by our standard templates or custom templates. We need to provide them anyway because then whether they are used or not will depend on the composition logic. Also, injecting the host.name field is done already if no sort configuration is provided by the user. There is a difference between injecting a sort field and injecting its mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go for the original plan got the host.name field...we can't really detect in the getDefault method of IndexMode if the field was added to index sorting by us (injecting it in IndexSortConfig or if it was the user explicitly adding it by means of index.sort.field .
In IndexSortConfig we already inject the host.name field if the sort config is empty...later on in getDefault we just make sure the field is mapped because otherwise sorting would not be possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way. But could we do something like this?

if (indexSettings.getSettings().hasValue(IndexSortConfig.INDEX_SORT_FIELD_SETTING.getKey())) {
    return DEFAULT_LOGS_TIMESTAMP_MAPPING;
} else {
    return DEFAULT_LOGS_TIMESTAMP_MAPPING_WITH_HOSTNAME;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like that works. We don't actually inject an index setting in LogsDB, we just apply a default in IndexSortConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way. But could we do something like this?

if (indexSettings.getSettings().hasValue(IndexSortConfig.INDEX_SORT_FIELD_SETTING.getKey())) {
    return DEFAULT_LOGS_TIMESTAMP_MAPPING;
} else {
    return DEFAULT_LOGS_TIMESTAMP_MAPPING_WITH_HOSTNAME;
}

I think that would not work if we have something like

            settings:
              index:
                sort.field: [ ]
                sort.order: [ ]
                mode: logsdb

I imagine that this is kind of a corner case...but I would be careful just in case this is automatically generated in some way and an empty array is used for some reason.

@felixbarny
Copy link
Member

I hope this can be back ported to 8.16.0 as it is causing issues with being able to pass through resource.attributes.host.name for OTel.

@salvatore-campagna salvatore-campagna requested a review from a team as a code owner October 16, 2024 09:28
@salvatore-campagna salvatore-campagna changed the title Inject the host.name field only if required for sorting when using logsdb index mode. Inject the host.name field mapping only if required when using logsdb index mode. Oct 16, 2024
@salvatore-campagna salvatore-campagna changed the title Inject the host.name field mapping only if required when using logsdb index mode. Inject the host.name field mapping only if required for logsdb index mode Oct 16, 2024
@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna salvatore-campagna added v8.16.0 auto-backport Automatically create backport pull requests when merged labels Oct 16, 2024
@salvatore-campagna salvatore-campagna merged commit 9bf6e3b into elastic:main Oct 16, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 114573

salvatore-campagna added a commit to salvatore-campagna/elasticsearch that referenced this pull request Oct 16, 2024
…dex mode (elastic#114573)

Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.

As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings

Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.

(cherry picked from commit 9bf6e3b)
@salvatore-campagna
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

javanna pushed a commit to javanna/elasticsearch that referenced this pull request Oct 16, 2024
…dex mode (elastic#114573)

Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.

As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings

Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.
salvatore-campagna added a commit that referenced this pull request Oct 16, 2024
…dex mode (#114573) (#114916)

Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.

As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings

Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.

(cherry picked from commit 9bf6e3b)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
@felixbarny
Copy link
Member

Unfortunately, the back port was merged into the 8.x branch after the 8.16 branch has been created. Could we create another back port for the 8.16 branch, given that this fixes the bug of not being able to use the host.name passthrough for OTel?

salvatore-campagna added a commit to salvatore-campagna/elasticsearch that referenced this pull request Oct 16, 2024
…dex mode (elastic#114573)

Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.

As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings

Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.

(cherry picked from commit 9bf6e3b)
@salvatore-campagna
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.16

Questions ?

Please refer to the Backport tool documentation

salvatore-campagna added a commit that referenced this pull request Oct 16, 2024
…dex mode (#114573) (#114936)

Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.

As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings

Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.

(cherry picked from commit 9bf6e3b)
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
…dex mode (elastic#114573)

Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.

As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings

Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
…dex mode (elastic#114573)

Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.

As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings

Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >non-issue :StorageEngine/Logs You know, for Logs Team:StorageEngine v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants