-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fingerprint/composite field types #84282
Comments
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
I've tagged both search and analytics teams as this touches on areas covered by both. Each team can discuss and leave their thoughts here. |
Do you anticipate a need for multiple fingerprints per document? If not this is what we are basically doing with |
@imotov yeah, I think so. Eg for the service inventory we might only need service name + env, but then when drilling down into the service detail page, we'd like to add transaction type and maybe host name. |
I'd be curious to see a picture of the thing you are building with the results here. We sure can build fingerprint fields if its the right thing. But maybe the right thing is to make multi-field terms agg faster. |
@nik9000 the thing that started this discussion was that we are experimenting with populating the service inventory (our landing page that has a list of all APM services) with the terms enum API to speed up perceived performance. However, one drawback there is that we'd like to filter on/group by environment, and the terms enum API will only return values for a single field. That is something that the multi terms agg cannot solve I think, though I am all in favor of a speed boost for the multi terms agg. We do some cases where we use a nested terms agg instead of multi terms because the former is a lot faster, and multi terms should be the more appropriate agg, in theory. |
Another thing I'm wondering about is: suppose we have such a field, on three different fields, e.g. on service.name, service.environment and transaction.type - I'd like to run a terms agg on two of the fields, which would mean that ES would have to merge buckets in the reduce phase - is something like that a reasonable thing to do w/ a field type like this? Maybe that's more of a TSDB thing though. |
Es could merge the buckets when reading I think. I'm not sure the exact
mechanics, but it should be possible.
…On Tue, Mar 1, 2022, 3:33 AM Dario Gieselaar ***@***.***> wrote:
Another thing I'm wondering about is: suppose we have such a field, on
three different fields, e.g. on service.name, service.environment and
transaction.type - I'd like to run a terms agg on two of the fields, which
would mean that ES would have to merge buckets in the reduce phase - is
something like that a reasonable thing to do w/ a field type like this?
—
Reply to this email directly, view it on GitHub
<#84282 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXIVXWNSZFCEKUGXCQRDU5XI4NANCNFSM5PEJ4W3A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
In the APM app (and probably in Observability in general) we sometimes use the composite of multiple fields as "keys" for a certain timeseries. E.g., we might use a nested terms aggregation on service.name + service.environment. There are several downsides to this approach currently:
We also use the terms enum API to get a list of service names fast. However, we cannot use this for multiple fields.
One workaround would be to add an ingest processor that "fingerprints" values from multiple fields into a single keyword, and use that to aggregate over this field. However, this comes with the downside of us having to come up with a serialization/deserialization logic.
Ideally, ES can help us here by adding a field type for this purpose - I'm using
fingerprint
here because a composite field type is already a thing in ES, but the name is probably not the best. The mapping could look as follows:Suppose that we run a terms aggregation on
service.id
:Elasticsearch would return the composite values as follows:
Or, when we call the terms enum API (which would have to be a breaking change, I guess?):
The text was updated successfully, but these errors were encountered: