-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Change ES tag schema to support tag search in Kibana #980
Conversation
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
We might choose different replacing character. Kibana uses
|
This mapping allows storing ~476 unique tag keys with the default EDIT @mabn @kacper-jackiewicz @Monnoroch do you know if raw field can be disabled? Or any possible workaround? |
Performance results https://github.com/pavolloffay/jaeger-perf-tests, 300k spans different limit parameter for jaeger-query
Limit 50000 https://pastebin.com/7tKiQnuC, https://pastebin.com/sbz2y9JP with node statistics Query time of limit 50000: mean 13363.90 milliseconds Report time with multiple queries: 191265.00 milliseconds 3.18 min The number of tags does not seem to affect query time. Results with model using tags as nested objects and tagsMap as object. The query runs on both fields. |
Performance of master/head of the same perf test Limit 50000 https://pastebin.com/BatPgLf6, https://pastebin.com/1pEV6PmY with node statistics Query time of limit 50000: mean = 12683.73 milliseconds Report time: The number of tags does not seem to affect query time |
@mabn the benchmark results does not indicate performance improvement when using object datatype instead of nested datatype for tags. |
tags := map[string]string{} | ||
for _, tag := range kvs { | ||
if strings.Contains(tag.Key, ":") { | ||
s.logger.Warn("Tag key contains \\':\\', at the query time it will be transformed to \\'.\\'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create a ton of spammy logs, one for each span. Perhaps, make it a Debug, or debounce it somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Monnoroch this is just POC to get some perf results, do not nit review please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, in that case the implementation makes sense apart from my question below. The only thing I would like is to have a way to configure the whitelisted/blacklisted tags somehow. The scenario I have in mind is you first would whitelist all of them and then if your service grows enough to have more and more tags you can blacklist some of them, and eventually move to a whitelist if there are too many of them. Though I realize that it's not trivial to reconfigure in a way that will not break indexing. Or maybe it is and just needs to be confirmed? I would think that ES was designed with logs that change over time in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you reach the limit you are not able to store more data in the given index. You could perhaps change the mapping and reindex but probably nobody would want to do that.
Blacklisting does not make much sense for me. If you are above the limit it might be very inconvenient to list all blacklisted tags.
I want something that works OOTB for all users especially with high cardinality tags. The other thing is that kibana is not primarily supported system for us.
for i := range queries { | ||
queries[i] = s.buildNestedQuery(tagFieldList[i], k, v) | ||
queries := make([]elastic.Query, len(objectTagFieldList)+1) | ||
kd := strings.Replace(k, ".", ":", -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This replace is duplicated in many places. Any way it can be extracted as a helper method with a nice name?
@@ -139,6 +142,9 @@ var ( | |||
} | |||
} | |||
}, | |||
"tagsMap": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your example in the description says:
tags: {
"http:method": "GET",
}
However the code says "tagsMap". Either there's a mistake or I'm missing something.
Tags []KeyValue `json:"tags"` | ||
ServiceName string `json:"serviceName"` | ||
Tags []KeyValue `json:"tags"` | ||
TagsMap map[string]string `json:"tagsMap,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a maintainer, but if it was me, I would have a separate "span" model that is independent of the backend and then I would map it to a backend specific model in the backend specific package. It does not look right to me that you have to modify the common model just to change ES representation.
Resolves #906
Example of the index:
Significant changes
:
Limitations:
index.mapping.total_fields.limit
https://www.elastic.co/guide/en/elasticsearch/reference/master/mapping.html#mapping-limit-settings. The count includes also meta fields Doc: Clarify count for index.mapping.total_fields.limit elastic/elasticsearch#24096:
TODO
index.mapping.total_fields.limit
- we could count unique tag names, wait for failed PUT and then store following tags differently - array/nested object