Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Change ES tag schema to support tag search in Kibana #980

Closed

Conversation

pavolloffay
Copy link
Member

@pavolloffay pavolloffay commented Aug 9, 2018

Resolves #906

Example of the index:

tags: {
  "http:method": "GET",
}

Significant changes

Limitations:

TODO

  • performance comparison.
  • fix index.mapping.total_fields.limit - we could count unique tag names, wait for failed PUT and then store following tags differently - array/nested object
  • store type of the tag

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
@pavolloffay
Copy link
Member Author

pavolloffay commented Aug 13, 2018

We might choose different replacing character. Kibana uses : as = https://www.elastic.co/guide/en/elasticsearch/reference/5.6/query-dsl-query-string-query.html#_field_names

: can be escaped to tags.a\:a:foo

@pavolloffay
Copy link
Member Author

pavolloffay commented Aug 13, 2018

This mapping allows storing ~476 unique tag keys with the default index.mapping.total_fields.limit set to 1000. There are some metafields _index, _id (12) and since we use dynamic mapping each field creates .raw field hence I was able to create on 476 unique tags elastic/elasticsearch#24096 (comment).

EDIT
If we define default mapping we can remove .raw field. https://www.elastic.co/guide/en/elasticsearch/reference/5.6/default-mapping.html. An example https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/org.ovirt.viaq-collectd.template.json#L5

@mabn @kacper-jackiewicz @Monnoroch do you know if raw field can be disabled? Or any possible workaround?

@pavolloffay pavolloffay changed the title WIP: Change ES tag schema from nested to object WIP: Change ES tag schema to support tag search in Kibana Aug 13, 2018
@pavolloffay
Copy link
Member Author

pavolloffay commented Aug 14, 2018

Performance results https://github.com/pavolloffay/jaeger-perf-tests, 300k spans different limit parameter for jaeger-query

QUERY_FROM=jaeger-query JAEGER_QUERY_ASYNC=true NUMBER_OF_SPANS=50000 NUM_OF_TRACERS=6 JAEGER_QUERY_LIMIT=50000 mvn clean package exec:java

Limit 50000 https://pastebin.com/7tKiQnuC, https://pastebin.com/sbz2y9JP with node statistics
Limit 1500 https://pastebin.com/vxY7dTEB

Query time of
http://localhost:16686/api/traces?service=perf-test-thread-0&limit=50000&lookback=1h&tags={"fooo.bar*?%http.d6cconald":"hehuhoh$?ij","fooo.ba2sar":"true","fooo.ba4342r":"1","fooo.bar1":"fobarhax*+??","fooo.bar*?%http.do**2nald":"goobarRAXbaz","fooo.bar*?%http.don(a44ld":"goobarRAXbaz","fooo.ba24r*?%":"hehe"}

limit 50000: mean 13363.90 milliseconds
limit 1500: mean = 441.96 milliseconds
limit 20: mean = 17.88 milliseconds.

Report time with multiple queries: 191265.00 milliseconds 3.18 min

The number of tags does not seem to affect query time.

Results with model using tags as nested objects and tagsMap as object. The query runs on both fields.
limit 50000: mean = 13397.66 milliseconds
limit 1500: mean = 436.02 milliseconds
limit 20: mean = 19.01 milliseconds
Report time with multiple queries: mean = 252512.00 milliseconds

@pavolloffay
Copy link
Member Author

pavolloffay commented Aug 14, 2018

Performance of master/head of the same perf test

Limit 50000 https://pastebin.com/BatPgLf6, https://pastebin.com/1pEV6PmY with node statistics
Limit 1500 https://pastebin.com/nxgNM7nk

Query time of
http://localhost:16686/api/traces?service=perf-test-thread-0&limit=50000&lookback=1h&tags={"fooo.bar*?%http.d6cconald":"hehuhoh$?ij","fooo.ba2sar":"true","fooo.ba4342r":"1","fooo.bar1":"fobarhax*+??","fooo.bar*?%http.do**2nald":"goobarRAXbaz","fooo.bar*?%http.don(a44ld":"goobarRAXbaz","fooo.ba24r*?%":"hehe"}

limit 50000: mean = 12683.73 milliseconds
limit 1500: mean = 405.27 milliseconds
limit 20: mean = 26.40 milliseconds

Report time:
206348.00 milliseconds = 3.4 min

The number of tags does not seem to affect query time

@pavolloffay
Copy link
Member Author

@mabn the benchmark results does not indicate performance improvement when using object datatype instead of nested datatype for tags.

tags := map[string]string{}
for _, tag := range kvs {
if strings.Contains(tag.Key, ":") {
s.logger.Warn("Tag key contains \\':\\', at the query time it will be transformed to \\'.\\'")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create a ton of spammy logs, one for each span. Perhaps, make it a Debug, or debounce it somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Monnoroch this is just POC to get some perf results, do not nit review please

Copy link

@Monnoroch Monnoroch Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, in that case the implementation makes sense apart from my question below. The only thing I would like is to have a way to configure the whitelisted/blacklisted tags somehow. The scenario I have in mind is you first would whitelist all of them and then if your service grows enough to have more and more tags you can blacklist some of them, and eventually move to a whitelist if there are too many of them. Though I realize that it's not trivial to reconfigure in a way that will not break indexing. Or maybe it is and just needs to be confirmed? I would think that ES was designed with logs that change over time in mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you reach the limit you are not able to store more data in the given index. You could perhaps change the mapping and reindex but probably nobody would want to do that.

Blacklisting does not make much sense for me. If you are above the limit it might be very inconvenient to list all blacklisted tags.

I want something that works OOTB for all users especially with high cardinality tags. The other thing is that kibana is not primarily supported system for us.

for i := range queries {
queries[i] = s.buildNestedQuery(tagFieldList[i], k, v)
queries := make([]elastic.Query, len(objectTagFieldList)+1)
kd := strings.Replace(k, ".", ":", -1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replace is duplicated in many places. Any way it can be extracted as a helper method with a nice name?

@@ -139,6 +142,9 @@ var (
}
}
},
"tagsMap": {
Copy link

@Monnoroch Monnoroch Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example in the description says:

tags: {
  "http:method": "GET",
}

However the code says "tagsMap". Either there's a mistake or I'm missing something.

Tags []KeyValue `json:"tags"`
ServiceName string `json:"serviceName"`
Tags []KeyValue `json:"tags"`
TagsMap map[string]string `json:"tagsMap,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a maintainer, but if it was me, I would have a separate "span" model that is independent of the backend and then I would map it to a backend specific model in the backend specific package. It does not look right to me that you have to modify the common model just to change ES representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Span format is not well suited for ES
2 participants