Tempo - Review performance #6844

josunect · 2023-11-14T16:16:20Z

It looks like, when doing some queries that returns a big amount of results, the pod frontend-query dies - Kiali is returning a 503 error.

There are not many of this calls - maybe 2 or 3.

josunect · 2023-11-29T13:04:59Z

Tested the deployment of Tempo with Tanka, kind and 2 nodes, and the following resources:

It takes a bit (1.94 seconds) but it works for the maximum of 7 days:

jshaughn · 2023-11-29T14:23:56Z

Any comparison with Jaeger?

josunect · 2023-11-30T11:38:39Z

I made this table so we can compare different calls. This is the bookinfo in minikube.

-	Jaeger	Tempo	cURL (Jaeger)	cURL (Tempo)
Traces 1m	104 ms	370 ms	66.522 ms	54.215 ms
Traces 10m	195 ms	1,26 s	-	-
Traces 7d	155 ms	712 ms	167.162 ms	314.332 ms
Traces 7d gRPC	219 ms	365 ms	-	-
Get Trace	10 ms	16 ms	4.772 ms	10.129 ms
Traces (Graph)	47 ms	458 ms	-	-

Traces call: http://localhost:3000/api/namespaces/bookinfo/workloads/reviews-v1/traces?startMicros=1700737749946000&tags=&limit=100
Trace: http://localhost:3000/api/traces/76cea5a020203ffebb1ef23e04f26282
cURL queries: Tempo - Review performance #6844 (comment)
Note: Jaeger and Jaeger cURL might be inconsistent, but cURL calls were done in different environments, I think the number of traces that the system has collected

The Tempo resources were increased:

  resources:
    total:
      limits:
        memory: 8Gi
        cpu: 4000m

jshaughn · 2023-11-30T17:47:26Z

Wow, Tempo is significantly slower. Thanks for the table comparison!

josunect · 2023-12-05T10:42:41Z

Wow, Tempo is significantly slower. Thanks for the table comparison!

There is also true, that we are also processing the traces and transform them into Jaeger format to be used internally. So, in the future maybe we can change internally to use the OTel format. I think this will require an effort.

josunect · 2023-12-13T18:32:26Z

I was reviewing the performance, and I see (From the graph view) there are many concurrent calls to the Tracing API with just one click on the graph:

2023-12-13T18:13:50Z INF [2023-12-13 18:12:46 - 2023-12-13 18:13:50] http://localhost:16686/jaeger/api/traces?end=1702491230000000&limit=15&service=reviews.bookinfo&start=1702491166000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:12:46 - 2023-12-13 18:13:50] http://localhost:16686/jaeger/api/traces?end=1702491230000000&limit=15&service=reviews.bookinfo&start=1702491166000000
2023-12-13T18:13:50Z INF [2023-12-13 18:12:58 - 2023-12-13 18:13:05] http://localhost:16686/jaeger/api/traces?end=1702491185000000&limit=1&service=reviews.bookinfo&start=1702491178000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:13:18 - 2023-12-13 18:13:24] http://localhost:16686/jaeger/api/traces?end=1702491204000000&limit=1&service=reviews.bookinfo&start=1702491198000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:13:37 - 2023-12-13 18:13:43] http://localhost:16686/jaeger/api/traces?end=1702491223000000&limit=1&service=reviews.bookinfo&start=1702491217000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:12:52 - 2023-12-13 18:12:58] http://localhost:16686/jaeger/api/traces?end=1702491178000000&limit=1&service=reviews.bookinfo&start=1702491172000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:13:24 - 2023-12-13 18:13:30] http://localhost:16686/jaeger/api/traces?end=1702491210000000&limit=1&service=reviews.bookinfo&start=1702491204000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:13:05 - 2023-12-13 18:13:11] http://localhost:16686/jaeger/api/traces?end=1702491191000000&limit=1&service=reviews.bookinfo&start=1702491185000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:13:30 - 2023-12-13 18:13:37] http://localhost:16686/jaeger/api/traces?end=1702491217000000&limit=1&service=reviews.bookinfo&start=1702491210000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:12:46 - 2023-12-13 18:12:52] http://localhost:16686/jaeger/api/traces?end=1702491172000000&limit=1&service=reviews.bookinfo&start=1702491166000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:13:11 - 2023-12-13 18:13:18] http://localhost:16686/jaeger/api/traces?end=1702491198000000&limit=1&service=reviews.bookinfo&start=1702491191000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:13:43 - 2023-12-13 18:13:50] http://localhost:16686/jaeger/api/traces?end=1702491230000000&limit=1&service=reviews.bookinfo&start=1702491223000000&tags=%7B%22cluster%22%3A%22Kubernetes%22%7D
2023-12-13T18:13:50Z INF [2023-12-13 18:12:58 - 2023-12-13 18:13:05] http://localhost:16686/jaeger/api/traces?end=1702491185000000&limit=1&service=reviews.bookinfo&start=1702491178000000
2023-12-13T18:13:50Z INF [2023-12-13 18:13:18 - 2023-12-13 18:13:24] http://localhost:16686/jaeger/api/traces?end=1702491204000000&limit=1&service=reviews.bookinfo&start=1702491198000000
2023-12-13T18:13:50Z INF [2023-12-13 18:13:11 - 2023-12-13 18:13:18] http://localhost:16686/jaeger/api/traces?end=1702491198000000&limit=1&service=reviews.bookinfo&start=1702491191000000
2023-12-13T18:13:50Z INF [2023-12-13 18:13:37 - 2023-12-13 18:13:43] http://localhost:16686/jaeger/api/traces?end=1702491223000000&limit=1&service=reviews.bookinfo&start=1702491217000000
2023-12-13T18:13:50Z INF [2023-12-13 18:12:46 - 2023-12-13 18:12:52] http://localhost:16686/jaeger/api/traces?end=1702491172000000&limit=1&service=reviews.bookinfo&start=1702491166000000
2023-12-13T18:13:50Z INF [2023-12-13 18:13:30 - 2023-12-13 18:13:37] http://localhost:16686/jaeger/api/traces?end=1702491217000000&limit=1&service=reviews.bookinfo&start=1702491210000000
2023-12-13T18:13:50Z INF [2023-12-13 18:12:52 - 2023-12-13 18:12:58] http://localhost:16686/jaeger/api/traces?end=1702491178000000&limit=1&service=reviews.bookinfo&start=1702491172000000
2023-12-13T18:13:50Z INF [2023-12-13 18:13:43 - 2023-12-13 18:13:50] http://localhost:16686/jaeger/api/traces?end=1702491230000000&limit=1&service=reviews.bookinfo&start=1702491223000000
2023-12-13T18:13:50Z INF [2023-12-13 18:13:24 - 2023-12-13 18:13:30] http://localhost:16686/jaeger/api/traces?end=1702491210000000&limit=1&service=reviews.bookinfo&start=1702491204000000
2023-12-13T18:13:50Z INF [2023-12-13 18:13:05 - 2023-12-13 18:13:11] http://localhost:16686/jaeger/api/traces?end=1702491191000000&limit=1&service=reviews.bookinfo&start=1702491185000000

https://github.com/kiali/kiali/blob/master/business/tracing.go#L141

The limit is 1, no sure how optimal this code is in this example (Time range is just 1m). Probably, if there is a big amount of time range, it could help to get a wider range of traces, based on the number of results.

It looks also that the queries are duplicated with the cluster tag.

josunect · 2023-12-14T10:36:26Z

For Tempo, there are many requests as it has a API to check the tags and it is possible to get this information in advance.
Anyway, this is an example of the resources:

  resources:
    total:
      limits:
        memory: 1Gi
        cpu: 2000m

11 queries are done concurrently before the pod died (This is, one click in a node Graph and the traces tab):

2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550070&limit=15&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550004
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550070&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550063
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550017&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550010
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550043&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550037
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550057&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550050
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550037&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550030
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550030&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550023
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550010&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550004
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550023&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550017
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550050&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550043
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550063&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550057
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550023&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550017": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550023&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550017]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550037&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550030": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550037&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550030]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550017&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550010": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550017&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550010]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550030&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550023": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550030&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550023]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550010&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550004": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550010&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550004]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550057&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550050": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550057&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550050]
2023-12-14T10:34:31Z ERR Traces split & join failed: Get "http://localhost:3200/api/search?end=1702550057&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550050": EOF
2023-12-14T10:34:40Z INF http://localhost:3200/api/search?end=1702550080&limit=100&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550020
2023-12-14T10:34:41Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550080&limit=100&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550020": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550080&limit=100&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550020]

josunect · 2023-12-14T11:23:16Z

For Tempo, there are many requests as it has a API to check the tags and it is possible to get this information in advance. Anyway, this is an example of the resources:

  resources:
    total:
      limits:
        memory: 1Gi
        cpu: 2000m

11 queries are done concurrently before the pod died (This is, one click in a node Graph and the traces tab):

2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550070&limit=15&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550004
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550070&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550063
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550017&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550010
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550043&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550037
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550057&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550050
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550037&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550030
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550030&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550023
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550010&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550004
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550023&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550017
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550050&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550043
2023-12-14T10:34:30Z INF http://localhost:3200/api/search?end=1702550063&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550057
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550023&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550017": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550023&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550017]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550037&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550030": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550037&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550030]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550017&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550010": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550017&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550010]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550030&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550023": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550030&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550023]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550010&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550004": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550010&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550004]
2023-12-14T10:34:31Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550057&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550050": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550057&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550050]
2023-12-14T10:34:31Z ERR Traces split & join failed: Get "http://localhost:3200/api/search?end=1702550057&limit=1&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550050": EOF
2023-12-14T10:34:40Z INF http://localhost:3200/api/search?end=1702550080&limit=100&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550020
2023-12-14T10:34:41Z ERR Tempo API query error: Get "http://localhost:3200/api/search?end=1702550080&limit=100&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550020": EOF [code: 0, URL: http://localhost:3200/api/search?end=1702550080&limit=100&q=%7B+.service.name+%3D+%22reviews.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702550020]

The same happens when Tempo has collected many traces, with a single query:

"http://localhost:3200/api/search?end=1702551729&limit=100&q=%7B+.service.name+%3D+%22productpage.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702551669"

Increasing the resources to the following:

  resources:
    total:
      limits:
        memory: 2Gi
        cpu: 4000m

Still has some issues (Increasing to the last 3h, the pod is killed by the OOM).
The system is more stable with this config:

  resources:
    total:
      limits:
        memory: 4Gi
        cpu: 8000m

But still, it looks like a big amount of resources for a dev environment.

cURL query: curl -G -s http://localhost:3200/api/search --data-urlencode 'q={ .service.name = "productpage.bookinfo"} && { } | select("status", ".service_name", ".node_id", ".component", ".upstream_cluster", ".http.method", ".response_flags")' --data-urlencode 'spss=10' --data-urlencode 'limit=100' --data-urlencode 'start=1701948096' --data-urlencode 'end=1702552896' | jq

josunect · 2023-12-14T12:50:05Z

It looks like the spss=10 (What makes possible to create the heatmap and the bubble size) is has a high cost. Maybe we could create an option to hide the heatmap in the Tooltip to improve performance (And this can be configurable to enable when the system is provided with many resources)

jshaughn · 2023-12-18T18:28:08Z

I was reviewing the performance, and I see (From the graph view) there are many concurrent calls to the Tracing API with just one click on the graph:

Sorry, I was not keeping up with this discussion. So, one request from the graph results in many concurrent back end calls to the tempo API? What determines the amount of concurrent calls, the number of spans? And are you saying that if the number of gofuncs is too high the pod may crash?

josunect · 2023-12-19T10:49:24Z

I was reviewing the performance, and I see (From the graph view) there are many concurrent calls to the Tracing API with just one click on the graph:

Sorry, I was not keeping up with this discussion. So, one request from the graph results in many concurrent back end calls to the tempo API? What determines the amount of concurrent calls, the number of spans? And are you saying that if the number of gofuncs is too high the pod may crash?

Yes, the Tempo pod might crash, killed by OOM. And that is right, one request from the graph might result in 10 concurrent calls to the backend, to spread the query throw the time interval and get a sample from different times: https://github.com/kiali/kiali/blob/master/business/tracing.go#L237 (For example, if we want 15 traces from the last hour, it would return a trace for each minute, instead of returning all the traces from the last 10 minutes).

This is done for Tempo and Jaeger, but I haven't seen any issue with Jaeger.

The number of concurrent calls is determined as a fixed value in the function: https://github.com/kiali/kiali/blob/master/business/tracing.go#L237 , it is set to 10.

The number of spans is independent to the concurrent calls, but is another point of improvement in the query for Tempo. Jaeger returns all the spans, and by default, Tempo limits the number of spans in 3, because it has a higher cost.

josunect · 2023-12-21T17:39:37Z

Jaeger query last 1m:
curl 'http://localhost:16686/jaeger/api/traces?end=1703177208000000&limit=100&service=productpage.bookinfo&start=1703177199000000' -w 'Total: %{time_total}s\n'
Total: 0.066522s
Jaeger query last 7d:
curl 'http://localhost:16686/jaeger/api/traces?end=1703177676000000&limit=100&service=productpage.bookinfo&start=1702572876000000' -w 'Total: %{time_total}s\n'
Total: 0.167162s
Jaeger query one trace:
curl 'http://localhost:16686/jaeger/api/traces/e3b70ccde315af7a772aaa41b786b368' -w 'Total: %{time_total}s\n'
Total: 0.004772s
Tempo query last 1m:
curl -G -s http://localhost:3200/api/search --data-urlencode 'q={ .service.name = "productpage.bookinfo"} && { } | select("status", ".service_name", ".node_id", ".component", ".upstream_cluster", ".http.method", ".response_flags")' --data-urlencode 'spss=10' --data-urlencode 'limit=100' --data-urlencode 'start=1703179679' --data-urlencode 'end=1703179696' -w 'Total: %{time_total}s\n'
Total: 0.054215s
Tempo query last 7d:
curl 'http://localhost:3200/api/search?end=1703179433&limit=100&q=%7B+.service.name+%3D+%22productpage.bookinfo%22++%7D+%26%26+%7B++%7D+%7C+select%28status%2C+.service_name%2C+.node_id%2C+.component%2C+.upstream_cluster%2C+.http.method%2C+.response_flags%29&spss=10&start=1702574633' -w 'Total: %{time_total}s\n'
Total: 0.314332s
Tempo query one trace:
curl 'http://localhost:3200/api/traces/b483155fb47d1d17627c8ee102daea67' -w 'Total: %{time_total}s\n'
Total: 0.010129s

josunect · 2023-12-22T15:45:23Z

Based on these results:

I think the performance of the Tempo API can be similar to Jaeger, and there are certain overhead processing every trace for the transformation. I don't think it is a lot, but as a future improvement, Kiali can be changed to work with the OTEL format (It might be a big change I think).
The main issue with Tempo is that the pod crashes (Killed by the OOM). The resources have been increased to have the system stable, but still it seems a lot (That was reported).
Some improvements could be, based on a configuration option, hiding the heat map in the tooltip, and perform a different query:
count() > 0 | { [resource.service.name](http://resource.service.name/)="<some svc>"} | avg(duration)

josunect · 2024-01-04T11:39:08Z

Based on this analysis I think this issue can be closed highlighting some points of improvement in case it is required:

Improve the search query used by Kiali (This will require some research)
Use traces in OTEL format in Kiali (backend and frontend changes)
Performance of the tempo operator (Decrease resources used)

josunect · 2024-01-04T11:39:21Z

Based on this analysis I think this issue can be closed highlighting some points of improvement in case it is required:

Improve the search query used by Kiali (This will require some research)
Use traces in OTEL format in Kiali (backend and frontend changes)
Performance of the tempo operator (Decrease resources used)

josunect added the bug Something isn't working label Nov 14, 2023

josunect mentioned this issue Nov 14, 2023

Tempo support improvements #6538

Closed

18 tasks

josunect self-assigned this Nov 29, 2023

jshaughn added the backlog Triaged Issue added to backlog label Dec 6, 2023

josunect mentioned this issue Dec 15, 2023

Use ignorecluster global for tracing #6960

Merged

josunect mentioned this issue Dec 22, 2023

Update trace logic for Tempo lo limit the number of results #6988

Merged

josunect closed this as completed Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tempo - Review performance #6844

Tempo - Review performance #6844

josunect commented Nov 14, 2023

josunect commented Nov 29, 2023 •

edited

jshaughn commented Nov 29, 2023

josunect commented Nov 30, 2023 •

edited

jshaughn commented Nov 30, 2023

josunect commented Dec 5, 2023

josunect commented Dec 13, 2023

josunect commented Dec 14, 2023 •

edited

josunect commented Dec 14, 2023 •

edited

josunect commented Dec 14, 2023

jshaughn commented Dec 18, 2023

josunect commented Dec 19, 2023

josunect commented Dec 21, 2023

josunect commented Dec 22, 2023 •

edited

josunect commented Jan 4, 2024

josunect commented Jan 4, 2024

Tempo - Review performance #6844

Tempo - Review performance #6844

Comments

josunect commented Nov 14, 2023

josunect commented Nov 29, 2023 • edited

jshaughn commented Nov 29, 2023

josunect commented Nov 30, 2023 • edited

jshaughn commented Nov 30, 2023

josunect commented Dec 5, 2023

josunect commented Dec 13, 2023

josunect commented Dec 14, 2023 • edited

josunect commented Dec 14, 2023 • edited

josunect commented Dec 14, 2023

jshaughn commented Dec 18, 2023

josunect commented Dec 19, 2023

josunect commented Dec 21, 2023

josunect commented Dec 22, 2023 • edited

josunect commented Jan 4, 2024

josunect commented Jan 4, 2024

josunect commented Nov 29, 2023 •

edited

josunect commented Nov 30, 2023 •

edited

josunect commented Dec 14, 2023 •

edited

josunect commented Dec 14, 2023 •

edited

josunect commented Dec 22, 2023 •

edited