extended tracing setup by elasticsearch instance #241

frzifus · 2022-06-14T12:38:14Z

Signed-off-by: Benedikt Bongartz bongartz@klimlive.de

based on the standalone ES tested here: jaegertracing/jaeger-operator#1937 (comment)

Used for testing.

oc-4.10 process -f  resources/services/observatorium-traces-template.yaml -p NAMESPACE=observatorium-traces-testing  | oc-4.10 create -f -

Blocked by: #242

Update

We need to wait until Red Hat OpenShift distributed tracing platform 2.4 is released. Its scheduled for today (16.06.2022)

periklis · 2022-06-15T05:04:37Z

services/observatorium-traces.libsonnet

+      },
+      nodes: [
+        {
+          nodeCount: 1,


Is this maybe a goof candidate for a template parameter? I.e. do we want run single node ES on production too?

indeed, but scaling only the nodes is not useful, as roles and redundancy policies can differ.
In consensus with @pavolloffay, we decided to keep it non-configurable for now and revisit this point later.

@pavolloffay I can certainly speak from my experience with AppSRE's onboard procedures that this is going to raise a blocker. Anything running on AppSRE's production requires at minimum an HA setup.

thanks @periklis

added some way to configure node types and count in e64f58b. Iam not familiar with configuring ES and would appreciate feedback. (didnt test it yet, still waiting for the RHOSDT 2.4 release)

I will come back on this on Monday after I return from PTO. The number of master nodes is not independent configurable from the total number of nodes. If done wrong you get stuck with a failure condition on your ES CR status w/o a cluster being scheduled at all

Iam not an ES expert, but it seems to work with the split configuration.

Here is what i tested.

oc-4.10 process -f resources/services/observatorium-traces-template.yaml -p NAMESPACE=observatorium-traces-testing -p ELASTICSEARCH_NODES_ALLINONE=0 -p ELASTICSEARCH_NODES_MASTER=1 -p ELASTICSEARCH_NODES_CLIENT=1 -p ELASTICSEARCH_NODES_DATA=1 | oc-4.10 apply -f -

And what i got:

NAME READY STATUS RESTARTS AGE shared-es-c-e71w31o2-0 2/2 Running 0 96s shared-es-d-0cizwmwr-1-5cd6b86759-nxsb4 2/2 Running 0 96s shared-es-m-fk21lbua-0 2/2 Running 0 96s

With named instance name=test123:

$ oc-4.10 process -f resources/services/observatorium-traces-template.yaml -p NAMESPACE=observatorium-traces-testing -p ELASTICSEARCH_NODES_ALLINONE=0 -p ELASTICSEARCH_NODES_MASTER=1 -p ELASTICSEARCH_NODES_CLIENT=1 -p ELASTICSEARCH_NODES_DATA=1 -p ELASTICSEARCH_NAME=test123 | oc-4.10 create -f - $ oc-4.10 get pods -n observatorium-traces-testing NAME READY STATUS RESTARTS AGE observatorium-jaeger-rhobs-collector-89b9c6bcc-cksgd 1/1 Running 2 (3m2s ago) 3m24s observatorium-jaeger-rhobs-query-7c585fd78c-tpk8r 3/3 Running 2 (3m1s ago) 3m24s observatorium-jaeger-telemeter-collector-786bf6d969-tgk6k 1/1 Running 3 (2m45s ago) 3m21s observatorium-jaeger-telemeter-query-656ff698b8-wgbcx 3/3 Running 3 (2m45s ago) 3m21s observatorium-otel-collector-58df976b46-wjhkt 1/1 Running 0 3m31s test123-c-nugjiuje-0 2/2 Running 0 3m19s test123-d-rocohl6k-1-6fbf999c-st7tm 2/2 Running 0 3m19s test123-m-g6g26a5j-0 2/2 Running 0 3m19s

@frzifus @pavolloffay Is there a reason why you need explicit client/master ES nodes (test123-c-nugjiuje-0 and test123-m-g6g26a5j-0) instead of three nodes using all roles client/data/master. We usually use this scheme in OpenShift Logging at it is easier to upgrade and easier to maintain long-term. Something like this:

spec: ... nodes: - nodeCount: 3 roles: - client - data - master storage: size: 80Gi storageClassName: gp2 redundancyPolicy: SingleRedundancy

In addition it makes using the right master count easy: https://github.com/openshift/elasticsearch-operator/blob/master/internal/elasticsearch/defaults.go#L24:2

@pavolloffay @periklis all right, now the node count for it's master/client/data nodes is configurable using ELASTICSEARCH_NODE_COUNT.

resources/services/observatorium-traces-template.yaml

services/observatorium-traces.libsonnet

resources/services/observatorium-traces-template.yaml

pavolloffay · 2022-06-21T13:55:04Z

resources/services/observatorium-traces-template.yaml

+          cpu: ${ELASTICSEARCH_REQUEST_CPU}
+          memory: ${ELASTICSEARCH_REQUEST_MEMORY}
+    nodes:
+    - nodeCount: ${{ELASTICSEARCH_NODE_COUNT}}


not sure if {{}} is correct

Got it from here: openshift/origin#3946 (comment)

frzifus · 2022-06-21T14:08:21Z

@periklis what do you think?

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>

frzifus · 2022-06-22T12:30:57Z

rebased.

periklis

Looks good to me! Only double-checking on the index-prefixes.

periklis · 2022-06-23T09:36:09Z

resources/services/observatorium-traces-template.yaml

+      options:
+        es:
+          index-prefix: rhobs


Are these extra index prefixes working with our present OpenDistro Roles for Jaeger?

the roles are independent of the index-prefix, right?
As far as i understood, all instances gets its own curator and jaeger role certs.

The roles have per se nothing to do with the certs. I am talking about authorization on the OpenDistro level. AFAICS your Jaeger definitions are *jaeger-..*, thus if you prefix your indices with rhobs-jaeger... you should be fine.

frzifus requested review from periklis, esnible and pavolloffay June 14, 2022 12:38

frzifus self-assigned this Jun 14, 2022

frzifus marked this pull request as draft June 14, 2022 12:47

frzifus force-pushed the add_elastic_instance branch from a9034af to faa4bc0 Compare June 14, 2022 13:57

periklis reviewed Jun 15, 2022

View reviewed changes

frzifus force-pushed the add_elastic_instance branch 2 times, most recently from cf39622 to 58882ac Compare June 15, 2022 13:34

frzifus requested a review from periklis June 15, 2022 13:37

frzifus commented Jun 15, 2022

View reviewed changes

resources/services/observatorium-traces-template.yaml Outdated Show resolved Hide resolved

frzifus force-pushed the add_elastic_instance branch 5 times, most recently from eb6b0b2 to 9678195 Compare June 20, 2022 10:57

frzifus marked this pull request as ready for review June 20, 2022 11:00

pavolloffay reviewed Jun 20, 2022

View reviewed changes

services/observatorium-traces.libsonnet Outdated Show resolved Hide resolved

frzifus requested a review from pavolloffay June 20, 2022 15:09

frzifus force-pushed the add_elastic_instance branch from 9678195 to ab190a8 Compare June 20, 2022 17:03

pavolloffay approved these changes Jun 21, 2022

View reviewed changes

frzifus force-pushed the add_elastic_instance branch from ab190a8 to 7426ba7 Compare June 21, 2022 14:05

frzifus enabled auto-merge (squash) June 21, 2022 14:07

frzifus added 6 commits June 22, 2022 14:26

extended tracing setup by elasticsearch instance

0531de6

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>

jaeger: switch from allinone to production

1c48fa6

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>

support elastic node configuration

16ccb74

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>

remove individual es node count

dd5a153

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>

use same memory value for es limit and request

3e858ba

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>

regenerate

a4ac18e

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>

frzifus force-pushed the add_elastic_instance branch from 7426ba7 to a4ac18e Compare June 22, 2022 12:30

periklis reviewed Jun 23, 2022

View reviewed changes

periklis approved these changes Jun 23, 2022

View reviewed changes

frzifus merged commit e4cf80c into rhobs:main Jun 23, 2022

frzifus deleted the add_elastic_instance branch June 23, 2022 10:22

extended tracing setup by elasticsearch instance #241

extended tracing setup by elasticsearch instance #241

Uh oh!

Conversation

frzifus commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frzifus Jun 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frzifus Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frzifus commented Jun 21, 2022

Uh oh!

frzifus commented Jun 22, 2022

Uh oh!

periklis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

frzifus commented Jun 14, 2022 •

edited

Loading

frzifus Jun 15, 2022 •

edited

Loading

frzifus Jun 20, 2022 •

edited

Loading