Skip to content

extended tracing setup by elasticsearch instance #241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 23, 2022

Conversation

frzifus
Copy link
Member

@frzifus frzifus commented Jun 14, 2022

Signed-off-by: Benedikt Bongartz bongartz@klimlive.de

based on the standalone ES tested here: jaegertracing/jaeger-operator#1937 (comment)

Used for testing.

oc-4.10 process -f  resources/services/observatorium-traces-template.yaml -p NAMESPACE=observatorium-traces-testing  | oc-4.10 create -f -

Blocked by: #242


Update

We need to wait until Red Hat OpenShift distributed tracing platform 2.4 is released. Its scheduled for today (16.06.2022)

@frzifus frzifus self-assigned this Jun 14, 2022
@frzifus frzifus marked this pull request as draft June 14, 2022 12:47
@frzifus frzifus force-pushed the add_elastic_instance branch from a9034af to faa4bc0 Compare June 14, 2022 13:57
},
nodes: [
{
nodeCount: 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this maybe a goof candidate for a template parameter? I.e. do we want run single node ES on production too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, but scaling only the nodes is not useful, as roles and redundancy policies can differ.
In consensus with @pavolloffay, we decided to keep it non-configurable for now and revisit this point later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavolloffay I can certainly speak from my experience with AppSRE's onboard procedures that this is going to raise a blocker. Anything running on AppSRE's production requires at minimum an HA setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @periklis

Copy link
Member Author

@frzifus frzifus Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some way to configure node types and count in e64f58b. Iam not familiar with configuring ES and would appreciate feedback. (didnt test it yet, still waiting for the RHOSDT 2.4 release)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will come back on this on Monday after I return from PTO. The number of master nodes is not independent configurable from the total number of nodes. If done wrong you get stuck with a failure condition on your ES CR status w/o a cluster being scheduled at all

Copy link
Member Author

@frzifus frzifus Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iam not an ES expert, but it seems to work with the split configuration.

Here is what i tested.

oc-4.10 process -f  resources/services/observatorium-traces-template.yaml -p NAMESPACE=observatorium-traces-testing -p ELASTICSEARCH_NODES_ALLINONE=0 -p ELASTICSEARCH_NODES_MASTER=1 -p ELASTICSEARCH_NODES_CLIENT=1 -p ELASTICSEARCH_NODES_DATA=1  | oc-4.10 apply -f -

And what i got:

NAME                                                              READY   STATUS             RESTARTS      AGE
shared-es-c-e71w31o2-0                                            2/2     Running            0             96s
shared-es-d-0cizwmwr-1-5cd6b86759-nxsb4                           2/2     Running            0             96s
shared-es-m-fk21lbua-0                                            2/2     Running            0             96s

With named instance name=test123:

$ oc-4.10 process -f  resources/services/observatorium-traces-template.yaml -p NAMESPACE=observatorium-traces-testing -p ELASTICSEARCH_NODES_ALLINONE=0 -p ELASTICSEARCH_NODES_MASTER=1 -p ELASTICSEARCH_NODES_CLIENT=1 -p ELASTICSEARCH_NODES_DATA=1 -p ELASTICSEARCH_NAME=test123  | oc-4.10 create -f -
$ oc-4.10 get pods -n observatorium-traces-testing                                      
NAME                                                        READY   STATUS    RESTARTS        AGE
observatorium-jaeger-rhobs-collector-89b9c6bcc-cksgd        1/1     Running   2 (3m2s ago)    3m24s
observatorium-jaeger-rhobs-query-7c585fd78c-tpk8r           3/3     Running   2 (3m1s ago)    3m24s
observatorium-jaeger-telemeter-collector-786bf6d969-tgk6k   1/1     Running   3 (2m45s ago)   3m21s
observatorium-jaeger-telemeter-query-656ff698b8-wgbcx       3/3     Running   3 (2m45s ago)   3m21s
observatorium-otel-collector-58df976b46-wjhkt               1/1     Running   0               3m31s
test123-c-nugjiuje-0                                        2/2     Running   0               3m19s
test123-d-rocohl6k-1-6fbf999c-st7tm                         2/2     Running   0               3m19s
test123-m-g6g26a5j-0                                        2/2     Running   0               3m19s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frzifus @pavolloffay Is there a reason why you need explicit client/master ES nodes (test123-c-nugjiuje-0 and test123-m-g6g26a5j-0) instead of three nodes using all roles client/data/master. We usually use this scheme in OpenShift Logging at it is easier to upgrade and easier to maintain long-term. Something like this:

spec:
...
  nodes:
  - nodeCount: 3
    roles:
    - client
    - data
    - master
    storage:
      size: 80Gi
      storageClassName: gp2
  redundancyPolicy: SingleRedundancy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavolloffay @periklis all right, now the node count for it's master/client/data nodes is configurable using ELASTICSEARCH_NODE_COUNT.

@frzifus frzifus force-pushed the add_elastic_instance branch 2 times, most recently from cf39622 to 58882ac Compare June 15, 2022 13:34
@frzifus frzifus requested a review from periklis June 15, 2022 13:37
@frzifus frzifus force-pushed the add_elastic_instance branch 5 times, most recently from eb6b0b2 to 9678195 Compare June 20, 2022 10:57
@frzifus frzifus marked this pull request as ready for review June 20, 2022 11:00
@frzifus frzifus requested a review from pavolloffay June 20, 2022 15:09
@frzifus frzifus force-pushed the add_elastic_instance branch from 9678195 to ab190a8 Compare June 20, 2022 17:03
cpu: ${ELASTICSEARCH_REQUEST_CPU}
memory: ${ELASTICSEARCH_REQUEST_MEMORY}
nodes:
- nodeCount: ${{ELASTICSEARCH_NODE_COUNT}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if {{}} is correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frzifus frzifus force-pushed the add_elastic_instance branch from ab190a8 to 7426ba7 Compare June 21, 2022 14:05
@frzifus frzifus enabled auto-merge (squash) June 21, 2022 14:07
@frzifus
Copy link
Member Author

frzifus commented Jun 21, 2022

@periklis what do you think?

frzifus added 6 commits June 22, 2022 14:26
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
@frzifus frzifus force-pushed the add_elastic_instance branch from 7426ba7 to a4ac18e Compare June 22, 2022 12:30
@frzifus
Copy link
Member Author

frzifus commented Jun 22, 2022

rebased.

Copy link
Contributor

@periklis periklis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Only double-checking on the index-prefixes.

Comment on lines +50 to +52
options:
es:
index-prefix: rhobs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these extra index prefixes working with our present OpenDistro Roles for Jaeger?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the roles are independent of the index-prefix, right?
As far as i understood, all instances gets its own curator and jaeger role certs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The roles have per se nothing to do with the certs. I am talking about authorization on the OpenDistro level. AFAICS your Jaeger definitions are *jaeger-..*, thus if you prefix your indices with rhobs-jaeger... you should be fine.

@frzifus frzifus merged commit e4cf80c into rhobs:main Jun 23, 2022
@frzifus frzifus deleted the add_elastic_instance branch June 23, 2022 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants