-
Notifications
You must be signed in to change notification settings - Fork 74
extended tracing setup by elasticsearch instance #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a9034af
to
faa4bc0
Compare
}, | ||
nodes: [ | ||
{ | ||
nodeCount: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this maybe a goof candidate for a template parameter? I.e. do we want run single node ES on production too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, but scaling only the nodes is not useful, as roles and redundancy policies can differ.
In consensus with @pavolloffay, we decided to keep it non-configurable for now and revisit this point later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavolloffay I can certainly speak from my experience with AppSRE's onboard procedures that this is going to raise a blocker. Anything running on AppSRE's production requires at minimum an HA setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @periklis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some way to configure node types and count in e64f58b. Iam not familiar with configuring ES and would appreciate feedback. (didnt test it yet, still waiting for the RHOSDT 2.4 release)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will come back on this on Monday after I return from PTO. The number of master nodes is not independent configurable from the total number of nodes. If done wrong you get stuck with a failure condition on your ES CR status w/o a cluster being scheduled at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iam not an ES expert, but it seems to work with the split configuration.
Here is what i tested.
oc-4.10 process -f resources/services/observatorium-traces-template.yaml -p NAMESPACE=observatorium-traces-testing -p ELASTICSEARCH_NODES_ALLINONE=0 -p ELASTICSEARCH_NODES_MASTER=1 -p ELASTICSEARCH_NODES_CLIENT=1 -p ELASTICSEARCH_NODES_DATA=1 | oc-4.10 apply -f -
And what i got:
NAME READY STATUS RESTARTS AGE
shared-es-c-e71w31o2-0 2/2 Running 0 96s
shared-es-d-0cizwmwr-1-5cd6b86759-nxsb4 2/2 Running 0 96s
shared-es-m-fk21lbua-0 2/2 Running 0 96s
With named instance name=test123
:
$ oc-4.10 process -f resources/services/observatorium-traces-template.yaml -p NAMESPACE=observatorium-traces-testing -p ELASTICSEARCH_NODES_ALLINONE=0 -p ELASTICSEARCH_NODES_MASTER=1 -p ELASTICSEARCH_NODES_CLIENT=1 -p ELASTICSEARCH_NODES_DATA=1 -p ELASTICSEARCH_NAME=test123 | oc-4.10 create -f -
$ oc-4.10 get pods -n observatorium-traces-testing
NAME READY STATUS RESTARTS AGE
observatorium-jaeger-rhobs-collector-89b9c6bcc-cksgd 1/1 Running 2 (3m2s ago) 3m24s
observatorium-jaeger-rhobs-query-7c585fd78c-tpk8r 3/3 Running 2 (3m1s ago) 3m24s
observatorium-jaeger-telemeter-collector-786bf6d969-tgk6k 1/1 Running 3 (2m45s ago) 3m21s
observatorium-jaeger-telemeter-query-656ff698b8-wgbcx 3/3 Running 3 (2m45s ago) 3m21s
observatorium-otel-collector-58df976b46-wjhkt 1/1 Running 0 3m31s
test123-c-nugjiuje-0 2/2 Running 0 3m19s
test123-d-rocohl6k-1-6fbf999c-st7tm 2/2 Running 0 3m19s
test123-m-g6g26a5j-0 2/2 Running 0 3m19s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frzifus @pavolloffay Is there a reason why you need explicit client/master ES nodes (test123-c-nugjiuje-0
and test123-m-g6g26a5j-0
) instead of three nodes using all roles client/data/master. We usually use this scheme in OpenShift Logging at it is easier to upgrade and easier to maintain long-term. Something like this:
spec:
...
nodes:
- nodeCount: 3
roles:
- client
- data
- master
storage:
size: 80Gi
storageClassName: gp2
redundancyPolicy: SingleRedundancy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition it makes using the right master count easy: https://github.com/openshift/elasticsearch-operator/blob/master/internal/elasticsearch/defaults.go#L24:2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavolloffay @periklis all right, now the node count for it's master/client/data nodes is configurable using ELASTICSEARCH_NODE_COUNT
.
cf39622
to
58882ac
Compare
eb6b0b2
to
9678195
Compare
9678195
to
ab190a8
Compare
cpu: ${ELASTICSEARCH_REQUEST_CPU} | ||
memory: ${ELASTICSEARCH_REQUEST_MEMORY} | ||
nodes: | ||
- nodeCount: ${{ELASTICSEARCH_NODE_COUNT}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if {{}}
is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it from here: openshift/origin#3946 (comment)
ab190a8
to
7426ba7
Compare
@periklis what do you think? |
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
7426ba7
to
a4ac18e
Compare
rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Only double-checking on the index-prefixes.
options: | ||
es: | ||
index-prefix: rhobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these extra index prefixes working with our present OpenDistro Roles for Jaeger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the roles are independent of the index-prefix, right?
As far as i understood, all instances gets its own curator and jaeger role certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The roles have per se nothing to do with the certs. I am talking about authorization on the OpenDistro level. AFAICS your Jaeger definitions are *jaeger-..*
, thus if you prefix your indices with rhobs-jaeger...
you should be fine.
Signed-off-by: Benedikt Bongartz bongartz@klimlive.de
based on the standalone ES tested here: jaegertracing/jaeger-operator#1937 (comment)
Used for testing.
oc-4.10 process -f resources/services/observatorium-traces-template.yaml -p NAMESPACE=observatorium-traces-testing | oc-4.10 create -f -
Blocked by: #242
Update
We need to wait until Red Hat OpenShift distributed tracing platform 2.4 is released. Its scheduled for today (16.06.2022)