-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log 1098 - Playbook for Critical Alerts #673
Log 1098 - Playbook for Critical Alerts #673
Conversation
16f4d81
to
a16524b
Compare
a16524b
to
2ed981e
Compare
``` | ||
oc logs <elasticsearch_node_name> -c elasticsearch -n openshift-logging | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should have a follow up step here... but this starts to get really tricky... @lukas-vlcek can you think of some steps we can take here? Do we just want to try to restart the nodes that haven't joined? but if there's a cert issue we need to figure out which one has the correct certs... also the operator should be doing something there already...
@jcantrill can you also try to look through some of these steps based on your past customer experiences? |
``` | ||
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name> -X DELETE | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we also want to add a step to unlock all the indices now that the watermark level is below the threshold. ES will lock the indices automatically but it will not unlock them for ES6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ewolinetz what I had found is that ES will lock the indices only on reaching flood watermark level and not on low or high. That's why I have mentioned a step to unlock the indices in flood watermark troubleshooting.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name>/_settings?pretty | ||
``` | ||
- Identify the number of replicas from the output of the above command. | ||
- Lower the number of replicas if possible: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would need to be cluster-wide... the operator would try to adjust them afterwards I believe
2ed981e
to
8324809
Compare
@openshift/sre-alert-sme could you also take a look through some of these and comment? |
/assign @RiRa12621 I'll check it out tomorrow, unless one of the APAC folks has time before that |
sorry for the delay, lgtm from my perspective. |
/retest |
@sasagarw: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ewolinetz, sasagarw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you @ewolinetz ! |
Description
This PR:
/cc @lukas-vlcek
/assign @ewolinetz
Links