INTLY-4129 Pod Monitor implementation #114

davidkirwan · 2019-12-17T14:41:25Z

JIRA ID

https://issues.redhat.com/browse/INTLY-4129

Additional Information

This PR adds another monitor resource: PodMonitor. It is useful when there are multiple Keycloak pods running at the same time in a cluster (HA mode).

Verification Steps

Checkout this branch add-pod-monitor on my fork git@github.com:davidkirwan/keycloak-operator.git
apply all the CRDs etc: make cluster/prepare
Deploy the operator locally: operator-sdk up local --namespace "keycloak"
Modify the Keycloak CR to increase the instances to 3:

diff --git a/deploy/examples/keycloak/keycloak.yaml b/deploy/examples/keycloak/keycloak.yaml
index d5d9d24..5d23118 100644
--- a/deploy/examples/keycloak/keycloak.yaml
+++ b/deploy/examples/keycloak/keycloak.yaml
@@ -5,8 +5,8 @@ metadata:
   labels:
     app: sso
 spec:
-  instances: 1
+  instances: 3
   extensions:
     - https://github.com/aerogear/keycloak-metrics-spi/releases/download/1.0.4/keycloak-metrics-spi-1.0.4.jar
   externalAccess:
-    enabled: True
\ No newline at end of file
+    enabled: True

label the keycloak namespace. oc label namespace keycloak monitoring-key=middleware
Install application monitoring stack - https://github.com/integr8ly/application-monitoring-operator
Then apply the Keycloak CR: oc apply -f deploy/examples/keycloak/keycloak.yaml
Then apply the Keycloak Realm CR: oc apply -f deploy/examples/realm/basic_realm.yaml
Verify that the PodMonitor created has the monitoring-key: middleware label
Check for metrics being available related to keycloak in the prometheus instance created by the application monitoring operator, eg: keycloak_request_duration_bucket

Expect log messages such as the following:

{"level":"info","ts":1576850553.5705435,"logger":"controller_keycloak","msg":"Reconciling Keycloak","Request.Namespace":"keycloak","Request.Name":"example-keycloak"}
{"level":"info","ts":1576850553.6047091,"logger":"action_runner","msg":"(    0)    SUCCESS Update Keycloak admin secret"}
{"level":"info","ts":1576850553.640096,"logger":"action_runner","msg":"(    1)    SUCCESS update keycloak prometheus rule"}
{"level":"info","ts":1576850553.6730623,"logger":"action_runner","msg":"(    2)    SUCCESS update keycloak service monitor"}
{"level":"info","ts":1576850553.7066567,"logger":"action_runner","msg":"(    3)    SUCCESS update keycloak pod monitor"}
{"level":"info","ts":1576850553.7443936,"logger":"action_runner","msg":"(    4)    SUCCESS update keycloak grafana dashboard"}
{"level":"info","ts":1576850553.7768626,"logger":"action_runner","msg":"(    5)    SUCCESS Update Database Secret"}
{"level":"info","ts":1576850553.812751,"logger":"action_runner","msg":"(    6)    SUCCESS Update Postgresql PersistentVolumeClaim"}
{"level":"info","ts":1576850553.846172,"logger":"action_runner","msg":"(    7)    SUCCESS Update Postgresql Deployment"}
{"level":"info","ts":1576850553.8792517,"logger":"action_runner","msg":"(    8)    SUCCESS Update Postgresql KeycloakService"}
{"level":"info","ts":1576850553.9124768,"logger":"action_runner","msg":"(    9)    SUCCESS Update keycloak Service"}
{"level":"info","ts":1576850553.9456978,"logger":"action_runner","msg":"(   10)    SUCCESS Update keycloak Discovery Service"}
{"level":"info","ts":1576850553.9791303,"logger":"action_runner","msg":"(   11)    SUCCESS Update Keycloak Deployment (StatefulSet)"}
{"level":"info","ts":1576850554.0188434,"logger":"action_runner","msg":"(   12)    SUCCESS Update Keycloak Route"}
{"level":"info","ts":1576850554.0523052,"logger":"controller_keycloak","msg":"desired cluster state met"}

Should see a PodMonitor created in the keycloak namespace:

Should see metrics from the multiple keycloak pods:

Checklist:

Verified by team member
Comments where necessary
Automated Tests
Documentation changes if necessary

Additional Notes

slaskawi · 2019-12-17T15:16:57Z

Travis seems to be sad... very sad :)

coveralls · 2019-12-17T18:07:18Z

Coverage decreased (-0.4%) to 41.962% when pulling e5b6264 on davidkirwan:add-pod-monitor into 7ed3df1 on keycloak:master.

pkg/controller/keycloak/keycloak_reconciler.go

slaskawi · 2019-12-18T16:42:39Z

@davidkirwan Overall, it looks good to me. Please also implement/change the following bits:

I added one, small comment - please address it.
please rename commit messages to something more descriptive, like: INTLY-4129 Pod Monitor implementation
Please provide a short description (in the PR) how it should work and how to test it.
Please convert this PR to a normal one (not a draft)
Please let @stianst and @abstractj know, that this one is ready to be merged.

Just FYI - I'll be on sick leave and then on PTO, so I probably won't be able to look more into it. But it looks fine, I'd like to have it in 8.0.1.

david-martin · 2019-12-18T17:35:02Z

@davidkirwan 2 questions.

Should the ServiceMonitor no longer be created now that a PodMonitor is being created?
Should the dashboard and any alerts be updated to allow for aggregating metrics from multiple pods?

davidkirwan · 2019-12-20T14:21:35Z

Hi @stianst and @abstractj I think this PR is ready to be looked at again!

@david-martin I think there will be some fallout related to the switch from ServiceMonitors to PodMonitors, I don't know enough at this point to say how much work it will be to modify our RHMI dashboards though.

wei-lee · 2019-12-31T13:55:53Z

@david-martin

Should the ServiceMonitor no longer be created now that a PodMonitor is being created?

No, I don't think so. They are for monitoring different objects: one for service (a load balancer), and the other is for pods.

Should the dashboard and any alerts be updated to allow for aggregating metrics from multiple pods?

yes, I think so.

davidffrench · 2020-01-02T11:07:50Z

@davidkirwan @wei-lee Which one of you is actively working on this ticket now? It looks like there are some additional updates to be made to the alerts and dashboards.

The addition of the new keycloak operator to RHMI is currently blocked until there is a new release of the keycloak operator which includes #112 . One option would be to release as is and then included these changes in another future release. @stianst @slaskawi What is the dates for future keycloak releases including the operator? or are they ad-hoc when needed?

davidkirwan · 2020-01-02T11:33:15Z

@davidffrench just spoke with @wei-lee it looks like we have to make some changes to the Keycloak grafanadashboards and prometheusrules:

https://github.com/keycloak/keycloak-operator/blob/c92d00e664b37c7b26246ea1708cdfc08dac4dce/pkg/model/grafana_dashboard.go
https://github.com/keycloak/keycloak-operator/blob/c92d00e664b37c7b26246ea1708cdfc08dac4dce/pkg/model/prometheus_rule.go

I can try take this work on!

davidffrench · 2020-01-02T12:42:09Z

Excellent, sounds good @davidkirwan

Updated grafana operator imports to v3 Renamed imports for grafana operator to grafanav1alpha1

davidkirwan · 2020-01-06T09:58:25Z

@davidffrench @wei-lee @david-martin

It seems the grafana dashboard bundled with Keycloak Operator appears to work ok from what I can see. The prometheus rules all work for the most part also, with 2 exceptions:

Alert: "KeycloakInstanceNotAvailable"
Alert: "KeycloakDatabaseNotAvailable"

These two alerts will only function correctly, when the prometheus/grafana monitoring stack has access to kube state metrics.

davidffrench · 2020-01-06T10:10:59Z

@davidkirwan That is correct but sure really not be the case if at all possible. I was the one to write these and was very presumptuous of me to assume kube state metrics would always be available. However, that is probably outside the scope of your PR unless you think it can be done easily.

davidkirwan · 2020-01-06T10:18:25Z

@davidffrench yep this is something which could maybe be added as configuration option to the application monitoring operator stack at install time, but yes outside the scope of this PR.

I think this PR is good to go now, just need verification/approval please: @stianst , @abstractj @slaskawi

slaskawi · 2020-01-07T10:48:46Z

@abstractj @davidkirwan @davidffrench @stianst This one LGTM - ready to be merged. Once it gets in, please tag 8.0.1 release (remember about running set-version.sh script).

davidkirwan mentioned this pull request Dec 17, 2019

add support for podmonitor #110

Closed

4 tasks

davidkirwan changed the title ~~[WIP] Wei's changes minus the vendor changes~~ [WIP] Wei's add support for podmonitor Dec 17, 2019

slaskawi reviewed Dec 18, 2019

View reviewed changes

pkg/controller/keycloak/keycloak_reconciler.go Outdated Show resolved Hide resolved

INTLY-4129 Pod Monitor implementation

ee37774

davidkirwan changed the title ~~[WIP] Wei's add support for podmonitor~~ INTLY-4129 Pod Monitor implementation Dec 19, 2019

davidkirwan marked this pull request as ready for review December 20, 2019 14:10

davidffrench mentioned this pull request Dec 23, 2019

[WIP] New Keycloak Operator integr8ly/integreatly-operator#266

Closed

Updated to grafana-operator to v3.0.2

e5b6264

Updated grafana operator imports to v3 Renamed imports for grafana operator to grafanav1alpha1

abstractj requested a review from slaskawi January 6, 2020 13:21

abstractj self-assigned this Jan 6, 2020

slaskawi approved these changes Jan 7, 2020

View reviewed changes

abstractj approved these changes Jan 7, 2020

View reviewed changes

abstractj merged commit 69f79ad into keycloak:master Jan 7, 2020

davidkirwan deleted the add-pod-monitor branch January 7, 2020 10:57

obrienrobert mentioned this pull request Aug 17, 2020

INTLY-8357 Pod monitor removal #243

Merged

3 tasks

tux-o-matic mentioned this pull request Jan 31, 2022

Make KeycloakInstanceNotAvailable and KeycloakDatabaseNotAvailable alerts optional #470

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INTLY-4129 Pod Monitor implementation #114

INTLY-4129 Pod Monitor implementation #114

davidkirwan commented Dec 17, 2019 •

edited

Loading

slaskawi commented Dec 17, 2019

coveralls commented Dec 17, 2019 •

edited

Loading

slaskawi commented Dec 18, 2019 •

edited

Loading

david-martin commented Dec 18, 2019

davidkirwan commented Dec 20, 2019

wei-lee commented Dec 31, 2019 •

edited

Loading

davidffrench commented Jan 2, 2020

davidkirwan commented Jan 2, 2020

davidffrench commented Jan 2, 2020

davidkirwan commented Jan 6, 2020

davidffrench commented Jan 6, 2020

davidkirwan commented Jan 6, 2020

slaskawi commented Jan 7, 2020

INTLY-4129 Pod Monitor implementation #114

INTLY-4129 Pod Monitor implementation #114

Conversation

davidkirwan commented Dec 17, 2019 • edited Loading

JIRA ID

Additional Information

Verification Steps

Checklist:

Additional Notes

slaskawi commented Dec 17, 2019

coveralls commented Dec 17, 2019 • edited Loading

slaskawi commented Dec 18, 2019 • edited Loading

david-martin commented Dec 18, 2019

davidkirwan commented Dec 20, 2019

wei-lee commented Dec 31, 2019 • edited Loading

davidffrench commented Jan 2, 2020

davidkirwan commented Jan 2, 2020

davidffrench commented Jan 2, 2020

davidkirwan commented Jan 6, 2020

davidffrench commented Jan 6, 2020

davidkirwan commented Jan 6, 2020

slaskawi commented Jan 7, 2020

davidkirwan commented Dec 17, 2019 •

edited

Loading

coveralls commented Dec 17, 2019 •

edited

Loading

slaskawi commented Dec 18, 2019 •

edited

Loading

wei-lee commented Dec 31, 2019 •

edited

Loading