Skip to content
This repository has been archived by the owner on Nov 16, 2022. It is now read-only.

INTLY-4129 Pod Monitor implementation #114

Merged
merged 2 commits into from
Jan 7, 2020
Merged

INTLY-4129 Pod Monitor implementation #114

merged 2 commits into from
Jan 7, 2020

Conversation

davidkirwan
Copy link
Contributor

@davidkirwan davidkirwan commented Dec 17, 2019

JIRA ID

https://issues.redhat.com/browse/INTLY-4129

Additional Information

This PR adds another monitor resource: PodMonitor. It is useful when there are multiple Keycloak pods running at the same time in a cluster (HA mode).

Verification Steps

  1. Checkout this branch add-pod-monitor on my fork git@github.com:davidkirwan/keycloak-operator.git
  2. apply all the CRDs etc: make cluster/prepare
  3. Deploy the operator locally: operator-sdk up local --namespace "keycloak"
  4. Modify the Keycloak CR to increase the instances to 3:
diff --git a/deploy/examples/keycloak/keycloak.yaml b/deploy/examples/keycloak/keycloak.yaml
index d5d9d24..5d23118 100644
--- a/deploy/examples/keycloak/keycloak.yaml
+++ b/deploy/examples/keycloak/keycloak.yaml
@@ -5,8 +5,8 @@ metadata:
   labels:
     app: sso
 spec:
-  instances: 1
+  instances: 3
   extensions:
     - https://github.com/aerogear/keycloak-metrics-spi/releases/download/1.0.4/keycloak-metrics-spi-1.0.4.jar
   externalAccess:
-    enabled: True
\ No newline at end of file
+    enabled: True
  1. label the keycloak namespace. oc label namespace keycloak monitoring-key=middleware
  2. Install application monitoring stack - https://github.com/integr8ly/application-monitoring-operator
  3. Then apply the Keycloak CR: oc apply -f deploy/examples/keycloak/keycloak.yaml
  4. Then apply the Keycloak Realm CR: oc apply -f deploy/examples/realm/basic_realm.yaml
  5. Verify that the PodMonitor created has the monitoring-key: middleware label
  6. Check for metrics being available related to keycloak in the prometheus instance created by the application monitoring operator, eg: keycloak_request_duration_bucket

Expect log messages such as the following:

{"level":"info","ts":1576850553.5705435,"logger":"controller_keycloak","msg":"Reconciling Keycloak","Request.Namespace":"keycloak","Request.Name":"example-keycloak"}
{"level":"info","ts":1576850553.6047091,"logger":"action_runner","msg":"(    0)    SUCCESS Update Keycloak admin secret"}
{"level":"info","ts":1576850553.640096,"logger":"action_runner","msg":"(    1)    SUCCESS update keycloak prometheus rule"}
{"level":"info","ts":1576850553.6730623,"logger":"action_runner","msg":"(    2)    SUCCESS update keycloak service monitor"}
{"level":"info","ts":1576850553.7066567,"logger":"action_runner","msg":"(    3)    SUCCESS update keycloak pod monitor"}
{"level":"info","ts":1576850553.7443936,"logger":"action_runner","msg":"(    4)    SUCCESS update keycloak grafana dashboard"}
{"level":"info","ts":1576850553.7768626,"logger":"action_runner","msg":"(    5)    SUCCESS Update Database Secret"}
{"level":"info","ts":1576850553.812751,"logger":"action_runner","msg":"(    6)    SUCCESS Update Postgresql PersistentVolumeClaim"}
{"level":"info","ts":1576850553.846172,"logger":"action_runner","msg":"(    7)    SUCCESS Update Postgresql Deployment"}
{"level":"info","ts":1576850553.8792517,"logger":"action_runner","msg":"(    8)    SUCCESS Update Postgresql KeycloakService"}
{"level":"info","ts":1576850553.9124768,"logger":"action_runner","msg":"(    9)    SUCCESS Update keycloak Service"}
{"level":"info","ts":1576850553.9456978,"logger":"action_runner","msg":"(   10)    SUCCESS Update keycloak Discovery Service"}
{"level":"info","ts":1576850553.9791303,"logger":"action_runner","msg":"(   11)    SUCCESS Update Keycloak Deployment (StatefulSet)"}
{"level":"info","ts":1576850554.0188434,"logger":"action_runner","msg":"(   12)    SUCCESS Update Keycloak Route"}
{"level":"info","ts":1576850554.0523052,"logger":"controller_keycloak","msg":"desired cluster state met"}

Should see a PodMonitor created in the keycloak namespace:
Screenshot from 2019-12-20 14-04-58

Should see metrics from the multiple keycloak pods:
Screenshot from 2019-12-20 14-10-01

Checklist:

  • Verified by team member
  • Comments where necessary
  • Automated Tests
  • Documentation changes if necessary

Additional Notes

@slaskawi
Copy link
Contributor

Travis seems to be sad... very sad :)

@davidkirwan davidkirwan mentioned this pull request Dec 17, 2019
4 tasks
@coveralls
Copy link

coveralls commented Dec 17, 2019

Coverage Status

Coverage decreased (-0.4%) to 41.962% when pulling e5b6264 on davidkirwan:add-pod-monitor into 7ed3df1 on keycloak:master.

@davidkirwan davidkirwan changed the title [WIP] Wei's changes minus the vendor changes [WIP] Wei's add support for podmonitor Dec 17, 2019
@slaskawi
Copy link
Contributor

slaskawi commented Dec 18, 2019

@davidkirwan Overall, it looks good to me. Please also implement/change the following bits:

  • I added one, small comment - please address it.
  • please rename commit messages to something more descriptive, like: INTLY-4129 Pod Monitor implementation
  • Please provide a short description (in the PR) how it should work and how to test it.
  • Please convert this PR to a normal one (not a draft)
  • Please let @stianst and @abstractj know, that this one is ready to be merged.

Just FYI - I'll be on sick leave and then on PTO, so I probably won't be able to look more into it. But it looks fine, I'd like to have it in 8.0.1.

@david-martin
Copy link
Contributor

@davidkirwan 2 questions.

  • Should the ServiceMonitor no longer be created now that a PodMonitor is being created?
  • Should the dashboard and any alerts be updated to allow for aggregating metrics from multiple pods?

@davidkirwan davidkirwan changed the title [WIP] Wei's add support for podmonitor INTLY-4129 Pod Monitor implementation Dec 19, 2019
@davidkirwan davidkirwan marked this pull request as ready for review December 20, 2019 14:10
@davidkirwan
Copy link
Contributor Author

Hi @stianst and @abstractj I think this PR is ready to be looked at again!

@david-martin I think there will be some fallout related to the switch from ServiceMonitors to PodMonitors, I don't know enough at this point to say how much work it will be to modify our RHMI dashboards though.

@wei-lee
Copy link

wei-lee commented Dec 31, 2019

@david-martin

Should the ServiceMonitor no longer be created now that a PodMonitor is being created?

No, I don't think so. They are for monitoring different objects: one for service (a load balancer), and the other is for pods.

Should the dashboard and any alerts be updated to allow for aggregating metrics from multiple pods?

yes, I think so.

@davidffrench
Copy link
Contributor

@davidkirwan @wei-lee Which one of you is actively working on this ticket now? It looks like there are some additional updates to be made to the alerts and dashboards.

The addition of the new keycloak operator to RHMI is currently blocked until there is a new release of the keycloak operator which includes #112 . One option would be to release as is and then included these changes in another future release. @stianst @slaskawi What is the dates for future keycloak releases including the operator? or are they ad-hoc when needed?

@davidkirwan
Copy link
Contributor Author

@davidffrench
Copy link
Contributor

Excellent, sounds good @davidkirwan

Updated grafana operator imports to v3
Renamed imports for grafana operator to grafanav1alpha1
@davidkirwan
Copy link
Contributor Author

@davidffrench @wei-lee @david-martin

It seems the grafana dashboard bundled with Keycloak Operator appears to work ok from what I can see. The prometheus rules all work for the most part also, with 2 exceptions:

Alert: "KeycloakInstanceNotAvailable"
Alert: "KeycloakDatabaseNotAvailable"

These two alerts will only function correctly, when the prometheus/grafana monitoring stack has access to kube state metrics.

@davidffrench
Copy link
Contributor

@davidkirwan That is correct but sure really not be the case if at all possible. I was the one to write these and was very presumptuous of me to assume kube state metrics would always be available. However, that is probably outside the scope of your PR unless you think it can be done easily.

@davidkirwan
Copy link
Contributor Author

@davidffrench yep this is something which could maybe be added as configuration option to the application monitoring operator stack at install time, but yes outside the scope of this PR.

I think this PR is good to go now, just need verification/approval please: @stianst , @abstractj @slaskawi

@abstractj abstractj self-assigned this Jan 6, 2020
@slaskawi
Copy link
Contributor

slaskawi commented Jan 7, 2020

@abstractj @davidkirwan @davidffrench @stianst This one LGTM - ready to be merged. Once it gets in, please tag 8.0.1 release (remember about running set-version.sh script).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants