-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Configure JGroups like DB, in entrypoint #100
Conversation
fails to load database driver. Might require a module update.
http://www.jgroups.org/manual/index.html#_jdbc_ping cautions against this because it increases database traffic but JGroups is designed for thousands of replicas and that'll rarely be the case with something like keycloak.
as it won't be worth the trouble of adding module jars, and maintaining duplicate connection strings. JNDI reuse is probably quite robust now despite the failing shutdown hook, thanks to clear_table_on_view_change
@@ -25,5 +25,10 @@ else | |||
echo "[KEYCLOAK DOCKER IMAGE] Using the embedded H2 database" | |||
fi | |||
|
|||
if [ "$JGROUPS_SETUP" != "" ]; then | |||
echo "[KEYCLOAK DOCKER IMAGE] Using custom JGroups setup $JGROUPS_SETUP" | |||
/bin/sh /opt/jboss/keycloak/bin/change-jgroups.sh $JGROUPS_SETUP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't forward OS signals: I gave it a try and didn't configure the DNS properly, and the Docker container got stuck in the "Looking up initial hosts for DNS name" loop. It couldn't be stopped with a SIGTERM, since bash doesn't forward signals to child processes.
I exchanged this line with the following lines and was able to stop the Docker image properly when stuck in the initial discovery phase.
trap 'kill -TERM $CHILD_PID; exit 1' TERM INT
/opt/jboss/keycloak/bin/change-jgroups.sh "$JGROUPS_SETUP" &
CHILD_PID=$!
wait $CHILD_PID
trap - TERM INT
wait $CHILD_PID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think I just copied this from how change-database.sh
is invoked. I'd suggest a switch to /bin/bash
, but I would prioritize similarity with change-database.
Running this solution in production since a month or two, on average three keycloak instances, without any issue. |
Hello Staffan, Is is intentional that you only set the cache owners attribute for the caches
Also, I found that we did not have to expose ports 9090 (management), 4712 (txn-recovery-ev) and 4713 (txn-status-mgr) to get the cluster working, is that correct? |
Got this from #92 (comment). I haven't investigated further.
Regarding 9090 it depends on if you want to do management from other pods. For the two txn ports I don't know. I started out exposing every port to eliminate that as a source of jgroups errors, so it's quite likely that those are not needed. |
In the manual http://www.keycloak.org/docs/3.4/server_installation/index.html#cache it sounds like the caches should all be replicated (and have multiple owners). |
Thanks a lot for this! This has been working great for me for months (upgraded to Keycloak 3.4.3 of course). Using the following Deployment yaml on GKE with the nginx ingress controller handling HTTPS:
The liveness probe looks for a deadlock we've been having about once a week. Haven't been able to find the cause of it yet, but it seems to occur in server-to-server OIDC communication as we're using one realm as an identity provider for another realm on the same Keycloak server. |
@MCalverley Do you run with the additional cache_owners=2 attribute that @hannesrohde suggests? We're also in production since a couple of months. Good that you've added memory limits and reduced the number of ports to expose. In
|
@solsson, yes, I set cache owners as part of the environment variables:
I think it works correctly since I at least see it rebalancing to two cache owners when it restarts because of that deadlock bug I mentioned. |
A bit of an update from the Keycloak team. We finally have managed to set aside some time to start looking at this issue. I like having both DNS_PING and KUBE_PING available and this is a great starting point for that. We are also working on upgrading to WildFly 13 and hopefully Keycloak 4.0.0.Final will be upgraded to that. That should provide make it much easier to also provide KUBE_PING support. Will get around to reviewing this PR in detail soon. Hopefully next week. |
Good news, good work @stianst. I guess this means you'll only need some kind of kubernetes manifest example? |
What I'm thinking is to review this PR. Make sure it gets to a mergable state. Merge that. Then add KUBE_PING as well and we can have the option to switch between the two like we do for DBs today. I'd like to await having Keycloak on WF 13 though first so we can check it works properly there. Also, need to make sure the README file covers all the env values for clustering setup. |
Very excited about this support, if it gets merged. It's pretty much the only blocker to being able to use this official KeyCloak image in HA mode. Without this, you need to roll your own to enable the *_PING support. |
Some more news from the Keycloak team. WildFly 13 did not make it into 4.0.0.Final. It's currently targeted to 4.1.0.Final. Then there is also PTOs coming up soon, so most likely it will take at least another month until we can pick up this work and review it properly. |
Oh well. At least it's easy to roll your own image. For what it's worth, I de-conflicted @solsson 's branch here: https://github.com/pete-woods/keycloak/tree/server-jgroups-setup |
An update from the Keycloak team with regards to clustering support in the Docker image. Keycloak 4.4.0.Final will be upgraded to WildFly 13 Final which has a number of improvements around discovery mechanisms and will be much simpler to enable clustering with it. We also now have an previous Infinispan team member on our team who will be looking at this after 4.4 is out. The aim will be to have easy support for clustering on standalone Docker, Kubernetes and OpenShift out of the box. |
From @slaskawi it looks like DNS_PING will handle both OpenShift and Kubernetes use-cases. What we do for standalone Docker is still to be decided. |
Something that works well in AWS's ECS would be ideal. I've been using the |
I'm guessing DNS_PING doesn't work in S3 unless you're running on top of Kubernetes or OpenShift? |
When I was looking at setting this up originally, it looked like getting DNS_PING would require more work on the AWS side, whereas AWS's database tech (RDS and RDS clusters) worked with JDBC_PING out of the box with no config changes. |
If you are interested, an alternative is to use S3_PING into aws. It is better to use S3_PING instead of JDBC_PING, because jdbc_ping will overload your DATABASE with many queries. |
Thanks for the push in the right direction, folks. Seems like ECS supports DNS based service discovery: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service-discovery.html |
Thanks a lot for the contribution! We decided to refactor the HA part a bit. In #151 we proposed a universal mechanism for using different discovery protocols. In the Pull Request we uploaded a test image so that you can play with it. Please have a look at the Pull Request and put your comments there. |
This PR is based on my interpretation of the discussion in #92, in particular the remarks from @stianst:
jboss/keycloak
imageThe structure for JGroups is heavily inspired by #96. Thanks @rayscunningham, I think the two PRs could easily be merged.
Now, my chief insight from #92 and #94 was that DNS_PING is significantly less complex than both JDBC_PING and KUBE_PING. Here I've actually implemented something similar with JGroups 3 in a couple of lines of bash. To use it:
JGROUPS_SETUP
env toTCPPING
which basically only triggers the execution of getent hosts $JGROUPS_DNS_NAME.I seem to get faster cluster joins with this strategy, and no records left behind at instance shutdown.
For those who want to stick with JDBC_PING, set
JGROUPS_SETUP
env toJDBC_PING
. It's sligtly improved compared to #92, with less waiting for JGroups to ping exited instances, thanks to clear_table_on_view_change.Note that #95 hasn't been merged, so logging to console in thits PR is always at or above INFO level.