-
Notifications
You must be signed in to change notification settings - Fork 16.9k
[stable/redis-ha] improvement: refactor of redis-ha #7323
Conversation
a662a78
to
46b8dd5
Compare
/assign @unguiculus |
In my opinion there is no need to have a different masterGroupName than the previous installations (mymaster), saves a bit of configuring in applications making use of it if it is not changed. From your PR I have made a local fork to test if it solves our issues. I will update if I run into any new issues, but so far it seems to be running well. Thanks for your efforts so far! |
Thanks for the PR. But it seems not to work well in my cluster.
|
Fixes issues: Race condition with masters Announced service no longer working PVCs possibilites redis-ha doesn't failover properly Signed-off-by: Salim <salim.salaues@scality.com>
Signed-off-by: Salim <salim.salaues@scality.com>
dcf1be7
to
dfb6fdc
Compare
@KoviaX You are totally right, I had this configured for my own testing and accidentally left it in. I just pushed updates to fix this. Also rebased to sign off the commits and fixed merge conflicts. @ceshihao yeah it looks like your slaves are down for some reason which is why you're getting |
Yes, 3 pods were scheduled, and I can not find some reason from redis master/slave log.
And redis slave log
redis master log
|
@ceshihao from the logs, it looks to be working as intended. They are replicating to one another. |
Couple of problems we came across in testing:
|
@xsm74 In my experience, client libraries typically are able to discover the Redis master via the Sentinels. Since the Sentinels keep track of the redis master they can be queried for the current master and in the case of a failover will update the clients accordingly. And seems like whoever originally wrote the instructions must have been running on Mac, regarding the base64 discrepancy. |
command: ["redis-cli", "-p", "{{ .Values.sentinel.port }}", "ping"] | ||
initialDelaySeconds: 15 | ||
periodSeconds: 5 | ||
readiness: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be readinessProbe
command: ["redis-cli", "ping"] | ||
initialDelaySeconds: 15 | ||
periodSeconds: 5 | ||
readiness: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
@unguiculus I removed them for a couple reasons:
If it makes it easier to merge, then I will drop that commit as that is not the objective of this PR and just an attempt to help more in the charts community. |
OK, I guess it's fine then given they haven't objected. After all, this PR has been open for quite some time. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ssalaues, unguiculus The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks @unguiculus |
@lakano yeah I just did some basic functionality test with 5.0 and seems to be working as expected |
* improvement: refactor of redis-ha Fixes issues: Race condition with masters Announced service no longer working PVCs possibilites redis-ha doesn't failover properly Signed-off-by: Salim <salim.salaues@scality.com> * cleanup and default to mymaster Signed-off-by: Salim <salim.salaues@scality.com> * fixes: requested changes readiness typo, move security context to pod level, and remove -x flag from init script Signed-off-by: Salim <salim.salaues@scality.com> * fix sentinel var name Signed-off-by: Salim <salim.salaues@scality.com> * docs: upgrade notes and refinements Signed-off-by: Salim <salim.salaues@scality.com> * improvement: update docs, various fixes, simpler init script, better resiliency fix notes force failover in the event that the existing master is not accessible simpler script, better failover, values update fix corner cases where upgrades can fail fix auth issue Cleanup unused code Default to auth disabled to prevent a new password to be generated on each upgrade Signed-off-by: Salim <salim.salaues@scality.com> * Fixed auth issues Switched 'exit 0' to 'return 0' so that auth can be configured correctly. Fixed `SENTINEl` typo Signed-off-by: Salim <salim.salaues@scality.com> * fixes: update with best practices Signed-off-by: Salim <salim.salaues@scality.com> * updates: improved doc, consistent style, and added options for custom configmap files Signed-off-by: Salim <salim.salaues@scality.com> * add auth info to README Signed-off-by: Salim <salim.salaues@scality.com> Signed-off-by: Patrick Montanari <patrick.montanari@gmail.com>
The fact we can't access the master directly, via a service, is a bit problematic, as @xsm74 mentioned. Since we should be using the services, rather than the pods themselves, to connect to the master to write. @ssalaues, is there a way to get our clients to work well while connecting to the service? |
Hello, |
Apologies @pmontanari, mis-copied, fixed now |
@alexvicegrab While there is no "master only" service after this PR, have you tried to point your application to the sentinel port of the Redis service? Most redis libraries support the use of sentinels and simply need to be pointed to the sentinel port itself instead of the redis. If the library natively supports sentinel, it will simply query any sentinel to return the ip:port of the current master. This can allow for failover at the application level and not only the Kubernetes level. I still like the idea of the "master only" service but will need some more thought before implementing as there were some issues with the prior implementation. |
How about a binary (may be in Go) that listens for events from sentinel, and changes the tag on a pod, coupled with a service which targets the tagged pod (similar to the v2 of this chart)? But I agree with using the load-balanced kube service to sentinel instances as the preferred way. |
Thanks @ssalaues, I'll try to test with the Sentinels. |
@ssalaues If clients running outside Kubernetes want to access Redis inside Kubernetes, the internal address of K8 pods returned by sentinels cannot be used by the clients. Is such a use case supported ? As the clients can interact with K8 services from outside easily. |
@alexvicegrab @rainhacker we have been using an HAProxy based K8s service to expose the redis cluster to external clients and also to provide master or slave only access for "dumb" clients. You can see the chart in this fork. If worthy I can see to make a PR. |
@rainhacker currently there is no support within the chart for this use case but I think the haproxy or similar method would be a good approach. |
@prodriguezdefino @ssalaues I was trying to use the haproxy-redis chart that @prodriguezdefino linked but without any success so far, my knowledge of how haproxy works is currently limited, so sorry for this question in advance, I just want to make sure I'm going in the right direction. I guess the haproxy-redis chart is not meant to use as-is and was built for your specific needs right (because sentinel doesn't seem to be exposed in it) ? Could you let me know if I've got the correct idea in order to have it work with the current redis-ha implementation ? My main point being that I don't have to talk to the master/slaves directly, my redis client knows how to talk to sentinel:
Does that make sense? I think it could be worth it to include some sort of documentation for this use case, from my current experience this is a common way to setup Redis especially on large projects with geo redundancy (and I guess when people are looking at Redis-HA it's usually due to some specific needs for a resilient and large scale architecture). And by the way, thanks for this refactor, I was using the chart in the initial version and the way it works now is much better in my opinion. |
Let me see if I can answer your questions @Albi34 : I guess the haproxy-redis chart is not meant to use as-is and was built for your specific needs right (because sentinel doesn't seem to be exposed in it)? Could you let me know if I've got the correct idea in order to have it work with the current redis-ha implementation ? Of course this should work when configured correctly, so maybe if you want to, you can open an issue on my repo and I can help you there =). |
* improvement: refactor of redis-ha Fixes issues: Race condition with masters Announced service no longer working PVCs possibilites redis-ha doesn't failover properly Signed-off-by: Salim <salim.salaues@scality.com> * cleanup and default to mymaster Signed-off-by: Salim <salim.salaues@scality.com> * fixes: requested changes readiness typo, move security context to pod level, and remove -x flag from init script Signed-off-by: Salim <salim.salaues@scality.com> * fix sentinel var name Signed-off-by: Salim <salim.salaues@scality.com> * docs: upgrade notes and refinements Signed-off-by: Salim <salim.salaues@scality.com> * improvement: update docs, various fixes, simpler init script, better resiliency fix notes force failover in the event that the existing master is not accessible simpler script, better failover, values update fix corner cases where upgrades can fail fix auth issue Cleanup unused code Default to auth disabled to prevent a new password to be generated on each upgrade Signed-off-by: Salim <salim.salaues@scality.com> * Fixed auth issues Switched 'exit 0' to 'return 0' so that auth can be configured correctly. Fixed `SENTINEl` typo Signed-off-by: Salim <salim.salaues@scality.com> * fixes: update with best practices Signed-off-by: Salim <salim.salaues@scality.com> * updates: improved doc, consistent style, and added options for custom configmap files Signed-off-by: Salim <salim.salaues@scality.com> * add auth info to README Signed-off-by: Salim <salim.salaues@scality.com> Signed-off-by: Ben Drucker <bvdrucker@gmail.com>
* improvement: refactor of redis-ha Fixes issues: Race condition with masters Announced service no longer working PVCs possibilites redis-ha doesn't failover properly Signed-off-by: Salim <salim.salaues@scality.com> * cleanup and default to mymaster Signed-off-by: Salim <salim.salaues@scality.com> * fixes: requested changes readiness typo, move security context to pod level, and remove -x flag from init script Signed-off-by: Salim <salim.salaues@scality.com> * fix sentinel var name Signed-off-by: Salim <salim.salaues@scality.com> * docs: upgrade notes and refinements Signed-off-by: Salim <salim.salaues@scality.com> * improvement: update docs, various fixes, simpler init script, better resiliency fix notes force failover in the event that the existing master is not accessible simpler script, better failover, values update fix corner cases where upgrades can fail fix auth issue Cleanup unused code Default to auth disabled to prevent a new password to be generated on each upgrade Signed-off-by: Salim <salim.salaues@scality.com> * Fixed auth issues Switched 'exit 0' to 'return 0' so that auth can be configured correctly. Fixed `SENTINEl` typo Signed-off-by: Salim <salim.salaues@scality.com> * fixes: update with best practices Signed-off-by: Salim <salim.salaues@scality.com> * updates: improved doc, consistent style, and added options for custom configmap files Signed-off-by: Salim <salim.salaues@scality.com> * add auth info to README Signed-off-by: Salim <salim.salaues@scality.com>
What this PR does / why we need it:
There's many issues with this chart and the simple fact is that in reality it is not highly-available as stated in the chart description/name. This refactor brings a simpler approach to a redis master/slave configuration with sentinel management as the sentinels are simply deployed as sidecars containers to each redis.
This provides native redis management, failover, and election
This also removes dependencies on very specific redis images thus allowing for use of any redis images. All init scripting is now managed from a configmap within this chart.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #fixes #5441, fixes #3197, fixes #3403, fixes #2780, fixes #8240, fixes #8062, fixes #7968
Special notes for your reviewer:
Clarification edit:
~~
From my understanding of the redis-ha chart in its current state, it uses docker images built from the repo here https://github.com/smileisak/docker-images/tree/master/redis/alpine which uses an assortment of scripts to update labels, find master/slaves, and start elections/promotions via kubectl commands, which require the appropriate roles and accounts to function in a typical RBAC enabled environment. Which is super cool in theory but has been the source of many issues (multiple masters being labeled and no failover as the primary ones I personally encountered)
The approach I took in this refactor is more of a redis native approach where I tried to remove much of the complexity of the scripts and allow all the election/promotion to be done through the redis-sentinels with a small init script hosted here as a configmap. I feel like this also makes it easier to maintain this chart and allows it to be more dynamic as it can be used with the official redis image or any image really. While I don't think it's perfect, I think this puts the chart in much more of a stable category than it's current state.
As a result I removed RBAC roles and accounts however if they are necessary for other aspects that I did not encounter or immediately see please feel free to point it out.
~~