Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[stable/redis-ha] improvement: refactor of redis-ha #7323

Merged
merged 11 commits into from
Oct 31, 2018

Conversation

ssalaues
Copy link
Collaborator

@ssalaues ssalaues commented Aug 23, 2018

What this PR does / why we need it:
There's many issues with this chart and the simple fact is that in reality it is not highly-available as stated in the chart description/name. This refactor brings a simpler approach to a redis master/slave configuration with sentinel management as the sentinels are simply deployed as sidecars containers to each redis.

This provides native redis management, failover, and election

This also removes dependencies on very specific redis images thus allowing for use of any redis images. All init scripting is now managed from a configmap within this chart.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #
fixes #5441, fixes #3197, fixes #3403, fixes #2780, fixes #8240, fixes #8062, fixes #7968

Special notes for your reviewer:

Clarification edit:
~~
From my understanding of the redis-ha chart in its current state, it uses docker images built from the repo here https://github.com/smileisak/docker-images/tree/master/redis/alpine which uses an assortment of scripts to update labels, find master/slaves, and start elections/promotions via kubectl commands, which require the appropriate roles and accounts to function in a typical RBAC enabled environment. Which is super cool in theory but has been the source of many issues (multiple masters being labeled and no failover as the primary ones I personally encountered)

The approach I took in this refactor is more of a redis native approach where I tried to remove much of the complexity of the scripts and allow all the election/promotion to be done through the redis-sentinels with a small init script hosted here as a configmap. I feel like this also makes it easier to maintain this chart and allows it to be more dynamic as it can be used with the official redis image or any image really. While I don't think it's perfect, I think this puts the chart in much more of a stable category than it's current state.

As a result I removed RBAC roles and accounts however if they are necessary for other aspects that I did not encounter or immediately see please feel free to point it out.
~~

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 23, 2018
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 23, 2018
@ssalaues ssalaues force-pushed the redis-ha-refactor branch 3 times, most recently from a662a78 to 46b8dd5 Compare August 23, 2018 23:48
@ssalaues ssalaues changed the title [stable/redis-ha] improvment: refactor of redis-ha [stable/redis-ha] improvement: refactor of redis-ha Aug 23, 2018
@ssalaues
Copy link
Collaborator Author

/assign @unguiculus
/assign @scottrigby

@KoviaX
Copy link
Contributor

KoviaX commented Aug 27, 2018

In my opinion there is no need to have a different masterGroupName than the previous installations (mymaster), saves a bit of configuring in applications making use of it if it is not changed.

From your PR I have made a local fork to test if it solves our issues. I will update if I run into any new issues, but so far it seems to be running well. Thanks for your efforts so far!

@mattfarina mattfarina added the Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). label Aug 27, 2018
@ceshihao
Copy link
Contributor

ceshihao commented Sep 3, 2018

Thanks for the PR.

But it seems not to work well in my cluster.
I install the chart by helm install stable/redis-ha -n dayu-test, but the slaves are s_down and No suitable slave to promote.

kubectl exec -it dayu-test-redis-ha-server-0 bash -n default
Defaulting container name to redis.
Use 'kubectl describe pod/dayu-test-redis-ha-server-0 -n default' to see all of the containers in this pod.
I have no name!@dayu-test-redis-ha-server-0:/data$ redis-cli -p 26379
127.0.0.1:26379> sentinel master mymaster
(error) ERR No such master with that name
127.0.0.1:26379> sentinel master zenko
 1) "name"
 2) "zenko"
 3) "ip"
 4) "10.18.39.6"
 5) "port"
 6) "6379"
 7) "runid"
 8) "a3f918a1e9f7d9455d6fe2e04de31404a6e66e0f"
 9) "flags"
10) "master"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "640"
19) "last-ping-reply"
20) "640"
21) "down-after-milliseconds"
22) "10000"
23) "info-refresh"
24) "741"
25) "role-reported"
26) "master"
27) "role-reported-time"
28) "291756"
29) "config-epoch"
30) "0"
31) "num-slaves"
32) "2"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
37) "failover-timeout"
38) "180000"
39) "parallel-syncs"
40) "5"
127.0.0.1:26379> sentinel slaves zenko
1)  1) "name"
    2) "10.18.57.0:6379"
    3) "ip"
    4) "10.18.57.0"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave,disconnected"
   11) "link-pending-commands"
   12) "3"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "290992"
   17) "last-ok-ping-reply"
   18) "290992"
   19) "last-ping-reply"
   20) "290992"
   21) "s-down-time"
   22) "280933"
   23) "down-after-milliseconds"
   24) "10000"
   25) "info-refresh"
   26) "1535968470392"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "290992"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
2)  1) "name"
    2) "10.18.40.0:6379"
    3) "ip"
    4) "10.18.40.0"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave,disconnected"
   11) "link-pending-commands"
   12) "3"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "290994"
   17) "last-ok-ping-reply"
   18) "290994"
   19) "last-ping-reply"
   20) "290994"
   21) "s-down-time"
   22) "280934"
   23) "down-after-milliseconds"
   24) "10000"
   25) "info-refresh"
   26) "1535968470393"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "290994"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
127.0.0.1:26379> sentinel failover zenko
(error) NOGOODSLAVE No suitable slave to promote
127.0.0.1:26379> exit

@ey-bot ey-bot removed the Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). label Sep 4, 2018
Salim added 2 commits September 4, 2018 10:34
Fixes issues:
Race condition with masters
Announced service no longer working
PVCs possibilites
redis-ha doesn't failover properly

Signed-off-by: Salim <salim.salaues@scality.com>
Signed-off-by: Salim <salim.salaues@scality.com>
@ey-bot ey-bot added the Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). label Sep 4, 2018
@ssalaues
Copy link
Collaborator Author

ssalaues commented Sep 4, 2018

@KoviaX You are totally right, I had this configured for my own testing and accidentally left it in. I just pushed updates to fix this.

Also rebased to sign off the commits and fixed merge conflicts.

@ceshihao yeah it looks like your slaves are down for some reason which is why you're getting NOGOODSLAVE No suitable slave to promote. Have all 3 pods been scheduled (at least for default replica count)? What do the logs say for the two slave pods?

@ceshihao
Copy link
Contributor

ceshihao commented Sep 5, 2018

@ssalaues

Yes, 3 pods were scheduled, and I can not find some reason from redis master/slave log.

helm status dayu-test
LAST DEPLOYED: Wed Sep  5 03:07:12 2018
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME                TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)             AGE
dayu-test-redis-ha  ClusterIP  None        <none>       6379/TCP,26379/TCP  34m

==> v1/StatefulSet
NAME                       DESIRED  CURRENT  AGE
dayu-test-redis-ha-server  3        3        34m

==> v1beta1/PodDisruptionBudget
NAME                    MIN AVAILABLE  MAX UNAVAILABLE  ALLOWED DISRUPTIONS  AGE
dayu-test-redis-ha-pdb  N/A            1                1                    34m

==> v1/Pod(related)
NAME                         READY  STATUS   RESTARTS  AGE
dayu-test-redis-ha-server-0  2/2    Running  0         34m
dayu-test-redis-ha-server-1  2/2    Running  0         34m
dayu-test-redis-ha-server-2  2/2    Running  0         34m

==> v1/ConfigMap
NAME                          DATA  AGE
dayu-test-redis-ha-configmap  3     34m


NOTES:
Redis cluster can be accessed via port 6379 on the following DNS name from within your cluster:
dayu-test-redis-ha.default.svc.cluster.local

And redis slave log

kubectl logs dayu-test-redis-ha-server-1 redis
1:C 05 Sep 03:07:56.176 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 05 Sep 03:07:56.199 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 05 Sep 03:07:56.199 # Configuration loaded
1:S 05 Sep 03:07:56.200 # Not listening to IPv6: unsupproted
1:S 05 Sep 03:07:56.201 * Running mode=standalone, port=6379.
1:S 05 Sep 03:07:56.201 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 05 Sep 03:07:56.201 # Server initialized
1:S 05 Sep 03:07:56.201 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 05 Sep 03:07:56.201 * Ready to accept connections
1:S 05 Sep 03:07:56.201 * Connecting to MASTER 10.18.112.4:6379
1:S 05 Sep 03:07:56.201 * MASTER <-> SLAVE sync started
1:S 05 Sep 03:07:56.202 * Non blocking connect for SYNC fired the event.
1:S 05 Sep 03:07:56.203 * Master replied to PING, replication can continue...
1:S 05 Sep 03:07:56.203 * Partial resynchronization not possible (no cached master)
1:S 05 Sep 03:08:02.953 * Full resync from master: 0d0678eb209be726961e8ba3fbdc45ace192a4fd:859
1:S 05 Sep 03:08:02.955 * MASTER <-> SLAVE sync: receiving streamed RDB from master
1:S 05 Sep 03:08:02.955 * MASTER <-> SLAVE sync: Flushing old data
1:S 05 Sep 03:08:02.955 * MASTER <-> SLAVE sync: Loading DB in memory
1:S 05 Sep 03:08:02.955 * MASTER <-> SLAVE sync: Finished with success

redis master log

kubectl logs dayu-test-redis-ha-server-0 redis
1:C 05 Sep 03:07:30.858 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 05 Sep 03:07:30.887 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 05 Sep 03:07:30.887 # Configuration loaded
1:M 05 Sep 03:07:30.889 # Not listening to IPv6: unsupproted
1:M 05 Sep 03:07:30.890 * Running mode=standalone, port=6379.
1:M 05 Sep 03:07:30.890 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 05 Sep 03:07:30.890 # Server initialized
1:M 05 Sep 03:07:30.890 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 05 Sep 03:07:30.890 * Ready to accept connections
1:M 05 Sep 03:07:56.203 * Slave 10.18.14.0:6379 asks for synchronization
1:M 05 Sep 03:07:56.203 * Full resync requested by slave 10.18.14.0:6379
1:M 05 Sep 03:07:56.204 * Delay next BGSAVE for diskless SYNC
1:M 05 Sep 03:08:02.952 * Starting BGSAVE for SYNC with target: slaves sockets
1:M 05 Sep 03:08:02.953 * Background RDB transfer started by pid 30
30:C 05 Sep 03:08:02.954 * RDB: 6 MB of memory used by copy-on-write
1:M 05 Sep 03:08:03.054 * Background RDB transfer terminated with success
1:M 05 Sep 03:08:03.054 # Slave 10.18.14.0:6379 correctly received the streamed RDB file.
1:M 05 Sep 03:08:03.054 * Streamed RDB transfer with slave 10.18.14.0:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 05 Sep 03:08:03.215 * Synchronization with slave 10.18.14.0:6379 succeeded
1:M 05 Sep 03:08:13.695 * Slave 10.18.5.0:6379 asks for synchronization
1:M 05 Sep 03:08:13.695 * Full resync requested by slave 10.18.5.0:6379
1:M 05 Sep 03:08:13.695 * Delay next BGSAVE for diskless SYNC
1:M 05 Sep 03:08:19.990 * Starting BGSAVE for SYNC with target: slaves sockets
1:M 05 Sep 03:08:19.990 * Background RDB transfer started by pid 46
46:C 05 Sep 03:08:19.992 * RDB: 8 MB of memory used by copy-on-write
1:M 05 Sep 03:08:20.091 * Background RDB transfer terminated with success
1:M 05 Sep 03:08:20.091 # Slave 10.18.5.0:6379 correctly received the streamed RDB file.
1:M 05 Sep 03:08:20.091 * Streamed RDB transfer with slave 10.18.5.0:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 05 Sep 03:08:20.708 * Synchronization with slave 10.18.5.0:6379 succeeded

@ssalaues
Copy link
Collaborator Author

ssalaues commented Sep 5, 2018

@ceshihao from the logs, it looks to be working as intended. They are replicating to one another.

@xsm74
Copy link

xsm74 commented Sep 6, 2018

Couple of problems we came across in testing:

  1. How do we connect to the master? The previous version had a seperate service. Tried running this an 2/3s of the time connected to a slave and was unable to write.

  2. "base64 -d" not "base64 -D" on Linux

@ssalaues
Copy link
Collaborator Author

ssalaues commented Sep 6, 2018

@xsm74 In my experience, client libraries typically are able to discover the Redis master via the Sentinels. Since the Sentinels keep track of the redis master they can be queried for the current master and in the case of a failover will update the clients accordingly.

And seems like whoever originally wrote the instructions must have been running on Mac, regarding the base64 discrepancy.

command: ["redis-cli", "-p", "{{ .Values.sentinel.port }}", "ping"]
initialDelaySeconds: 15
periodSeconds: 5
readiness:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be readinessProbe

command: ["redis-cli", "ping"]
initialDelaySeconds: 15
periodSeconds: 5
readiness:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@ey-bot ey-bot removed the Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). label Sep 7, 2018
@ssalaues
Copy link
Collaborator Author

@unguiculus I removed them for a couple reasons:

  1. Neither seems to be active in the charts community which is fine but there have been plenty of issues in which they were tagged in but unavailable to respond. Currently I am more active and available to help.

  2. As a result of this PR, the chart no longer relies on any code that they originally used and maintained through their personal repositories (specifically smileisak). Since there is not much resemblance to the chart they originally maintained other than in name, it makes sense that they wouldn't be the most relevantly informed on future issues or reviews (Not saying they couldn't)

If it makes it easier to merge, then I will drop that commit as that is not the objective of this PR and just an attempt to help more in the charts community.

@unguiculus
Copy link
Member

OK, I guess it's fine then given they haven't objected. After all, this PR has been open for quite some time.

@unguiculus
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ssalaues, unguiculus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2018
@k8s-ci-robot k8s-ci-robot merged commit d73ba50 into helm:master Oct 31, 2018
@ssalaues
Copy link
Collaborator Author

Thanks @unguiculus

@ssalaues
Copy link
Collaborator Author

@lakano yeah I just did some basic functionality test with 5.0 and seems to be working as expected

@ssalaues ssalaues deleted the redis-ha-refactor branch October 31, 2018 21:33
pmontanari pushed a commit to pmontanari/charts that referenced this pull request Nov 2, 2018
* improvement: refactor of redis-ha

Fixes issues:
Race condition with masters
Announced service no longer working
PVCs possibilites
redis-ha doesn't failover properly

Signed-off-by: Salim <salim.salaues@scality.com>

* cleanup and default to mymaster

Signed-off-by: Salim <salim.salaues@scality.com>

* fixes: requested changes

readiness typo, move security context to pod level, and remove -x flag from init script

Signed-off-by: Salim <salim.salaues@scality.com>

* fix sentinel var name

Signed-off-by: Salim <salim.salaues@scality.com>

* docs: upgrade notes and refinements

Signed-off-by: Salim <salim.salaues@scality.com>

* improvement: update docs, various fixes, simpler init script, better resiliency

fix notes

force failover in the event that the existing master is not accessible

simpler script, better failover, values update

fix corner cases where upgrades can fail

fix auth issue

Cleanup unused code

Default to auth disabled to prevent a new password to be generated on each upgrade

Signed-off-by: Salim <salim.salaues@scality.com>

* Fixed auth issues

Switched 'exit 0' to 'return 0' so that auth can be configured correctly.

Fixed `SENTINEl` typo

Signed-off-by: Salim <salim.salaues@scality.com>

* fixes: update with best practices

Signed-off-by: Salim <salim.salaues@scality.com>

* updates: improved doc, consistent style, and added options for custom configmap files

Signed-off-by: Salim <salim.salaues@scality.com>

* add auth info to README

Signed-off-by: Salim <salim.salaues@scality.com>
Signed-off-by: Patrick Montanari <patrick.montanari@gmail.com>
@alexvicegrab
Copy link
Collaborator

alexvicegrab commented Nov 5, 2018

The fact we can't access the master directly, via a service, is a bit problematic, as @xsm74 mentioned. Since we should be using the services, rather than the pods themselves, to connect to the master to write.

@ssalaues, is there a way to get our clients to work well while connecting to the service?

@pmontanari
Copy link

The fact we can't access the master directly, via a service, is a bit problematic, as @xsm74 mentioned. Since we should be using the services, rather than the pods themselves, to connect to the master to write.

@pmontanari, is there a way to get our clients to work well while connecting to the service?

Hello,
Sorry, My commit (PR) was not related to redis but prometheus-operator.
Unfortunately I messed-up somewhere and had a lot of other changes merged with my PR so I closed it.

@alexvicegrab
Copy link
Collaborator

Apologies @pmontanari, mis-copied, fixed now

@ssalaues
Copy link
Collaborator Author

ssalaues commented Nov 6, 2018

@alexvicegrab While there is no "master only" service after this PR, have you tried to point your application to the sentinel port of the Redis service? Most redis libraries support the use of sentinels and simply need to be pointed to the sentinel port itself instead of the redis. If the library natively supports sentinel, it will simply query any sentinel to return the ip:port of the current master. This can allow for failover at the application level and not only the Kubernetes level.

I still like the idea of the "master only" service but will need some more thought before implementing as there were some issues with the prior implementation.

@shantanuthatte
Copy link

How about a binary (may be in Go) that listens for events from sentinel, and changes the tag on a pod, coupled with a service which targets the tagged pod (similar to the v2 of this chart)?

But I agree with using the load-balanced kube service to sentinel instances as the preferred way.

@alexvicegrab
Copy link
Collaborator

Thanks @ssalaues, I'll try to test with the Sentinels.

@rainhacker
Copy link

rainhacker commented Nov 6, 2018

@alexvicegrab While there is no "master only" service after this PR, have you tried to point your application to the sentinel port of the Redis service? Most redis libraries support the use of sentinels and simply need to be pointed to the sentinel port itself instead of the redis. If the library natively supports sentinel, it will simply query any sentinel to return the ip:port of the current master. This can allow for failover at the application level and not only the Kubernetes level.

I still like the idea of the "master only" service but will need some more thought before implementing as there were some issues with the prior implementation.

@ssalaues If clients running outside Kubernetes want to access Redis inside Kubernetes, the internal address of K8 pods returned by sentinels cannot be used by the clients. Is such a use case supported ? As the clients can interact with K8 services from outside easily.

@prodriguezdefino
Copy link

prodriguezdefino commented Nov 6, 2018

@alexvicegrab @rainhacker we have been using an HAProxy based K8s service to expose the redis cluster to external clients and also to provide master or slave only access for "dumb" clients. You can see the chart in this fork. If worthy I can see to make a PR.

@ssalaues
Copy link
Collaborator Author

ssalaues commented Nov 7, 2018

@rainhacker currently there is no support within the chart for this use case but I think the haproxy or similar method would be a good approach.

@jeremy-albuixech
Copy link

jeremy-albuixech commented Nov 8, 2018

@prodriguezdefino @ssalaues
Hi !
My main cluster is part of a cluster federation and I need to access the redis sentinel and the redis master services from several other clusters, all in the same GCP network.

I was trying to use the haproxy-redis chart that @prodriguezdefino linked but without any success so far, my knowledge of how haproxy works is currently limited, so sorry for this question in advance, I just want to make sure I'm going in the right direction.

I guess the haproxy-redis chart is not meant to use as-is and was built for your specific needs right (because sentinel doesn't seem to be exposed in it) ?

Could you let me know if I've got the correct idea in order to have it work with the current redis-ha implementation ? My main point being that I don't have to talk to the master/slaves directly, my redis client knows how to talk to sentinel:

  1. Need to have a haproxy frontend for the master, slave and sentinel services.
  2. Point my redis client to the haproxy IP and the sentinel port
  3. Sentinel will tell my client the address of the master redis
  4. My client will go through the haproxy IP for the correct master redis service

Does that make sense? I think it could be worth it to include some sort of documentation for this use case, from my current experience this is a common way to setup Redis especially on large projects with geo redundancy (and I guess when people are looking at Redis-HA it's usually due to some specific needs for a resilient and large scale architecture).

And by the way, thanks for this refactor, I was using the chart in the initial version and the way it works now is much better in my opinion.

@prodriguezdefino
Copy link

Let me see if I can answer your questions @Albi34 :

I guess the haproxy-redis chart is not meant to use as-is and was built for your specific needs right (because sentinel doesn't seem to be exposed in it)?
The sentinel is not exposed, that is correct, adding it should be very simple but I didn't had a reason to. In this chart case the slaves and the master are exposed through different services (each of them on diff IPs), HAProxy checks should be able to take care of discovering which of the underlying Redis instances is elected as master and which others as slaves.

Could you let me know if I've got the correct idea in order to have it work with the current redis-ha implementation ?
If you are going to access the Redis cluster from Kubernetes, then using the sentinels directly and the IPs they return should be sufficient.
If you are going to access the Redis cluster from the outside world, if you use the haproxy chart I built then you should only connect to the master service (if you need read/write ops) or the slave service (only for reads) and you should be good to go.

Of course this should work when configured correctly, so maybe if you want to, you can open an issue on my repo and I can help you there =).

@ssalaues ssalaues mentioned this pull request Nov 13, 2018
bendrucker pushed a commit to bendrucker/charts that referenced this pull request Nov 26, 2018
* improvement: refactor of redis-ha

Fixes issues:
Race condition with masters
Announced service no longer working
PVCs possibilites
redis-ha doesn't failover properly

Signed-off-by: Salim <salim.salaues@scality.com>

* cleanup and default to mymaster

Signed-off-by: Salim <salim.salaues@scality.com>

* fixes: requested changes

readiness typo, move security context to pod level, and remove -x flag from init script

Signed-off-by: Salim <salim.salaues@scality.com>

* fix sentinel var name

Signed-off-by: Salim <salim.salaues@scality.com>

* docs: upgrade notes and refinements

Signed-off-by: Salim <salim.salaues@scality.com>

* improvement: update docs, various fixes, simpler init script, better resiliency

fix notes

force failover in the event that the existing master is not accessible

simpler script, better failover, values update

fix corner cases where upgrades can fail

fix auth issue

Cleanup unused code

Default to auth disabled to prevent a new password to be generated on each upgrade

Signed-off-by: Salim <salim.salaues@scality.com>

* Fixed auth issues

Switched 'exit 0' to 'return 0' so that auth can be configured correctly.

Fixed `SENTINEl` typo

Signed-off-by: Salim <salim.salaues@scality.com>

* fixes: update with best practices

Signed-off-by: Salim <salim.salaues@scality.com>

* updates: improved doc, consistent style, and added options for custom configmap files

Signed-off-by: Salim <salim.salaues@scality.com>

* add auth info to README

Signed-off-by: Salim <salim.salaues@scality.com>
Signed-off-by: Ben Drucker <bvdrucker@gmail.com>
wgiddens pushed a commit to wgiddens/charts that referenced this pull request Jan 18, 2019
* improvement: refactor of redis-ha

Fixes issues:
Race condition with masters
Announced service no longer working
PVCs possibilites
redis-ha doesn't failover properly

Signed-off-by: Salim <salim.salaues@scality.com>

* cleanup and default to mymaster

Signed-off-by: Salim <salim.salaues@scality.com>

* fixes: requested changes

readiness typo, move security context to pod level, and remove -x flag from init script

Signed-off-by: Salim <salim.salaues@scality.com>

* fix sentinel var name

Signed-off-by: Salim <salim.salaues@scality.com>

* docs: upgrade notes and refinements

Signed-off-by: Salim <salim.salaues@scality.com>

* improvement: update docs, various fixes, simpler init script, better resiliency

fix notes

force failover in the event that the existing master is not accessible

simpler script, better failover, values update

fix corner cases where upgrades can fail

fix auth issue

Cleanup unused code

Default to auth disabled to prevent a new password to be generated on each upgrade

Signed-off-by: Salim <salim.salaues@scality.com>

* Fixed auth issues

Switched 'exit 0' to 'return 0' so that auth can be configured correctly.

Fixed `SENTINEl` typo

Signed-off-by: Salim <salim.salaues@scality.com>

* fixes: update with best practices

Signed-off-by: Salim <salim.salaues@scality.com>

* updates: improved doc, consistent style, and added options for custom configmap files

Signed-off-by: Salim <salim.salaues@scality.com>

* add auth info to README

Signed-off-by: Salim <salim.salaues@scality.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet