Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle redis cluster via redis load balancer URL #13

Closed
zhangruiskyline opened this issue Apr 11, 2018 · 31 comments
Closed

Handle redis cluster via redis load balancer URL #13

zhangruiskyline opened this issue Apr 11, 2018 · 31 comments

Comments

@zhangruiskyline
Copy link

Hi

I am using Microsoft Azure redis and it dose not provide each cluster node individual IP address, rather it will only provide the redis cluster URL(which is actually a load balancer, not real redis nodes). Can this lib handle this? if so, I should put this redis URL in startupNodes or in dialoption's address?

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 11, 2018

Hello Rui,

I'm not familiar with Azure's redis cluster support, but reading this page: https://docs.microsoft.com/en-us/azure/redis-cache/cache-how-to-premium-clustering , it looks like it should work. They do mention using a redis client with cluster support, so that means it should expose the necessary commands to query a cluster. I'm a bit concerned by the Can I directly connect to the individual shards of my cache? question in the FAQ, where they say this is not officially supported, because that's basically what a redis cluster client must do (e.g. to follow MOVE replies, direct the commands to the right node, etc.).

If you do try it out, I'd be very interested in hearing about your experience, what worked, what didn't and how I could update the package to better support azure!

Thanks,
Martin

@zhangruiskyline
Copy link
Author

zhangruiskyline commented Apr 12, 2018

Thanks Martin @mna

I have tried and it seems working. although connect to individual shard is not officially support, but I can check the redis node's IP via cluster nodes and get back information.

so in cluster setup, I use all internal nodes' address in "StartupNodes" and external URL for dial. One follow up questions, Azure Redis support master/slave mode and master and slave will have different ports, should I put master and slave in StartupNodes or only need to put master node address/port is enough?

Also Dose this redisc follow MOVE command and redirect?

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 12, 2018

Hello,

You can put either masters or replicas in StartupNodes, the client will automatically figure out the topology based on cluster nodes responses. I'm not sure what you're doing is correct for Dial, though. If I understand correctly, you use a redigo.DialNetDial option to specify a dial function that ignores the address it receives as parameter and always uses the external URL instead?

If so, I believe that would defeat the goal of using redisc. The dial function should always try to connect to the address provided, that's the one the client identified as the right address to connect to the required node.

Regarding your last question, to follow MOVE replies, redisc gives you complete control - you can either use a basic connection that will return a RedirError for MOVE replies, or you can wrap that basic connection in a RetryConn that will automatically follow those redirections (see https://godoc.org/github.com/mna/redisc#RetryConn).

Hope this helps!
Martin

@zhangruiskyline
Copy link
Author

Hi Martin @mna

regarding "The dial function should always try to connect to the address provided", you mean it will automatically fetch {IP:port} from StartupNodes and connect?

I define a CreatePool func which will be put like

		cluster := &redisc.Cluster{
			StartupNodes: redisClusterStartupNode,
			DialOptions:  redisDialOptions,
			CreatePool:   createPool,
		}
func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
	return &redis.Pool{
		MaxActive:   redisMaxActive,
		MaxIdle:     redisMaxIdle,
		IdleTimeout: time.Duration(redisIdleTimeout) * time.Second,
		Dial: func() (redis.Conn, error) {
			c, err := redis.Dial("tcp", redisURL, opts...)

			if err != nil {
				log.Errorf(err.Error())
				return nil, err
			}
			return c, nil
		},
	}, nil
}

but I indeed find can not pass the into createPool so use some hack solution to directly use redisURL in createPool , is that not what we should do? what do you suggest the best way to use this lib for cluster connection purpose

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 12, 2018

Ok, thanks for the details, the Dial function in createPool should try to connect to the addr parameter passed to createPool, so that it returns connections connected to the right node.

@zhangruiskyline
Copy link
Author

@mna ,in my case, The Addr should be load balancer URL for Azure redis, not startupnodes IP addresses for real redis node?

Thanks
Rui

@zhangruiskyline
Copy link
Author

@mna , let me refine my question in this way. what is the difference between StartupNodes in Cluster data structure, and addr in dail function of a connection pool? should they be same? or different?

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 12, 2018

Okay so here's how the redis cluster works. The startup nodes are used to build the topology of the cluster, figure out the masters and replicas. This may change during the execution of your app - e.g. maybe a node will go down, another one will replace it, etc. So that's why those are called "startup" nodes, those are the nodes to use when the app (server, typically) starts, but after that it may change. The client stays up-to-date with those changes.

Another thing that the client does is keeping track of which node in the cluster serves which key hash. There are 16384 hash values, and each hash value has a corresponding node. When you get a connection from the cluster and you execute a command, the client will identify which node should be contacted to execute this command based on the key passed to the command (e.g. GET a will contact the node that serves the hash of key "a").

Now, about the addr that is passed to the Cluster.CreatePool function. When you try to get the value of key "a", the Cluster will try to get a connection from the corresponding node's pool. If there is no pool yet for this node, it will call the Cluster.CreatePool function to create a pool that manages connections to that specific node. So as you can see, it is important that the Dial function of this pool really returns connection to that specific node, not the generic load balancer URL, otherwise you lose a lot of the optimization of the redis cluster client (the "smart routing" part).

In a stable cluster - typically, the huge majority of the time - the client will always select the right node to contact, so you won't receive MOVE replies and will avoid having to do multiple network calls to get the response from the correct node.

In other words (in code), you should do this:

func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
	return &redis.Pool{
                // other fields omitted...
		Dial: func() (redis.Conn, error) {
			c, err := redis.Dial("tcp", addr, opts...) // <<-- IMPORTANT! use "addr" here!
			if err != nil {
				log.Errorf(err.Error())
				return nil, err
			}
			return c, nil
		},
	}, nil
}

Hope that makes it clearer!
Martin

@zhangruiskyline
Copy link
Author

Thanks @mna for your detailed information.

follow up on " it is important that the Dial function of this pool really returns connection to that specific node, not the generic load balancer URL". for cluster case, I will have bunch of different {IP:ports} of different redis nodes, which one should I use to pass into Addr field?

Right now, in Azure redis, In createPool I always Dial in external URL(which is actually a load balancer in Azure redis case), and in StartupNodes in list all {IP:port}s(I got these IP:port by issuing cluster nodes in redis command line). And looks like it worked

but according to your suggestion, this sounds to be a Hack. not recommended. But I can not find another way, since Azure want to hide all cluster information and only expose one external URL to outside, and let redis client to manage all cluster/pool itself.

Can you provide a sample code what is your recommend way to implement the cluster init stage?

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 13, 2018

Hm I think I see where the confusion comes from, you never call createCuster directly yourself, redisc will call it when needed, with the right address as parameter. You can take a look at example_test.go in the redisc package for some example usage.

In your code, you simply call Custer.Get() to retrieve a pooled connection, e.g.:

	// create the cluster
	cluster := redisc.Cluster{
		StartupNodes: ..., // your startup nodes
		DialOptions: ..., // whatever dial options you want to use
		CreatePool:   createPool,
	}
	defer cluster.Close()

	// initialize its mapping before use, recommended
	if err := cluster.Refresh(); err != nil {
		// handle error
	}

        // And then elsewhere, when you need a connection
	// grab a connection from the pool
	conn := cluster.Get()
	defer conn.Close()

So you never actually call createPool.

Hope this helps!
Martin

@zhangruiskyline
Copy link
Author

Thanks @mna , If redisc calls the createPool instead of myself handling it, then where should I pass the redis URL at starting point? Another question, is it possible for redisc to handle all internal nodes IP:port without specifying them in startupNodes section(this seems to be what Azure redis recommend, so application should not know internal redis node's IP)

Thanks
Rui

@zhangruiskyline
Copy link
Author

FYI, @mna , I have contacted Azure redis dev team,

and their response is like:

Most Redis clients that understand clustering do the following:

  1. Connect to the host name for the cache using port 6379 (non-ssl) or 6380 (SSL).
  2. Ask Redis if it is clustered. If so, ask for the configuration of the cluster. This includes the IP Addresses and ports of all nodes (masters and slaves) in the cluster. This also includes the slot assignments handled by each node.
  3. When getting or setting a key from redis, the client will calculate the hash of the key, then lookup which node owns the slot that hash, then send the request to the specific node for that key.

Another dev points out:

Normally, you can just put redis URL as startupNode. But you should be aware that our redis URL is actually a load balancer, rather than a real node. Whether this will cause issues depend on concrete implementation. As I know, there is one issue caused by passing load balancer as startup node in Java Lettuce client. You can refer this. redis/lettuce#712

So right now it seems to me only the hack way works in Azure redis

  1. to put redisURL in createPool addr field
  2. use "cluster nodes" in redis command and get back {ip:port}, then assign these {ip:port} into StartupNodes,

I am not sure whether this is best what I can do with redisc, or some other piece I should improve. please let me know your comments, that will be great helpful

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 13, 2018

Hello,

Yep, that's pretty much what a redis cluster client will do. It's a good recommendation to pass the redis URL as startup node, as it probably ensures that it can connect to a valid node in the cluster, and then redisc will discover the actual addresses of each active node.

However, in createPool, as I said, you have to pass the addr argument you received, as-is, not the redis URL. That's for the point 3 that the azure devs told you about:

When getting or setting a key from redis, the client will calculate the hash of the key, then lookup which node owns the slot that hash, then send the request to the specific node for that key.

So the createPool has to create a connection to the specific node that it requested in the addr parameter.

Martin

@zhangruiskyline
Copy link
Author

zhangruiskyline commented Apr 13, 2018

Thanks for reply @mna , but it dose not work in my case, I always get "redisc: failed to get a connection",here is my connection type

		cluster := &redisc.Cluster{
			StartupNodes: redisClusterStartupNode,  //here I pass: "xxx..redis.cache.windows.net:6380" as startup nodes
			DialOptions: redisDialOptions,
			CreatePool: createPool,
		}

and in createPool I use what you reommended

func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
	return &redis.Pool{
		MaxActive:   redisMaxActive,
		MaxIdle:     redisMaxIdle,
		IdleTimeout: time.Duration(redisIdleTimeout) * time.Second,
		Dial: func() (redis.Conn, error) {
			c, err := redis.Dial("tcp", addr, opts...)

			if err != nil {
				log.Errorf(err.Error())
				return nil, err
			}
			return c, nil
		},
	}, nil
}

is there any way I can completely get rid of startup nodes or any internal information, just provide the external URL(Load Balancer URL) and make it work? I guess it is because redisc need to have a real redis node as startupnodes? while I provide only load balancer?

Thanks
Rui

@zhangruiskyline zhangruiskyline changed the title Handle cluster URL as load balancer Handle redis cluster via redis load balancer URL Apr 13, 2018
@mna
Copy link
Owner

mna commented Apr 13, 2018

Where is your app running from? Does it have access to the internal nodes (IP address and port, not the entrypoint URL of the load balancer)? E.g. if you try to connect from one of the internal node addresses from the server where the app runs using redis-cli, can you successfully connect and run commands?

My guess is that your app doesn't have access to the cluster nodes' addresses from where it runs.

@zhangruiskyline
Copy link
Author

I am running it in my local machine, but it should have cert and SSL configured, I am able to connect to redisURL:6380 . and the hack works(put this redisURL:6380 in creatPool addr)

but when putting redisURL: into startupNodes, it dose not work, I try both 6380 or 1500x ports, neither work

Thanks
Rui

@zhangruiskyline
Copy link
Author

zhangruiskyline commented Apr 14, 2018

Hi @mna

some more update on debug: if I only put redisURL:6380 in startupnode, and use addr in createPool, seems to me the correct redis node address is passed(according to debug), but somehow dial function can not establish the connection due to SSL handle error.

I tried two different ways:

  1. Put the auth in createpool:
func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
	return &redis.Pool{
		MaxActive:   redisMaxActive,
		MaxIdle:     redisMaxIdle,
		IdleTimeout: time.Duration(redisIdleTimeout) * time.Second,
		Dial: func() (redis.Conn, error) {
			c, err := redis.Dial("tcp", addr, opts...)

			if err != nil {
				log.Errorf(err.Error())
				return nil, err
			}
                          //Auth here in Dial, which is what I did for redigo and worked
			if _, err := c.Do("AUTH", redisAuth); err != nil {
				log.Errorf(err.Error())
				c.Close()
				return nil, err
			}
			return c, nil
		},
	}, nil
}
  1. Another one is use redis.DialPassword(redisAuth) in dial option,

but both returns "redisc: failed to get a connection", Not sure what is best way I should dial in SSL case for cluster.

It looks like to me problem is:

When redisc first establish SSL connection, it will call redisURL:6380, which is a load balancer as indicated before. so it can establish security connections.

but later when it tries to find other nodes in redis cluster to connect, and when createpool try to establish the SSL connection directly to new redis node, it fails. probably all security handle is on load balancer side for Azure redis, rather than each cluster node level.

Is there any solution to this?

Thanks
Rui

@zhangruiskyline
Copy link
Author

zhangruiskyline commented Apr 14, 2018

Update again,

if I disable the Azure redis cache SSL and expose none SSL port, only use PWD to Auth, it works!! so looks like my former assumption of SSL auth problem is correct

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 14, 2018 via email

@zhangruiskyline
Copy link
Author

zhangruiskyline commented Apr 14, 2018

Yes, I am aware of using redigo to do SSL, but the question I have is when we createpool to some other redis node rather than Loadbalancer(redisURL:6380), it seems redisc will try to create same SSL session pool directly to redis node, but will be rejected since only load balancer is able to handle the SSL stuff. even if I change redisTLSconfig.ServerName = redisURL:6380 in createpool, it dose not work.

any better way I can force SSL connection to load balancer instead of addr field when I try to createpool?

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 14, 2018

Ah I see, yeah so you don't need SSL in createPool on the internal nodes, only for the load balancer URL... There's no obvious way to do this, one thing you could do is not set the DialTLSConfig and DialUseTLS on Cluster.DialOptions, and only set it in createPool when the addr is the load balancer's address, e.g.:

func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
        if addr == redisURL { // or redisURL + ":6380", don't know exactly what you've set redisURL to
            opts = append(opts, redis.DialUseTLS(true)) 
            opts = append(opts, redis.DialTLSConfig(...))
        }
	return &redis.Pool{
                // ...
		Dial: func() (redis.Conn, error) {
			c, err := redis.Dial("tcp", addr, opts...)
                        // ...
        }
}

@zhangruiskyline
Copy link
Author

I think this dose not work also, when redisc dial into some other redis nodes. it still need to go to SSL connection, juts it won't SSL directly to redis node, but load balancer. in your code, it will not establish the SSL at all

@mna
Copy link
Owner

mna commented Apr 15, 2018

I'm afraid I don't understand what is your issue then. You either need TLS and thus have it configured in the dial options, or you don't and thus you set it on/off in the dial options when needed. I'm not familiar with Azure's redis, but I'd expect the external URL to require TLS config so that the endpoint is not publicly accessible from outside the VPN, but the internal nodes to be accessed from inside the VPN (where your binary should run) without TLS.

@zhangruiskyline
Copy link
Author

Let me double check again. My question is when dial into some redis cluster node, we still need SSL, but not SSL directly to node, rather it needs to dial to Redis load balancer URL. In your former approach, it will directly dial into redis node without SSL, right?

@zhangruiskyline
Copy link
Author

zhangruiskyline commented Apr 16, 2018

Hi @mna

I think I got some feedback from Azure redis team, so client can directly connect to redis nodes(not necessarily load Balancer), but in Azure redis, different nodes and load balancer shares same IP, but different ports.(via a NAT way)

but in dial, seems we split the ports, so we lost port in SSL creation? anyway, it could be a question for redigo.

		if tlsConfig.ServerName == "" {
			host, _, err := net.SplitHostPort(address)
			if err != nil {
				netConn.Close()
				return nil, err
			}
			tlsConfig.ServerName = host
		}

Thanks
Rui

@zhangruiskyline
Copy link
Author

I think the problem is that Azure redis uses NAT in cluster mode, so every node has same IP but different ports, somehow redisc can not bind rc, _, err := c.bind(cmdSlot(cmd, args)), find some strange slot and can not find correct node.

investigating more, but if you have some insight, please share

Thanks
Rui

@zhangruiskyline
Copy link
Author

One question, the createPool will be called only in startup stage? basically if I only provide load balancer node address as startupNodes option, will it run "cluster nodes" command in redis cache and get back all nodes and create SSLs for all of them at starting point?

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 19, 2018

No, createPool is created whenever a pool is needed for one node (i.e. at least once per node in the cluster + startup nodes).

@zhangruiskyline
Copy link
Author

@mna

I got some feedback again from Azure redis team about the recommendation how the cluster client should work.

Quote as:

    1. Connect to the Load Balancer port and ask for the cluster configuration via the CLUSTER NODES command
    1. Parse the output of that command to discover the IP address and ports of the nodes in the cluster.
    1. Configure your client to know about each node in the cluster.

I am not sure whether redisc did same? If not, how can I use redisc to minic the Azure redis behavior?

Thanks
Rui

@zhangruiskyline
Copy link
Author

OK, finally I got the client works in SSL mode. here is configuration

in StartupNodes, only put the redisURL:6380(Loadbalancer URL) , in dial option, use redis.DialUseTLS(true), and redis.DialTLSSkipVerify(true), The key point put redis.DialTLSSkipVerify(true), I am still not 100% sure why put this works.

Thanks
Rui

@mna
Copy link
Owner

mna commented Apr 20, 2018

Ok, looks like you don't have a trusted certificate. Glad you got it working, closing as this is a connection configuration issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants