Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for more Redis Cluster commands #10434

Merged
merged 18 commits into from
Mar 2, 2022
Merged

Conversation

jakule
Copy link
Contributor

@jakule jakule commented Feb 17, 2022

This PR adds support for more Redis command in cluster mode. When connecting to Redis through Teleport client should always connect in a standalone mode (redis-cli without -c flag for example). If Redis works in the cluster mode then the Teleport responsibility is to "emulate" standalone behaviour. The main implication of that is that Teleport need to send GET/SET command to the correct instance to retrieve/set keys. go-redis already implements most of the functionality for us, but not all commands are supported.
I was looking on the list commands supported by Envoy https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/other_protocols/redis#redis-cluster-support which works similar to Teleport in this case.

This PR should get as close to all commands supported by Envoy, with a few exceptions:
MSET - I don't think we should support it. MSET allows to set multiple keys as once. Envoy internally splits the requested keys and sends multiple MSET requests to multiple servers. The problem here is error handling. If one of the commands fails (only on one server for example), Teleport could return an error, but that leaves the cluster in inconsistent state where only some keys are updated. Currently we just pass the MSET command the the server. It will work only if all keys are on the server.

Redis transactions are blocked. Similar as above. Currently a transaction will fail if it affects multiple servers. I think that clocking them explicitly is safer and less confusing for users.

This PR doesn't contain any tests as currently we don't have a way to simulate Redis Cluster behaviour in tests. I'm working on getting Redis Cluster to our system tests, but I'll create a separate PR for that when it's ready.

@jakule jakule marked this pull request as ready for review February 17, 2022 21:12
@jakule jakule requested a review from r0mant February 17, 2022 21:12
@github-actions github-actions bot added the database-access Database access related issues and PRs label Feb 17, 2022
@jakule jakule requested review from smallinsky and removed request for lxea and nklaassen February 17, 2022 21:12
lib/srv/db/redis/client.go Outdated Show resolved Hide resolved
lib/srv/db/redis/client.go Outdated Show resolved Hide resolved
}
}

// Process add supports for additional cluster commands.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Explain why it is required like you did in this PR's description.

func (c *clusterClient) Process(ctx context.Context, inCmd redis.Cmder) error {
cmd, ok := inCmd.(*redis.Cmd)
if !ok {
return trace.BadParameter("failed to cast Redis command")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include %T in the message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

return trace.Wrap(err)
}

mtx.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is locking needed? Does ForEachMaster execute functions concurrently? If so, explain that with a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ForEachMaster is called in parallel. I'll add a comment.

default:
// We've checked that while validating the config, but checking again can help with regression.
return trace.BadParameter("incorrect connection mode %s", connectionOptions.mode)
connectionAddr := fmt.Sprintf("%s:%s", connectionOptions.address, connectionOptions.port)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use net.JoinHostPort instead of Sprintf.

Comment on lines +193 to +199
cmd.SetVal(singleCmd.Val())
cmd.SetErr(singleCmd.Err())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

race condition c.ForEachMaster calls the ForEachMaster in for each master asynchronously in new goroutine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can talk about this 😅 But I think that there is a different issue here. If one of the commands returns error we should return that error and not override it later with "OK" value. I'll fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added mutex and error check above.

Comment on lines 166 to 168
mtx.Lock()
resultsKeys = append(resultsKeys, redis.Nil)
mtx.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lock is needed here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. This is not called in parallel. I'll remove it. Thanks

Comment on lines 177 to 179
mtx.Lock()
resultsKeys = append(resultsKeys, result.Val())
mtx.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I'm not sure based on code to see why this lock is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks.

@jakule jakule requested a review from r0mant February 25, 2022 06:20
@@ -99,6 +99,10 @@ func WriteCmd(wr *redis.Writer, vals interface{}) error {
if err := writeStringSlice(wr, val); err != nil {
return trace.Wrap(err)
}
case []bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the following case case []interface{}' should already handle the bool array ? Based on the code it looks that writeSliceand writeBoolSlice` functions are the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are the same except the function signature. Go distinguish between slices of different types, so generic []interface{} won't be called when []bool is passed: https://go.dev/play/p/betqiX7vl1L
The same with the function call; []bool cannot be passed as []interface{}.

Go 1.18 would be helpful here: https://go.dev/play/p/c6WIHvhAN2n?v=gotip

For now the best what I can do is to use reflection (and add a UT).

Comment on lines 33 to 45
// Commands with additional processing in Teleport when using cluster mode.
const (
multiCmd = "multi"
execCmd = "exec"
watchCmd = "watch"
dbsizeCmd = "dbsize"
scanCmd = "scan"
keysCmd = "keys"
mgetCmd = "mget"
flushallCmd = "flushall"
flushdbCmd = "flushdb"
scriptCmd = "script"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says that this commands have additional processing during cluster mode but based on the but some of them are not supported right now in cluster mode like: execCmd https://github.com/gravitational/teleport/pull/10434/files#diff-6230c3f30596bb3b46e45d42bc9da09c2fac0fb440e69ed818e6daf73570b55bR138

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I move them inside the case at some point and forgot to move them here.

return nil
}

cmd.SetVal(singleCmd.Val())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if in case where the flow is skipped by the continue call when cmd.Err() != nil the cmd.Val and cmd.Err can be both nil.
In that case processSeverResponse can return nil, nil so further flow can panic.

@jakule jakule enabled auto-merge (squash) March 2, 2022 16:40
@jakule jakule merged commit 705f042 into master Mar 2, 2022
@jakule jakule deleted the jakule/redis-cluster-cmds branch March 2, 2022 16:44
jakule added a commit that referenced this pull request Mar 2, 2022
* Add support for more Redis commands in Cluster mode.

* Add reconnect on AUTH logic to Redis.

* Redis code refactoring.

* Update list of unsupported Redis commands in Cluster mode.
jakule added a commit that referenced this pull request Mar 4, 2022
* Add support for more Redis commands in Cluster mode.

* Add reconnect on AUTH logic to Redis.

* Redis code refactoring.

* Update list of unsupported Redis commands in Cluster mode.

Backport #10434.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database-access Database access related issues and PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants