Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Functions should be able declare additional keys #10951

Open
madolson opened this issue Jul 7, 2022 · 17 comments
Open

[NEW] Functions should be able declare additional keys #10951

madolson opened this issue Jul 7, 2022 · 17 comments
Projects

Comments

@madolson
Copy link
Contributor

madolson commented Jul 7, 2022

The problem/use-case that the feature addresses

One pattern that functions are suppose to allow is to "abstract" away the underlying details of a command implementation. One such way that might be accomplished is to access multiple keys to perform an operation. However, Functions require all keys to be declared, which might change.

Description of the feature

A function should be able to declare a function that selects which keys will be touched, similar to how functions registration works.

Alternatives you've considered

Require end uses to specify all the keys. This couples the underlying implementation with the keys accessed.

@oranagra
Copy link
Member

oranagra commented Jul 8, 2022

It also bothered me at some point, but doing that could violate cluster routing since the client won't see these keys.
One way out is maybe to add some APIs to allow the function to easily generate other key names that fit the same hash slot of the declared input key.

@MeirShpilraien @yossigo WDYT?

@oranagra oranagra added this to Backlog in 7.0 via automation Jul 8, 2022
@oranagra oranagra moved this from Backlog to To Do in 7.0 Jul 8, 2022
@yossigo
Copy link
Member

yossigo commented Jul 9, 2022

@oranagra One easy (but maybe not very memory efficient) way to do that without any additional functions is to use {BaseKey}DerivedKey to name derived keys off base keys.

@MeirShpilraien
Copy link
Collaborator

@madolson I agree with the problem you raise and I believe its even deeper then what you just mentioned. I believe that there are use-cases that today are not even solvable on a cluster and not just because the function should abstract the keys names. For example, consider the use-case where we have hashes representing persons and for each person we have a field with the amount of kids the person have. We want to maintain a counter that will give us the total number of kids. For a single shard, it is easy to implement, you create a function to change the kids field and maintain an outside counter that keeps the total number of kids. For a cluster, I do not see a way to do it, what will be the name of the counter key? How do you maintain it on each shard? How to handle resharding? I believe that such a use-case is a valid use-case for functions and we do not have a solution for that on cluster (unless there is some easy solution that I missed?).

@ranshid
Copy link
Collaborator

ranshid commented Jul 10, 2022

@oranagra One easy (but maybe not very memory efficient) way to do that without any additional functions is to use {BaseKey}DerivedKey to name derived keys off base keys.

I also like this solution - "allow the function to easily generate other key names that fit the same hash slot of the declared input key" However I would like to ask if using the BaseKey is the correct method?
for one, the user might use the curly brackets on the basekey, which will require to parse the basekey to make sure we keep align with the slot mapping. also it is possible (but probably less common) to image a function which accepts no keys but will need to maintain internal key access (for saving iterator location for example). Also the fact that we will use the basekey as prefix, might expose it to user operations. for example in case the user will want to delete all keys which has the substring 'basekey' it is not sure we would like to erase the "derived keys" in this case.

IMO allow adding internal keys in the function metadata is structured and we can think of supporting some identifier like "$slot" in order to make sure the key will be mapped to the specific slot the function operates in.
this way we can even drop the need for a specific configuration to enable undeclared keys and decide on the validity of a key access based on the function metadata.

@oranagra
Copy link
Member

For example, consider the use-case where we have hashes representing persons and for each person we have a field with the amount of kids the person have. We want to maintain a counter that will give us the total number of kids.

Note that the function may want to update one counter for many different keys it works on, so if we go the curly brackets way, it should be {hashslot($KEYS[1])}_counter.
But then when another function needs to collect the total count, it may need to sum 16k counters.

in case the user will want to delete all keys which has the substring 'basekey' it is not sure we would like to erase the "derived keys" in this case.

With functions philosophy, users don't delete keys, they invoke functions to delete logical objects, so that shouldn't be a problem.

@madolson
Copy link
Contributor Author

We want to maintain a counter that will give us the total number of kids. For a single shard, it is easy to implement, you create a function to change the kids field and maintain an outside counter that keeps the total number of kids. For a cluster, I do not see a way to do it, what will be the name of the counter key?

This is strictly an anti-pattern for Redis cluster, and we need to think deep and hard if we really want to work to resolve it. Right now Redis only provides linearizable (for it's definition of durability) on a slot, and you are suggesting we should have some form of linearizable across slots (or some other weaker guarantee). This requires cross shard coordination in the form of distributed transactions or conflict resolvable data types. This is an interesting problem, but I think it's distinct from the problem being discussed here.

@oranagra
Copy link
Member

I agree it's a slightly different topic than (sorry for hogging your issue).
It's just that your opening statement made me realize we missed a big issue with cluster in the philosophy of functions.

Regarding the issue you raised, and the solution you suggested, I think it's problematic to run an auxiliary function before starting the actual function.
If your concern was ACL, to have the function start only after validating access to all keys, then in some way it's already achievable with an API call.

@zuiderkwast
Copy link
Contributor

Redis Cluster puts the responsibility on the client to store each key on the correct shard. Redis could have been responsible for forwarding/proxying the commands to the correct node, but it isn't. It has been suggested though...

If we want to compare the code in a function to the logic in a client, we can allow functions to make calls to any node in the cluster, e.g. the redis.call function could be turned into a cluster aware client.

@madolson
Copy link
Contributor Author

@zuiderkwast It'll be a lot more complex than just allowing a client to forward a command to another node, since we still have cross shard linearizability issues. Let's say you have a script that atomically updates two bank account numbers. If those scripts are executed on two separate nodes, they still need to appear as if they happened in some order, we can't have the commands between the two transactions intermingled.

@zuiderkwast
Copy link
Contributor

@madolson Two bank accounts on different shards: We don't support atomic operations across slots, whether or not functions are used.

Functions can only do things that regular clients can do. Are we trying to give them more power than that?

@oranagra
Copy link
Member

going though this again i'd like to summarize and suggest what i think should be done.
i'd like to use Ran's idea of adding a metadata field to the function (not a callback) with a list of template key names.
i think being declarative rather than procedural will be better for both the users who write scripts, and also for redis (although a bit less flexible).

these names should support two forms of macros:

redis.register_function('f1', f1, keys=['{$keys[0]}_suffix', '{$slots[0]}_sum'])

i.e. the $slots[0] macro will return a name of a key that will fall into the same hash slot as the first key (one constant string for each hash slot, regardless of the actual key name).
I think we should also support keys[*] and slots[*] so that we automatically generate keys for all input keys provided by the caller.
and I do think we should require that as soon as the keys argument is provided, the user should explicitly also mention $keys[*].

This way we support both the use case were someone wanna maintain a specific auxiliary key for each of the main keys.
But also in some way support the the summing use case mentioned above (it'll generate up to 16k auxiliary keys, one for each hash slot, and the user can implement a function to sum them.

we'll also need to add some APIs that will let the function code expand the slots macro thing into a key name that it can pass to a redis command.
something like redis.get_key_representing_slot(keyname), and maybe also a way to know the slots of the current node and iterate on them (to create a function that can sum a result from the 16k keys)

I wanna argue that users can already probably achieve some of that (the specific auxiliary key for each of the main keys) by manually naming these keys correctly with curly braces, and the main disadvantage they have compared to what we'll offer is that they'll need to manually use redis.acl_check_cmd() at the beginning of the function in order to make sure it doesn't abort half way in.
so the two things we add here are convenience / validation, and also a solution for the summing use case.
please correct me if i'm wrong.

@madolson
Copy link
Contributor Author

@zuiderkwast I think that is what people use functions for, they expect it to be atomic.

@oranagra I agree with most of the proposal, but I don't think we should solve the $slot problem the way that was identified, by generating a compatible key per slot. I think long term we need some "slot" independent data, and we can provide access to that slot data. I've wanted to modify the hash tag semantic in the past, perhaps with a v2, that allows specifying the slot on the client side.

@oranagra
Copy link
Member

@madolson by slot independent data you mean some per-slot data storage (i.e. the function will be able to store some info in slot 0 and some info in slot 1)? or do you mean to save some global (out of keyspace) data in redis (which we'll need to figure out how to reshard)?
I suppose you refer to the first one, but i think it'll just complicate things.
i.e. you'll still need to let different users / functions to store things independently, and support all the data types (counter, stream, etc). i think generating a curly bracket based prefix string for a key name that falls into each slot is simpler and still lets all the other mechanisms in redis implicitly work.

@madolson
Copy link
Contributor Author

madolson commented Jul 20, 2022

I want to support something like {:1235}foo that maps the key into slot 1235. This will require a second iteration of the key routing, but I think it will be much easy for developers and end to understand compared to 'generating' a valid key per slot. I'm okay with this not being backwards compatible since it's mostly new functionality.

@madolson
Copy link
Contributor Author

I also independently want the {:1235} syntax so the clients can deliberately place items in a slot, which we've seen with some AWS users. Some user has some extremely hot key, and they want to route it specially. It also allows custom client side routing at the cost of added key sizes.

@oranagra
Copy link
Member

oranagra commented Jul 21, 2022

ok, i can live with that.. it does make things a little bit more complex (more steps), and adds backwards compatibility issues, but arguably makes things clearer.
so a function can have these APIs:

redis.get_slot_range() -- returns an array with two elements representing start and end slots
redis.get_slot_of_key(keyname) -- returns an integer

then they can do something like

slot_key = '{'..redis.get_slot_of_key(KEYS[0])..'}_sum'
redis.call('incr', slot_key)
range = redis.get_slot_range()
for i=range[0],range[1] do redis.call('get',  '{'..i..'}_sum') end

i still think that the declarative key binding could be like:

redis.register_function('f1', f1, keys=['{$keys[0]}_suffix', '{$slots[0]}_sum'])

@oranagra
Copy link
Member

oranagra commented Jul 26, 2022

discussed in core-team meeting.
we wanna tackle these 3 separate topics together as part of 7.2.
the get_slot_range() thing, might be an anti-pattern, we should try to think if we can come up with a different (more high level) solution for a map-reduce case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To Do
7.0
To Do
Development

No branches or pull requests

6 participants