[NEW] Functions should be able declare additional keys #10951

madolson · 2022-07-07T21:21:10Z

The problem/use-case that the feature addresses

One pattern that functions are suppose to allow is to "abstract" away the underlying details of a command implementation. One such way that might be accomplished is to access multiple keys to perform an operation. However, Functions require all keys to be declared, which might change.

Description of the feature

A function should be able to declare a function that selects which keys will be touched, similar to how functions registration works.

Alternatives you've considered

Require end uses to specify all the keys. This couples the underlying implementation with the keys accessed.

oranagra · 2022-07-08T06:49:03Z

It also bothered me at some point, but doing that could violate cluster routing since the client won't see these keys.
One way out is maybe to add some APIs to allow the function to easily generate other key names that fit the same hash slot of the declared input key.

@MeirShpilraien @yossigo WDYT?

yossigo · 2022-07-09T17:43:06Z

@oranagra One easy (but maybe not very memory efficient) way to do that without any additional functions is to use {BaseKey}DerivedKey to name derived keys off base keys.

MeirShpilraien · 2022-07-10T15:09:54Z

@madolson I agree with the problem you raise and I believe its even deeper then what you just mentioned. I believe that there are use-cases that today are not even solvable on a cluster and not just because the function should abstract the keys names. For example, consider the use-case where we have hashes representing persons and for each person we have a field with the amount of kids the person have. We want to maintain a counter that will give us the total number of kids. For a single shard, it is easy to implement, you create a function to change the kids field and maintain an outside counter that keeps the total number of kids. For a cluster, I do not see a way to do it, what will be the name of the counter key? How do you maintain it on each shard? How to handle resharding? I believe that such a use-case is a valid use-case for functions and we do not have a solution for that on cluster (unless there is some easy solution that I missed?).

ranshid · 2022-07-10T18:31:46Z

@oranagra One easy (but maybe not very memory efficient) way to do that without any additional functions is to use {BaseKey}DerivedKey to name derived keys off base keys.

I also like this solution - "allow the function to easily generate other key names that fit the same hash slot of the declared input key" However I would like to ask if using the BaseKey is the correct method?
for one, the user might use the curly brackets on the basekey, which will require to parse the basekey to make sure we keep align with the slot mapping. also it is possible (but probably less common) to image a function which accepts no keys but will need to maintain internal key access (for saving iterator location for example). Also the fact that we will use the basekey as prefix, might expose it to user operations. for example in case the user will want to delete all keys which has the substring 'basekey' it is not sure we would like to erase the "derived keys" in this case.

IMO allow adding internal keys in the function metadata is structured and we can think of supporting some identifier like "$slot" in order to make sure the key will be mapped to the specific slot the function operates in.
this way we can even drop the need for a specific configuration to enable undeclared keys and decide on the validity of a key access based on the function metadata.

oranagra · 2022-07-10T19:27:33Z

For example, consider the use-case where we have hashes representing persons and for each person we have a field with the amount of kids the person have. We want to maintain a counter that will give us the total number of kids.

Note that the function may want to update one counter for many different keys it works on, so if we go the curly brackets way, it should be {hashslot($KEYS[1])}_counter.
But then when another function needs to collect the total count, it may need to sum 16k counters.

in case the user will want to delete all keys which has the substring 'basekey' it is not sure we would like to erase the "derived keys" in this case.

With functions philosophy, users don't delete keys, they invoke functions to delete logical objects, so that shouldn't be a problem.

madolson · 2022-07-11T17:33:01Z

We want to maintain a counter that will give us the total number of kids. For a single shard, it is easy to implement, you create a function to change the kids field and maintain an outside counter that keeps the total number of kids. For a cluster, I do not see a way to do it, what will be the name of the counter key?

This is strictly an anti-pattern for Redis cluster, and we need to think deep and hard if we really want to work to resolve it. Right now Redis only provides linearizable (for it's definition of durability) on a slot, and you are suggesting we should have some form of linearizable across slots (or some other weaker guarantee). This requires cross shard coordination in the form of distributed transactions or conflict resolvable data types. This is an interesting problem, but I think it's distinct from the problem being discussed here.

oranagra · 2022-07-11T17:52:47Z

I agree it's a slightly different topic than (sorry for hogging your issue).
It's just that your opening statement made me realize we missed a big issue with cluster in the philosophy of functions.

Regarding the issue you raised, and the solution you suggested, I think it's problematic to run an auxiliary function before starting the actual function.
If your concern was ACL, to have the function start only after validating access to all keys, then in some way it's already achievable with an API call.

zuiderkwast · 2022-07-11T18:33:29Z

Redis Cluster puts the responsibility on the client to store each key on the correct shard. Redis could have been responsible for forwarding/proxying the commands to the correct node, but it isn't. It has been suggested though...

If we want to compare the code in a function to the logic in a client, we can allow functions to make calls to any node in the cluster, e.g. the redis.call function could be turned into a cluster aware client.

madolson · 2022-07-19T03:40:04Z

@zuiderkwast It'll be a lot more complex than just allowing a client to forward a command to another node, since we still have cross shard linearizability issues. Let's say you have a script that atomically updates two bank account numbers. If those scripts are executed on two separate nodes, they still need to appear as if they happened in some order, we can't have the commands between the two transactions intermingled.

zuiderkwast · 2022-07-19T09:40:16Z

@madolson Two bank accounts on different shards: We don't support atomic operations across slots, whether or not functions are used.

Functions can only do things that regular clients can do. Are we trying to give them more power than that?

oranagra · 2022-07-19T15:36:25Z

going though this again i'd like to summarize and suggest what i think should be done.
i'd like to use Ran's idea of adding a metadata field to the function (not a callback) with a list of template key names.
i think being declarative rather than procedural will be better for both the users who write scripts, and also for redis (although a bit less flexible).

these names should support two forms of macros:

redis.register_function('f1', f1, keys=['{$keys[0]}_suffix', '{$slots[0]}_sum'])

i.e. the $slots[0] macro will return a name of a key that will fall into the same hash slot as the first key (one constant string for each hash slot, regardless of the actual key name).
I think we should also support keys[*] and slots[*] so that we automatically generate keys for all input keys provided by the caller.
and I do think we should require that as soon as the keys argument is provided, the user should explicitly also mention $keys[*].

This way we support both the use case were someone wanna maintain a specific auxiliary key for each of the main keys.
But also in some way support the the summing use case mentioned above (it'll generate up to 16k auxiliary keys, one for each hash slot, and the user can implement a function to sum them.

we'll also need to add some APIs that will let the function code expand the slots macro thing into a key name that it can pass to a redis command.
something like redis.get_key_representing_slot(keyname), and maybe also a way to know the slots of the current node and iterate on them (to create a function that can sum a result from the 16k keys)

I wanna argue that users can already probably achieve some of that (the specific auxiliary key for each of the main keys) by manually naming these keys correctly with curly braces, and the main disadvantage they have compared to what we'll offer is that they'll need to manually use redis.acl_check_cmd() at the beginning of the function in order to make sure it doesn't abort half way in.
so the two things we add here are convenience / validation, and also a solution for the summing use case.
please correct me if i'm wrong.

madolson · 2022-07-19T21:02:46Z

@zuiderkwast I think that is what people use functions for, they expect it to be atomic.

@oranagra I agree with most of the proposal, but I don't think we should solve the $slot problem the way that was identified, by generating a compatible key per slot. I think long term we need some "slot" independent data, and we can provide access to that slot data. I've wanted to modify the hash tag semantic in the past, perhaps with a v2, that allows specifying the slot on the client side.

oranagra · 2022-07-20T05:25:26Z

@madolson by slot independent data you mean some per-slot data storage (i.e. the function will be able to store some info in slot 0 and some info in slot 1)? or do you mean to save some global (out of keyspace) data in redis (which we'll need to figure out how to reshard)?
I suppose you refer to the first one, but i think it'll just complicate things.
i.e. you'll still need to let different users / functions to store things independently, and support all the data types (counter, stream, etc). i think generating a curly bracket based prefix string for a key name that falls into each slot is simpler and still lets all the other mechanisms in redis implicitly work.

madolson · 2022-07-20T22:12:31Z

I want to support something like {:1235}foo that maps the key into slot 1235. This will require a second iteration of the key routing, but I think it will be much easy for developers and end to understand compared to 'generating' a valid key per slot. I'm okay with this not being backwards compatible since it's mostly new functionality.

madolson · 2022-07-20T23:20:53Z

I also independently want the {:1235} syntax so the clients can deliberately place items in a slot, which we've seen with some AWS users. Some user has some extremely hot key, and they want to route it specially. It also allows custom client side routing at the cost of added key sizes.

oranagra · 2022-07-21T05:29:09Z

ok, i can live with that.. it does make things a little bit more complex (more steps), and adds backwards compatibility issues, but arguably makes things clearer.
so a function can have these APIs:

redis.get_slot_range() -- returns an array with two elements representing start and end slots
redis.get_slot_of_key(keyname) -- returns an integer

then they can do something like

slot_key = '{'..redis.get_slot_of_key(KEYS[0])..'}_sum'
redis.call('incr', slot_key)
range = redis.get_slot_range()
for i=range[0],range[1] do redis.call('get',  '{'..i..'}_sum') end

i still think that the declarative key binding could be like:

redis.register_function('f1', f1, keys=['{$keys[0]}_suffix', '{$slots[0]}_sum'])

oranagra · 2022-07-26T07:30:40Z

discussed in core-team meeting.
we wanna tackle these 3 separate topics together as part of 7.2.
the get_slot_range() thing, might be an anti-pattern, we should try to think if we can come up with a different (more high level) solution for a map-reduce case.

oranagra added this to Backlog in 7.0 via automation Jul 8, 2022

oranagra moved this from Backlog to To Do in 7.0 Jul 8, 2022

oranagra mentioned this issue Jul 10, 2022

Redis should support enforcing scripts only accessing declared keys #10958

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] Functions should be able declare additional keys #10951

[NEW] Functions should be able declare additional keys #10951

madolson commented Jul 7, 2022 •

edited

oranagra commented Jul 8, 2022

yossigo commented Jul 9, 2022

MeirShpilraien commented Jul 10, 2022

ranshid commented Jul 10, 2022

oranagra commented Jul 10, 2022

madolson commented Jul 11, 2022

oranagra commented Jul 11, 2022

zuiderkwast commented Jul 11, 2022

madolson commented Jul 19, 2022

zuiderkwast commented Jul 19, 2022

oranagra commented Jul 19, 2022

madolson commented Jul 19, 2022

oranagra commented Jul 20, 2022

madolson commented Jul 20, 2022 •

edited

madolson commented Jul 20, 2022

oranagra commented Jul 21, 2022 •

edited

oranagra commented Jul 26, 2022 •

edited

[NEW] Functions should be able declare additional keys #10951

[NEW] Functions should be able declare additional keys #10951

Comments

madolson commented Jul 7, 2022 • edited

oranagra commented Jul 8, 2022

yossigo commented Jul 9, 2022

MeirShpilraien commented Jul 10, 2022

ranshid commented Jul 10, 2022

oranagra commented Jul 10, 2022

madolson commented Jul 11, 2022

oranagra commented Jul 11, 2022

zuiderkwast commented Jul 11, 2022

madolson commented Jul 19, 2022

zuiderkwast commented Jul 19, 2022

oranagra commented Jul 19, 2022

madolson commented Jul 19, 2022

oranagra commented Jul 20, 2022

madolson commented Jul 20, 2022 • edited

madolson commented Jul 20, 2022

oranagra commented Jul 21, 2022 • edited

oranagra commented Jul 26, 2022 • edited

madolson commented Jul 7, 2022 •

edited

madolson commented Jul 20, 2022 •

edited

oranagra commented Jul 21, 2022 •

edited

oranagra commented Jul 26, 2022 •

edited