Add new commands ZDIFF and ZDIFFSTORE #7961

felipou · 2020-10-25T17:53:13Z

Related to issue #446.

Simply adds the new commands ZDIFF and ZDIFFSTORE. For now, there are 3 things missing:

More tests (for now I just used the two basic tests from the reference PR)
Add documentation with a PR for redis-doc
Include "algorithm 2", as in the SDIFF command. I've started something here. It works, but it doesn't look very good to me right now, it seems rather "fragile". But if this is wanted, I can add it to this PR and work on it.

Side question: I noticed that the ZINTER and ZUNION commands are missing from the help.c file (only the "STORE" version is there), is that as expected, or should I fix it (here or in another PR)? I added both the ZDIFF and ZDIFFSTORE here.

oranagra

i see t_set.c saying this:

            /* With algorithm 1 it is better to order the sets to subtract
             * by decreasing size, so that we are more likely to find
             * duplicated elements ASAP. */
            qsort(sets+1,setnum-1,sizeof(robj*),
                qsortCompareSetsByRevCardinality);

maybe we need to do the same in zset?

i don't see any way to reduce the "fragility", i suppose you mean it's a lot of code and repeated twice? i guess the way to combat the fragility is to make sure there's a good test coverage for it.
i'll try to comment on that commit.

redis.conf

src/help.h

It's the same algorithm used in SDIFF, modified to work with zsets in zdiff/zdiffstore.

felipou · 2020-10-28T02:06:25Z

Just pushed some changes, mainly:

Adding the commit with algorithm 2 from the other branch.
Modifying the search for the max element on algorithm 2, as suggested by @oranagra.
Changes to the things @oranagra pointed out.

Still missing more tests, and the PR on redis-doc.

felipou · 2020-10-28T02:09:18Z

Two things I currently need advisement:

Should I move dictGetMaxElementLength that I created for algorithm 2 to the dict.c file?
On algorithm 2, inside the else block where I remove the elements from the dstzset, I just copied that code from a part of zsetDel, including the comments. I should probably create a function for that, right?

redis.conf

src/t_zset.c

oranagra · 2020-10-28T07:38:32Z

src/t_zset.c

+                de = dictUnlink(dstzset->dict,tmp);
+                if (de != NULL) {
+                    /* Get the score in order to delete from the skiplist later. */
+                    score = *(double*)dictGetVal(de);
+
+                    /* Delete from the hash table and later from the skiplist.
+                    * Note that the order is important: deleting from the skiplist
+                    * actually releases the SDS string representing the element,
+                    * which is shared between the skiplist and the hash table, so
+                    * we need to delete from the skiplist as the final step. */
+                    dictFreeUnlinkedEntry(dstzset->dict,de);
+
+                    /* Delete from skiplist. */
+                    int retval = zslDelete(dstzset->zsl,score,tmp,NULL);
+                    serverAssert(retval);


i agree with your suggestion, let's move this into a common function that serves both this one and zsetDel, and returns a value so the caller knows if the entry was deleted and can carry out the other tasks forthat if

Done. I'm not sure the name of the new function is really appropriate though, could use some help with it. I called it zsetslDel, seeing as it is a function to delete an element from a zset encoded as a skiplist (plus dict).

i don't have any definite answer. i think i would like to avoid creating a new type of prefix.
i would suggest to just use a zset prefix, and good top comment.
but zsetDel is taken 8-).
maybe call it zsetRemoveFromSkiplist ?

If any other set is equal to the first set in zdiff, we're subtracting the set from itself, so we already know we'll have an empty set as a result.

felipou · 2020-10-31T19:03:47Z

In addition to addressing everything pointed out, I pushed a minor optimization, which could also be done in the SDIFF command too (but in another pull requests, maybe?).

felipou · 2020-10-31T19:41:51Z

Added some more tests, including a copy of the "SDIFF fuzzing" test.

oranagra

@felipou while looking at the complexity formulas with @inbaryuval (conclusions tomorrow) we noticed something to improve about the use of zslInsert (see new comments).

oranagra · 2020-11-04T19:16:01Z

src/t_zset.c

+
+    memset(&zval, 0, sizeof(zval));
+    zuiInitIterator(&src[0]);
+    while (zuiNext(&src[0],&zval)) {


going over the list backwards will make zslInsert more efficient, see:

redis/src/rdb.c

Lines 855 to 862 in d8fbd3a

/* We save the skiplist elements from the greatest to the smallest

* (that's trivial since the elements are already ordered in the

* skiplist): this improves the load process, since the next loaded

* element will always be the smaller, so adding to the skiplist

* will always immediately stop at the head, making the insertion

* O(1) instead of O(log(N)). */

zskiplistNode *zn = zsl->tail;

while (zn != NULL) {

alternatively, maybe we can improve zslInsert to be optimized for this case too?

Ok, I just analyzed the code, and it's a bit more complicated than I thought. The referenced code already knows that it's a skiplist. In the case of zdiff, we have to cover all cases since the src values can have any encoding. I've checked that zuiInitIterator and zuiNext are only used in zinter/zunion/zdiff, so I thought about changing then to zuiInitBackIterator and zuiPrev. I'll have to study the data structure a bit more to understand how to do that, but shouldn't be too hard. I'll try to finish this tomorrow.

oranagra · 2020-11-04T19:16:45Z

src/t_zset.c

+        zuiInitIterator(&src[j]);
+        while (zuiNext(&src[j],&zval)) {


same, going backwards will make zslInsert faster

Here I had an extra concern, which is the impact that this will have on zsetslDel. Should we iterate forward when deleting the elements? This would be simple, we could just move loop to inside the if that checks if it's the first set or not, iterating backwards in one case, and forward in the other.

you mean zslDelete? looking at it, and considering we can't know the order of elements between efferent sets, i'm not sure if it has any principal difference.

maybe for the (possibly common case) that we have just two sets, and the majority of the elements are being deleted, it's best to iterate from head to tail. (constantly deleting the first element and no need for long searches).

maybe instead of changing the ZDIFF code, it's better to try to have zslInsert detect an insertion to the tail and make it efficient for that case too.
this way no one else will have to worry to scan insertions in the efficient direction.

Seems like a better idea indeed. I'll try to look into it, but I'll have to better understand the skiplist data structure, because I read the zslInsert code and couldn't see an easy way to optimize for that case. The only idea I had was to add a "hint" so that it assumes the element is greater than all, but I don't think it's possible to just reverse the loops, since it would change the way the "rank" array in the function is calculated.

It just ocurred to me: since this is an optimization that affects other commands, maybe it should be in a different pull request?

let's make it in a different PR. this one is already going too long, and since the change isn't gonna affect the API, and might even be in a different are in the code with implications to other parts of redis...

@felipou will you have time to try to handle that soon?
i wanna make ZDIFF efficient for 6.2 one way or another (we have about 2 weeks before an RC).

I'll try to come up with something today, at least an initial PR that we can build on top of, that way if I don't have much time in the next weeks someone can pick up where I left. But I believe I can get something working by the next weekend.

I just reviewed the insert algorithm for skip lists, and if I understand correctly, there is no way to efficiently insert backwards. To insert, we need to update the forward pointers in all levels, so we need to traverse the list anyway. The backwards pointer is only in the first level, and is only an optimization for ZREVRANGE to traverse the list backwards. If we created backward pointers for all levels, we would end up having to traverse the list both ways when inserting (and deleting and updating).

I'll try to come up with the backward equivalents of zuiInitIterator and zuiNext instead, so that we can modify the ZDIFF code directly.

Just finished this PR: #8105

src/t_zset.c

oranagra · 2020-11-04T19:39:00Z

one other random thought (maybe for another PR), maybe when ZINTER / ZUNION / ZDIFF are used without WITHSCORES, we want to avoid building the skiplist altogether? i suppose users expect the result to be sorted even if provided without scores. but maybe some use cases don't need that and will benefit from not paying that cost?

this concern is new (introduced when #7794 added ZUNION / ZINTER)
@yangbodong22011 @itamarhaber do you have any feedback on that?

p.s. i guess the WITHSCORES argument should be rejected as invalid syntax when dstkey is non-NULL.

oranagra

@felipou i got feedback from my "math doctor" and tried to suggest the changes that are both correct, and hopefully easy to understand.
please merge them, or let me know if you see any problem.

other than that there's the reverse iteration issue i posted yesterday that needs to be changed.

src/t_zset.c

madolson

API LGTM

oranagra · 2020-11-10T20:24:15Z

@felipou what's left is two small changes:

changing the order of iteration to be more optimal for insertion
accepting my suggested comments about complexity (if you concur)
can you push this through or rather i handle it?

felipou · 2020-11-11T03:48:31Z

Sorry for taking so long, I'll go over everything now.

This check was copied from the zunion/zinter commands code, but is not needed for zdiff since it doesn't have the WEIGHTS argument.

felipou · 2020-11-11T04:43:21Z

p.s. i guess the WITHSCORES argument should be rejected as invalid syntax when dstkey is non-NULL.

Ok, will do that!

Review suggestions by oranagra Co-authored-by: Oran Agra <oran@redislabs.com>

felipou · 2020-11-13T01:32:15Z

Just merged the suggested comments, and added the rejection of WITHSCORES in the case of the STORE commands.

The only thing remaining is the optimization of zslInsert (or making the insert loop backwards on zdiff), I'll focus on this now.

oranagra · 2020-11-13T08:05:42Z

@redis/core-team this one is ready to be merged, please approve.
it adds ZDIFF and ZDIFFSTORE which work similarly to SDIFF and SDIFFSTORE.

besides that it makes sure the new WITHSCORES argument that was added for ZUNION isn't considered valid for ZUNIONSTORE

oranagra · 2020-11-15T12:14:45Z

Merged. Thank you @felipou !

- Add ZDIFF and ZDIFFSTORE which work similarly to SDIFF and SDIFFSTORE - Make sure the new WITHSCORES argument that was added for ZUNION isn't considered valid for ZUNIONSTORE Co-authored-by: Oran Agra <oran@redislabs.com>

felipou added 2 commits October 25, 2020 13:46

Add new commands ZDIFF and ZDIFFSTORE

965ece1

Merge remote-tracking branch 'upstream/unstable' into zdiffstore

ee7ca2f

felipou marked this pull request as draft October 25, 2020 17:59

felipou mentioned this pull request Oct 25, 2020

zdiffstore #446

Closed

oranagra reviewed Oct 25, 2020

View reviewed changes

redis.conf Outdated Show resolved Hide resolved

src/help.h Outdated Show resolved Hide resolved

oranagra added this to In progress in 6.2 Oct 27, 2020

oranagra linked an issue Oct 27, 2020 that may be closed by this pull request

zdiffstore #446

Closed

felipou added 7 commits October 27, 2020 22:25

Add algorithm 2 option to zdiff

c292f36

It's the same algorithm used in SDIFF, modified to work with zsets in zdiff/zdiffstore.

Add sort step to algorithm 1 of zdiff

b39b3be

Add extra cardinality check to zdiff algorithm 2

64b561f

Use dict iteration directly to find max element length

95c78ed

Revert changes to help.h adding zdiff/zdiffstore

8ed99fd

Remove unused variables from zdiffAlgorithm2

d147091

Adjusted the note about the error when lacking keys for eviction

e7d226c

oranagra reviewed Oct 28, 2020

View reviewed changes

felipou added 10 commits October 31, 2020 13:37

Make all new internal functions static

1efe956

Extract function zsetslDel from zsetDel and zdiffAlgorithm2

0abf6c8

Change eviction write commands note to text suggested by oranagra

72c51f8

Fix leak by properly releasing dict iterator

d521048

Rename new zset internal functions

58d541b

Fix internal function call names after name change

fc05b83

Fix zdiff algorithms complexity in code comments

f7444fc

Minor optimization to zdiff

8d631da

If any other set is equal to the first set in zdiff, we're subtracting the set from itself, so we already know we'll have an empty set as a result.

Add extra comment about algorithm cost in zdiff algorithm 2

2e4b312

Remove comment that became useless in zdiff algorithm 1

0472a32

Add more tests for zdiff

714e08e

yossigo approved these changes Nov 4, 2020

View reviewed changes

oranagra reviewed Nov 4, 2020

View reviewed changes

oranagra reviewed Nov 5, 2020

View reviewed changes

src/t_zset.c Outdated Show resolved Hide resolved

src/t_zset.c Outdated Show resolved Hide resolved

mp911de mentioned this pull request Nov 8, 2020

Add support for ZDIFF and ZDIFFSTORE commands redis/lettuce#1507

Closed

madolson self-requested a review November 9, 2020 22:28

madolson previously approved these changes Nov 9, 2020

View reviewed changes

yossigo previously approved these changes Nov 10, 2020

View reviewed changes

Remove check for nan

c7132a0

This check was copied from the zunion/zinter commands code, but is not needed for zdiff since it doesn't have the WEIGHTS argument.

felipou dismissed stale reviews from yossigo, madolson, itamarhaber, and oranagra via c7132a0 November 11, 2020 04:42

felipou and others added 2 commits November 12, 2020 22:15

Update algorithm complexities as per review suggestions

dc119ed

Review suggestions by oranagra Co-authored-by: Oran Agra <oran@redislabs.com>

Syntax error for z(diff/inter/union)store if using WITHSCORES argument

540a78b

oranagra approved these changes Nov 13, 2020

View reviewed changes

itamarhaber approved these changes Nov 13, 2020

View reviewed changes

oranagra merged commit d8fd48c into redis:unstable Nov 15, 2020

oranagra moved this from In progress to Done in 6.2 Nov 15, 2020

jiekun mentioned this pull request Dec 21, 2020

Redis 6.2 New Commands Check List redis/redis-py#1434

Closed

18 tasks

oranagra mentioned this pull request Jan 13, 2021

Redis 6.2 RC1. #8187

Merged

sundb mentioned this pull request Feb 5, 2021

RAND* commands: fix risk of OOM panic in hash and zset, use fair random in hash, and add tests for even distribution to all #8429

Merged

ShooterIT mentioned this pull request Jun 25, 2021

Added ZSET command ZDIFFSTORE #448

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new commands ZDIFF and ZDIFFSTORE #7961

Add new commands ZDIFF and ZDIFFSTORE #7961

felipou commented Oct 25, 2020

oranagra left a comment

felipou commented Oct 28, 2020

felipou commented Oct 28, 2020

oranagra Oct 28, 2020

felipou Oct 31, 2020

oranagra Nov 1, 2020

felipou Nov 1, 2020

felipou commented Oct 31, 2020

felipou commented Oct 31, 2020

oranagra left a comment

oranagra Nov 4, 2020

felipou Nov 11, 2020

oranagra Nov 4, 2020

felipou Nov 11, 2020

oranagra Nov 11, 2020

felipou Nov 11, 2020

felipou Nov 11, 2020

oranagra Nov 13, 2020

oranagra Nov 25, 2020

felipou Nov 25, 2020

felipou Nov 25, 2020

felipou Nov 28, 2020

oranagra commented Nov 4, 2020

oranagra left a comment

madolson left a comment

oranagra commented Nov 10, 2020

felipou commented Nov 11, 2020

felipou commented Nov 11, 2020

felipou commented Nov 13, 2020

oranagra commented Nov 13, 2020

oranagra commented Nov 15, 2020

	/* We save the skiplist elements from the greatest to the smallest
	* (that's trivial since the elements are already ordered in the
	* skiplist): this improves the load process, since the next loaded
	* element will always be the smaller, so adding to the skiplist
	* will always immediately stop at the head, making the insertion
	* O(1) instead of O(log(N)). */
	zskiplistNode *zn = zsl->tail;
	while (zn != NULL) {

Add new commands ZDIFF and ZDIFFSTORE #7961

Add new commands ZDIFF and ZDIFFSTORE #7961

Conversation

felipou commented Oct 25, 2020

oranagra left a comment

Choose a reason for hiding this comment

felipou commented Oct 28, 2020

felipou commented Oct 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felipou commented Oct 31, 2020

felipou commented Oct 31, 2020

oranagra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oranagra commented Nov 4, 2020

oranagra left a comment

Choose a reason for hiding this comment

madolson left a comment

Choose a reason for hiding this comment

oranagra commented Nov 10, 2020

felipou commented Nov 11, 2020

felipou commented Nov 11, 2020

felipou commented Nov 13, 2020

oranagra commented Nov 13, 2020

oranagra commented Nov 15, 2020