-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SINTERCARD/ZINTERCARD Commands #8946
Conversation
…nly the resulting cardinality.
…nly the resulting cardinality.
@oranagra @yossigo @itamarhaber Thoughts on this? At present, it's resulting in ~4x performance gain for the project I'm working on. |
local function microtime ()
local ts = redis.call('time');
return tonumber(string.format('%s.%s', ts[1], ts[2]));
end
local function sinter (s1, s2)
return #(redis.call('sinter', s1, s2));
end
local function sinterstore (s1, s2)
return redis.call('sinterstore', 'tempset', s1, s2);
end
local function sintercard (s1, s2)
return redis.call('sintercard', s1, s2);
end
local function jaccard (s1, s2, interfunc)
local ic = interfunc(s1, s2);
local uc = ((redis.call('scard', s1) + redis.call('scard', s2)) - ic);
if (0 == uc) then
return tostring(0.0);
else
return tostring(ic / uc);
end
end
local function main (
argc,
argv
)
if (2 ~= argc) then
return redis.error_reply('invalid number of arguments');
end
local num_iterations = 1000;
local impl = {
['sinter'] = sinter,
['sinterstore'] = sinterstore,
['sintercard'] = sintercard
};
local result = {};
for k, v in pairs(impl)
do
local sum = 0;
for ii = 1, num_iterations
do
local t1 = microtime();
jaccard(argv[1], argv[2], v);
sum = (sum + (microtime() - t1));
end
local avg = (sum / num_iterations);
result[k] = avg;
print(string.format('%s %8.6fs avg', k, avg));
end
local comparators = { 'sinter', 'sinterstore' };
for ii = 1, #comparators
do
local comparator = comparators[ii];
print(string.format('COMPARED TO %s', comparator));
print(string.format('increase/decrease...... %8.6f %%',
((result['sintercard'] - result[comparator]) / result[comparator]) * 100.0));
print(string.format('performance increase... %8.6f %%',
((result[comparator] - result['sintercard']) / result['sintercard']) * 100.0));
print(string.format('times faster........... %8.6f',
(result[comparator] / result['sintercard'])));
end
end
return main(#KEYS, KEYS);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems useful enough to me. We've had an ongoing discussion about trying to reduce the number of commands that exist, but it seems like this is best left as a separate command since it returns a very different return type.
Some options like "WITHSCORES" also don't really make sense with zintercard, and should be blocked.
src/t_zset.c
Outdated
dictAdd(dstzset->dict,tmp,&znode->score); | ||
if (sdslen(tmp) > maxelelen) maxelelen = sdslen(tmp); | ||
if (!cardinality_only) { | ||
tmp = zuiNewSdsFromValue(&zval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this spacing is off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing the blockage of irrelevant options (WITHSCORES
, WEIGHTS
, ...)
Agreed. While the store variant of zunionInterDiffGenericCommand similarly returns only the cardinality of the resulting zset, there wasn't a good way for me to fit this concept into any of the current commands that made sense - with the exception of adding a new intersection-only option which seemed nasty and out of place.
Definitely. That's the main thing I'd like to clean-up if there is a desire to move this forward. |
i also feel that new command is probably better, but i wanna note that in some sense WITHSCORE also changes the response type (more clearly on RESP3), and that mutually-exclusive arguments are also common (NX and XX), so in that sense a CARDONLY argument might have been ok too. I think the reasoning may be that when there are no tons of common other arguments (other than the two inputs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delay. generally LGTM, few minor suggestions.
i'd like to hear @itamarhaber feedback on the command and if there was anything similar discussed in the past.
New command and syntax LGTM. |
I upvote the new API and the use case. |
@jonahharris do you want to see it though, or shall i pick it up? |
@yossigo please review my changes. |
doc PR: redis/redis-doc#1610 |
Agreeing with that. |
@oranagra Thanks for picking this up, man. I don't know why GMail only pushes some of these to my Inbox where I can see them, but it never seems to be the right ones :( |
@redis/core-team please approve the two new commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM
Co-authored-by: Itamar Haber <itamar@redislabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, I missed the update on this. Code and API LGTM.
Add SINTERCARD and ZINTERCARD commands that are similar to ZINTER and SINTER but only return the cardinality with minimum processing and memory overheads. Co-authored-by: Oran Agra <oran@redislabs.com>
Add SINTERCARD and ZINTERCARD commands that are similar to
ZINTER and SINTER but only return the cardinality with minimum
processing and memory overheads.
For set-based operations, a painful consequence of requiring only the resulting cardinality is a substantial memory overhead in either returning the entire resulting set or storing it in another key. This adds SINTERCARD/ZINTERCARD commands, which have zero memory overhead and return only the resulting cardinality. With these commands, performing Jaccard-type calculations on two sets is substantially faster and less resource-intensive - it's simply an SCARD of both sets and one SINTERCARD. Unfortunately, there is no easy way to implement a similar cardinality for unions given the underlying implementation. ZINTERCARD is kinda nasty from a factoring perspective given zunionInterDiffGenericCommand's handling of all use-cases.
Anyway, interested in thoughts on this as they are required for a good amount of recommendation system work and, while they could be done with modules, it seems nasty to copy out the logic of redis core into a module. If there's a pro-command response, I'll clean-up the zset variant some.