Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SINTERCARD/ZINTERCARD Commands #8946

Merged
merged 8 commits into from
Aug 3, 2021

Conversation

jonahharris
Copy link
Contributor

@jonahharris jonahharris commented May 14, 2021

Add SINTERCARD and ZINTERCARD commands that are similar to
ZINTER and SINTER but only return the cardinality with minimum
processing and memory overheads.


For set-based operations, a painful consequence of requiring only the resulting cardinality is a substantial memory overhead in either returning the entire resulting set or storing it in another key. This adds SINTERCARD/ZINTERCARD commands, which have zero memory overhead and return only the resulting cardinality. With these commands, performing Jaccard-type calculations on two sets is substantially faster and less resource-intensive - it's simply an SCARD of both sets and one SINTERCARD. Unfortunately, there is no easy way to implement a similar cardinality for unions given the underlying implementation. ZINTERCARD is kinda nasty from a factoring perspective given zunionInterDiffGenericCommand's handling of all use-cases.

Anyway, interested in thoughts on this as they are required for a good amount of recommendation system work and, while they could be done with modules, it seems nasty to copy out the logic of redis core into a module. If there's a pro-command response, I'll clean-up the zset variant some.

@jonahharris
Copy link
Contributor Author

@oranagra @yossigo @itamarhaber Thoughts on this? At present, it's resulting in ~4x performance gain for the project I'm working on.

@jonahharris
Copy link
Contributor Author

jonahharris commented May 14, 2021

local function microtime ()
  local ts = redis.call('time');
  return tonumber(string.format('%s.%s', ts[1], ts[2]));
end

local function sinter (s1, s2)
  return #(redis.call('sinter', s1, s2));
end

local function sinterstore (s1, s2)
  return redis.call('sinterstore', 'tempset', s1, s2);
end

local function sintercard (s1, s2)
  return redis.call('sintercard', s1, s2);
end

local function jaccard (s1, s2, interfunc)
  local ic = interfunc(s1, s2);
  local uc = ((redis.call('scard', s1) + redis.call('scard', s2)) - ic);

  if (0 == uc) then
    return tostring(0.0);
  else
    return tostring(ic / uc);
  end 
end

local function main (
  argc,
  argv
)
  if (2 ~= argc) then
    return redis.error_reply('invalid number of arguments');
  end

  local num_iterations = 1000;
  local impl = {
    ['sinter'] = sinter,
    ['sinterstore'] = sinterstore,
    ['sintercard'] = sintercard
  };
  local result = {};
  for k, v in pairs(impl)
  do
    local sum = 0;
    for ii = 1, num_iterations
    do
      local t1 = microtime();
      jaccard(argv[1], argv[2], v);
      sum = (sum + (microtime() - t1));
    end
    local avg = (sum / num_iterations);
    result[k] = avg;
    print(string.format('%s %8.6fs avg', k, avg));
  end

  local comparators = { 'sinter', 'sinterstore' };
  for ii = 1, #comparators
  do
    local comparator = comparators[ii];

    print(string.format('COMPARED TO %s', comparator));
    print(string.format('increase/decrease...... %8.6f %%',
      ((result['sintercard'] - result[comparator]) / result[comparator]) * 100.0));
    print(string.format('performance increase... %8.6f %%',
      ((result[comparator] - result['sintercard']) / result['sintercard']) * 100.0));
    print(string.format('times faster........... %8.6f',
      (result[comparator] / result['sintercard'])));
  end
end

return main(#KEYS, KEYS);
jharris-laptop02:redis jharris$ redis-cli --eval jaccard.lua s1 s2
sinter 0.002803s avg
sintercard 0.000908s avg
sinterstore 0.005039s avg
COMPARED TO sinter
increase/decrease...... -67.620561 %
performance increase... 208.837964 %
times faster........... 3.088380
COMPARED TO sinterstore
increase/decrease...... -81.986088 %
performance increase... 455.126494 %
times faster........... 5.551265

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems useful enough to me. We've had an ongoing discussion about trying to reduce the number of commands that exist, but it seems like this is best left as a separate command since it returns a very different return type.

Some options like "WITHSCORES" also don't really make sense with zintercard, and should be blocked.

src/t_zset.c Outdated
dictAdd(dstzset->dict,tmp,&znode->score);
if (sdslen(tmp) > maxelelen) maxelelen = sdslen(tmp);
if (!cardinality_only) {
tmp = zuiNewSdsFromValue(&zval);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this spacing is off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing the blockage of irrelevant options (WITHSCORES, WEIGHTS, ...)

@jonahharris
Copy link
Contributor Author

jonahharris commented May 14, 2021

Seems useful enough to me. We've had an ongoing discussion about trying to reduce the number of commands that exist, but it seems like this is best left as a separate command since it returns a very different return type.

Agreed. While the store variant of zunionInterDiffGenericCommand similarly returns only the cardinality of the resulting zset, there wasn't a good way for me to fit this concept into any of the current commands that made sense - with the exception of adding a new intersection-only option which seemed nasty and out of place.

Some options like "WITHSCORES" also don't really make sense with zintercard, and should be blocked.

Definitely. That's the main thing I'd like to clean-up if there is a desire to move this forward.

@oranagra
Copy link
Member

oranagra commented May 19, 2021

i also feel that new command is probably better, but i wanna note that in some sense WITHSCORE also changes the response type (more clearly on RESP3), and that mutually-exclusive arguments are also common (NX and XX), so in that sense a CARDONLY argument might have been ok too.

I think the reasoning may be that when there are no tons of common other arguments (other than the two inputs).
i.e. the WEIGHTS and AGGREGATE arguments for ZDIFF and ZUNION don't apply here, it's just a matter of defining the operation (cmd name), and two inputs.
unlike other commands we unified in which if we kept them apart, extending one, would mean we need to extend the others (e.g. GEORADIUS vs GEOBOX, and SETEX vs GETSET)

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay. generally LGTM, few minor suggestions.
i'd like to hear @itamarhaber feedback on the command and if there was anything similar discussed in the past.

src/server.c Outdated Show resolved Hide resolved
src/t_zset.c Outdated Show resolved Hide resolved
@yossigo
Copy link
Member

yossigo commented Jun 15, 2021

New command and syntax LGTM.

@oranagra oranagra added state:major-decision Requires core team consensus release-notes indication that this issue needs to be mentioned in the release notes approval-needed Waiting for core team approval to be merged state:needs-doc-pr requires a PR to redis-doc repository labels Jun 30, 2021
@itamarhaber
Copy link
Member

I upvote the new API and the use case.

@oranagra
Copy link
Member

@jonahharris do you want to see it though, or shall i pick it up?

@oranagra
Copy link
Member

@yossigo please review my changes.
p.s. note that unlike ZINTER and ZINTERSTORE, which takes several flag argument, ZINTERCARD takes none.
so in theory we can change it from ZINTERCARD <num-keys> <key1> <key2> ... to be similar to SINTERCARD (SINTERCARD <key1> <key2>).
however i suppose that being consistent with ZINTER and ZINTERSTORE is more important for most users of this command.

@oranagra
Copy link
Member

doc PR: redis/redis-doc#1610

@itamarhaber
Copy link
Member

@oranagra

however i suppose that being consistent with ZINTER and ZINTERSTORE is more important for most users of this command.

Agreeing with that.

@jonahharris
Copy link
Contributor Author

@oranagra Thanks for picking this up, man. I don't know why GMail only pushes some of these to my Inbox where I can see them, but it never seems to be the right ones :(

@oranagra
Copy link
Member

@redis/core-team please approve the two new commands.

itamarhaber
itamarhaber previously approved these changes Jul 26, 2021
Copy link
Member

@itamarhaber itamarhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

src/t_set.c Outdated Show resolved Hide resolved
src/t_zset.c Show resolved Hide resolved
Co-authored-by: Itamar Haber <itamar@redislabs.com>
@madolson madolson self-requested a review August 3, 2021 06:17
madolson
madolson previously approved these changes Aug 3, 2021
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I missed the update on this. Code and API LGTM.

@oranagra oranagra merged commit 432c92d into redis:unstable Aug 3, 2021
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
Add SINTERCARD and ZINTERCARD commands that are similar to
ZINTER and SINTER but only return the cardinality with minimum
processing and memory overheads.

Co-authored-by: Oran Agra <oran@redislabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants