Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for set difference #14

Closed
wants to merge 2 commits into from
Closed

Add support for set difference #14

wants to merge 2 commits into from

Conversation

ideasasylum
Copy link

This adds support for Redis' SDIFF and SDIFFSTORE operations allowing us to efficiently calculate set difference and, optionally, store the result in another Kredis key.

It looks like this:

jamie = Kredis.set 'set1', typed: :string
jam = Kredis.set 'set2', typed: :string

jamie << %w(j a m i e) 
jam << %w(j a m) 

# SDIFF
jamie - jam #=> ["i", "e"]
jamie.diff jam #=> ["i", "e"]

# SDIFFSTORE which avoids the need to return the result to ruby
result = Kredis.set 'resultset'
jamie.diff jam, store: result
result.members #=> ["i", "e"]

I'm not sure if this is a direction you'd like Kredis to go in but it would be easy enough to add SUNION (set + otherset) and SINTER (set & otherset) operations to mirror the same operations in Array. Happy to write PRs for those operations too or expand this one

This adds support for Redis' SDIFF and SDIFFSTORE operations allowing us to efficiently calcuate set difference and, optionally, store the result in another redis key
@ideasasylum
Copy link
Author

fwiw, I just knocked out the union (+) and intersection (&) operators to see what they looked like

@kaspth
Copy link
Contributor

kaspth commented Feb 13, 2021

Thanks for looking at this! We generally don't want to just match Redis commands in Ruby, but rather base the types on use cases from within Ruby. That's why we have e.g. UniqueList which uses multiple commands underneath the hood.

I think the store option breaks our conceptualization, feels like it's masking a potential other type. But let's start from the basics, what's the use case?

@ideasasylum
Copy link
Author

Hey @kaspth, thanks for taking a look. I enjoyed hacking on Kredis so good job so far!

Here's what I'm working on which might explain my use case: I need to sync data with a rather terrible API (basically a mailing list). I need to grab all the email addresses from the API, grab the user list from our database, and then work out which emails to delete from the API and which to add so I can keep them in sync. The current script is straight ruby and uses the Array difference operator to determine emails_to_add and emails_to_delete.

Things are getting a little slow now there are tens of thousands of entries and I'd like to split this up into separate jobs. Instead of storing the entries in a Ruby array, I'll be inserting the entries into a Redis set to coordinate the data from various jobs. Then, just like the ruby code, I'll use set difference to determine which email addresses to add, and which to delete.

In Kredis, using this PR looks like

paying_customers = Kredis.set 'customers'
api_contacts = Kredis.set 'contacts'

emails_to_delete = api_contacts - paying_customers
emails_to_delete.each { |e| DeleteContactJob.perform_later e }

emails_to_add = paying_customers - api_contacts
emails_to_add.each { |e| AddContactJob.perform_later e }

Whenever I use a set data structure, it's almost always because I need to use set operations, especially diff and intersect, perhaps with some uniqueness constraints—otherwise, I'd just use a list. I'd really like to see those operations here in Kredis too. I think it would be a powerful use of the data structure and maintain compatibility with the familiar Array interface. With these operators, you could take pure Ruby code using Arrays and convert it to Redis-backed sets using Kredis with a familiar interface.

TBH, I wasn't aware of the *STORE versions and I just thought that was a neat optimisation if you wanted the resulting difference to be stored back in Redis. It's not strictly necessary for my use case and it did feel a bit out of place, especially as it required creating both a - method and diff method to accommodate the store.

@kaspth
Copy link
Contributor

kaspth commented Feb 16, 2021

Heyo, I've been trying to do some more thinking around this and I've been wondering if we should support Set operations right now. Because we're essentially opening the door to cross-type comparisons as shown by the need to call key, which feels like we're slightly breaking Kredis conceptualization of interfacing with Redis on a one-type basis. Not saying we can't expand, just that for right now this feels like it would be premature. Yes, the code itself isn't that much, but I'm saying conceptually here. I'd like us to see more use cases from real apps before we try to extract something here.

Meanwhile, you can make your example work with this:

emails_to_delete = api_contacts.proxy.sdiff paying_customers.key

Thanks for the suggestion! You're more than welcome to share more use cases here or in other issues as you add Kredis to your app 🙏

@kaspth kaspth closed this Feb 16, 2021
@ideasasylum
Copy link
Author

Gotcha! I think the hardest part of starting an open source project is figuring out what it is and what it's not.

The cross-type comparisons are something I considered and would probably need some type-checking to raise an exception if you tried to perform set difference with a non-set type. I'm not sure how Redis handles that situation either.

To be honest, I'll probably go back to plain redis commands instead of kredis.proxy… 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants