Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved version of UniqueList: OrderedSet #76

Merged
merged 19 commits into from
Jun 18, 2023

Conversation

lewispb
Copy link
Contributor

@lewispb lewispb commented Feb 25, 2022

Closes #74

Using a benchmark script, we can see the improvement in performance by using a Sorted Set in some scenarios:

              user       system     total    real
ordered_set 0.131565   0.018613   0.150178 (  0.174106)
unique_list 1.109980   0.133186   1.243166 (  4.544614)

Initially, we can introduce this as a new datatype. But to follow on, a migration mechanism will be introduced so that we can replace unique_list.

A sorted set is optimal for this type of functionality because we can
use fewer, more-performant Redis calls.
In order to ease the transition from UniqueList being backed by
a Redis list, we can fallback to the legacy implementation for read
operations. For write operations we first migrate to a sorted set,
then retry.
legacy_elements = legacy_list.elements
legacy_list.del
append(legacy_elements)
end

This comment was marked as resolved.

@@ -0,0 +1,29 @@
# You'd normally call this a set, but Redis already has another data type for that
class Kredis::Types::UniqueListLegacy < Kredis::Types::List
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the existing UniqueList to UniqueListLegacy.

end

def current_nanoseconds(negative:)
"%10.9f" % (negative ? -Time.now.to_f : Time.now.to_f)

This comment was marked as resolved.

end

multi do
zadd(elements_with_scores)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider LT / GT options for ordering safety, avoiding problems with clock skew.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not possible due to Redis server compatibility target being version 4.

end

def basis_score_components
[ redis_time + Time.now.utc.to_i, Time.now.utc.nsec ]

This comment was marked as resolved.

@dhh dhh requested a review from jeremy February 26, 2022 09:33
Use the Redis server time as a basis. Then add the current process time.
Then add to that the index of the inserted elements as a microsecond
component.
@lewispb lewispb force-pushed the unique_list_with_sorted_set branch from f7af6e0 to f28ff6d Compare March 1, 2022 10:39
Copy link
Member

@jeremy jeremy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice approach!

I'd introduce this as a distinct data type rather than migrating in place. Consider e.g. ordered_set which reflects the zset usage and scoring. Then, separately, introduce a pathway to auto-migrate from unique lists to ordered sets.

It'd be interesting to consider a migration abstraction for other types, too. Could be a proxy wrapper with declared fallback vs migrate behavior, e.g. migrates from: :OldType, to: NewType, fallback: [ methods ], migrate: [ methods ].

end

def process_start_time
@process_start_time ||= redis.time.join(".").to_f - process_uptime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this may be used in replicated and clustered setups. Flagging just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you envisage any problems if this were used in such a setup?

end

def process_uptime
Process.clock_gettime(Process::CLOCK_MONOTONIC)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about our spread of platform support, but this is limited to POSIX (so, probably not JRuby).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Process.clock_gettime is OK in JRuby 9 +

https://blog.dnsimple.com/2018/03/elapsed-time-with-ruby-the-right-way/

end

def trim(from_beginning:)
return unless limit&.positive?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are negative/zero limits supported? If not, would fail earlier, e.g. in a #limit= method, and only check for nil limit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

legacy_elements = legacy_list.elements
legacy_list.rename([ legacy_list.key, "_backup" ].join)
append(legacy_elements)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be within an atomic MULTI…EXEC?

Like a UniqueList, but backed by a Redis sorted set rather than a list
* main:
  Bump version for 1.2.0
  Use pipeline for migration too
  Ensure to pass the pipeline to super
  Use block parameter to pipeline in Redis#multi (rails#68)
  Note tested / supported versions of Redis in the Readme (rails#79)
  Return counter value after incrementing/decrementing
  Enum Bang setter (rails#82)
  Add reset to Cycle after_change callbacks
  Add example in Readme.md
  Support configuring custom keys via method invocation
  Use bundle add instead (rails#81)
@lewispb
Copy link
Contributor Author

lewispb commented May 26, 2022

@jeremy thanks for all the feedback. This is much simpler now. I like the idea of splitting out the migration into a follow-up PR.

@lewispb lewispb changed the title Back UniqueList with a Sorted Set Improved version of UniqueList: OrderedSet May 26, 2022
Copy link
Contributor

@intrip intrip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewispb Great approach! I've just left a note for your consideration

return if elements.empty?

elements_with_scores = types_to_strings(elements, typed).map.with_index do |element, index|
score = generate_base_score(negative: prepending) + index / 100000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate what index / 10000 is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redis sorted sets use a double 64-bit floating point number to represent the score. In all the architectures we support, this is represented as an IEEE 754 floating point number

I was trying to store the maximum unique precision possible in Redis, to store scores (times). By dividing a large integer (seconds) by 100000 we get a float. e.g:

>> Time.now.to_i
=> 1653898149
>> Time.now.to_f / 100000
=> 16538.98132595902
>> 3.times { p Time.now.to_f / 100000 }
16538.98262287975
16538.982622880398
16538.98262288052

Even though we lose some precision, each iteration produces a higher score than the previous one, perfect for our use case of appending.

Comment on lines +48 to +52
def generate_base_score(negative:)
current_time = process_start_time + process_uptime

negative ? -current_time : current_time
end
Copy link
Contributor

@intrip intrip May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever approach 👍!

Just a note for your consideration:

What would happen in case of skewed Redis clocks / concurrent writes? Is possible to face the scenario where appending 2 arrays would lead to mixing their elements?

# In process A
ordered_set.append([1, 2, 3])
# In process B
ordered_set.append([4, 5, 6])
# Is this possible?
ordered_set.elements
# => [1, 2, 4, 3, 5, 6]

Maybe a less clever approach that relies getting the max/min (with an optimistic lock) would be more resilient?

WATCH zset
# get max
element = ZRANGE zset -1 -1
# compute scores max +1 for each element
MULTI
# add elements
ZADD zset elements
# trim
EXEC

Copy link
Member

@jeremy jeremy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great.

Let's consider how to guide folks to the improved data structure. Perhaps remove the unique list example from the README, or move to the bottom and mark as deprecated? Could do this in a separate PR, since it also begs "well, how do I migrate?"

@dhh dhh merged commit a0062d1 into rails:main Jun 18, 2023
dhh added a commit that referenced this pull request Jun 18, 2023
dhh added a commit that referenced this pull request Jun 18, 2023
lewispb added a commit to lewispb/kredis that referenced this pull request Jun 19, 2023
* main: (21 commits)
  Bump version for 1.4.0
  Update nokogiri for compatibility
  Revert "Improved version of UniqueList: OrderedSet (rails#76)" (rails#111)
  Add `last` to lists (rails#97)
  Improved version of UniqueList: OrderedSet (rails#76)
  Return Time objects instead of deprecated DateTime (rails#106)
  Fix possible deserialization of untrusted data
  Typecast return of Set#take (rails#105)
  Declare Active Model dependency (rails#107)
  Address LogSubscriber deprecation (rails#98)
  Account for time zones in DateTime serializations (rails#102)
  Add sample to set (rails#100)
  Bump version for 1.3.0
  Allow Redis 5.x
  Add ltrim to lists
  Coalesce "current pipeline or redis" into the redis method itself
  Pefer a thread_mattr_accessor over a thread local variable
  Delete list of keys in batch (rails#90)
  Use a thread-local variable for pipeline
  Revert "Use block parameter to pipeline in Redis#multi (rails#68)"
  ...
lewispb added a commit to basecamp/kredis that referenced this pull request Jul 8, 2023
…tialize

* origin/main: (22 commits)
  Add kredis_ordered_set for OrderedSet usage in models
  Add a development console
  Bump version for 1.5.0
  Fix ordered set prepend bug (rails#115)
  Unique list with sorted set (rails#114)
  Eliminating Ruby Warnings (rails#112)
  CI against Redis 7, Ruby 3.1, and Ruby 3.2 (rails#113)
  Bump version for 1.4.0
  Update nokogiri for compatibility
  Revert "Improved version of UniqueList: OrderedSet (rails#76)" (rails#111)
  Add `last` to lists (rails#97)
  Improved version of UniqueList: OrderedSet (rails#76)
  Return Time objects instead of deprecated DateTime (rails#106)
  Fix possible deserialization of untrusted data
  Typecast return of Set#take (rails#105)
  Declare Active Model dependency (rails#107)
  Address LogSubscriber deprecation (rails#98)
  Account for time zones in DateTime serializations (rails#102)
  Add sample to set (rails#100)
  Bump version for 1.3.0
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Use a sorted set for unique list functionality
4 participants