New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a Schwartzian transform with custom sorting #6342

Merged
merged 7 commits into from Sep 2, 2017

Conversation

Projects
None yet
5 participants
@parkr
Member

parkr commented Sep 1, 2017

Hey!

I was reading through the Enumerable#sort_by documentation and came across an interesting passage about how it gets its speed-up:

A more efficient technique is to cache the sort keys (modification times in this case)
before the sort. Perl users often call this approach a Schwartzian transform, after
Randal Schwartz. We construct a temporary array, where each element is an array
containing our sort key along with the filename. We sort this array, and then extract
the filename from the result.

This is exactly what sort_by does internally.

Whoa! A hidden gem: a recommendation about getting the most out of #sort by reducing object allocations. I'm in!

I tested it on the _docs collection in our jekyllrb.com site and it is quite impressive:

Yeah, ok, correctness all checks out for property "redirect_from"
Yeah, ok, correctness all checks out for property "title"
Warming up --------------------------------------
sort_by_property_directly with sparse property
                        11.000  i/100ms
schwartzian_transform with sparse property
                        76.000  i/100ms
Calculating -------------------------------------
sort_by_property_directly with sparse property
                        114.388  (± 8.7%) i/s -      1.144k in  10.087658s
schwartzian_transform with sparse property
                        824.113  (± 5.2%) i/s -      8.284k in  10.083133s

Comparison:
schwartzian_transform with sparse property:      824.1 i/s
sort_by_property_directly with sparse property:      114.4 i/s - 7.20x  slower

Warming up --------------------------------------
sort_by_property_directly with non-sparse property
                       932.000  i/100ms
schwartzian_transform with non-sparse property
                         1.772k i/100ms
Calculating -------------------------------------
sort_by_property_directly with non-sparse property
                          9.885k (± 6.3%) i/s -     98.792k in  10.040320s
schwartzian_transform with non-sparse property
                         17.436k (± 7.5%) i/s -    173.656k in  10.023797s

Comparison:
schwartzian_transform with non-sparse property:    17435.8 i/s
sort_by_property_directly with non-sparse property:     9885.2 i/s - 1.76x  slower

Based on these numbers, I switched Jekyll::Filters#sort_input to use this method.

🎉 🐎

parkr added some commits Sep 1, 2017

@parkr parkr requested review from jekyll/build Sep 1, 2017

parkr added some commits Sep 1, 2017

@ashmaroli

This comment has been minimized.

Show comment
Hide comment
@ashmaroli

ashmaroli Sep 1, 2017

Member

nice implementation Parker 👍 I did come across this transformation while reading up for another PR, but never bothered to bench an implementation..

If you have the time can you please benchmark using apples[0] and oranges[0] too?
Array[0] is a tad faster than Array.first

Member

ashmaroli commented Sep 1, 2017

nice implementation Parker 👍 I did come across this transformation while reading up for another PR, but never bothered to bench an implementation..

If you have the time can you please benchmark using apples[0] and oranges[0] too?
Array[0] is a tad faster than Array.first

parkr added some commits Sep 1, 2017

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Sep 1, 2017

Member

If you have the time can you please benchmark using apples[0] and oranges[0] too?
Array[0] is a tad faster than Array.first

Updated. I had to ensure I was copying the docs array for each new try (otherwise everything would be faster after the first iteration which sorted in place). Now showing much higher gains: 7x faster and 1.75x faster.

Member

parkr commented Sep 1, 2017

If you have the time can you please benchmark using apples[0] and oranges[0] too?
Array[0] is a tad faster than Array.first

Updated. I had to ensure I was copying the docs array for each new try (otherwise everything would be faster after the first iteration which sorted in place). Now showing much higher gains: 7x faster and 1.75x faster.

@ashmaroli

This comment has been minimized.

Show comment
Hide comment
@ashmaroli

ashmaroli Sep 1, 2017

Member

Now showing much higher gains: 7x faster and 1.75x faster

WOW!! Très cool!!!

Member

ashmaroli commented Sep 1, 2017

Now showing much higher gains: 7x faster and 1.75x faster

WOW!! Très cool!!!

@ashmaroli

This comment has been minimized.

Show comment
Hide comment
@ashmaroli

ashmaroli Sep 1, 2017

Member

@parkr IMO, this change should ideally be made into a public Utility method which can then be adapted for your suggestion to #5904

# lib/jekyll/utils.rb

def schwartzian_transform(input, property, order)
  [...]
end
Member

ashmaroli commented Sep 1, 2017

@parkr IMO, this change should ideally be made into a public Utility method which can then be adapted for your suggestion to #5904

# lib/jekyll/utils.rb

def schwartzian_transform(input, property, order)
  [...]
end
@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Sep 1, 2017

Member

this change should ideally be made into a public Utility method

I don't think we can do that, because each implementation will pull the property from input through different means each time, e.g. using input.data[property] or input[property] or item_property(input, property) in the filters.

Member

parkr commented Sep 1, 2017

this change should ideally be made into a public Utility method

I don't think we can do that, because each implementation will pull the property from input through different means each time, e.g. using input.data[property] or input[property] or item_property(input, property) in the filters.

apple_property <=> orange_property
input.map { |item| [item_property(item, property), item] }
.sort! do |apple_info, orange_info|
apple_property = apple_info.first

This comment has been minimized.

@mattr-

mattr- Sep 2, 2017

Member

Based on the benchmark results, Array#[] was faster than Array#first correct? Should we switch to that or did I interpret the benchmark results incorrectly?

@mattr-

mattr- Sep 2, 2017

Member

Based on the benchmark results, Array#[] was faster than Array#first correct? Should we switch to that or did I interpret the benchmark results incorrectly?

@mattr-

This comment has been minimized.

Show comment
Hide comment
@mattr-

mattr- Sep 2, 2017

Member

What happens if we switch to Enumerable#sort_by directly? What's the reason that I'm missing that keeps us from doing:

input.sort_by { |item| item_property(item, property) }
Member

mattr- commented Sep 2, 2017

What happens if we switch to Enumerable#sort_by directly? What's the reason that I'm missing that keeps us from doing:

input.sort_by { |item| item_property(item, property) }
@Crunch09

This comment has been minimized.

Show comment
Hide comment
@Crunch09

Crunch09 Sep 2, 2017

Member

What happens if we switch to Enumerable#sort_by directly?

@mattr- : @parkr added a comment about this in the benchmark script, sort_by doesn't like nils

["foo", nil, "bar"].sort_by{|el| el}
# => ArgumentError: comparison of String with nil failed
Member

Crunch09 commented Sep 2, 2017

What happens if we switch to Enumerable#sort_by directly?

@mattr- : @parkr added a comment about this in the benchmark script, sort_by doesn't like nils

["foo", nil, "bar"].sort_by{|el| el}
# => ArgumentError: comparison of String with nil failed
@Crunch09

👏 Really nice change!

@mattr-

mattr- approved these changes Sep 2, 2017

@mattr-

This comment has been minimized.

Show comment
Hide comment
@mattr-

mattr- Sep 2, 2017

Member

@parkr added a comment about this in the benchmark script

Always good to know that I can't read very well. 🤣 Thanks @Crunch09!

Member

mattr- commented Sep 2, 2017

@parkr added a comment about this in the benchmark script

Always good to know that I can't read very well. 🤣 Thanks @Crunch09!

@mattr-

This comment has been minimized.

Show comment
Hide comment
@mattr-

mattr- Sep 2, 2017

Member

@jekyllbot: merge +minor

Member

mattr- commented Sep 2, 2017

@jekyllbot: merge +minor

@jekyllbot jekyllbot merged commit 6ce912e into master Sep 2, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jekyllbot jekyllbot deleted the schwartzian-transform branch Sep 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment