Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize single-argument methods #210

Merged
merged 1 commit into from Sep 22, 2021
Merged

Optimize single-argument methods #210

merged 1 commit into from Sep 22, 2021

Conversation

JacobEvelyn
Copy link
Member

@JacobEvelyn JacobEvelyn commented Aug 27, 2021

This commit adds a performance optimization to one-argument
methods; these methods now have their memoization hashes
stored in an array rather than in an outer hash. This offers
a slight speedup because array accesses are faster than
hash lookups.

Before merging:

  • Copy the table printed at the end of the latest benchmark results into the README.md and update this PR
  • If this change merits an update to CHANGELOG.md, add an entry following Keep a Changelog guidelines with semantic versioning
  • Manually re-confirm that this code is consistently faster than the previous no-array implementation

@JacobEvelyn JacobEvelyn mentioned this pull request Aug 27, 2021
2 tasks
@codecov
Copy link

codecov bot commented Aug 27, 2021

Codecov Report

Merging #210 (01b4fb3) into main (4a7b5cc) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 01b4fb3 differs from pull request most recent head 6bab0b0. Consider uploading reports for the commit 6bab0b0 to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##              main      #210   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            2         2           
  Lines          164       204   +40     
=========================================
+ Hits           164       204   +40     
Impacted Files Coverage Δ
lib/memo_wise.rb 100.00% <100.00%> (ø)
lib/memo_wise/internal_api.rb 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a7b5cc...6bab0b0. Read the comment docs.

@JacobEvelyn JacobEvelyn force-pushed the array-optimization-3 branch 3 times, most recently from c4db0b7 to 3fa4cf1 Compare August 27, 2021 15:34
lib/memo_wise.rb Outdated
# another_single_arg_method_name: 0
# }
# }
@_memo_wise_indices ||= Hash.new { |h, k| h[k] = {} }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we avoided Hash default procs elsewhere due to the Marshal issue, but I can't see that being an issue here. Please double-check my reasoning though!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ms-ati I'd love your thoughts on this as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacobEvelyn Is your reasoning that this is safe for Marshal, because the Hash with default proc lives in a Class vs in an instance? Do we have tests of the Marshal case that verify this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning is more one of usage—I find it unlikely that someone would start loading a class (e.g. memo_wise one method), then Marshal it, then load it, and then memo_wise another method. But maybe I should just change this so we don't worry about it? Adding a test for it feels like it may be overkill though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I realized this was no longer needed and have removed it.

lib/memo_wise.rb Outdated
end
end
END_OF_METHOD
else # :splat, :double_splat
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of annoying, but throughout this PR I use else when I really want to just be explicit about one specific case. Unfortunately, if this isn't else, our code coverage fails because it says we have no test coverage of the "else" condition, since it doesn't realize that our explicit options are in fact exhaustive.

In the end I decided to use else to appease code coverage, but leave comments indicating the original cases I had.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to hear more about why use else to mean :splat, :double_splat instead of making an else no-op (or a better version of raise "Impossible") case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then to get to 100% code coverage we need to add a test that exercises the else case, which is impossible/very difficult because our case cases are exhaustive. Does that make sense?

lib/memo_wise.rb Outdated
when :one_required_positional, :one_required_keyword
hash = (@_memo_wise_single_argument[api.index(method_name)] ||= {})
hash[method_arguments == :one_required_positional ? args.first : kwargs.first.last] = yield
when :splat, :double_splat
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many parts of this PR, I went back and forth on when to combine different cases and when to separate them. I'm very open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah reading this code (before this comment) I was wondering why not keep them as separate cases. I think it's slightly cleaner that way instead of using the ternary operator within the switch statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 works for me

# @return [Integer] the array index in `@_memo_wise_single_argument` to use
# to find the memoization data for the given method
def index(method_name)
indices = target_class.instance_variable_get(:@_memo_wise_indices) ||
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this search of three different objects (two are searched in target_class) to find the instance variable too janky? I couldn't come up with a better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know another way to do it either

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ms-ati I'd love your thoughts on this as well. I'm still worried about edge cases in which multiple @_memo_wise_indices instance vars exist on e.g. both target_class and target and cause incorrect behavior when multiple memoized methods share the same name, but I'm not sure what additional tests I can add that I haven't already added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this doesn't seem like the correct pattern @JacobEvelyn. Should we schedule a pairing session to review the code, and look for ways to have a single code path that knows "where" to look?

I may be misunderstanding, but it seems like we should know whether we are in a "def", "def self.", or "class << self" code path, shouldn't we?

obj.instance_variable_set(:@_memo_wise, {}) unless obj.instance_variables.include?(:@_memo_wise)
unless obj.instance_variables.include?(:@_memo_wise_single_argument)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting—if we similarly optimize the no-args case, I'd probably rename this array @_memo_wise and have the hash be @_memo_wise_multiple_arguments or something similar. I'm open to naming suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it might be worth renaming the hash to @_memo_wise_multiple_arguments already within this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not do that now since at this point that hash also contains 0-arity methods, and I think it's possible that this optimization won't end up being faster for 0-arity methods because of the sentinel issue. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me

# @param method_name [Symbol] the name of the memoized method
# @return [Integer] the array index in `@_memo_wise_single_argument` to use
# to find the memoization data for the given method
def index(method_name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using a more descriptive name but it got annoying. 🤷‍♀️

it "does not memoize the class methods" do
expect(Array.new(4) { class_with_memo.with_positional_args(1, 2) }).
to all eq("class_with_positional_args: a=1, b=2")
context "for methods with no arguments" do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about bugs in my logic manifesting when there are multiple memoized methods with the same name, so I added a handful of methods testing that, here and elsewhere in the specs below. I couldn't come up with easy ways to generalize these or test them more broadly/in all cases (modules, etc.) but am definitely open to doing so if y'all have ideas here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ms-ati I'd love your thoughts on DRYing up these tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I think I came up with a DRY approach in the Round 2 changes commit.

@@ -388,6 +388,48 @@
expect(instance2.no_args_counter).to eq(1)
end
end

context "when method name is the same as a memoized class method" do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add similar tests to preset_memo_wise_spec? I mostly viewed these as a way to test that InternalAPI#index works correctly, but open to other thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to either leave a comment explaining this decision or to add similar tests to preset_memo_wise_spec, otherwise they're asymmetrical without a documented explanation for why.

lib/memo_wise/internal_api.rb Show resolved Hide resolved
lib/memo_wise.rb Outdated
end
end
END_OF_METHOD
else # :splat, :double_splat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to hear more about why use else to mean :splat, :double_splat instead of making an else no-op (or a better version of raise "Impossible") case?

lib/memo_wise.rb Outdated
when :one_required_positional, :one_required_keyword
hash = (@_memo_wise_single_argument[api.index(method_name)] ||= {})
hash[method_arguments == :one_required_positional ? args.first : kwargs.first.last] = yield
when :splat, :double_splat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah reading this code (before this comment) I was wondering why not keep them as separate cases. I think it's slightly cleaner that way instead of using the ternary operator within the switch statement.

#
# `@_memo_wise_single_argument` looks like:
# [
# { arg1 => :memoized_result, ... }, # For method 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a nitpick, can we do For method 0 and For method 1 in the comments here? I find that easier to follow with the indexing we have to use here

# { arg1 => :memoized_result, ... }, # For method 1
# { arg1 => :memoized_result, ... }, # For method 2
# ]
# This is essentially a faster alternative to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove essentially here?

obj.instance_variable_set(:@_memo_wise, {}) unless obj.instance_variables.include?(:@_memo_wise)
unless obj.instance_variables.include?(:@_memo_wise_single_argument)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it might be worth renaming the hash to @_memo_wise_multiple_arguments already within this PR?

"#{name}#{':' if type == :keyreq}"
end.join(", ")
else
raise ArgumentError, "Unexpected arguments for #{method.name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just considering what I said re the else comment above - I think we should try stay consistent with how we treat this within the codebase.

@@ -388,6 +388,48 @@
expect(instance2.no_args_counter).to eq(1)
end
end

context "when method name is the same as a memoized class method" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to either leave a comment explaining this decision or to add similar tests to preset_memo_wise_spec, otherwise they're asymmetrical without a documented explanation for why.

spec/reset_memo_wise_spec.rb Show resolved Hide resolved
obj.instance_variable_set(:@_memo_wise, {}) unless obj.instance_variables.include?(:@_memo_wise)
unless obj.instance_variables.include?(:@_memo_wise_single_argument)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me

# @return [Integer] the array index in `@_memo_wise_single_argument` to use
# to find the memoization data for the given method
def index(method_name)
indices = target_class.instance_variable_get(:@_memo_wise_indices) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know another way to do it either

Copy link
Contributor

@ms-ati ms-ati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This commit adds a performance optimization to one-argument
methods; these methods now have their memoization hashes
stored in an array rather than in an outer hash. This offers
a slight speedup because array accesses are faster than
hash lookups.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants