Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by 10% #1260

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
3 participants
@Confusion
Copy link

commented Jul 4, 2019

Inspired by a blog post [1] on the 'allocation_stats' gem I looked into
yard's memory use (using the 'memory_profiler' gem) and found some hotspots
that the attached patch removes. It reduces memory allocations by ~25%,
while increasing performance by ~10%, measured on the test_doc Rake task of
Yard itself.

[1] https://blog.presidentbeef.com/blog/2018/11/28/finding-ruby-performance-hotspots-via-allocation-stats/

Description

Describe your pull request and problem statement here.

Completed Tasks

  • I have read the Contributing Guide.
  • The pull request is complete (implemented / written).
  • Git commits have been cleaned up (squash WIP / revert commits).
  • I wrote tests and ran bundle exec rake locally (if code is attached to PR).
@coveralls

This comment has been minimized.

Copy link

commented Jul 4, 2019

Coverage Status

Coverage decreased (-0.008%) to 93.788% when pulling cc0ba69 on Confusion:master into 0d10c06 on lsegal:master.

@lsegal
Copy link
Owner

left a comment

Thanks for the optimization PR!

A few notes to help get this through:

  • All new methods will need documentation, especially public ones (as these are).
  • The suffix regex should be replaced by match to be consistent with other methods in that module.
  • It would be better to move these methods and their caching into RegistryResolver directly, since methods like split_on_separator_match and starts_with_separator_match are not general purpose enough to truly belong inside NamespaceMapper.
    • That said, in order to do this, you would need to hook into NamespaceMapper in ways that may require weird monkeypatching or new constructs to be added to notify subclasses of invalidation. The latter is the better option if you're going to take this on.

The general feedback here (besides the specific points) is that this new addition seems to break expected boundaries of the NamespaceMapper module. It seems odd to me to add methods to an outer module that are only used by one including class.

It would be nice to have a more general purpose abstraction so that it's more obvious how all of these new methods fit together, otherwise it will be very easy to forego them in the future and regress on the performance benefits-- i.e. I could see another developer forgetting to use these cached matches and relying on separators_match directly again, or worse yet, these methods being pulled out in a substantial refactor. Documentation would help here, but so would providing a more solid separation of concerns (like putting caching in RegistryResolver directly).

@@ -68,6 +73,14 @@ def separators_match
NamespaceMapper.map_match
end

def starts_with_separator_regex

This comment has been minimized.

Copy link
@lsegal

lsegal Jul 4, 2019

Owner

These two methods have a possible caching bug-- if NamespaceMapper.invalidate is called, these will not be invalidated along with map_match.

Improve performance by 10%
Inspired by a blog post [1] on the 'allocation_stats' gem I looked into
yard's memory use (using the 'memory_profiler' gem) and found some hotspots
that the attached patch removes. It reduces memory allocations by ~25%,
while increasing performance by ~10%, measured on the test_doc Rake task of
Yard itself.

[1] https://blog.presidentbeef.com/blog/2018/11/28/finding-ruby-performance-hotspots-via-allocation-stats/

@Confusion Confusion force-pushed the Confusion:master branch from 8ead323 to cc0ba69 Jul 4, 2019

@Confusion

This comment has been minimized.

Copy link
Author

commented Jul 4, 2019

I renamed the methods, moved them to RegistryResolver as private methods (but documented them anyway) and hooked into invalidate and default_seperator to invalidate the memoizations when necessary.

This of course creates additional coupling between the RegistryResolver and the NamespaceMapper through the 'super' calls (which at least will break on renames and changes in number of parameters, so chance of silently losing functionality seems slim), but I don't think there is any way to avoid that.

Performance gain is still ~10% (actually it seems more like 15-20%, but I do not sufficiently trust my measurement methodology to claim that much)

@lsegal
Copy link
Owner

left a comment

Thanks for following up! A few more notes but with the following other comments addressed this looks good to go.

This of course creates additional coupling between the RegistryResolver and the NamespaceMapper through the 'super' calls

I'm not too concerned about this-- the Mapper->Resolver coupling already exists through the inclusion of the Mapper module by Resolver and its reliance on included Mapper methods. That is normal and expected coupling, and (ideally but I haven't fully confirmed this) tests start failing if any of those super methods change.

@@ -185,5 +185,35 @@ def collect_namespaces(object)

nss
end

# @see NamespaceMapper#invalidate
def invalidate

This comment has been minimized.

Copy link
@lsegal

lsegal Jul 4, 2019

Owner

Unfortunately this won't trigger as Namespacemapper.invalidate is a module function.

The good news is I think you can/should override #clear_separators instead, since NamespaceMapper's singleton stuff internal and only expected to be used through / exposed by those instance methods.

Suggested change
def invalidate
def clear_separators

(Need to change the docstring too)

This comment has been minimized.

Copy link
@Confusion

Confusion Jul 5, 2019

Author

👍

I now realize my previous comment wasn't true though: if clear_separators is renamed, this won't fail: it just silently won't be called. Do you want me to add a spec for that?

This comment has been minimized.

Copy link
@lsegal

lsegal Jul 5, 2019

Owner

Sure a spec would help.

This comment has been minimized.

Copy link
@Confusion

Confusion Jul 5, 2019

Author

I just noticed that invalidate is also called via register_separator, so register_separator won't result in the memoizations being cleared, which seems necessary. So we need to extend invalidate after all?

What we could do is add a line

invalidate if respond_to?(:invalidate)

to register_separator, after NamespaceMapper.invalidate so users of NamespaceMapper can define an instance method :invalidate for additional resets. What do you think about that?

@@ -185,5 +185,35 @@ def collect_namespaces(object)

nss
end

This comment has been minimized.

Copy link
@lsegal

lsegal Jul 4, 2019

Owner

You should probably mark all these methods below as private since they are not part of the public API.

Suggested change
private

This comment has been minimized.

Copy link
@Confusion

Confusion Jul 5, 2019

Author

There's a private marker on line 93 that covers these methods, so that's already taken care of?

This comment has been minimized.

Copy link
@lsegal

lsegal Jul 5, 2019

Owner

You're right about that, I missed it! You can ignore this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.