Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new versions of methods to reduce allocations #14

Merged
merged 20 commits into from
Sep 9, 2021

Conversation

wblakecaldwell
Copy link
Contributor

@wblakecaldwell wblakecaldwell commented Aug 31, 2021

This incorporates the great work from https://github.com/pforemski (long enough ago that I'm embarrassed to say), here: #12, then takes it a bit further. The method signatures have changed, so this isn't backwards compatible - the always-nil errors were removed.

Changes:

  • Removed always-nil error return values
  • New methods that take slices to append to: FindTagsAppend, FindTagsWithFilterAppend, FindDeepestTagsAppend
  • New DeleteWithBuffer now accepts a slice to use as a buffer

Benchmarks between the self-allocating FindTags and the slice-reusing FindTagsAppend:

cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkFindTags-5                     	 4138029	       246.7 ns/op	     112 B/op	       3 allocs/op
BenchmarkFindTagsAppend-5               	32779479	        32.25 ns/op	       0 B/op	       0 allocs/op

Of course, we're just moving the slice allocation outside the method, and these gains are only realized if you can reuse it for successive calls. For applications that can, this should make quite an improvement in performance.

For those unfamiliar with this repo, look in the template directory. All of the *_tree directories are generated from that code with make generate.

pforemski and others added 10 commits September 11, 2020 14:35
These errors were always nil - keep things clean
Added FindTagsAppend(), which reduces allocations by accepting a slice
to use for the return value.
The `Find` methods now accept a slice to put results into.

`Delete` now accepts a slice to use as a buffer.
Benchmark results:

cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
OLD: BenchmarkFindTags-5                     	 3721257	       294.3 ns/op	     160 B/op	       6 allocs/op
NEW: BenchmarkFindTags-5                     	41273214	        28.12 ns/op	       0 B/op	       0 allocs/op
@wblakecaldwell
Copy link
Contributor Author

FYI - I'm going to rename these new methods, and put back the old names so the caller can choose between allocating and reusing, or just not worrying about it.

renamed the following to better describe their function:
- FindTags -> FindTagsAppend
- FindDeepestTags -> FindDeepestTagsAppend
- Delete -> DeleteWithBuffer

Created new methods with the original names, that allocate slices and
pass them into the *Append/*WithBuffer methods. This provides easier
migration to new code, and simplifies use cases that can't reuse
buffers.
@wblakecaldwell wblakecaldwell changed the title Changed method signatures to reduce allocations Added new versions of methods to reduce allocations Aug 31, 2021
Copy link

@aaronfuj aaronfuj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense and looks good to me but I had a question around the logic changes for FindTagsWithFilter (now actually FindTagsWithFilterAppend) and how that might negatively impact performance/if it matters before signing off.

if address.Length == 0 {
// caller just looking for root tags
return ret, nil
if len(ret) == retPos || filterFunc == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this comment primarily for myself, but may just be useful for anyone else reviewing. This logic is essentially sayin:

  • "if we didn't find any new tags, return what we have"
  • "if we have no filtering function, return whatever we found (regardless of finding new tags)"

nodeIndex = node.Left
} else {
nodeIndex = node.Right
// filter in place

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this new filter in place approach (after finding all matching values) over the previous filter while searching for matches I think it definitely cleans up the code and makes it easier to follow/read. However, this comes at the cost of potentially needing to allocate more capacity for the slice up front (which may not be necessary if the values just get filtered out anyway). If we end up allocating extra capacity up front that isn't actually used, how do you think this would impact the overall performance since it is theoretically creating more garbage to cleanup for later? Will this really matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still allocates less than before, where a new slice was allocated for the pre-filtered results, then thrown away after the filtered results were accepted. If you're reusing the same buffer over and over, the allocations will happen once, and then reused. But yeah, I think it's worth running the filter func earlier in the stack. I'll take a look, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this in 94f2b4b

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my main concern was if by passing around the filter function and doing these extra checks it would slow things down. Based on offline discussion it doesn't seem like that is the case, and the new changes look good to me.

The methods that take a filter no longer allocate first, then filter
out. We now only add them to the slice if they match. Did this by
putting the main logic in the *Filter methods, then having the
non-Filter methods call them, with a nil filter.
Copy link

@aaronfuj aaronfuj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link

@alistairking alistairking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

func (t *TreeV4) Delete(address patricia.IPv4Address, matchFunc MatchesFunc, matchVal GeneratedType) (int, error) {
// - use DeleteWithBuffer if you can reuse slices, to cut down on allocations
func (t *TreeV4) Delete(address patricia.IPv4Address, matchFunc MatchesFunc, matchVal GeneratedType) int {
return t.DeleteWithBuffer(make([]GeneratedType, 0), address, matchFunc, matchVal)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really doesn't matter much, but I think it works to just pass nil in these cases instead of making a zero-length slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya good call - updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be done in the other non-buffering methods, but I worry the caller will be expecting non-nil, so I'll keep it as is for now�. If you're interested in reducing allocations, you can use the buffering versions

@wblakecaldwell wblakecaldwell merged commit 2160333 into master Sep 9, 2021
@ingwarsw ingwarsw deleted the wbc-fewer-allocations branch March 23, 2022 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants