Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: expand capacity of map by more than 1 at a time? #56182

Open
lizthegrey opened this issue Oct 12, 2022 · 7 comments
Open

runtime: expand capacity of map by more than 1 at a time? #56182

lizthegrey opened this issue Oct 12, 2022 · 7 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@lizthegrey
Copy link

lizthegrey commented Oct 12, 2022

Follow-on to #52157 and #54454

Right now the reference code in /x/exp/maps.Copy will potentially call runtime.growWork_faststr and runtime.hashGrow many times if src and dst are mostly disjoint and len(src) is large.

There should be some smarter logic to be able to not just pre-allocate a map with a certain capacity at creation time, but to expand an existing map by a certain amount (e.g. by len(src)) -- there already exists this idea with variadic append() for concatenating slices.

Short of that, maybe Copy() should make a decision based on the sizes of dst and src & capacities available in dst on whether it's more efficient to create a new map with capacity = len(src)+len(dst) and then copy dst and src both into the new map.

(you could then Shrink() the map if you were confident it were going to be read-only after that, to avoid overshooting usage by too much)

The real-world pain here is that telemetry libraries like OpenTelemetry and Honeycomb Beelines like to keep maps of attributes associated with a given trace span, and merge them together before sending into a final map; a lot of work is done to growWork_faststr and hashGrow if many attributes are added one at a time if map merging/copying via iteration.

Parallels in other languages: Rust

@gopherbot gopherbot added this to the Proposal milestone Oct 12, 2022
@seankhliao seankhliao changed the title proposal: language: expand capacity of map by more than 1 at a time? runtime: expand capacity of map by more than 1 at a time? Oct 12, 2022
@seankhliao seankhliao removed this from the Proposal milestone Oct 12, 2022
@seankhliao seankhliao added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Oct 12, 2022
@seankhliao
Copy link
Member

seankhliao commented Oct 12, 2022

cc @golang/runtime

@randall77
Copy link
Contributor

randall77 commented Oct 12, 2022

Where does the allocation of dst happen? Can you give it a size hint there? Or can you use Clone instead?

I'm not really sure why/how you're seeing extra work here. maps are grown by factors of 2, so asymptotically there isn't any extra work. If you would need to grow the map twice (to 4x the size), the 1x->2x grow is kinda pointless, but it is at most half the work of the 2x->4x grow, so it's not that much extra work.

@lizthegrey
Copy link
Author

lizthegrey commented Oct 12, 2022

Where does the allocation of dst happen? Can you give it a size hint there? Or can you use Clone instead?

dst is allocated when the trace span is created, any number of attributes may be added during execution, but then we have to add in finisher attributes at the end.

That's the thing that I don't have the information to decide, because cap(dst) doesn't exist, and it might have a cap of 5 elements or it might have a cap of 100 elements. I guess I can look to the current len() as a guide though. Therefore I don't know if I'm better off running Clone() to guarantee the capacity (and re-copying everything) or whether I'm better just inserting into the map and trusting the capacity to be there.

I'm not really sure why/how you're seeing extra work here. maps are grown by factors of 2, so asymptotically there isn't any extra work. If you would need to grow the map twice (to 4x the size), the 1x->2x grow is kinda pointless, but it is at most half the work of the 2x->4x grow, so it's not that much extra work.

I wish I knew as well, but that's what the profiling data says -- that we're spending a HUGE amount of time expanding maps rather than doing useful work :(

@randall77
Copy link
Contributor

randall77 commented Oct 12, 2022

If there's lots of growing during that final Copy, then when you allocate dst use the size of src (the finisher attributes) as the size hint.

I'm not sure how having the map capacity would help here. It lets you know whether the map needs to be grown, but you still have to do the grow in any case (by yourself by allocating a new map, maybe, but that's going to cost just as much).

It sounds like the finisher attributes are large compared to the regular trace attibutes, so maybe just doing

m := maps.Clone(finisherAttributes)
m.Copy(traceAttributes)
... assign m back to traceAttributes ...

at the end should work?

@lizthegrey
Copy link
Author

lizthegrey commented Oct 12, 2022

If there's lots of growing during that final Copy, then when you allocate dst use the size of src (the finisher attributes) as the size hint.
It sounds like the finisher attributes are large compared to the regular trace attibutes, so maybe just doing

Correct, while we don't know for sure what the "record immediately prior to send" attributes will be or what their values are (since it's evaluated just before sending the span), the list of keys usually don't change dynamically so we could pre-allocate a capacity of len(starter_attributes)+len(finisher_attributes), even if things might change and there might be slightly more or less attributes than expected. Thanks for the suggestion.

MikeGoldsmith pushed a commit to honeycombio/beeline-go that referenced this issue Oct 17, 2022
## Which problem is this PR solving?

- Similar to honeycombio/libhoney-go#197, profiling shows we spend a lot of time in github.com/honeycombio/beeline-go/trace.(*Span).send generating the map.

## Short description of the changes

- define the map capacity in advance before returning it.

a subsequent follow-up might be allowing passing a map in its entirety to github.com/honeycombio/libhoney-go.(*Event).AddField so that the map entries do not need to be added one at a time when merging the maps together.

Or, better yet, golang/go#56182 could prevent this problem by giving us a more sensible API for mass additions to a map.
@thepudds
Copy link

thepudds commented Oct 19, 2022

Hi @lizthegrey, I've been looking at map performance from a few different angles, and thought I'd share a quick benchmark that might be in the ballpark of what you initially reported.

In this benchmark, keys are 20 byte strings, dst size starts at 2, src is between 92 to 108 elems, and src gets copied into dst. Those are not going to be your exact numbers, but hopefully demonstrate a similar phenomena. The size ranges were chosen to sweep across 105, which is a point that a map grow happens.

BenchmarkMapCopy
func BenchmarkMapCopy(b *testing.B) {
	bms := []struct {
		presize      bool
		dstStartSize int
	}{
		{presize: false, dstStartSize: 2}, // dst map starts with 1 bucket
		{presize: true, dstStartSize: 2},
		{presize: false, dstStartSize: 14}, // dst map starts with 4 buckets
		{presize: true, dstStartSize: 14},
	}
	// a grow happens at a map size of 105, so here we sweep srcSize +/-20 around 105
	for srcSize := 85; srcSize < 125; srcSize++ {
		for _, bm := range bms {
			b.Run(fmt.Sprintf("src_%d_dst_start_%d_dst_final_%d_presize_%v", srcSize, bm.dstStartSize, srcSize+bm.dstStartSize, bm.presize), func(b *testing.B) {
				var src, dst map[string]string
				src = make(map[string]string)
				populate := func(m map[string]string, count int) {
					for i := 0; i < count; i++ {
						k := make([]byte, 20)
						v := make([]byte, 20)
						rand.Read(k)
						rand.Read(v)
						m[string(k)] = string(v)
					}
				}
				populate(src, srcSize)

				for i := 0; i < b.N; i++ {
					b.StopTimer()
					if bm.presize {
						dst = make(map[string]string, srcSize+bm.dstStartSize)
					} else {
						dst = make(map[string]string)
					}
					populate(dst, bm.dstStartSize)
					b.StartTimer()

					// copy src into dst
					for k, v := range src {
						dst[k] = v
					}
				}
			})
		}
	}
}

Here are some sample results, where 'old' is without presizing dst, and 'new' is presizing dst. The names src_XX_dst_YY show the len of src and the final len of dst after the copy.

name                  old time/op  new time/op  delta
Copy/src_92_dst_94    19.0µs ± 3%   8.4µs ± 2%  -55.6%  (p=0.000 n=14+15)
Copy/src_93_dst_95    19.2µs ± 6%   8.6µs ± 3%  -55.1%  (p=0.000 n=15+15)
Copy/src_94_dst_96    19.3µs ± 4%   8.7µs ± 3%  -54.7%  (p=0.000 n=15+15)
Copy/src_95_dst_97    19.7µs ± 3%   8.8µs ± 2%  -55.3%  (p=0.000 n=15+13)
Copy/src_96_dst_98    19.7µs ± 4%   8.9µs ± 2%  -54.7%  (p=0.000 n=15+13)
Copy/src_97_dst_99    19.6µs ± 4%   9.1µs ± 3%  -53.6%  (p=0.000 n=15+15)
Copy/src_98_dst_100   19.6µs ± 3%   9.2µs ± 3%  -53.0%  (p=0.000 n=15+14)
Copy/src_99_dst_101   19.6µs ± 3%   9.2µs ± 3%  -53.2%  (p=0.000 n=14+14)
Copy/src_100_dst_102  20.5µs ± 4%   9.4µs ± 3%  -54.0%  (p=0.000 n=15+15)
Copy/src_101_dst_103  20.1µs ± 4%   9.4µs ± 5%  -53.0%  (p=0.000 n=15+15)
Copy/src_102_dst_104  20.1µs ± 3%   9.6µs ± 3%  -52.4%  (p=0.000 n=15+15)
Copy/src_103_dst_105  24.3µs ± 3%   8.7µs ± 4%  -64.2%  (p=0.000 n=15+12) // dst: extra grow
Copy/src_104_dst_106  24.6µs ± 2%   8.7µs ± 3%  -64.6%  (p=0.000 n=15+15)
Copy/src_105_dst_107  28.6µs ± 5%  13.9µs ± 2%  -51.5%  (p=0.000 n=14+13) // src: mid-move iter
Copy/src_106_dst_108  29.4µs ± 3%  13.7µs ± 3%  -53.3%  (p=0.000 n=15+15)
Copy/src_107_dst_109  30.4µs ± 4%  13.1µs ± 5%  -56.9%  (p=0.000 n=14+15)
Copy/src_108_dst_110  29.9µs ± 4%  13.0µs ± 5%  -56.4%  (p=0.000 n=13+14)

Two notable jumps in performance:

  1. Copy/src_103_dst_105 -- For the non-presized case ('old'), this dst final size of 105 is the first time in this particular range of sizes that an extra grow of dst happens during the copy, and hence is more expensive. For the presized case ('new'), it is likely benefiting here from a 2x larger capacity presized dst map with a lower fill factor compared to the prior line.

  2. Copy/src_105_dst_107 -- For both the non-presized and presized cases, this is when src at size 105 is "stuck" mid-growth during the range operation, which is more expensive (runtime: consider growing map on iteration #51410).

I think those results generally seem explainable by the current runtime map implementation...

@CAFxX
Copy link
Contributor

CAFxX commented Oct 21, 2022

I know we don't like adding APIs for this kind of thing, but ISTM that a maps.Grow(m, hint) (just a strawman based on slices.Grow) that could be used to ask the runtime to increase the size of the map - as if it had been created with make(map[K]V, len(m)+hint) - would allow Copy callers that know that the set of keys is mostly disjunct to avoid the problem:

maps.Grow(dst, len(src))
maps.Copy(dst, src)

I don't think maps.Copy should do this internally because in the case the keys in src are mostly already included in dst it may cause an unneeded growth of the dst map.

@seankhliao seankhliao added this to the Unplanned milestone Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

6 participants