Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: add Map.Len method? #20680

Open
F21 opened this issue Jun 15, 2017 · 55 comments
Open

sync: add Map.Len method? #20680

F21 opened this issue Jun 15, 2017 · 55 comments

Comments

@F21
Copy link

@F21 F21 commented Jun 15, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.9-beta1

What operating system and processor architecture are you using (go env)?

Windows 10 64-bit

set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Work
set GORACE=
set GOROOT=C:\Go
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config

It would be really useful to have a Length() method on sync.Map that can return the number of items in a map. Currently, I need to do something like this, which is quite tedious and not as readable:

length := 0

myMap.Range(func(_, _ interface{}) bool {
	length++
	
	return true
})
@randall77
Copy link
Contributor

@randall77 randall77 commented Jun 15, 2017

@bcmills
Copy link
Member

@bcmills bcmills commented Jun 15, 2017

What's the use-case?

@F21
Copy link
Author

@F21 F21 commented Jun 15, 2017

I am storing items filled by goroutines processing data into 2 separate maps. Once this is done, I need to compare the length of these 2 maps to perform a quick sanity check and decide on which branch to proceed for further processing.

Previously I was using github.com/orcaman/concurrent-map and there was a Count() method to do this.

@bradfitz bradfitz changed the title Length() for sync.Map sync: add Map.Length method? Jun 15, 2017
@bradfitz bradfitz added this to the Go1.9Maybe milestone Jun 15, 2017
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jun 15, 2017

This would normally be a Go 1.10 thing, but since this type is new in Go 1.9, I'll let @bcmills decide.

The implementation might be hairy enough to warrant Go 1.10 anyway, especially if the representation needs to change and other code gets modified.

@cespare
Copy link
Contributor

@cespare cespare commented Jun 15, 2017

Isn't the number you get out going to be a best-effort guess anyway? (Like len(channel)). Seems like an easy user workaround would be to maintain a parallel atomic int64 count.

(I don't know if that'd also be a reasonable internal implementation or not.)

@F21
Copy link
Author

@F21 F21 commented Jun 15, 2017

In my case, I am retrieving the length of the map after all goroutines have finished.

@bcmills
Copy link
Member

@bcmills bcmills commented Jun 15, 2017

sync.Map is optimized for long-lived, mostly-write workloads, for which a Len method would either be misleading (under-counting keys) or inefficient (introducing cache contention on the length counter).

The Range workaround at least has the benefit of appearing to be as expensive as it actually is.

I'm not opposed to the idea of adding a Len method, but I think we would need to show that the use-case is common enough to merit an API with such subtleties. At the very least, I don't think we should add it in 1.9.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe Jun 15, 2017
@rsc rsc changed the title sync: add Map.Length method? sync: add Map.Len method? Jun 26, 2017
@AlexStocks
Copy link

@AlexStocks AlexStocks commented Jul 6, 2017

@bcmills Do not you think it's very rediculous that Go has a built in func len for map while sync.Map does not have a corresponding function func (Map)Len() int to get its size?

@bcmills
Copy link
Member

@bcmills bcmills commented Jul 6, 2017

@AlexStocks No, I don't think it's "very [ridiculous]". Concurrent data structures are not the same as unsynchronized data structures. The built-in map type doesn't have (and doesn't need) a LoadOrStore, for example.

We should decide the API of each type based on its own tradeoffs. Consistency is a benefit, but there are costs to weigh it against.

@rfyiamcool
Copy link

@rfyiamcool rfyiamcool commented Jul 13, 2017

I thank go1.9 sync.map length feature should be required. Of course, I'm just a suggestion.
don't range all entry, Extend a field as a atomic counter ..

@bcmills
Copy link
Member

@bcmills bcmills commented Jul 16, 2017

@rfyiamcool

Extend a field as a atomic counter

Delete calls (and Store calls with previously-deleted keys) on disjoint keys in the read-only part of the map do not contend in the current implementation. An atomic counter would reintroduce contention for those calls.

We didn't omit Len just to be stubborn. It really is a subtle problem.

@bcmills
Copy link
Member

@bcmills bcmills commented Jul 17, 2017

#21035 has more detail on some of the optimizations that might complicate an efficient implementation of Len.

@GoLangsam
Copy link

@GoLangsam GoLangsam commented Jul 26, 2017

@bcmills May I suggest to have some remark in the source comments saying something like
"As of now, Len() is intentionally not implemented/provided due to ..." with a brief and concrete rationale.

This would ease understanding and adjust expectations of future users (who much more likely read source comments than old issues).

( And this should also be applied to other methods suggested elsewhere (such as UpdateOrStore) and currently not reasonably implementable. )

@maj-o
Copy link

@maj-o maj-o commented Sep 7, 2017

I looked up the sourcecode of Range() and I think (knowing that the result could be wrong, see remark on Range() and above) this would be the solution for Len()

return len(read.m) instead of the for loop in line 328

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 7, 2017

@maj-o Promoting the read map makes Len an O(N) operation if interleaved with Store. By convention, Len methods on types in the Go standard library are O(1).

@protosam
Copy link

@protosam protosam commented Sep 15, 2017

I believe that having a .Count() or .Len() or even just adding a count integer would be a good addition for syncmap.Map. Personally I'm using sync.Map to track connections in an asynchronous p2p API that is heavily built around propagating network writes. The count is necessary for artificial connection limits and I had to switch from maps, because of raise conditions in goroutines.

Hope this shows some use-cases for you. I can think of various situations where this simple functionality would reduce code by many lines.

@cznic
Copy link
Contributor

@cznic cznic commented Sep 15, 2017

The count is necessary for artificial connection limits and I had to switch from maps, because of raise conditions in goroutines.

I think that does not justify adding a method to sync.Map, because it's easy to make your own derived type which just additionaly keeps the count/len information if you think such information is useful.

@maj-o
Copy link

@maj-o maj-o commented Sep 16, 2017

If dirty has length oft N.
But 1sr : nobody is interested in dirty - if this blocks You, leave it out - O(1)
2nd : we can asume, that dirty is somehow const and small against the total amount - O(1) + const = O(1)
I know that in worst case (map is empty and dort is full) we could habe O(N) - this can be legt out or ignorred, because in real life it dies not matter. In real live the map is bigger then dirty and the amount of dirty is constant.. So, if it hurts You, comment the dirty loop out or leave it, because it does not matter in real live. And there it is O(1)

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 19, 2017

@maj-o

If dirty has length oft N.

That assumes that the internal representation of sync.Map always includes a single dirty map. Nothing in the API or documentation guarantees that to be the case, and in fact some optimizations (such as #21035) may require the opposite.

@dhui
Copy link

@dhui dhui commented Oct 4, 2017

My usecase for an O(1) Len() func is to pre-allocate a slice of data (an optimization) to hold a filtered subset of the the map values, populated by a call to Range(). FYI, my primary use for sync.Map is concurrent O(1) lookups.

So for my usecase, Len() doesn't need to be consistent since the worse case is an under-allocated slice (resulting in inconsistent performance) or an over-allocated slice (resulting in a bit more memory consumed).

@protosam
Copy link

@protosam protosam commented Oct 5, 2017

Hey guys,

I just want to clarify on something here. What would you say the scope of requirements adding such functionally be?

Like, to make a decision to add this utility function a part of the API, are you looking for a large quantity of developer need/want for the feature?

Or are we just aiming to decide how it should work?

I'm the former situation I believe that a count feature would just be expected by developers considering that the map type can be used in len()

In the latter situation if it's something that we all want added, but comes down to deciding how it should operate, I think that leads to two potential situations:

1 - a somewhat ballpark count of the map would be necessary for an application. This could be non blocking and the result just needs to be close to the current state.

2 - a blocking function that needs to be timed for an absolute accurate count at the given moment.

A utility function could be made for both scenarios.

I think it is best to make at least the boilerplate functionality for developers to help evade bad practices due to niavity when using the API. Best practices with concurrency is not obvious to new developers and devs trying to adopt go. I do foresee newbies at least reading a godoc and being able to choose based on need.

@Inkeliz
Copy link

@Inkeliz Inkeliz commented Sep 5, 2018

I think the Len (or Count) is usefull when a maximum number of values need to be enforced. I don't know if has a workaround. I use the sync.Map to store information about nodes/peers in the network, however one single client maybe don't like to store everything and only keep N values. Using the map is possible to do a simple if len(...) > and then ignore the insert, it can't be done directly with sync.Map.

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 5, 2018

We could, in theory, keep track of the count of items in the map using some sort of sharded counter (as proposed in #18802), but that would still be pure overhead for applications that don't need it.

That suggests that @protosam's idea to wrap the sync.Map with a separate counter may be the right direction: you can always implement the same method set in the wrapper, and then you would only pay the overhead of tracking the length for call sites that actually use that wrapper.

@mojinfu
Copy link

@mojinfu mojinfu commented Feb 27, 2019

Hello, I am currently developing an application which matches up pairs of individual records in a stream of input data. I have hash-partitioned the data stream such that each of my go routines will use disjoint sets of keys in my sync.Map. However, sometimes a record won't have a corresponding match in the data stream. Over time, these records accumulate in the map. I'd like to routinely get an estimate of the map's size to trigger an eviction policy carried out by Range. This keeps the RAM usage of the program fairly constant without degrading performance.

Currently I am looking at implementing a solution similar to @protosam and maintaining a count myself using a derived type from sync.Map

I'm not sure what I'm asking for should be called Len, but it would certainly be useful.

Hi, you can use my package, https://github.com/mojinfu/cmap, i think it is what you want.

@mojinfu
Copy link

@mojinfu mojinfu commented Feb 27, 2019

We could, in theory, keep track of the count of items in the map using some sort of sharded counter (as proposed in #18802), but that would still be pure overhead for applications that don't need it.

That suggests that @protosam's idea to wrap the sync.Map with a separate counter may be the right direction: you can always implement the same method set in the wrapper, and then you would only pay the overhead of tracking the length for call sites that actually use that wrapper.

Hi, you can use my package, https://github.com/mojinfu/cmap, i think it is what you want.

@mojinfu
Copy link

@mojinfu mojinfu commented Feb 27, 2019

@rfyiamcool

Extend a field as a atomic counter

Delete calls (and Store calls with previously-deleted keys) on disjoint keys in the read-only part of the map do not contend in the current implementation. An atomic counter would reintroduce contention for those calls.

We didn't omit Len just to be stubborn. It really is a subtle problem.

i still cant understand,
When Delete called , and go in the lock free path , delete a normal key ( turn point into nil) , it will return true , then I call atomic.AddInt64(&m.length, -1) which is thread safe func to reduce length.
(you can see it in https://github.com/mojinfu/cmap/blob/master/cmap.go line:353, and store logic line 153)
it would not reintroduce contention .

Which step of my understanding is incorrect?

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 27, 2019

@mojinfu, atomic operations are not contention-free: they just move the locking from the application layer to the CPU.

(If you have N CPU cores working with the same data concurrently, the atomic.AddInt64 operations are executed sequentially, so each operation takes O(N) time. Each core first acquires exclusive access to the cache line, then copies the value from the core that previously owned that line, and finally updates the cached value. On an Intel CPU, that process takes about 40ns per atomic op.)

@mojinfu
Copy link

@mojinfu mojinfu commented Feb 27, 2019

@mojinfu, atomic operations are not contention-free: they just move the locking from the application layer to the CPU.

(If you have N CPU cores working with the same data concurrently, the atomic.AddInt64 operations are executed sequentially, so each operation takes O(N) time. Each core first acquires exclusive access to the cache line, then copies the value from the core that previously owned that line, and finally updates the cached value. On an Intel CPU, that process takes about 40ns per atomic op.)

you are right!
after fixed logic bugs,the result of len method in 'cmap' package is working perfectly
then i found
benchmark : 100 times Store(i, i) and Delete(i) (goos: darwin goarch: amd64)
sync.Map: 21230 ns/op 5600 B/op 499 allocs/op
CMap: 24243 ns/op 5600 B/op 499 allocs/op

it means each Store or Delete action will take another 15ns
i think cmap will be useful in some scene.

@protosam
Copy link

@protosam protosam commented Mar 4, 2019

@mojinfu in regards to using your package, this is really a discussion about getting .Len added to the existing package sync/map.

I have a function that works as I've described in prior comments to approximate the length of the map. It just seems self evident to me that it should be a built in function for the library.

There's a couple of caveats in regards to adding a new .Len method:

  • how it should work
  • what are the trade-offs we want between functionality or performance
  • is this wanted enough to actually add to the package
@mojinfu
Copy link

@mojinfu mojinfu commented Mar 18, 2019

@protosam thank you for using my package.
After I implement this method , I found that it's reasonable to remove Len method from package sync/map.
you can see it in the discussion between @bcmills and me.
"An atomic counter will slow down "create" and "delete" method 15ns when method be called "

As you said :“what are the trade-offs we want between functionality or performance”
right , so also it can be understood like that " In different scenarios, the weights of trade-offs are different"
in my test , when program runs on a "one CPU" computer , it works More efficient than sync/map.
Especially you got a big map (len bigger than 10000 ), Every call saves 0.17 ms.

You can see it in the readme table.

@protosam
Copy link

@protosam protosam commented Mar 21, 2019

I'm not using your package....

@elagergren-spideroak
Copy link

@elagergren-spideroak elagergren-spideroak commented Jun 13, 2019

I just stumbled on this issue while looking to see if there were plans for a method that reports whether it's deleted the key, so I figure I'll add my $0.02.

A current use case I finished writing ~5 minutes ago is extracting data from the Map via Range. I want to pre-size the buffer where I'm storing data because data extraction is a common operation. (I won't run into the corner case where, after pre-sizing the data, a competing goroutine clears the Map.)

Since whenever I need to use sync.Map I create a wrapper type with type-safe methods, I easily added a guess int64 field. This allowed me to have a "best guess" at how much memory I needed to pre-allocate.

I was replacing a normal map (which was using len(m)) with sync.Map, and the lack of a Len method caused me to think and realize that there's no way to 100% replicate the behavior of a Mutex-locked map.

Anyway, I think a Len method could be the source of subtle bugs and perhaps hamper future optimizations.

@larytet
Copy link

@larytet larytet commented Jul 31, 2019

My use case. I want to print a sample from the map - a single arbitrary object. I want to print a nice "No data in the map" message if the map is empty. Without the map.Size() API I need two more lines of code - declare a variable and set the variable in the call to Range().

To be completely frank I keep a counter of objects in all my maps for debug. However I think that map without a Size() API is an unusual animal.

@jessicayuen
Copy link

@jessicayuen jessicayuen commented Apr 28, 2020

I'd like to add another use case, and am seeking alternative suggestions. My use case is with testing. I recently moved from a conventional map + RWMutex to the sync.Map model due support for concurrent locks on a per-key basis. The value is of type chan. In various of my tests, I'd like to validate that the map is populated and cleaned up appropriately when server shutdown handlers are called. The specifics of the channel contents aren't important for my use case.

With maps I can easily check assert.Equal(t, 1, len(myMap)) before and assert.Equal(t, 0, len(myMap)) after, but it's not so simple with sync.Map.

I'm getting around this currently by wrapping the check in a function that iterates .Range().

@bcmills Has there been any reconsideration for adding native len or.Length() support?

@larytet
Copy link

@larytet larytet commented Apr 29, 2020

@jessicayuen , add an atomic counter after every add/delete. If the atomic is zero the map is empty. This approach has a benefit. You can add a simple performance monitor (see the idea https://github.com/larytet-go/accumulator) on top of the atomic and watch the load of the map in the real-time.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 30, 2020

@jessicayuen, if the test fails, presumably you're going to need to know more about the failure mode than just the length of the map. (Which key or keys failed to be removed? Are the channels associated with those keys nil, or empty, or buffered with non-empty buffer contents?)

So you likely need that Range call to obtain enough information to diagnose the failure anyway, and given that, the Len method is superfluous.

@ocket8888
Copy link

@ocket8888 ocket8888 commented Jul 22, 2020

My use-case for this, since one was asked for earlier, is for efficient allocation of space in a regular map when reading values out of the sync.Map. Operations are mostly write, but every once in a while (on request) I dump the contents to a regular map to serve out of an API. Currently it looks like:

mystuff := map[interface{}]interface{}{}
syncMap.Range(func (key, value interface{}) bool {
	mystuff[key] = value
	return true
})

and these maps get big, so that does a lot of reallocation. It'd be an improvement to be able to do:

mystuff := make(map[interface{}]interface{}, syncMap.Len())

even if that's misleading because other things are added before the Range because it'll still help avoid reallocs.

@larytet
Copy link

@larytet larytet commented Jul 22, 2020

and these maps get big, so that does a lot of reallocation. It'd be an improvement to be able to do:

mystuff := make(map[interface{}]interface{}, syncMap.Len())

even if that's misleading because other things are added before the Range because it'll still help avoid reallocs.

@ocket8888 ,
I think that in this specific situation it is better for the application to keep a counter, atomic or not, of calls to Put(). I would go even further suggesting to prepare the regular map "mystuff" ahead of time if the memory allows.
If the map is large a problem can have system wide impact. it makes sense to watch the map closely.

@ocket8888
Copy link

@ocket8888 ocket8888 commented Jul 22, 2020

The actual sync.Map is a private member of a wrapping structure, and doesn't expose Put, only Store and LoadOrStore, but I suppose it would be simple enough to count those uses just the same.

Preparing the map ahead of time is out of the question for a litany of reasons, not the least of which being "I would have no idea how to even begin implementing that in this pile of fettucine code, and the very thought gives me shakes and dry heaves".

Thanks for the suggestion, I think I'll probably do that at some point.

@blacklat

This comment was marked as disruptive content.

@cristaloleg
Copy link

@cristaloleg cristaloleg commented Sep 21, 2020

Operations are mostly write

@ocket8888 afaik sync.Map works better under ready-heavy workloads. It's also stated in the documentation https://pkg.go.dev/sync#Map Probably simple map with RWMutex might be better for your case.

@ocket8888
Copy link

@ocket8888 ocket8888 commented Sep 23, 2020

From the sync.Map docs:

"The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex. "

My case is the second. There are many goroutines reading, writing, and especially overwriting entries for disjoint sets of keys. Actually, to be specific, there are many goroutines writing and overwriting, and only one reading.

@larytet
Copy link

@larytet larytet commented Sep 23, 2020

@ocket8888 May be a custom lock free hashtable makes sense in your case?
For example/inspiration check this https://github.com/larytet-go/hashtable
and https://github.com/larytet/mcachego

@protosam
Copy link

@protosam protosam commented Sep 26, 2020

Can someone clarify if the stance changed on this? Revisiting this after over a year, I think this entire issue needs to be closed. We're beating a dead horse that's going nowhere.

This became evident to me over time as I realized this feature doesn't matter. It's not a blocker for anything, coding around it is easy, and the only people wanting this are just asking for a convenience tool (I will admit that was the case that led me to being in this issue).

That leads me to the solution. The blocker has already been expressed in this issue of "who needs it?" The way I see it, if we really want it, we have to go make it, market it, and maybe one day we might be able to show the interest it takes to get it into the standard library.

Absent of that, this is a dead horse. Good luck comrades. I'm unsubscribing.

@pioz
Copy link

@pioz pioz commented Oct 1, 2020

What's the use-case?

Testing, I need to test if the elements on the map are exactly what I expect.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Oct 1, 2020

Then you’d want to iterate over the map to asset that it’s contents are identical.

@larytet
Copy link

@larytet larytet commented Nov 10, 2020

When I implement a counter of stored keys internally I occasionally do the following:

	if _, ok = m.MyLoad(key); !ok {
		// do my stuff 
		m.MyStore(key, value)
	}

In the call to MyStore() I increment the length counter m.len and call to sync.Map.Store(). In a multi-thread environment I need to call sync.Map.Lock() around the key existence test. Otherwise I would count calls to Map.Store() instead of the new keys.

Is there a chance to add a return value to the sync.Map.Store which returns Ok if the key did not exist in the map before the call?

@jeremycochoy
Copy link

@jeremycochoy jeremycochoy commented Jun 13, 2021

Just call it "BlockingLen" or "SlowLen" or "DontUseItLen" if you are worried users don't realize it is costly, but I don't think that hundreds (looks at the upvotes...) of peoples having to re-implement len by hand is very efficient...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet