Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: add Map.Len method? #20680

Open
F21 opened this issue Jun 15, 2017 · 40 comments

Comments

@F21
Copy link

commented Jun 15, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.9-beta1

What operating system and processor architecture are you using (go env)?

Windows 10 64-bit

set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Work
set GORACE=
set GOROOT=C:\Go
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config

It would be really useful to have a Length() method on sync.Map that can return the number of items in a map. Currently, I need to do something like this, which is quite tedious and not as readable:

length := 0

myMap.Range(func(_, _ interface{}) bool {
	length++
	
	return true
})
@randall77

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2017

@bcmills

This comment has been minimized.

Copy link
Member

commented Jun 15, 2017

What's the use-case?

@F21

This comment has been minimized.

Copy link
Author

commented Jun 15, 2017

I am storing items filled by goroutines processing data into 2 separate maps. Once this is done, I need to compare the length of these 2 maps to perform a quick sanity check and decide on which branch to proceed for further processing.

Previously I was using github.com/orcaman/concurrent-map and there was a Count() method to do this.

@bradfitz bradfitz changed the title Length() for sync.Map sync: add Map.Length method? Jun 15, 2017

@bradfitz bradfitz added this to the Go1.9Maybe milestone Jun 15, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 15, 2017

This would normally be a Go 1.10 thing, but since this type is new in Go 1.9, I'll let @bcmills decide.

The implementation might be hairy enough to warrant Go 1.10 anyway, especially if the representation needs to change and other code gets modified.

@cespare

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2017

Isn't the number you get out going to be a best-effort guess anyway? (Like len(channel)). Seems like an easy user workaround would be to maintain a parallel atomic int64 count.

(I don't know if that'd also be a reasonable internal implementation or not.)

@F21

This comment has been minimized.

Copy link
Author

commented Jun 15, 2017

In my case, I am retrieving the length of the map after all goroutines have finished.

@bcmills

This comment has been minimized.

Copy link
Member

commented Jun 15, 2017

sync.Map is optimized for long-lived, mostly-write workloads, for which a Len method would either be misleading (under-counting keys) or inefficient (introducing cache contention on the length counter).

The Range workaround at least has the benefit of appearing to be as expensive as it actually is.

I'm not opposed to the idea of adding a Len method, but I think we would need to show that the use-case is common enough to merit an API with such subtleties. At the very least, I don't think we should add it in 1.9.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe Jun 15, 2017

@rsc rsc changed the title sync: add Map.Length method? sync: add Map.Len method? Jun 26, 2017

@AlexStocks

This comment has been minimized.

Copy link

commented Jul 6, 2017

@bcmills Do not you think it's very rediculous that Go has a built in func len for map while sync.Map does not have a corresponding function func (Map)Len() int to get its size?

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 6, 2017

@AlexStocks No, I don't think it's "very [ridiculous]". Concurrent data structures are not the same as unsynchronized data structures. The built-in map type doesn't have (and doesn't need) a LoadOrStore, for example.

We should decide the API of each type based on its own tradeoffs. Consistency is a benefit, but there are costs to weigh it against.

@rfyiamcool

This comment has been minimized.

Copy link

commented Jul 13, 2017

I thank go1.9 sync.map length feature should be required. Of course, I'm just a suggestion.
don't range all entry, Extend a field as a atomic counter ..

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 16, 2017

@rfyiamcool

Extend a field as a atomic counter

Delete calls (and Store calls with previously-deleted keys) on disjoint keys in the read-only part of the map do not contend in the current implementation. An atomic counter would reintroduce contention for those calls.

We didn't omit Len just to be stubborn. It really is a subtle problem.

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 17, 2017

#21035 has more detail on some of the optimizations that might complicate an efficient implementation of Len.

@GoLangsam

This comment has been minimized.

Copy link

commented Jul 26, 2017

@bcmills May I suggest to have some remark in the source comments saying something like
"As of now, Len() is intentionally not implemented/provided due to ..." with a brief and concrete rationale.

This would ease understanding and adjust expectations of future users (who much more likely read source comments than old issues).

( And this should also be applied to other methods suggested elsewhere (such as UpdateOrStore) and currently not reasonably implementable. )

@maj-o

This comment has been minimized.

Copy link

commented Sep 7, 2017

I looked up the sourcecode of Range() and I think (knowing that the result could be wrong, see remark on Range() and above) this would be the solution for Len()

return len(read.m) instead of the for loop in line 328

@bcmills

This comment has been minimized.

Copy link
Member

commented Sep 7, 2017

@maj-o Promoting the read map makes Len an O(N) operation if interleaved with Store. By convention, Len methods on types in the Go standard library are O(1).

@protosam

This comment has been minimized.

Copy link

commented Sep 15, 2017

I believe that having a .Count() or .Len() or even just adding a count integer would be a good addition for syncmap.Map. Personally I'm using sync.Map to track connections in an asynchronous p2p API that is heavily built around propagating network writes. The count is necessary for artificial connection limits and I had to switch from maps, because of raise conditions in goroutines.

Hope this shows some use-cases for you. I can think of various situations where this simple functionality would reduce code by many lines.

@cznic

This comment has been minimized.

Copy link
Contributor

commented Sep 15, 2017

The count is necessary for artificial connection limits and I had to switch from maps, because of raise conditions in goroutines.

I think that does not justify adding a method to sync.Map, because it's easy to make your own derived type which just additionaly keeps the count/len information if you think such information is useful.

@maj-o

This comment has been minimized.

Copy link

commented Sep 16, 2017

If dirty has length oft N.
But 1sr : nobody is interested in dirty - if this blocks You, leave it out - O(1)
2nd : we can asume, that dirty is somehow const and small against the total amount - O(1) + const = O(1)
I know that in worst case (map is empty and dort is full) we could habe O(N) - this can be legt out or ignorred, because in real life it dies not matter. In real live the map is bigger then dirty and the amount of dirty is constant.. So, if it hurts You, comment the dirty loop out or leave it, because it does not matter in real live. And there it is O(1)

@bcmills

This comment has been minimized.

Copy link
Member

commented Sep 19, 2017

@maj-o

If dirty has length oft N.

That assumes that the internal representation of sync.Map always includes a single dirty map. Nothing in the API or documentation guarantees that to be the case, and in fact some optimizations (such as #21035) may require the opposite.

@dhui

This comment has been minimized.

Copy link

commented Oct 4, 2017

My usecase for an O(1) Len() func is to pre-allocate a slice of data (an optimization) to hold a filtered subset of the the map values, populated by a call to Range(). FYI, my primary use for sync.Map is concurrent O(1) lookups.

So for my usecase, Len() doesn't need to be consistent since the worse case is an under-allocated slice (resulting in inconsistent performance) or an over-allocated slice (resulting in a bit more memory consumed).

@protosam

This comment has been minimized.

Copy link

commented Oct 5, 2017

Hey guys,

I just want to clarify on something here. What would you say the scope of requirements adding such functionally be?

Like, to make a decision to add this utility function a part of the API, are you looking for a large quantity of developer need/want for the feature?

Or are we just aiming to decide how it should work?

I'm the former situation I believe that a count feature would just be expected by developers considering that the map type can be used in len()

In the latter situation if it's something that we all want added, but comes down to deciding how it should operate, I think that leads to two potential situations:

1 - a somewhat ballpark count of the map would be necessary for an application. This could be non blocking and the result just needs to be close to the current state.

2 - a blocking function that needs to be timed for an absolute accurate count at the given moment.

A utility function could be made for both scenarios.

I think it is best to make at least the boilerplate functionality for developers to help evade bad practices due to niavity when using the API. Best practices with concurrency is not obvious to new developers and devs trying to adopt go. I do foresee newbies at least reading a godoc and being able to choose based on need.

@bcmills

This comment has been minimized.

Copy link
Member

commented Oct 9, 2017

are you looking for a large quantity of developer need/want for the feature?

Yes, and use-cases that it would (or would not) improve.

Or are we just aiming to decide how it should work?

Not at this time.

I think it is best […] to help evade bad practices due to [naivety] when using the API. Best practices with concurrency is not obvious to new developers and devs trying to adopt go.

I agree that it is important to structure the API to encourage best practices. One best practice for using a concurrent data structure is to avoid depending on global properties of it (such as size), because computing those global properties can incur a surprising cost. That is perhaps the strongest reason why sync.Map does not have a Len method today.

@henrylee2cn

This comment was marked as off-topic.

Copy link

commented Oct 13, 2017

From #22247

func (m *Map) Len() int

The following is a reference:

https://github.com/henrylee2cn/goutil/blob/master/map.go#L541

@ychen11

This comment was marked as off-topic.

Copy link

commented Mar 7, 2018

So whats the status of this issue? After this argument? Personally I just saw a million of developers are trying to persuade an arrogant golang maintainer that their requirement needs to be noticed.
Feel bad for this issue, typically on top of Golang.

@josharian

This comment was marked as off-topic.

Copy link
Contributor

commented Mar 7, 2018

@ychen11 please be polite

@golang golang deleted a comment from c-Monster Mar 7, 2018

@golang golang locked as too heated and limited conversation to collaborators Mar 7, 2018

@golang golang unlocked this conversation Jul 13, 2018

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 13, 2018

Unlocking to allow for further comments regarding concrete examples and usage statistics.
(Ideally, please reference experience reports addressing concrete problems and the available workarounds.)

“Me too” comments that do not address the technical considerations up-thread are off topic and will be hidden or deleted.

@protosam

This comment has been minimized.

Copy link

commented Jul 19, 2018

To be fair, I don't even think Sync.map has a large amount of projects that even need it at this time. Though I have 4 separate projects in which I am clumsily iterating my maps to get counts. I would still like to see something more standardized, planned out, and discussed; eventually implemented as a feature. That way I don't have to second guess if my code is bad later.

@bcmills in regards to you comment from Oct 9, 2017, I can definitely see what you're getting at. In this particular case, considering what sync.Map, I think not globalizing a way to track length is a bit of an oversight.

Hypothetically the type of applications that will use (or can benefit from) sync.Map are likely to be any concurrent tasks that would want to have managed access to a map. In server-side implementations where the server is delegating information between it's clients, it is quite common to want to have a count of things. If someone wanted to use this to concurrently push data sources into memory, a resulting count of things may be application after.

Something I've been considering for a while is to copy sync.Map, adding an integer variable, a function that outputs the value of the new integer, and then do addition/subtraction on it everytime Store() or Delete() is completed successfully.

This is starting to look more appealing than iterating the entire map to confirm the count. I'm in a situation where the amount of time it takes to count the entire map is slowing things down noticeably and I think that just giving the small compute time it takes to just do this plan will alleviate my problem.

@Zalgo2462

This comment has been minimized.

Copy link

commented Jul 19, 2018

Hello, I am currently developing an application which matches up pairs of individual records in a stream of input data. I have hash-partitioned the data stream such that each of my go routines will use disjoint sets of keys in my sync.Map. However, sometimes a record won't have a corresponding match in the data stream. Over time, these records accumulate in the map. I'd like to routinely get an estimate of the map's size to trigger an eviction policy carried out by Range. This keeps the RAM usage of the program fairly constant without degrading performance.

Currently I am looking at implementing a solution similar to @protosam and maintaining a count myself using a derived type from sync.Map

I'm not sure what I'm asking for should be called Len, but it would certainly be useful.

@Inkeliz

This comment has been minimized.

Copy link

commented Sep 5, 2018

I think the Len (or Count) is usefull when a maximum number of values need to be enforced. I don't know if has a workaround. I use the sync.Map to store information about nodes/peers in the network, however one single client maybe don't like to store everything and only keep N values. Using the map is possible to do a simple if len(...) > and then ignore the insert, it can't be done directly with sync.Map.

@bcmills

This comment has been minimized.

Copy link
Member

commented Sep 5, 2018

We could, in theory, keep track of the count of items in the map using some sort of sharded counter (as proposed in #18802), but that would still be pure overhead for applications that don't need it.

That suggests that @protosam's idea to wrap the sync.Map with a separate counter may be the right direction: you can always implement the same method set in the wrapper, and then you would only pay the overhead of tracking the length for call sites that actually use that wrapper.

@mojinfu

This comment has been minimized.

Copy link

commented Feb 27, 2019

Hello, I am currently developing an application which matches up pairs of individual records in a stream of input data. I have hash-partitioned the data stream such that each of my go routines will use disjoint sets of keys in my sync.Map. However, sometimes a record won't have a corresponding match in the data stream. Over time, these records accumulate in the map. I'd like to routinely get an estimate of the map's size to trigger an eviction policy carried out by Range. This keeps the RAM usage of the program fairly constant without degrading performance.

Currently I am looking at implementing a solution similar to @protosam and maintaining a count myself using a derived type from sync.Map

I'm not sure what I'm asking for should be called Len, but it would certainly be useful.

Hi, you can use my package, https://github.com/mojinfu/cmap, i think it is what you want.

@mojinfu

This comment has been minimized.

Copy link

commented Feb 27, 2019

We could, in theory, keep track of the count of items in the map using some sort of sharded counter (as proposed in #18802), but that would still be pure overhead for applications that don't need it.

That suggests that @protosam's idea to wrap the sync.Map with a separate counter may be the right direction: you can always implement the same method set in the wrapper, and then you would only pay the overhead of tracking the length for call sites that actually use that wrapper.

Hi, you can use my package, https://github.com/mojinfu/cmap, i think it is what you want.

@mojinfu

This comment has been minimized.

Copy link

commented Feb 27, 2019

@rfyiamcool

Extend a field as a atomic counter

Delete calls (and Store calls with previously-deleted keys) on disjoint keys in the read-only part of the map do not contend in the current implementation. An atomic counter would reintroduce contention for those calls.

We didn't omit Len just to be stubborn. It really is a subtle problem.

i still cant understand,
When Delete called , and go in the lock free path , delete a normal key ( turn point into nil) , it will return true , then I call atomic.AddInt64(&m.length, -1) which is thread safe func to reduce length.
(you can see it in https://github.com/mojinfu/cmap/blob/master/cmap.go line:353, and store logic line 153)
it would not reintroduce contention .

Which step of my understanding is incorrect?

@bcmills

This comment has been minimized.

Copy link
Member

commented Feb 27, 2019

@mojinfu, atomic operations are not contention-free: they just move the locking from the application layer to the CPU.

(If you have N CPU cores working with the same data concurrently, the atomic.AddInt64 operations are executed sequentially, so each operation takes O(N) time. Each core first acquires exclusive access to the cache line, then copies the value from the core that previously owned that line, and finally updates the cached value. On an Intel CPU, that process takes about 40ns per atomic op.)

@mojinfu

This comment has been minimized.

Copy link

commented Feb 27, 2019

@mojinfu, atomic operations are not contention-free: they just move the locking from the application layer to the CPU.

(If you have N CPU cores working with the same data concurrently, the atomic.AddInt64 operations are executed sequentially, so each operation takes O(N) time. Each core first acquires exclusive access to the cache line, then copies the value from the core that previously owned that line, and finally updates the cached value. On an Intel CPU, that process takes about 40ns per atomic op.)

you are right!
after fixed logic bugs,the result of len method in 'cmap' package is working perfectly
then i found
benchmark : 100 times Store(i, i) and Delete(i) (goos: darwin goarch: amd64)
sync.Map: 21230 ns/op 5600 B/op 499 allocs/op
CMap: 24243 ns/op 5600 B/op 499 allocs/op

it means each Store or Delete action will take another 15ns
i think cmap will be useful in some scene.

@protosam

This comment has been minimized.

Copy link

commented Mar 4, 2019

@mojinfu in regards to using your package, this is really a discussion about getting .Len added to the existing package sync/map.

I have a function that works as I've described in prior comments to approximate the length of the map. It just seems self evident to me that it should be a built in function for the library.

There's a couple of caveats in regards to adding a new .Len method:

  • how it should work
  • what are the trade-offs we want between functionality or performance
  • is this wanted enough to actually add to the package
@mojinfu

This comment has been minimized.

Copy link

commented Mar 18, 2019

@protosam thank you for using my package.
After I implement this method , I found that it's reasonable to remove Len method from package sync/map.
you can see it in the discussion between @bcmills and me.
"An atomic counter will slow down "create" and "delete" method 15ns when method be called "

As you said :“what are the trade-offs we want between functionality or performance”
right , so also it can be understood like that " In different scenarios, the weights of trade-offs are different"
in my test , when program runs on a "one CPU" computer , it works More efficient than sync/map.
Especially you got a big map (len bigger than 10000 ), Every call saves 0.17 ms.

You can see it in the readme table.

@protosam

This comment has been minimized.

Copy link

commented Mar 21, 2019

I'm not using your package....

@elagergren-spideroak

This comment has been minimized.

Copy link

commented Jun 13, 2019

I just stumbled on this issue while looking to see if there were plans for a method that reports whether it's deleted the key, so I figure I'll add my $0.02.

A current use case I finished writing ~5 minutes ago is extracting data from the Map via Range. I want to pre-size the buffer where I'm storing data because data extraction is a common operation. (I won't run into the corner case where, after pre-sizing the data, a competing goroutine clears the Map.)

Since whenever I need to use sync.Map I create a wrapper type with type-safe methods, I easily added a guess int64 field. This allowed me to have a "best guess" at how much memory I needed to pre-allocate.

I was replacing a normal map (which was using len(m)) with sync.Map, and the lack of a Len method caused me to think and realize that there's no way to 100% replicate the behavior of a Mutex-locked map.

Anyway, I think a Len method could be the source of subtle bugs and perhaps hamper future optimizations.

@larytet

This comment has been minimized.

Copy link

commented Jul 31, 2019

My use case. I want to print a sample from the map - a single arbitrary object. I want to print a nice "No data in the map" message if the map is empty. Without the map.Size() API I need two more lines of code - declare a variable and set the variable in the call to Range().

To be completely frank I keep a counter of objects in all my maps for debug. However I think that map without a Size() API is an unusual animal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.