Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: enhance Map to provide `UpdateOrStore` #20884

Open
RussellLuo opened this issue Jul 2, 2017 · 17 comments

Comments

@RussellLuo
Copy link

commented Jul 2, 2017

Motivation

Per src/sync/map.go, Map has already provided the following operations:

  • Load
  • Store
  • LoadOrStore
  • Delete
  • Range

But there are also many times when we need to:

  1. Update the value of a key, if present, based on its existing value.
  2. Otherwise, store a value for the missing key.

For example, if we want to increment the value of an existing key in a concurrency-safe way, we can not leverage Map now.

Possible Solution

In order to satisfy the need mentioned above, we may enhance Map to provide one more operation, such as UpdateOrStore to follow the naming rule of LoadOrStore.

One possible implementation maybe like this:

// UpdateOrStore updates the value of the key, if present, to the value
// returned by f, which is called by passing the existing value of the key,
// and returns the finally updated value.
//
// Otherwise, it stores and returns the value returned by f, which is called
// by passing nil.
//
// The updated result is true if the value was updated, false if stored.
func (m *Map) UpdateOrStore(key interface{}, f func(interface{}) interface{}) (final interface{}, updated bool) {
	read, _ := m.read.Load().(readOnly)
	if e, ok := read.m[key]; ok {
		if final, updated, ok = e.tryUpdateOrStore(f); ok {
			return final, updated
		}
	}

	m.mu.Lock()
	read, _ = m.read.Load().(readOnly)
	if e, ok := read.m[key]; ok {
		if e.unexpungeLocked() {
			// The entry was previously expunged, which implies that there is a
			// non-nil dirty map and this entry is not in it.
			m.dirty[key] = e
		}
		final, updated, _ = e.tryUpdateOrStore(f)
	} else if e, ok := m.dirty[key]; ok {
		final, updated, _ = e.tryUpdateOrStore(f)
	} else {
		if !read.amended {
			// We're adding the first new key to the dirty map.
			// Make sure it is allocated and mark the read-only map as incomplete.
			m.dirtyLocked()
			m.read.Store(readOnly{m: read.m, amended: true})
		}
		value := f(nil)
		m.dirty[key] = newEntry(value)
		final, updated = value, false
	}
	m.mu.Unlock()

	return final, updated
}

// tryUpdateOrStore updates or stores a value if the entry has not been expunged.
//
// If the entry is expunged, tryUpdateOrStore returns false and leaves the
// entry unchanged.
func (e *entry) tryUpdateOrStore(f func(interface{}) interface{}) (final interface{}, updated, ok bool) {
	p := atomic.LoadPointer(&e.p)
	if p == expunged {
		return nil, false, false
	}
	var value interface{}
	for {
		if p == nil {
			value = f(nil)
		} else {
			value = f(*(*interface{})(p))
		}
		if atomic.CompareAndSwapPointer(&e.p, p, unsafe.Pointer(&value)) {
			return value, p != nil, true
		}
		p = atomic.LoadPointer(&e.p)
		if p == expunged {
			return nil, false, false
		}
	}
}

Usage

package main

import (
	"fmt"
	"sync"
)

func main() {
	m := &sync.Map{}
	incr := func(old interface{}) interface{} {
                if old == nil {
                        return int64(1) // initialized to 1
                } else {
                        return old.(int64) + 1 // incremented by 1
                }
        }
        final, updated := m.UpdateOrStore("counter", incr)
	fmt.Println("final:", final, "updated:", updated)
}
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2017

CC @bcmills

I don't think an API based on the sync code calling a user-supplied function while holding a lock is a good idea.

Can you describe a situation where this functionality is necessary? We don't want to add features in the abstract, we want to have specific use cases. Your example suggests using this to keep a counter in the map, but as far as I can see that can also be done by storing a *uint32 in the map using LoadOrStore and using atomic.AddUint32.

@RussellLuo

This comment has been minimized.

Copy link
Author

commented Jul 2, 2017

@ianlancetaylor Thanks for your reply!

My original intention is to manage the increment of a counter in a intuitive way, while keeping the counter in Map. In the particular case of the counter, your suggestion is a great alternative.

But if the value stored in Map is nontrivial, e.g. a struct type, which can not be processed atomically by the package atomic, the functionality mentioned above may be needed.

To atomize the two Map round-trips:

  1. Load the old value first
  2. Calculate the new value based on the old one, and then Store the new value

I have no idea of any solution else except a user-defined function. (As far as the risk of blocking, maybe we can make the risk explicit to user, and then users ought to be responsible for their actions?)

@OneOfOne

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2017

I actually had to fork sync.Map and add a similar modification for the same reasons, it'd be nice if this was in the stdlib.

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

@RussellLuo

if the value stored in Map is nontrivial, e.g. a struct type, which can not be processed atomically by the package atomic

I think I'm missing something about your use-case. Why can't/wouldn't you make the map values pointers to structs containing a sync.Mutex?

A more realistic example might help. Your usage example looks a lot like (*expvar.Map).Add, which is currently implemented in terms of the existing sync.Map API.

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

Beyond that, I have several concerns with the proposed implementation.

  1. The API as proposed seems prone to lock-ordering bugs: if f calls another Map method while m.mu is held, the program may deadlock. (My initial implementation of Range had the same problem, but the last round of improvements eliminated the callback in the locked section, at the cost of making Range O(N) even if it only executes a constant number of iterations.)

  2. The sync.Map methods have been somewhat optimized to avoid unnecessary allocations, particularly in compare-and-swap loops. In your proposal, the interface returned by f escapes, so it must be heap-allocated, and the assignment to value on each iteration will drop the previous value: the amount of allocation overhead will thus be proportional to the number of CAS iterations.

  3. The fact that the current sync.Map implementation uses an extra pointer indirection is, in my opinion, somewhat unfortunate. I'm a bit concerned that UpdateOrStore would close off potential improvements in that direction. (I haven't thought about that very much, so it could turn out not to be a big deal.)

@RussellLuo

This comment has been minimized.

Copy link
Author

commented Jul 7, 2017

@bcmills

I think I'm missing something about your use-case. Why can't/wouldn't you make the map values pointers to structs containing a sync.Mutex?

As I have mentioned above, my original user-case is just a counter.

While a simple counter just involves a value of int type, the counter case can easily be generalized to some complicated one, which uses a value of a complicated type (e.g. struct type).

That is to say, the complicated case with the value of struct type is imaginary.

As far as the solutions to the following two specific cases:

  1. For int value, just "store a *uint32 in the map using LoadOrStore and using atomic.AddUint32" (from @ianlancetaylor)
  2. For struct value, just "make the map values pointers to structs containing a sync.Mutex" (from you:-)

I think both of the above solutions are practical and really simple. But these solutions also remind me of using map with sync.RWMutex, which is also simple, but not efficient enough. (which may be one of the intention of sync.Map?)

And I also think using sync.Map with an additional mutex (or an additional atomic operation) is a mixed method in a way. So I wonder if there is a more pure way to address these cases, such as enhancing sync.Map itself to handle them.

But I haven't thought a lot about it, maybe it's not easy, or maybe sync.Map is not dedicated to these complicated cases?

@RussellLuo

This comment has been minimized.

Copy link
Author

commented Jul 7, 2017

As for the concerns:

Beyond that, I have several concerns with the proposed implementation.

I totally agree with you, @bcmills.

Since the user-defined function f, which is used to calculate the new value based on the old one, must be called every time after the actual old value is loaded (either from m.read or from m.dirty) or after it's believed to be missing. It's hard to address the issues you mentioned:

  1. If the key is in m.dirty or if it's missing, then the old value of the key can be loaded and updated only in a context where m.mu is held, then f is prone to have deadlock bugs. (If f is minimal and carefully written, then the tendency can be eliminated. But as you can see, it depends.)
  2. As far as I know, the heap-allocating overhead here is caused by the interface{} type, which can not be handled cleverly by current Go compiler. And I haven't figured out a solution to avoid it after thinking it over :-(
  3. As for the extra pointer indirection, to my knowledge, it's mainly restricted by the package "atomic", which can only handle integers and pointers.

All of the above is my humble opinion, maybe some folks have good ideas?

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 7, 2017

these solutions also remind me of using map with sync.RWMutex, which is also simple, but not efficient enough.

The reason we added sync.Map in the first place was to address cache contention in various places in the standard library, but if you have a read-modify-write operation on a particular value you likely already have contention on that value.

And note that you can use more subtle synchronization within the values: for example, you could use atomic.Value or atomic.CompareAndSwapPointer instead of sync.RWMutex to keep all of the read paths contention-free.


the complicated case with the value of struct type is imaginary.

It's much easier to add to an API than to remove from it, so I'd prefer to wait to expand sync.Map until we're sure it addresses concrete use-cases.

Fortunately, forks of sync.Map are fairly benign: sync.Map shouldn't appear in your exported API anyway, so the cost of a fork is perhaps a little bit of code bloat. That's great, because it gives everyone more room to experiment! You can try out variations on the API and implementation in real code, see the impact on benchmarks and readability, and build evidence to support proposed API changes, all without being constrained by the Go release cycle.

@OneOfOne

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2017

The problem with

For int value, just "store a *uint32 in the map using LoadOrStore and using atomic.AddUint32"

That requires passing new(uint32) with every call to LoadOrStore rather than say LoadOrXXXX("key", func() interface{} { return new(uint32) }).

While that's not really a big deal for a small value, imagine having a struct that's slightly bigger than that, allocating it every single call.

One of the packages that moved to sync.Map in the stdlib (not sure which, I can't look right now) worked around that with calling .Load first then when that fails calls LoadOrStore, that works but it's also more expensive than just having a LoadOrStoreFn(key string, fn func() interface{}) that will automatically call fn if key doesn't exist, and that only needs to happen only once.

I have a very simple implementation, I can push a PR if needed, it simply adds a new function that does exactly that and it makes life much easier.

Example:

type Data struct {
	m sync.Mutex
	..... big complex struct ....
}
type S struct {
	m sync.Map
	// .....
}

func (s *S) createValue() interface{} {
	return &Data{
		// ... init fields
	}
}

func(s *S) GetOrUpdate(key string) *Data {
	di, _ := s.m.LoadOrStoreFn(key, s.createValue)
	return di // check if it's (*Data, etc)
}
@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 7, 2017

@OneOfOne

One of the packages that moved to sync.Map in the stdlib [calls] .Load first […], that works but it's also more expensive

I suspect you're thinking of expvar.Map, which turned out to be one of the more interesting examples in the standard library (and is the reason for the current double-indirection implementation).

You're right that LoadOrStore is often more expensive than a LoadOrStoreFn could be, particularly if the value is a flat struct, but one of the things I found in making the standard-library changes is that escape-analysis often makes the performance tradeoffs diverge from what you might expect. For example, if your fn closes over other values, those values might end up heap-allocated whereas before they could have been on the stack.

In practice, the extra Load often isn't a big deal anyway: if the map hasn't been amended (or if the key already existed), the Load can be serviced out of the local core's cache, which has very little cost compared to the Store part of the LoadOrStore (which currently does introduce cross-core contention).

Concrete benchmark results on a machine with many cores would help make the case one way or the other.


I have a very simple implementation [that] adds a new function that does exactly that

Does it address the issues I raised above? If so, how?

@OneOfOne

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2017

Here's an extremely simple benchmark:

BenchmarkLoadOrStoreAddOnValue/*syncmap.Map-8           50000000                23.9 ns/op            15 B/op          1 allocs/op
BenchmarkLoadOrStoreFnAddOnValue/*syncmap.Map-8         100000000               20.1 ns/op             7 B/op          0 allocs/op

https://gist.github.com/OneOfOne/7661de2ba606661d79ae91a9fcbc64b3

I worked around by simply only ever calling the function once:

func (e *entry) tryLoadOrStoreFn(i func() interface{}) (actual interface{}, loaded, ok bool) {
	p := atomic.LoadPointer(&e.p)
	if p == expunged {
		return nil, false, false
	}
	if p != nil {
		return *(*interface{})(p), true, true
	}

	// Copy the interface after the first load to make this method more amenable
	// to escape analysis: if we hit the "load" path or the entry is expunged, we
	// shouldn't bother heap-allocating.
	ic := i()
	for {
		if atomic.CompareAndSwapPointer(&e.p, nil, unsafe.Pointer(&ic)) {
			return ic, false, true
		}
		p = atomic.LoadPointer(&e.p)
		if p == expunged {
			return nil, false, false
		}
		if p != nil {
			return *(*interface{})(p), true, true
		}
	}
}
@OneOfOne

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2017

@bcmills with only 100 unique keys per map and calling atomic.AddUint64 on one of the big struct fields.

➤ go get github.com/OneOfOne/syncmap
➤ go test -bench=BigStruct -benchmem -benchtime=3s github.com/OneOfOne/syncmap
goos: linux
goarch: amd64
pkg: github.com/OneOfOne/syncmap
BenchmarkLoadOrStoreBigStruct/*syncmap.Map-8            100000000               44.4 ns/op           183 B/op          1 allocs/op
BenchmarkLoadOrStoreFnBigStruct/*syncmap.Map-8          200000000               19.6 ns/op             7 B/op          0 allocs/op
PASS
ok      github.com/OneOfOne/syncmap     10.609
@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 7, 2017

3.8ns/op seems like a pretty miniscule difference, and that microbenchmark probably represents the most favorable conditions to demonstrate it. (Also, 8 cores is not really enough to hit the knee of the sync.Map curve: the point is to drop operations from O(N) to O(1) with N cores, so you want to measure with N ≫ 1.)

What's the impact on more realistic code, like expvar.Map or whatever your actual use-case is?

@OneOfOne

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2017

For real code (which is the previous benchmark, just copied the struct I was using and modified it a little) it's over 2x faster and uses less memory.

For expvar.Map.AddFn it was rather odd, it was the opposite of my app's benchmark:

BenchmarkMapAddDifferentSteadyState-8           30000000                45.0 ns/op             0 B/op          0 allocs/op
BenchmarkMapAddFnDifferentSteadyState-8         20000000                94.9 ns/op            64 B/op          4 allocs/op

func newInt() interface{} { return new(Int) }
func (v *Map) AddFn(key string, delta int64) {
	var dup bool
	i, dup := v.m.LoadOrStoreFn(key, newInt)
	if !dup {
		v.addKey(key)
	}

	// Add to Int; ignore otherwise.
	if iv, ok := i.(*Int); ok {
		iv.Add(delta)
	}
}

I'll investigate more when I have sometime later today, I pushed my modifications to https://github.com/OneOfOne/syncmap

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 7, 2017

For real code (which is the previous benchmark, just copied the struct I was using and modified it a little) it's over 2x faster and uses less memory.

Compared to which alternative? I would expect the extra-Load approach to perform better than LoadOrStore with an extra allocation, but it would be useful to see both of those points for comparison.

@OneOfOne

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2017

@bcmills you are correct, it's actually (slightly) faster for my use case.

BenchmarkLoadOrStoreBigStruct/*syncmap.Map-8            100000000               41.3 ns/op           183 B/op          1 allocs/op
BenchmarkLoadThenLoadOrStoreBigStruct/*syncmap.Map-8          300000000               17.7 ns/op             7 B/op          0 allocs/op
BenchmarkLoadOrStoreFnBigStruct/*syncmap.Map-8                  300000000               18.2 ns/op             7 B/op          0 allocs/op

https://github.com/OneOfOne/syncmap/blob/master/map_bench_test.go#L168

@pcostanza

This comment has been minimized.

Copy link

commented Jul 28, 2017

CompareAndSwap or mutexes on the values are not necessarily safe, because they don't see concurrent Deletes.

@bcmills bcmills added this to the Unplanned milestone Apr 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.