Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fmt: improve or avoid panic when printing map that is changing as it is being printed #33275

Closed
J-CIC opened this issue Jul 25, 2019 · 29 comments
Closed

Comments

@J-CIC
Copy link

@J-CIC J-CIC commented Jul 25, 2019

What version of Go are you using (go version)?

$ go version
1.12

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/yaotianjia/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/yaotianjia/goworkspace"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build822647916=/tmp/go-build -gno-record-gcc-switches"

What did you do?

using fmt.Sprintf("%+v",arg) to print a map, and it panics

What did you expect to see?

It should shows the content of the map

What did you see instead?

runtime panic: runtime error: index out of range

runtime panic: runtime error: index out of range
......
panic(0x320a360, 0x6a3f2b0)
    /usr/local/go/src/runtime/panic.go:522 +0x1b5
internal/fmtsort.Sort(0x31d94a0, 0xc01b3a7ab0, 0x195, 0x1)
    /usr/local/go/src/internal/fmtsort/sort.go:60 +0x34f
fmt.(*pp).printValue(0xc03c81dc80, 0x31d94a0, 0xc01b3a7ab0, 0x195, 0x76, 0x2)
    /usr/local/go/src/fmt/print.go:759 +0xd20
fmt.(*pp).printValue(0xc03c81dc80, 0x360b300, 0xc01b3a7680, 0x199, 0xc000000076, 0x1)
    /usr/local/go/src/fmt/print.go:796 +0x1b52
fmt.(*pp).printValue(0xc03c81dc80, 0x30f0ba0, 0xc01b3a7680, 0x16, 0xc000000076, 0x0)
    /usr/local/go/src/fmt/print.go:866 +0x1956
fmt.(*pp).printArg(0xc03c81dc80, 0x30f0ba0, 0xc01b3a7680, 0x76)
    /usr/local/go/src/fmt/print.go:702 +0x2ba
fmt.(*pp).doPrintf(0xc03c81dc80, 0x3757e58, 0x3c, 0xc00fa93cf8, 0x2, 0x2)
    /usr/local/go/src/fmt/print.go:1016 +0x158
fmt.Sprintf(0x3757e58, 0x3c, 0xc00fa93cf8, 0x2, 0x2, 0x1e, 0x363fa6c)
    /usr/local/go/src/fmt/print.go:214 +0x66
......

the reason of the panic

https://github.com/golang/go/blob/release-branch.go1.12/src/internal/fmtsort/sort.go#L59

the built-in function is shown as follow

func Sort(mapValue reflect.Value) *SortedMap {
	if mapValue.Type().Kind() != reflect.Map {
		return nil
	}
	key := make([]reflect.Value, mapValue.Len())
	value := make([]reflect.Value, len(key))
	iter := mapValue.MapRange()
	for i := 0; iter.Next(); i++ {
		key[i] = iter.Key()
		value[i] = iter.Value()
	}
	sorted := &SortedMap{
		Key:   key,
		Value: value,
	}
	sort.Stable(sorted)
	return sorted
}

The variable key and value is pre-calculated and then the code iterates by iter, when another goroutine add a key to map, the length of key is shorter than the iter's range. Therefore, it panics.

My solutions

I think the code should consider the range of key, maybe it's better looks like this:

for i := 0; iter.Next() && i < len(key) ; i++ {
    ...
}
@av86743
Copy link

@av86743 av86743 commented Jul 25, 2019

When you mutate an object and at the same time access its content without any synchronization, behavior of the program is not specified.

In short: Your program does not exist.

@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 25, 2019

When you mutate an object and at the same time access its content without any synchronization, behavior of the program is not specified.

In short: Your program does not exist.

I know that concurrent behaviors of map result in undefined behavior, but maybe is not suitable to panic in fmt? This can be prevented easily by check whether the index i is in the range of key

J-CIC added a commit to J-CIC/go that referenced this issue Jul 25, 2019
this modification prevents the fmt panicking with reason index out of range

Fixes golang#33275

Change-Id: I81fbd1997b8cd1f5f30549eb4640b516105d1dcc
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 25, 2019

Change https://golang.org/cl/187577 mentions this issue: internal/fmtsort: restrict the for-range by len(key)

@martisch
Copy link
Contributor

@martisch martisch commented Jul 25, 2019

If there is a race: just adding a check without synchronisation might catch some of the hazards but not all. If the map changes while reading from it, it can still panic. The solution is not to have a race which requires a syncronisation like a mutex to not change the map while printing it with fmt.

@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 25, 2019

If there is a race: just adding a check without synchronisation might catch some of the hazards but not all. If the map changes just one instruction after i < len(key) it will still panic. The solution is not to have a race which requires a syncronisation like a mutex to not change the map while printing it with fmt.

Actually it panics not because of the race of codes, it panics because the index out of the range.

@martisch
Copy link
Contributor

@martisch martisch commented Jul 25, 2019

Ok. can you run your program with the race detector and report whether it passes the race detector?

The internal map code does not protect against concurrent read and writes and can still panic (it just might be harder to trigger) or give the wrong results. If we need to not concurrently change the map to fix all the issues then fmtsort does not need to prevent these either as it should not happen in well behaving programs.

@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 25, 2019

Ok. can you run your program with the race detector and report whether it passes the race detector? The internal map code does not protect against concurrent read and writes and can still panic (it just might be harder to trigger) or give the wrong results. If we need to not concurrently change the map to fix all the issues then fmtsort does not need to prevent these either as it should not happen in well behaving programs.

I'm agree with you that it's hard to trigger, but I'm afraid that i can't run it on PROD environment to reproduce the problems with --race.

@martisch
Copy link
Contributor

@martisch martisch commented Jul 25, 2019

I think the fix here is not to make it harder to trigger by adding more checks to fmtsort but to fix the issue that the program should not concurrently read and write to the map.

@av86743
Copy link

@av86743 av86743 commented Jul 25, 2019

https://golang.org/cl/187577

Missing is the test which demonstrates that problem existed before the patch is applied, and does not exist after the patch is applied.

@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 25, 2019

I think the fix here is not to make it harder to trigger by adding more checks to fmtsort but to fix the issue that the program should not concurrently read and write to the map.

I agree that we should handle correctly on concurrent behavior on map, that's how I fixed the program. But this panic message("index out of range") may confuse other programmer ?

@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 25, 2019

https://golang.org/cl/187577

Missing is the test which demonstrates that problem existed before the patch is applied, and does not exist after the patch is applied.

sorry that it may be a little hard for me to generate the test

@av86743
Copy link

@av86743 av86743 commented Jul 25, 2019

https://golang.org/cl/187577
Missing is the test which demonstrates that problem existed before the patch is applied, and does not exist after the patch is applied.

sorry that it may be a little hard for me to generate the test

Then you fail.

@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 25, 2019

https://golang.org/cl/187577
Missing is the test which demonstrates that problem existed before the patch is applied, and does not exist after the patch is applied.

sorry that it may be a little hard for me to generate the test

Then you fail.

It's impossible for me to apply this patch on PROD environment, so that i can't create such concurrent environment to prove it. But it happens serval times on PROD environment before I fixed it.

And the patch is just a suggestion, when you encounter with such a panic and it tells you: index out of range, it may mislead you to another way to solve it. Unless you dive down into the source code, you won't find that it's actually a concurrent read and write on a map.

What I mean is only that the panic error should be clear, we can do nothing or just check in fmtsort and if it doesn't satisfy the situation(length or other), then we panic with correct message.

@av86743
Copy link

@av86743 av86743 commented Jul 25, 2019

... Unless you dive down into the source code, you won't find that it's actually a concurrent read and write on a map.

If you produce code which does not exist, this is an issue for a different repository.

@martisch
Copy link
Contributor

@martisch martisch commented Jul 25, 2019

I agree that we should handle correctly on concurrent behavior on map, that's how I fixed the program. But this panic message("index out of range") may confuse other programmer ?

The main way to detect these problems is to run tests and some production workload with the race detector that is meant to find these problems. The proposed change hides this problem from the programmer as no panic would trigger anymore. I do not think hiding the problem of the race that can surface in multiple other ways would be preferred. Racy programs can fail in many ways many are confusing and actually triggering a panic is one of the better observable failures vs silent data corruption.

If you have a suggestion for an improvement that helps in a non racy program or a way to better detect races with the race detector please add a new issue. Otherwise I would suggest to close this issue as I do think It is not clear what the proposed addition would improve and its based on working around a non well formed go program.

@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 25, 2019

I agree that we should handle correctly on concurrent behavior on map, that's how I fixed the program. But this panic message("index out of range") may confuse other programmer ?

The main way to detect these problems is to run tests and some production workload with the race detector that is meant to find these problems. The proposed change hides this problem from the programmer as no panic would trigger anymore. I do not think hiding the problem of the race that can surface in multiple other ways would be preferred. Racy programs can fail in many ways many are confusing and actually triggering a panic is one of the better observable failures vs silent data corruption.

If you have a suggestion for an improvement that helps in a non racy program or a way to better detect races with the race detector please add a new issue. Otherwise I would suggest to close this issue as I do think It is not clear what the proposed addition would improve and its based on working around a non well formed go program.

Agree

@J-CIC J-CIC closed this Jul 25, 2019
@randall77
Copy link
Contributor

@randall77 randall77 commented Jul 25, 2019

I do think there is something to be fixed here. At the very least, the panic could be improved.
I see the options as

  1. Leave the code as is (the status quo)
  2. Panic, but with a better error message
  3. Don't panic, but add something to the string output to indicate a data race happened
  4. Ignore the error and proceed

Note that we do 4 in reflect.MapKeys. Maybe that's not the right choice, but it is precedent.

I'd argue that fmt is kind of special and warrants considering doing either 3 or 4. Unlike reflect.MapKeys, for example, the results of fmt.Printf probably aren't being used programmatically, but printed or logged. In that case, it's almost certainly better to allow the print to proceed without a panic. The print/log might be happening as part of handling a panic, and triggering a panic while handling a panic is almost always a bad idea.

@randall77 randall77 reopened this Jul 25, 2019
@martisch
Copy link
Contributor

@martisch martisch commented Jul 25, 2019

For 4: Another option might be to append map keys and values (to a pre-alloced slice) instead of cutting of the length so the resulting slice is neither to small nor to big.

If we generally want to harden fmt against data races/panics I think it would be good to think about a more general approach to recover from panics like the handler functions in fmt do with catchPanic:

func (p *pp) catchPanic(arg interface{}, verb rune, method string) {

@av86743
Copy link

@av86743 av86743 commented Jul 25, 2019

@randall77

... The print/log might be happening as part of handling a panic, and triggering a panic while handling a panic is almost always a bad idea.

And how it would possible to print/log map "as part of handling a panic"? As your intention concerns removal only map-related panic.

@randall77
Copy link
Contributor

@randall77 randall77 commented Jul 25, 2019

And how it would possible to print/log map "as part of handling a panic"? As your intention concerns removing only map-related panic.

func f(m map[int]int) {
    defer func() {
        fmt.Printf("f failed on map %v\n", m)
    }()
    ... do some complex calculation with m...
}

Then if the complex calculation panics for whatever reason, we can print (or log) the map used for that calculation.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 25, 2019

@av86743 Please avoid saying things like "then you fail". That is unfriendly. The same sentiment could be phrased in a less destructive way, for example by saying "unfortunately we are unlikely to accept such a change without a test". Please see https://golang.org/conduct. Thanks.

@ianlancetaylor ianlancetaylor changed the title Panic while using fmt.Sprintf in concurrent environment (go version 1.12) fmt: improve or avoid panic when printing map that is changing as it is being printed Jul 25, 2019
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jul 25, 2019
@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 26, 2019

I do think there is something to be fixed here. At the very least, the panic could be improved.
I see the options as

  1. Leave the code as is (the status quo)
  2. Panic, but with a better error message
  3. Don't panic, but add something to the string output to indicate a data race happened
  4. Ignore the error and proceed

Note that we do 4 in reflect.MapKeys. Maybe that's not the right choice, but it is precedent.

I'd argue that fmt is kind of special and warrants considering doing either 3 or 4. Unlike reflect.MapKeys, for example, the results of fmt.Printf probably aren't being used programmatically, but printed or logged. In that case, it's almost certainly better to allow the print to proceed without a panic. The print/log might be happening as part of handling a panic, and triggering a panic while handling a panic is almost always a bad idea.

Actually I use fmt in logger and I think it's better not to panic explicitly in a logger. I prefer the option 3/4 for the reason that we have already noted that concurrent read and write on a map caused undefined behavior(in https://blog.golang.org/go-maps-in-action). It's programmer's duty to ensure the program won't encouter a race condition. Also it makes us upset and confused when the program panic with "index out of range" while using fmt to log something.

@av86743
Copy link

@av86743 av86743 commented Jul 26, 2019

... It's programmer's duty to ensure the program won't encouter a race condition. ...

Please explain yourself. My google translate does not work for this.

@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 26, 2019

... It's programmer's duty to ensure the program won't encouter a race condition. ...

Just simply means that you should take care of the cocurrent condition.

@av86743

This comment was marked as off-topic.

@randall77
Copy link
Contributor

@randall77 randall77 commented Jul 27, 2019

In trying to make a test for this, it's very hard to get this path to trigger. I almost always get an error from the runtime

fatal error: concurrent map iteration and map write

This is a throw, not even a panic, so it is uncatchable.

Not sure what that means for the overall issue. Possibly that any of 2,3, 4 are fine.

@J-CIC do you have a program that can reproduce this somewhat reliably?

@J-CIC
Copy link
Author

@J-CIC J-CIC commented Jul 28, 2019

In trying to make a test for this, it's very hard to get this path to trigger. I almost always get an error from the runtime

fatal error: concurrent map iteration and map write

This is a throw, not even a panic, so it is uncatchable.

Not sure what that means for the overall issue. Possibly that any of 2,3, 4 are fine.

@J-CIC do you have a program that can reproduce this somewhat reliably?

@randall77
Yes, the code is as follow:

package main

import (
	"fmt"
	"sync"
	"time"
)

func main() {
	Info := map[string]string{}
	startSec := time.Now().UnixNano()
	//generate a large map
	for i := 0; i < 10000000; i++ {
		key := fmt.Sprintf("key%d", i)
		Info[key] = "2"
	}
	wg := sync.WaitGroup{}
	wg.Add(2)
	endSec := time.Now().UnixNano()
	fmt.Println("time used to generate large map: ", (endSec-startSec)/time.Millisecond.Nanoseconds())
	// start simulate
	go modifyMap(&wg, Info)
	go printMap(&wg, Info)
	wg.Wait()
}

func printMap(wg *sync.WaitGroup, item map[string]string) {
	_ = fmt.Sprintf("%+v\n", item)
	wg.Done()
}
func modifyMap(wg *sync.WaitGroup, item map[string]string) {
	// you may have to custom the time.Sleep Millisecond here
	time.Sleep(time.Millisecond * 10)
	item["testPanic"] = "test"
	wg.Done()
}

Expected to see

time used to generate large map:  6784
panic: runtime error: index out of range

goroutine 21 [running]:
internal/fmtsort.Sort(0x10ad440, 0xc00008c000, 0x15, 0x0)
	/usr/local/go/src/internal/fmtsort/sort.go:60 +0x34f
fmt.(*pp).printValue(0xc0047680c0, 0x10ad440, 0xc00008c000, 0x15, 0x76, 0x0)
	/usr/local/go/src/fmt/print.go:759 +0xd20
fmt.(*pp).printArg(0xc0047680c0, 0x10ad440, 0xc00008c000, 0x76)
	/usr/local/go/src/fmt/print.go:702 +0x2ba
fmt.(*pp).doPrintf(0xc0047680c0, 0x10c8d4d, 0x4, 0xc000077fb0, 0x1, 0x1)
	/usr/local/go/src/fmt/print.go:1016 +0x158
fmt.Sprintf(0x10c8d4d, 0x4, 0xc00002efb0, 0x1, 0x1, 0x0, 0x0)
	/usr/local/go/src/fmt/print.go:214 +0x66
main.printMap(0xc054f807a0, 0xc00008c000)
	/Users/yaotianjia/go/src/test/test.go:27 +0x74
created by main.main
	/Users/yaotianjia/go/src/test/test.go:22 +0x2fa
exit status 2

Also my go version is

go version go1.12.7 darwin/amd64
@JAicewizard
Copy link

@JAicewizard JAicewizard commented Jul 31, 2019

idea for testing this (just a noobs brilliant idea):
can you use atomic read/writes to more precisely time the read and write to get more accurate results?

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 21, 2019

Change https://golang.org/cl/191197 mentions this issue: internal/fmtsort: don't out-of-bounds panic if there's a race condition

@gopherbot gopherbot closed this in 579c69a Aug 30, 2019
tomocy added a commit to tomocy/go that referenced this issue Sep 1, 2019
Raising an out-of-bounds panic is confusing. There's no indication
that the underlying problem is a race.

The runtime already does a pretty good job of detecting this kind of
race (modification while iterating). We might as well just reorganize
a bit to avoid the out-of-bounds panic.

Fixes golang#33275

Change-Id: Icdd337ad2eb3c84f999db0850ec1d2ff2c146b6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/191197
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
t4n6a1ka added a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
Raising an out-of-bounds panic is confusing. There's no indication
that the underlying problem is a race.

The runtime already does a pretty good job of detecting this kind of
race (modification while iterating). We might as well just reorganize
a bit to avoid the out-of-bounds panic.

Fixes golang#33275

Change-Id: Icdd337ad2eb3c84f999db0850ec1d2ff2c146b6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/191197
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
@golang golang locked and limited conversation to collaborators Aug 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants
You can’t perform that action at this time.