-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: maps behave inconsistently on arm64 and amd64 architectures when the -race detection is not enabled #69652
Comments
That is strange.
I'm not sure what you mean here by a "memory access violation". Do you mean, similar to what the race detector reports? Without |
It's not easy to simulate memory access violations in Go, but it can be done using C++. Below are two pieces of code written in C++, similar to Go's unsafe replace. There will be a segmentation fault, but adding a read/write lock can solve the problem! @randall77
#include <map>
#include <thread>
#include <vector>
int main() {
std::vector<std::thread> threads{11};
std::map<std::string, int> kvs{{"1", 1}};
for (int x = 0; x < 10; x++) {
threads[x] = std::thread([&]() {
for (int x = 0; x < 100000; x++) {
auto _ = kvs["1"];
}
});
}
threads[10] = std::thread([&]() {
for (int x = 0; x < 100000; x++) {
std::map<std::string, int> clone_kvs{};
// copy
clone_kvs.insert(kvs.begin(), kvs.end());
// set
clone_kvs["2"] = 2;
// replace
kvs = std::move(clone_kvs);
}
});
for (auto& thread : threads) {
thread.join();
}
}
#include <map>
#include <mutex>
#include <shared_mutex>
#include <thread>
#include <vector>
int main() {
std::shared_mutex rw_lock{};
std::vector<std::thread> threads{11};
std::map<std::string, int> kvs{{"1", 1}};
for (int x = 0; x < 10; x++) {
threads[x] = std::thread([&]() {
for (int x = 0; x < 100000; x++) {
{
std::shared_lock<std::shared_mutex> rlock{rw_lock};
auto _ = kvs["1"];
}
}
});
}
threads[10] = std::thread([&]() {
for (int x = 0; x < 100000; x++) {
std::map<std::string, int> clone_kvs{kvs};
// safe read
{
std::shared_lock<std::shared_mutex> rlock{rw_lock};
clone_kvs.insert(kvs.begin(), kvs.end());
}
// set clone kvs
clone_kvs["2"] = 2;
// safe replace
{
std::unique_lock<std::shared_mutex> wlock{rw_lock};
kvs = std::move(clone_kvs);
}
}
});
for (auto& thread : threads) {
thread.join();
}
} |
I'm not sure what you are proposing then. We could of course cause a segfault instead of reporting a "rw error". But I don't think that's very helpful. Why is the former better when the latter provides a more accurate description of the user's error? |
Because from the perspective of the code, it is very difficult to understand that this problem is caused by concurrent access to the map resulting in an RW exception. This explanation doesn't make sense. Hope to use more explicit exception information |
As written, this code doesn't directly race on the internals of the map. It does explicitly race on That is, in program order, the replacement map is fully written before being assigned to On a system with a total store order (TSO, all stores become visible in program order, i.e., amd64), the On a system without TSO (arm64), I suspect something like this occurs:
What happens here is that without TSO, one CPU clears As for why not segfault, I agree with @randall77 in not really understanding why we should do so. The fact that your C++ code segfaults is a coincidence of implementation. C++ certainly does not guarantee that racy accesses will segfault. Our code could theoretically segfault on arm64 if the reader observed In practice, on x86 I'm almost certain that 8-byte stores never tear, so readers will always see a valid pointer, though which is not guaranteed. I believe arm64 is the same, though maybe not for unaligned stores? Given these semantics, I don't even think it would be possible to guarantee a fault on racy access without something like the full race detector in every program. |
Thanks, I was able to solve this issue using atomic.Value or sync.RWMutex. However, with the proliferation of Apple's M-series chips, such cases are indeed hard to understand because the code that previously had no issues is now problematic after switching to the M-series chips. @prattmic |
Indeed, this is the issue. can use the following program to reproduce it. package main
import "time"
type Data struct {
flag int
}
func main() {
var data = &Data{flag: 1}
go func() {
for {
if data.flag != 1 {
panic(`error`)
}
}
}()
go func() {
for {
clone := &Data{flag: 2}
clone.flag = 1
data = clone
}
}()
time.Sleep(time.Second * 5)
} |
Indeed, this is the classic pain point when porting applications from amd64 to arm64 (or anything else, really: amd64 has the most strict memory ordering semantics. See table at https://en.wikipedia.org/wiki/Memory_ordering#In_symmetric_multiprocessing_(SMP)_microprocessor_systems). I'm going to close this, as I don't think there is anything we can do here. Note that the race detector should catch these cases (if it does not, please reopen). If you have suggestions on how we can improve the situation, we're open to hearing them. |
Go version
go version go1.20 linux/amd64
go version go1.20 darwin/arm64
Output of
go env
in your module/workspace:What did you do?
I have a piece of code that behaves inconsistently on Mac-Apple M3 Pro and Linux amd64 when the -race detection is not enabled. On Mac-Apple M3 Pro, it directly reports a map rw error, but on Linux amd64, it doesn't.
What did you see happen?
What did you expect to see?
The text was updated successfully, but these errors were encountered: