runtime/race: MemoryRangeSet is slow #20139

Closed
josharian opened this Issue Apr 26, 2017 · 9 comments

Comments

Projects
None yet
3 participants
@josharian
Contributor

josharian commented Apr 26, 2017

The race-enabled compiler (go install -race cmd/compile) spends 40% of its CPU time in __tsan MemoryRangeSet. Looking at the implementation of MemoryRangeSet in tsan_rtl.cc, I see (roughly):

if (kGoMode || size < 64*1024) {
    // Go pointer by pointer.
  } else {
    // Do a bulk clear.
  }

The compiler does lots of large, bulk memory clears. I wonder whether the kGoMode || part of the conditional could be removed.

cc @dvyukov

@dvyukov

This comment has been minimized.

Show comment
Hide comment
@dvyukov

dvyukov Apr 27, 2017

Member

We can try to replace kGoMode (which is now SANITIZER_GO) with SANITIZER_WINDOWS.
Also can add a more efficient memory clearing routine.

Member

dvyukov commented Apr 27, 2017

We can try to replace kGoMode (which is now SANITIZER_GO) with SANITIZER_WINDOWS.
Also can add a more efficient memory clearing routine.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Apr 27, 2017

Contributor

Great! If I want to try that out, are there docs anywhere for how to update the race detector?

I'm looking for basic things like: where to get the tsan code, which branch/commit to use, pointer to docs about how to build the tsan Go support libs, where to find the build artifacts, where to put them in the Go tree, how to test.

Contributor

josharian commented Apr 27, 2017

Great! If I want to try that out, are there docs anywhere for how to update the race detector?

I'm looking for basic things like: where to get the tsan code, which branch/commit to use, pointer to docs about how to build the tsan Go support libs, where to find the build artifacts, where to put them in the Go tree, how to test.

@dvyukov

This comment has been minimized.

Show comment
Hide comment
@dvyukov

dvyukov Apr 27, 2017

Member

src/runtime/race/README is meant to drive you in the right direction
However, racebuild does not support fetching from custom git repo (it only works with upstream llvm miror). We probably need to extend it to accept a custom repo. Then you could commit to your private github copy and direct racebuild there.

Member

dvyukov commented Apr 27, 2017

src/runtime/race/README is meant to drive you in the right direction
However, racebuild does not support fetching from custom git repo (it only works with upstream llvm miror). We probably need to extend it to accept a custom repo. Then you could commit to your private github copy and direct racebuild there.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Apr 27, 2017

Contributor

Thanks. I'm busy fixing races at the moment, but will give it a try soon. :)

Contributor

josharian commented Apr 27, 2017

Thanks. I'm busy fixing races at the moment, but will give it a try soon. :)

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 2, 2017

Contributor

Thanks for the pointer, @dvyukov. Switching to SANITIZER_WINDOWS helps a little with performance, but not much:

name       old time/op       new time/op       delta
Template         2.46s ± 1%        2.42s ± 0%  -1.61%  (p=0.008 n=5+5)
Unicode          1.73s ± 0%        1.71s ± 1%  -0.80%  (p=0.032 n=4+5)
GoTypes          6.07s ± 2%        5.95s ± 2%  -1.97%  (p=0.032 n=5+5)
Compiler         25.9s ± 0%        25.3s ± 1%  -2.31%  (p=0.008 n=5+5)
SSA              52.0s ± 1%        50.6s ± 0%  -2.86%  (p=0.008 n=5+5)
Flate            1.95s ± 0%        1.92s ± 0%  -1.36%  (p=0.008 n=5+5)
GoParser         2.23s ± 1%        2.21s ± 1%    ~     (p=0.151 n=5+5)
Reflect          4.01s ± 1%        3.94s ± 1%  -1.87%  (p=0.016 n=5+5)
Tar              1.84s ± 1%        1.82s ± 1%  -1.06%  (p=0.008 n=5+5)
XML              2.78s ± 2%        2.74s ± 1%  -1.42%  (p=0.016 n=5+5)

name       old user-time/op  new user-time/op  delta
Template         1.57s ± 2%        1.51s ± 1%  -3.94%  (p=0.008 n=5+5)
Unicode          735ms ± 1%        720ms ± 2%  -2.00%  (p=0.032 n=4+5)
GoTypes          5.51s ± 2%        5.38s ± 2%  -2.45%  (p=0.032 n=5+5)
Compiler         26.7s ± 1%        26.1s ± 1%  -2.52%  (p=0.008 n=5+5)
SSA              56.6s ± 0%        55.2s ± 0%  -2.53%  (p=0.008 n=5+5)
Flate            993ms ± 1%        964ms ± 1%  -3.00%  (p=0.008 n=5+5)
GoParser         1.28s ± 1%        1.25s ± 2%  -2.51%  (p=0.016 n=5+5)
Reflect          3.19s ± 1%        3.12s ± 1%  -2.04%  (p=0.008 n=5+5)
Tar              889ms ± 1%        864ms ± 1%  -2.88%  (p=0.008 n=5+5)
XML              1.85s ± 2%        1.82s ± 2%    ~     (p=0.056 n=5+5)

So I think I'll send a patch upstream but not bother trying to get the race libraries updated ahead of schedule. (When is schedule, anyway?)

Contributor

josharian commented May 2, 2017

Thanks for the pointer, @dvyukov. Switching to SANITIZER_WINDOWS helps a little with performance, but not much:

name       old time/op       new time/op       delta
Template         2.46s ± 1%        2.42s ± 0%  -1.61%  (p=0.008 n=5+5)
Unicode          1.73s ± 0%        1.71s ± 1%  -0.80%  (p=0.032 n=4+5)
GoTypes          6.07s ± 2%        5.95s ± 2%  -1.97%  (p=0.032 n=5+5)
Compiler         25.9s ± 0%        25.3s ± 1%  -2.31%  (p=0.008 n=5+5)
SSA              52.0s ± 1%        50.6s ± 0%  -2.86%  (p=0.008 n=5+5)
Flate            1.95s ± 0%        1.92s ± 0%  -1.36%  (p=0.008 n=5+5)
GoParser         2.23s ± 1%        2.21s ± 1%    ~     (p=0.151 n=5+5)
Reflect          4.01s ± 1%        3.94s ± 1%  -1.87%  (p=0.016 n=5+5)
Tar              1.84s ± 1%        1.82s ± 1%  -1.06%  (p=0.008 n=5+5)
XML              2.78s ± 2%        2.74s ± 1%  -1.42%  (p=0.016 n=5+5)

name       old user-time/op  new user-time/op  delta
Template         1.57s ± 2%        1.51s ± 1%  -3.94%  (p=0.008 n=5+5)
Unicode          735ms ± 1%        720ms ± 2%  -2.00%  (p=0.032 n=4+5)
GoTypes          5.51s ± 2%        5.38s ± 2%  -2.45%  (p=0.032 n=5+5)
Compiler         26.7s ± 1%        26.1s ± 1%  -2.52%  (p=0.008 n=5+5)
SSA              56.6s ± 0%        55.2s ± 0%  -2.53%  (p=0.008 n=5+5)
Flate            993ms ± 1%        964ms ± 1%  -3.00%  (p=0.008 n=5+5)
GoParser         1.28s ± 1%        1.25s ± 2%  -2.51%  (p=0.016 n=5+5)
Reflect          3.19s ± 1%        3.12s ± 1%  -2.04%  (p=0.008 n=5+5)
Tar              889ms ± 1%        864ms ± 1%  -2.88%  (p=0.008 n=5+5)
XML              1.85s ± 2%        1.82s ± 2%    ~     (p=0.056 n=5+5)

So I think I'll send a patch upstream but not bother trying to get the race libraries updated ahead of schedule. (When is schedule, anyway?)

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 2, 2017

Contributor

Blarg. Can't figure out where and how to send compiler-rt patches. @dvyukov the patch would be going to you anyway. Can you just apply it from here? (Or recreate it locally?) It's pretty trivial.

From d3c800fb4e7b475f6aa6a284676a503e5735cca3 Mon Sep 17 00:00:00 2001
From: Josh Bleecher Snyder <josharian@gmail.com>
Date: Tue, 2 May 2017 07:33:31 -0700
Subject: [PATCH] tsan: allow fast large MemoryRangeSet on non-Windows Go

See https://github.com/golang/go/issues/20139
for background and motivation.
---
 lib/tsan/rtl/tsan_rtl.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/tsan/rtl/tsan_rtl.cc b/lib/tsan/rtl/tsan_rtl.cc
index 70393037e..fa60f3247 100644
--- a/lib/tsan/rtl/tsan_rtl.cc
+++ b/lib/tsan/rtl/tsan_rtl.cc
@@ -866,9 +866,8 @@ static void MemoryRangeSet(ThreadState *thr, uptr pc, uptr addr, uptr size,
   // Don't want to touch lots of shadow memory.
   // If a program maps 10MB stack, there is no need reset the whole range.
   size = (size + (kShadowCell - 1)) & ~(kShadowCell - 1);
-  // UnmapOrDie/MmapFixedNoReserve does not work on Windows,
-  // so we do it only for C/C++.
-  if (SANITIZER_GO || size < common_flags()->clear_shadow_mmap_threshold) {
+  // UnmapOrDie/MmapFixedNoReserve does not work on Windows.
+  if (SANITIZER_WINDOWS || size < common_flags()->clear_shadow_mmap_threshold) {
     u64 *p = (u64*)MemToShadow(addr);
     CHECK(IsShadowMem((uptr)p));
     CHECK(IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1)));
-- 
2.12.2
Contributor

josharian commented May 2, 2017

Blarg. Can't figure out where and how to send compiler-rt patches. @dvyukov the patch would be going to you anyway. Can you just apply it from here? (Or recreate it locally?) It's pretty trivial.

From d3c800fb4e7b475f6aa6a284676a503e5735cca3 Mon Sep 17 00:00:00 2001
From: Josh Bleecher Snyder <josharian@gmail.com>
Date: Tue, 2 May 2017 07:33:31 -0700
Subject: [PATCH] tsan: allow fast large MemoryRangeSet on non-Windows Go

See https://github.com/golang/go/issues/20139
for background and motivation.
---
 lib/tsan/rtl/tsan_rtl.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/tsan/rtl/tsan_rtl.cc b/lib/tsan/rtl/tsan_rtl.cc
index 70393037e..fa60f3247 100644
--- a/lib/tsan/rtl/tsan_rtl.cc
+++ b/lib/tsan/rtl/tsan_rtl.cc
@@ -866,9 +866,8 @@ static void MemoryRangeSet(ThreadState *thr, uptr pc, uptr addr, uptr size,
   // Don't want to touch lots of shadow memory.
   // If a program maps 10MB stack, there is no need reset the whole range.
   size = (size + (kShadowCell - 1)) & ~(kShadowCell - 1);
-  // UnmapOrDie/MmapFixedNoReserve does not work on Windows,
-  // so we do it only for C/C++.
-  if (SANITIZER_GO || size < common_flags()->clear_shadow_mmap_threshold) {
+  // UnmapOrDie/MmapFixedNoReserve does not work on Windows.
+  if (SANITIZER_WINDOWS || size < common_flags()->clear_shadow_mmap_threshold) {
     u64 *p = (u64*)MemToShadow(addr);
     CHECK(IsShadowMem((uptr)p));
     CHECK(IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1)));
-- 
2.12.2

chapuni pushed a commit to llvm-project/llvm-project that referenced this issue May 2, 2017

tsan: allow fast large MemoryRangeSet on non-Windows Go
The fast reset for large memory regions is not working
only on windows. So enable it for Go/linux/darwin/freebsd.

See golang/go#20139
for background and motivation.

Based on idea by Josh Bleecher Snyder.
@dvyukov

This comment has been minimized.

Show comment
Hide comment

chapuni pushed a commit to llvm-project/compiler-rt that referenced this issue May 2, 2017

tsan: allow fast large MemoryRangeSet on non-Windows Go
The fast reset for large memory regions is not working
only on windows. So enable it for Go/linux/darwin/freebsd.

See golang/go#20139
for background and motivation.

Based on idea by Josh Bleecher Snyder.



git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@301927 91177308-0d34-0410-b5e6-96231b3b80d8

chapuni pushed a commit to llvm-project/llvm-project-submodule that referenced this issue May 2, 2017

tsan: allow fast large MemoryRangeSet on non-Windows Go
The fast reset for large memory regions is not working
only on windows. So enable it for Go/linux/darwin/freebsd.

See golang/go#20139
for background and motivation.

Based on idea by Josh Bleecher Snyder.

jyknight pushed a commit to jyknight/llvm-monorepo-old1 that referenced this issue May 2, 2017

dvyukov
tsan: allow fast large MemoryRangeSet on non-Windows Go
The fast reset for large memory regions is not working
only on windows. So enable it for Go/linux/darwin/freebsd.

See golang/go#20139
for background and motivation.

Based on idea by Josh Bleecher Snyder.

llvm-svn=301927
@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 2, 2017

Contributor

You tested it, right?

Yep, insofar as I know how.

Contributor

josharian commented May 2, 2017

You tested it, right?

Yep, insofar as I know how.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 2, 2017

Contributor

Thanks! I'll close this now.

Contributor

josharian commented May 2, 2017

Thanks! I'll close this now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.