Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/race: MemoryRangeSet is slow #20139

Closed
josharian opened this issue Apr 26, 2017 · 9 comments
Closed

runtime/race: MemoryRangeSet is slow #20139

josharian opened this issue Apr 26, 2017 · 9 comments

Comments

@josharian
Copy link
Contributor

The race-enabled compiler (go install -race cmd/compile) spends 40% of its CPU time in __tsan MemoryRangeSet. Looking at the implementation of MemoryRangeSet in tsan_rtl.cc, I see (roughly):

if (kGoMode || size < 64*1024) {
    // Go pointer by pointer.
  } else {
    // Do a bulk clear.
  }

The compiler does lots of large, bulk memory clears. I wonder whether the kGoMode || part of the conditional could be removed.

cc @dvyukov

@dvyukov
Copy link
Member

dvyukov commented Apr 27, 2017

We can try to replace kGoMode (which is now SANITIZER_GO) with SANITIZER_WINDOWS.
Also can add a more efficient memory clearing routine.

@josharian
Copy link
Contributor Author

Great! If I want to try that out, are there docs anywhere for how to update the race detector?

I'm looking for basic things like: where to get the tsan code, which branch/commit to use, pointer to docs about how to build the tsan Go support libs, where to find the build artifacts, where to put them in the Go tree, how to test.

@dvyukov
Copy link
Member

dvyukov commented Apr 27, 2017

src/runtime/race/README is meant to drive you in the right direction
However, racebuild does not support fetching from custom git repo (it only works with upstream llvm miror). We probably need to extend it to accept a custom repo. Then you could commit to your private github copy and direct racebuild there.

@josharian
Copy link
Contributor Author

Thanks. I'm busy fixing races at the moment, but will give it a try soon. :)

@josharian
Copy link
Contributor Author

Thanks for the pointer, @dvyukov. Switching to SANITIZER_WINDOWS helps a little with performance, but not much:

name       old time/op       new time/op       delta
Template         2.46s ± 1%        2.42s ± 0%  -1.61%  (p=0.008 n=5+5)
Unicode          1.73s ± 0%        1.71s ± 1%  -0.80%  (p=0.032 n=4+5)
GoTypes          6.07s ± 2%        5.95s ± 2%  -1.97%  (p=0.032 n=5+5)
Compiler         25.9s ± 0%        25.3s ± 1%  -2.31%  (p=0.008 n=5+5)
SSA              52.0s ± 1%        50.6s ± 0%  -2.86%  (p=0.008 n=5+5)
Flate            1.95s ± 0%        1.92s ± 0%  -1.36%  (p=0.008 n=5+5)
GoParser         2.23s ± 1%        2.21s ± 1%    ~     (p=0.151 n=5+5)
Reflect          4.01s ± 1%        3.94s ± 1%  -1.87%  (p=0.016 n=5+5)
Tar              1.84s ± 1%        1.82s ± 1%  -1.06%  (p=0.008 n=5+5)
XML              2.78s ± 2%        2.74s ± 1%  -1.42%  (p=0.016 n=5+5)

name       old user-time/op  new user-time/op  delta
Template         1.57s ± 2%        1.51s ± 1%  -3.94%  (p=0.008 n=5+5)
Unicode          735ms ± 1%        720ms ± 2%  -2.00%  (p=0.032 n=4+5)
GoTypes          5.51s ± 2%        5.38s ± 2%  -2.45%  (p=0.032 n=5+5)
Compiler         26.7s ± 1%        26.1s ± 1%  -2.52%  (p=0.008 n=5+5)
SSA              56.6s ± 0%        55.2s ± 0%  -2.53%  (p=0.008 n=5+5)
Flate            993ms ± 1%        964ms ± 1%  -3.00%  (p=0.008 n=5+5)
GoParser         1.28s ± 1%        1.25s ± 2%  -2.51%  (p=0.016 n=5+5)
Reflect          3.19s ± 1%        3.12s ± 1%  -2.04%  (p=0.008 n=5+5)
Tar              889ms ± 1%        864ms ± 1%  -2.88%  (p=0.008 n=5+5)
XML              1.85s ± 2%        1.82s ± 2%    ~     (p=0.056 n=5+5)

So I think I'll send a patch upstream but not bother trying to get the race libraries updated ahead of schedule. (When is schedule, anyway?)

@josharian
Copy link
Contributor Author

Blarg. Can't figure out where and how to send compiler-rt patches. @dvyukov the patch would be going to you anyway. Can you just apply it from here? (Or recreate it locally?) It's pretty trivial.

From d3c800fb4e7b475f6aa6a284676a503e5735cca3 Mon Sep 17 00:00:00 2001
From: Josh Bleecher Snyder <josharian@gmail.com>
Date: Tue, 2 May 2017 07:33:31 -0700
Subject: [PATCH] tsan: allow fast large MemoryRangeSet on non-Windows Go

See https://github.com/golang/go/issues/20139
for background and motivation.
---
 lib/tsan/rtl/tsan_rtl.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/tsan/rtl/tsan_rtl.cc b/lib/tsan/rtl/tsan_rtl.cc
index 70393037e..fa60f3247 100644
--- a/lib/tsan/rtl/tsan_rtl.cc
+++ b/lib/tsan/rtl/tsan_rtl.cc
@@ -866,9 +866,8 @@ static void MemoryRangeSet(ThreadState *thr, uptr pc, uptr addr, uptr size,
   // Don't want to touch lots of shadow memory.
   // If a program maps 10MB stack, there is no need reset the whole range.
   size = (size + (kShadowCell - 1)) & ~(kShadowCell - 1);
-  // UnmapOrDie/MmapFixedNoReserve does not work on Windows,
-  // so we do it only for C/C++.
-  if (SANITIZER_GO || size < common_flags()->clear_shadow_mmap_threshold) {
+  // UnmapOrDie/MmapFixedNoReserve does not work on Windows.
+  if (SANITIZER_WINDOWS || size < common_flags()->clear_shadow_mmap_threshold) {
     u64 *p = (u64*)MemToShadow(addr);
     CHECK(IsShadowMem((uptr)p));
     CHECK(IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1)));
-- 
2.12.2

@dvyukov
Copy link
Member

dvyukov commented May 2, 2017

jyknight pushed a commit to jyknight/llvm-monorepo-old1 that referenced this issue May 2, 2017
The fast reset for large memory regions is not working
only on windows. So enable it for Go/linux/darwin/freebsd.

See golang/go#20139
for background and motivation.

Based on idea by Josh Bleecher Snyder.

llvm-svn=301927
@josharian
Copy link
Contributor Author

You tested it, right?

Yep, insofar as I know how.

@josharian
Copy link
Contributor Author

Thanks! I'll close this now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants