Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/gc: reduce init load #2297

Closed
rsc opened this issue Sep 23, 2011 · 7 comments
Closed

cmd/gc: reduce init load #2297

rsc opened this issue Sep 23, 2011 · 7 comments
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Sep 23, 2011

init should not be doing so much work.

Forwarded conversation
Subject: [golang-dev] 386 GC problems
------------------------

From: Dmitry Vyukov <dvyukov@google.com>
Date: Tue, Aug 30, 2011 at 03:50
To: golang-dev@googlegroups.com
Cc: ushakov@google.com


Hi,
I've created a testing program that imports all standard packages, then allocates 2 GB
in equally sized blocks, then executes GC and then checks how many of the blocks are
freed (source code is below, but it also requires patching mgc0.c to not do GC until
manually triggered).
On Mac/386 it says that 90%(!) of 1MB blocks are not freed (there may be some
measurement error, but I think not that big).
I've investigated the problem, and it turned out that basically all blocks are pinned by
various tables in data segment of the program (after adding some debug output to mgc0.c
and merging it with nm output I can see what symbols cause problems).
I think most cases can be fixed by turning static arrays into dynamic slices. For
example, after applying the following patch:
------------------------------------------------------------
diff -r 7fec8679f10d src/pkg/math/pow10.go
--- a/src/pkg/math/pow10.go Fri Aug 26 17:45:19 2011 -0400
+++ b/src/pkg/math/pow10.go Tue Aug 30 11:29:47 2011 +0400
@@ -4,12 +4,18 @@
 
 package math
 
+import (
+ "sync"
+)
+
 // This table might overflow 127-bit exponent representations.
 // In that case, truncate it after 1.0e38.
-var pow10tab [70]float64
+var pow10tab []float64
+var pow10tabOnce sync.Once
 
 // Pow10 returns 10**e, the base-10 exponential of e.
 func Pow10(e int) float64 {
+ pow10tabOnce.Do(pow10tabInit)
  if e <= -325 {
  return 0
  } else if e > 309 {
@@ -26,7 +32,8 @@
  return Pow10(m) * Pow10(e-m)
 }
 
-func init() {
+func pow10tabInit() {
+ pow10tab = make([]float64, 70)
  pow10tab[0] = 1.0e0
  pow10tab[1] = 1.0e1
  for i := 2; i < len(pow10tab); i++ {
------------------------------------------------------------
3 additional 1MB blocks get freed. Or when I use 16-byte blocks, 70(!) additional blocks
get freed, that basically means that every float64 pins a heap object.
Before submitting any changes, I would like to know as to whether you think it (turning
static arrays into dynamic slices) is the right approach or not. Potentially it also
reduces exec size and/or startup times, fortunately sync.Once is quite fast now :)

------------------------------------------------------------
// the testing utility
package main

import (
"flag"
"fmt"
"os"
"runtime"
)
import (
_ "container/heap"
_ "container/list"
_ "container/ring"
_ "container/vector"
_ "flag"
_ "json"
_ "path"
_ "strings"
_ "unsafe"
_ "crypto"
_ "fmt"
_ "log"
_ "rand"
_ "sync"
_ "url"
_ "archive/tar"
_ "archive/zip"
_ "csv"
_ "go/ast"
_ "go/build"
_ "go/doc"
_ "go/parser"
_ "go/printer"
_ "go/scanner"
_ "go/token"
_ "go/typechecker"
_ "go/types"
_ "mail"
_ "reflect"
_ "syscall"
_ "utf16"
_ "asn1"
_ "debug/dwarf"
_ "debug/elf"
_ "debug/gosym"
_ "debug/macho"
_ "debug/pe"
_ "gob"
_ "math"
_ "regexp"
_ "syslog"
_ "utf8"
_ "big"
_ "hash"
_ "mime"
_ "rpc"
_ "tabwriter"
_ "websocket"
_ "bufio"
_ "ebnf"
_ "html"
_ "net"
_ "runtime"
_ "template"
_ "xml"
_ "encoding/ascii85"
_ "encoding/base32"
_ "encoding/base64"
_ "encoding/binary"
_ "encoding/git85"
_ "encoding/hex"
_ "encoding/pem"
_ "http"
_ "netchan"
_ "scanner"
_ "testing"
_ "bytes"
_ "exec"
_ "image"
_ "old/template"
_ "smtp"
_ "time"
_ "cmath"
_ "exp/datafmt"
_ "exp/gui"
_ "exp/gui/x11"
_ "exp/norm"
_ "exp/regexp/syntax"
_ "exp/template/html"
_ "index/suffixarray"
_ "os"
_ "sort"
_ "try"
_ "compress/bzip2"
_ "compress/flate"
_ "compress/gzip"
_ "compress/lzw"
_ "compress/zlib"
_ "expvar"
_ "io"
_ "patch"
_ "strconv"
_ "unicode"
)
func main() {
flagSize := flag.Int("size", 0, "alloc block size in bytes (must be a
power of 2)")
flagMem := flag.Int("mem", 0, "total mem to allocate in MB")
flag.Parse()
if *flagSize <= 0 || *flagSize&(*flagSize-1) != 0 || *flagMem <= 0 {
flag.PrintDefaults()
os.Exit(1)
}
sz := uintptr(*flagSize)
cnt := *flagMem * 1024 * 1024 / (*flagSize)
a0 := runtime.MemStats.Mallocs - runtime.MemStats.Frees
for i := 0; i < cnt; i++ {
p := make([]byte, sz)
func(p []byte) {
}(p)
}
a1 := runtime.MemStats.Mallocs - runtime.MemStats.Frees - a0
runtime.GC()
a2 := runtime.MemStats.Mallocs - runtime.MemStats.Frees - a0
fmt.Printf("%.2f%% pinned (%d)\n", float64(a2)*100/float64(a1), a2)
fmt.Printf("%dMB wasted\n", a2*100*uint64(*flagMem)/a1)
}

----------
From: Russ Cox <rsc@golang.org>
Date: Tue, Aug 30, 2011 at 07:57
To: golang-dev@googlegroups.com
Cc: ushakov@google.com


You don't have to use sync.Once to move things into the heap.
It would suffice to change one line:

var pow10tab [70]float64

to

var pow10tab = make([]float64, 70)

However, this is not the right approach anyway.  You're letting
an implementation concern limit the way you use the language.
That's almost never the right long-term strategy.  It's definitely
a problem we need to address but making people change their
code to work around it is not the way.

Russ

----------
From: Dmitry Vyukov <dvyukov@google.com>
Date: Tue, Aug 30, 2011 at 08:29
To: rsc@golang.org
Cc: golang-dev@googlegroups.com, ushakov@google.com


I understand your point, and that's why I am asking.
However, I guess there is no simple way to fix it on GC level
(otherwise it would be already fixed). Moreover, it's not about user's
code, it's about standard library code (I perfectly understand that
users can create similar tables, but I think it's not quite common for
Go programs to create static tables of floats). Moreover, maybe it's a
better way to do it GC problem aside, data segment of the program is
250k + increased GC time for scanning of it + increased startup time
(I see ±15ms startup penalty for the program). In either case, the
issue renders Go basically useless on 386 for a lot of programs.
WDYT?

----------
From: Russ Cox <rsc@golang.org>
Date: Tue, Aug 30, 2011 at 09:02
To: Dmitry Vyukov <dvyukov@google.com>
Cc: golang-dev@googlegroups.com, ushakov@google.com


I think we need a better solution than avoiding
certain parts of the Go language.

Russ

----------
From: Ian Lance Taylor <iant@google.com>
Date: Tue, Aug 30, 2011 at 09:21
To: Dmitry Vyukov <dvyukov@google.com>
Cc: rsc@golang.org, golang-dev@googlegroups.com, ushakov@google.com


I'm not Russ but I think this should just be fixed in the
compiler/linker.  In gccgo I don't scan the entire data segment for
pointers.  Instead, at program startup time all global variables which
may contain pointers are registered, and the GC scans those.  This
approach is not optimal, but it does avoid this problem, since of course
a variable of type [70]float need not be registered.

Using 6l/8l we can do better.  There is no implied ordering of global
variables, so the linker can order all the global variables which may
contain pointers together, and define symbols to mark the start and end
of this memory area.  Then we change mark in runtime/mgc0.c to use those
symbols rather than scanning the entire data segment (minus mheap) as it
does today.

Ian

----------
From: Dmitry Vyukov <dvyukov@google.com>
Date: Tue, Aug 30, 2011 at 10:45
To: rsc@golang.org
Cc: golang-dev@googlegroups.com, ushakov@google.com


Nothing to object to.

----------
From: Dmitry Vyukov <dvyukov@google.com>
Date: Tue, Aug 30, 2011 at 10:53
To: Ian Lance Taylor <iant@google.com>
Cc: rsc@golang.org, golang-dev@googlegroups.com, ushakov@google.com


If we move in the direction of preciser GC, for heap objects we can use the GC bitmap in
some way (per-word bitNoPointers). But that still does not solve the problem for static
data (it is not covered by the bitmap), so we need a separate solution for static data
in either case. Initially I was thinking about a separate data section, but it's
essentially the same as yours. Both are beyond my capabilities.
@rsc
Copy link
Contributor Author

rsc commented Dec 9, 2011

Comment 1:

Labels changed: added priority-later.

@gopherbot
Copy link

gopherbot commented Jun 24, 2012

Comment 3:

I updated source code from the conversation so that it compiles with the tip of Go
repository (attachment: issue2297.go). 6114046 (http://golang.org/cl/6114046/)
provides a complete fix for this issue:
# Tip
./a-tip -mem=800 -size=1048576
68.46% pinned (548 MiB)
# Tip patched with 6114046
./a-newgc -mem=800 -size=1048576
0.13% pinned (1 MiB)

Attachments:

  1. issue2297.go (2958 bytes)

@rsc
Copy link
Contributor Author

rsc commented Sep 12, 2012

Comment 4:

Labels changed: added go1.1.

@rsc
Copy link
Contributor Author

rsc commented Dec 10, 2012

Comment 6:

Labels changed: added size-xl.

@rsc
Copy link
Contributor Author

rsc commented Mar 11, 2013

Comment 7:

Didn't have time.

Labels changed: added go1.2, removed go1.1.

@rsc
Copy link
Contributor Author

rsc commented Jul 30, 2013

Comment 8:

Labels changed: added feature.

@dvyukov
Copy link
Member

dvyukov commented Jul 31, 2013

Comment 9:

Here is updated program:
http://play.golang.org/p/Hy1H0ZXHC9
$ GOARCH=386 go run /tmp/gc.go -mem=1200 -size=1048576
0.08% pinned (1)
1MB wasted
It is fixed by precise GC data for globals.

Status changed to Fixed.

@rsc rsc added fixed labels Jul 31, 2013
@rsc rsc self-assigned this Jul 31, 2013
@rsc rsc added this to the Go1.2 milestone Apr 14, 2015
@rsc rsc removed the go1.2 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
@rsc rsc removed their assignment Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants