Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: language: lazy init imports to possibly import without side effects #38450

Open
bradfitz opened this issue Apr 14, 2020 · 9 comments
Open

Comments

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Apr 14, 2020

I've been doing a bunch of work on binary size reduction and a common problem that comes up is that the linker's dead code elimination can remove all the references to a package, but the one remaining reference is to the package's inittask symbol, which is then heavy & brings it a bunch more.

That is:

package main
import "crypto/x509"
func main() {
    if false {
        NewCertPool()
    }
}

Is equivalent today to:

package main
import _ "crypto/x509"
func main() {}

... which slurps in a ton of packages and their init load and extra 1 MiB binary size increase, from 1.3 MiB to 2.3 MiB:

dev:big $ go build --ldflags=--dumpdep x.go 2>&1 | grep inittask
runtime.main -> runtime..inittask
runtime.main -> main..inittask
main..inittask -> crypto/x509..inittask
crypto/x509..inittask -> encoding/pem..inittask
crypto/x509..inittask -> errors..inittask
crypto/x509..inittask -> crypto/aes..inittask
crypto/x509..inittask -> crypto/cipher..inittask
crypto/x509..inittask -> crypto/des..inittask
crypto/x509..inittask -> crypto/md5..inittask
crypto/x509..inittask -> encoding/hex..inittask
crypto/x509..inittask -> io..inittask
crypto/x509..inittask -> strings..inittask
crypto/x509..inittask -> sync..inittask
crypto/x509..inittask -> crypto/rsa..inittask
crypto/x509..inittask -> encoding/asn1..inittask
crypto/x509..inittask -> math/big..inittask
crypto/x509..inittask -> crypto/ecdsa..inittask
crypto/x509..inittask -> crypto/ed25519..inittask
crypto/x509..inittask -> crypto/x509/pkix..inittask
crypto/x509..inittask -> fmt..inittask
crypto/x509..inittask -> io/ioutil..inittask
crypto/x509..inittask -> os..inittask
crypto/x509..inittask -> crypto/elliptic..inittask
crypto/x509..inittask -> bytes..inittask
crypto/x509..inittask -> net..inittask
crypto/x509..inittask -> net/url..inittask
crypto/x509..inittask -> reflect..inittask
crypto/x509..inittask -> time..inittask
crypto/x509..inittask -> crypto..inittask
crypto/x509..inittask -> crypto/dsa..inittask
crypto/x509..inittask -> crypto/sha1..inittask
crypto/x509..inittask -> crypto/sha256..inittask
crypto/x509..inittask -> crypto/sha512..inittask
crypto/x509..inittask -> strconv..inittask
crypto/x509..inittask -> vendor/golang.org/x/crypto/cryptobyte..inittask
crypto/x509..inittask -> crypto/x509.init
encoding/pem..inittask -> encoding/base64..inittask
encoding/pem..inittask -> sort..inittask
encoding/pem..inittask -> encoding/base64..inittask
encoding/pem..inittask -> sort..inittask
encoding/pem..inittask -> encoding/base64..inittask
encoding/pem..inittask -> sort..inittask
errors..inittask -> internal/reflectlite..inittask
errors..inittask -> errors.init
runtime..inittask -> internal/bytealg..inittask
runtime..inittask -> runtime.init
runtime..inittask -> runtime.init.0
runtime..inittask -> runtime.init.3
runtime..inittask -> runtime.init.4
runtime..inittask -> runtime.init.5
runtime..inittask -> runtime.init.6
crypto/aes..inittask -> encoding/binary..inittask
crypto/aes..inittask -> crypto/aes.init
crypto/cipher..inittask -> crypto/cipher.init
crypto/md5..inittask -> hash..inittask
crypto/md5..inittask -> crypto/md5.init.0
encoding/hex..inittask -> encoding/hex.init
io..inittask -> io.init
strings..inittask -> unicode..inittask
sync..inittask -> sync.init
sync..inittask -> sync.init.0
sync..inittask -> sync.init.1
crypto/rsa..inittask -> crypto/internal/randutil..inittask
crypto/rsa..inittask -> crypto/rand..inittask
crypto/rsa..inittask -> math..inittask
crypto/rsa..inittask -> crypto/rsa.init
encoding/asn1..inittask -> encoding/asn1.init
math/big..inittask -> math/rand..inittask
math/big..inittask -> math/big.init
crypto/ecdsa..inittask -> crypto/ecdsa.init
crypto/ed25519..inittask -> crypto/ed25519/internal/edwards25519..inittask
crypto/x509/pkix..inittask -> crypto/x509/pkix.init
fmt..inittask -> internal/fmtsort..inittask
fmt..inittask -> fmt.init
io/ioutil..inittask -> path/filepath..inittask
io/ioutil..inittask -> io/ioutil.init
os..inittask -> syscall..inittask
os..inittask -> internal/oserror..inittask
os..inittask -> internal/poll..inittask
os..inittask -> internal/syscall/execenv..inittask
os..inittask -> internal/syscall/unix..inittask
os..inittask -> os.init
os..inittask -> os.init.0
bytes..inittask -> bytes.init
net..inittask -> context..inittask
net..inittask -> vendor/golang.org/x/net/dns/dnsmessage..inittask
net..inittask -> internal/singleflight..inittask
net..inittask -> net.init
net..inittask -> net.init.0
reflect..inittask -> reflect.init
time..inittask -> time.init
crypto..inittask -> crypto.init
crypto/dsa..inittask -> crypto/dsa.init
crypto/sha1..inittask -> crypto/sha1.init
crypto/sha1..inittask -> crypto/sha1.init.0
crypto/sha256..inittask -> crypto/sha256.init
crypto/sha256..inittask -> crypto/sha256.init.0
crypto/sha512..inittask -> crypto/sha512.init
crypto/sha512..inittask -> crypto/sha512.init.0
strconv..inittask -> strconv.init
vendor/golang.org/x/crypto/cryptobyte..inittask -> vendor/golang.org/x/crypto/cryptobyte.init
encoding/base64..inittask -> encoding/base64.init
internal/bytealg..inittask -> internal/bytealg.init.0
encoding/binary..inittask -> encoding/binary.init
unicode..inittask -> unicode.init
crypto/rand..inittask -> bufio..inittask
crypto/rand..inittask -> crypto/rand.init
crypto/rand..inittask -> crypto/rand.init.0
crypto/rand..inittask -> crypto/rand.init.1
crypto/rand..inittask -> crypto/rand.init.2
math..inittask -> math.init
math/rand..inittask -> math/rand.init
path/filepath..inittask -> path/filepath.init
syscall..inittask -> syscall.init
internal/oserror..inittask -> internal/oserror.init
internal/poll..inittask -> internal/poll.init
context..inittask -> context.init
context..inittask -> context.init.0
vendor/golang.org/x/net/dns/dnsmessage..inittask -> vendor/golang.org/x/net/dns/dnsmessage.init
bufio..inittask -> bufio.init

The only way to get around that is with build tags and more conditional compilation, which is gross.

What I'd like instead is a way to declare to the toolchain that for a given imported package that I'm fine with that package's init-time side effects (like normal) if I need them, but I'm also cool with omitting them if the linker decides that's fine.

That is, I want something like this this strawman syntax:

package main

import (
      "crypto/x509" // go:lazyinit
      "fmt"
)

func main() {
    if false {
        NewCertPool()
    }
    fmt.Println("hi")
}

... so the x509 init (nor its deps) is never run if the toolchain's DCE doesn't want to.

The go:lazyinit is saying that I'm not depending on any init-time work happening there for the import on that line. (Or perhaps it should be on the line before to be consistent with other //go: comments, or it shouldn't use comments)

(Arguably this should be the default behavior and imports with side effects would need the declaration, but we can't for backwards compatibility anyway, so not worth considering.)

/cc @ianlancetaylor @griesemer

@gopherbot gopherbot added this to the Proposal milestone Apr 14, 2020
@gopherbot gopherbot added the Proposal label Apr 14, 2020
@bradfitz bradfitz changed the title proposal: language: lazy init imports to possibly import with side effects proposal: language: lazy init imports to possibly import without side effects Apr 14, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 14, 2020

Would it be correct to say that you mean something like: if there are no references to any exported names of this package, then there is no need to run any initializers?

I certainly understand that the current situation is frustrating, but it seems possible to implement this entirely in the compiler without changing the language. Basically we need to separate the initialization of each global variable into a separate init function. We put those init functions in the deadcode graph along with everything else, with a reference from the variable to the init function. The package init function then calls only the undiscarded init functions.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Apr 15, 2020

Would it be correct to say that you mean something like: if there are no references to any exported names of this package, then there is no need to run any initializers?

Yes.

but it seems possible to implement this entirely in the compiler without changing the language

In many cases, yes, but not all. For one example that's only partially contrived:

https://play.golang.org/p/2GLltT92h1t

You can't do that automatically without changing language semantics.

Or you can't get rid of imports to net/http/pprof (even if the caller is only using its exported symbols) because of its side effects of registering itself with http.DefaultServeMux, even if you don't use that.

But, yes, if the various bugs like #19533 #14840 were fixed, it's unlikely I'd be filing this.

I just think such toolchain magic continues to be years away and I wonder if a more explicit mechanism has a better chance of being implemented.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 15, 2020

How do you end up with those kinds of if false statements in production code in general?

(Are these due to debugging constants, or paths rendered unreachable due to build tags?)

@rsc
Copy link
Contributor

@rsc rsc commented Apr 15, 2020

It seems like it would make sense to start with first handling the cases that don't require changes to language semantics, and then evaluate what's left.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Apr 15, 2020

How do you end up with those kinds of if false statements in production code in general?

They arise in a dozen different ways. That was just an example for brevity. Sometimes build tags, sometimes GOOS comparisons, sometimes just statically unreachable code.

@rasky
Copy link
Member

@rasky rasky commented Apr 15, 2020

I have a very initial implementation for #19533 here:
rasky@50db447

This splits generated init functions into separate symbols (using a relocation to bind them to the symbols they initialize), and the linker deadcode pass is then able to remove them in "some cases". In fact, it's not sufficient for the linker to see that the symbol is not used: it also needs to know that the init function itself is "side-effect free". For instance:

var x int = FunctionCall()

might not be side-effect free, depending on the body of FunctionCall. So, in addition to what it's done above, we also need a pass that does "side effect" detection in the compiler: we need to analyze all functions in call graph order (similar to escape detection) and check whether it's function is side-effect free. This flag must also be stored in export data, so that we're able to cross package boundaries.

By the way, this also require to agree on the exact definition of "side-effect free"; for instance, many things are visible through a debugger. os.Getenv might possibly be considered side-effect free for the purpose of removing a init function, but it's visible with strace/ptrace.

It would also be possible to have a compiler annotation for std functions which are commonly used in init functions, and we can't otherwise prove that are side effects free (even in the best possible world where we have a fully working side effect detection pass, they might still end up calling a syscall which needs to be somehow annotated anyway).

So yes, this is actually a complicated issue that requires many development hours. It's possibly a task bigger than I can afford in my spare contribution time, so I'm not sure I'll get around completing it.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Apr 15, 2020

@rsc, here's a more concrete example distilled from above.

crypto/x509 imports crypto/md5, but only because the never-used-in-std x509.EncryptPEMBlock might use it.

If you don't use EncryptPEMBlock, the linker can GC everything about md5, except the crypto/x509..inittask -> crypto/md5..inittask edge, which then brings in all of md5 forever because:

https://golang.org/src/crypto/md5/md5.go#L21

func init() {
	crypto.RegisterHash(crypto.MD5, New)
}

What I'd like to do from crypto/x509 is do a lazy/weak/conditional import of crypto/md5, as we don't really care about that that crypto.RegisterHash side effect.

In fact, the only use of https://golang.org/pkg/crypto/#Hash.Available in x509 is:

switch hashType {
case crypto.Hash(0):
        if pubKeyAlgo != Ed25519 {
                return ErrUnsupportedAlgorithm
        }
case crypto.MD5:
        return InsecureAlgorithmError(algo)
default:
        if !hashType.Available() {
                return ErrUnsupportedAlgorithm
        }
        h := hashType.New()
        h.Write(signed)
        signed = h.Sum(nil)
}

... where it only uses the crypto.Hash registration mechanism after it's determined that it's not MD5.

@magical
Copy link
Contributor

@magical magical commented Apr 22, 2020

What I'd like to do from crypto/x509 is do a lazy/weak/conditional import of crypto/md5, as we don't really care about that that crypto.RegisterHash side effect.

Are side effects from indirectly-imported packages covered by the Go 1 compatibility promise? The following code technically works today...

https://play.golang.org/p/lw5P2Vb5NXH

package main

import (
	"crypto"
	"fmt"

	_ "crypto/x509"
)

func main() {
	h := crypto.MD5.New()
	fmt.Println(h.Sum(nil))
}
@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Apr 25, 2020

Are side effects from indirectly-imported packages covered by the Go 1 compatibility promise?

No clear rule either way. I'm sure we've even already broken it in the past without knowing since the fix is so easy for people. It's not something we automatically test for at least. We'd definitely avoid trying to break people if possible.

@bradfitz bradfitz added the binary-size label May 1, 2020
bradfitz added a commit to tailscale/go that referenced this issue Jun 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.