Skip to content

Commit

Permalink
[dev.typeparams] cmd/compile: unified IR construction
Browse files Browse the repository at this point in the history
This CL adds a new unified IR construction mode to the frontend.  It's
purely additive, and all files include "UNREVIEWED" at the top, like
how types2 was initially imported. The next CL adds a -d=unified flag
to actually enable unified IR mode.

See below for more details, but some highlights:

1. It adds ~6kloc (excluding enum listings and stringer output), but I
estimate it will allow removing ~14kloc (see CL 324670, including its
commit message);

2. When enabled by default, it passes more tests than -G=3 does (see
CL 325213 and CL 324673);

3. Without requiring any new code, it supports inlining of more code
than the current inliner (see CL 324574; contrast CL 283112 and CL
266203, which added support for inlining function literals and type
switches, respectively);

4. Aside from dictionaries (which I intend to add still), its support
for generics is more complete (e.g., it fully supports local types,
including local generic types within generic functions and
instantiating generic types with local types; see
test/typeparam/nested.go);

5. It supports lazy loading of types and objects for types2 type
checking;

6. It supports re-exporting of types, objects, and inline bodies
without needing to parse them into IR;

7. The new export data format has extensive support for debugging with
"sync" markers, so mistakes during development are easier to catch;

8. When compiling with -d=inlfuncswithclosures=0, it enables "quirks
mode" where it generates output that passes toolstash -cmp.

--

The new unified IR pipeline combines noding, stenciling, inlining, and
import/export into a single, shared code path. Previously, IR trees
went through multiple phases of copying during compilation:

1. "Noding": the syntax AST is copied into the initial IR form. To
support generics, there's now also "irgen", which implements the same
idea, but takes advantage of types2 type-checking results to more
directly construct IR.

2. "Stenciling": generic IR forms are copied into instantiated IR
forms, substituting type parameters as appropriate.

3. "Inlining": the inliner made backup copies of inlinable functions,
and then copied them again when inlining into a call site, with some
modifications (e.g., updating position information, rewriting variable
references, changing "return" statements into "goto").

4. "Importing/exporting": the exporter wrote out the IR as saved by
the inliner, and then the importer read it back as to be used by the
inliner again. Normal functions are imported/exported "desugared",
while generic functions are imported/exported in source form.

These passes are all conceptually the same thing: make a copy of a
function body, maybe with some minor changes/substitutions. However,
they're all completely separate implementations that frequently run
into the same issues because IR has many nuanced corner cases.

For example, inlining currently doesn't support local defined types,
"range" loops, or labeled "for"/"switch" statements, because these
require special handling around Sym references. We've recently
extended the inliner to support new features like inlining type
switches and function literals, and they've had issues. The exporter
only knows how to export from IR form, so when re-exporting inlinable
functions (e.g., methods on imported types that are exposed via
exported APIs), these functions may need to be imported as IR for the
sole purpose of being immediately exported back out again.

By unifying all of these modes of copying into a single code path that
cleanly separates concerns, we eliminate many of these possible
issues. Some recent examples:

1. Issues #45743 and #46472 were issues where type switches were
mishandled by inlining and stenciling, respectively; but neither of
these affected unified IR, because it constructs type switches using
the exact same code as for normal functions.

2. CL 325409 fixes an issue in stenciling with implicit conversion of
values of type-parameter type to variables of interface type, but this
issue did not affect unified IR.

Change-Id: I5a05991fe16d68bb0f712503e034cb9f2d19e296
Reviewed-on: https://go-review.googlesource.com/c/go/+/324573
Trust: Matthew Dempsky <mdempsky@google.com>
Trust: Robert Griesemer <gri@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
  • Loading branch information
mdempsky committed Jun 15, 2021
1 parent ea438bd commit 79cd168
Show file tree
Hide file tree
Showing 12 changed files with 6,164 additions and 0 deletions.
126 changes: 126 additions & 0 deletions src/cmd/compile/internal/noder/codes.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
// UNREVIEWED

// Copyright 2021 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package noder

type code interface {
marker() syncMarker
value() int
}

type codeVal int

func (c codeVal) marker() syncMarker { return syncVal }
func (c codeVal) value() int { return int(c) }

const (
valBool codeVal = iota
valString
valInt64
valBigInt
valBigRat
valBigFloat
)

type codeType int

func (c codeType) marker() syncMarker { return syncType }
func (c codeType) value() int { return int(c) }

const (
typeBasic codeType = iota
typeNamed
typePointer
typeSlice
typeArray
typeChan
typeMap
typeSignature
typeStruct
typeInterface
typeUnion
typeTypeParam
)

type codeObj int

func (c codeObj) marker() syncMarker { return syncCodeObj }
func (c codeObj) value() int { return int(c) }

const (
objAlias codeObj = iota
objConst
objType
objFunc
objVar
objStub
)

type codeStmt int

func (c codeStmt) marker() syncMarker { return syncStmt1 }
func (c codeStmt) value() int { return int(c) }

const (
stmtEnd codeStmt = iota
stmtLabel
stmtBlock
stmtExpr
stmtSend
stmtAssign
stmtAssignOp
stmtIncDec
stmtBranch
stmtCall
stmtReturn
stmtIf
stmtFor
stmtSwitch
stmtSelect

// TODO(mdempsky): Remove after we don't care about toolstash -cmp.
stmtTypeDeclHack
)

type codeExpr int

func (c codeExpr) marker() syncMarker { return syncExpr }
func (c codeExpr) value() int { return int(c) }

// TODO(mdempsky): Split expr into addr, for lvalues.
const (
exprNone codeExpr = iota
exprConst
exprType // type expression
exprLocal // local variable
exprName // global variable or function
exprBlank
exprCompLit
exprFuncLit
exprSelector
exprIndex
exprSlice
exprAssert
exprUnaryOp
exprBinaryOp
exprCall

// TODO(mdempsky): Handle in switchStmt directly instead.
exprTypeSwitchGuard
)

type codeDecl int

func (c codeDecl) marker() syncMarker { return syncDecl }
func (c codeDecl) value() int { return int(c) }

const (
declEnd codeDecl = iota
declFunc
declMethod
declVar
declOther
)
243 changes: 243 additions & 0 deletions src/cmd/compile/internal/noder/decoder.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
// UNREVIEWED

// Copyright 2021 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package noder

import (
"encoding/binary"
"fmt"
"go/constant"
"go/token"
"math/big"
"os"
"strings"

"cmd/compile/internal/base"
)

type pkgDecoder struct {
pkgPath string

elemEndsEnds [numRelocs]uint32
elemEnds []uint32
elemData string
}

func newPkgDecoder(pkgPath, input string) pkgDecoder {
pr := pkgDecoder{
pkgPath: pkgPath,
}

// TODO(mdempsky): Implement direct indexing of input string to
// avoid copying the position information.

r := strings.NewReader(input)

assert(binary.Read(r, binary.LittleEndian, pr.elemEndsEnds[:]) == nil)

pr.elemEnds = make([]uint32, pr.elemEndsEnds[len(pr.elemEndsEnds)-1])
assert(binary.Read(r, binary.LittleEndian, pr.elemEnds[:]) == nil)

pos, err := r.Seek(0, os.SEEK_CUR)
assert(err == nil)

pr.elemData = input[pos:]
assert(len(pr.elemData) == int(pr.elemEnds[len(pr.elemEnds)-1]))

return pr
}

func (pr *pkgDecoder) numElems(k reloc) int {
count := int(pr.elemEndsEnds[k])
if k > 0 {
count -= int(pr.elemEndsEnds[k-1])
}
return count
}

func (pr *pkgDecoder) totalElems() int {
return len(pr.elemEnds)
}

func (pr *pkgDecoder) absIdx(k reloc, idx int) int {
absIdx := idx
if k > 0 {
absIdx += int(pr.elemEndsEnds[k-1])
}
if absIdx >= int(pr.elemEndsEnds[k]) {
base.Fatalf("%v:%v is out of bounds; %v", k, idx, pr.elemEndsEnds)
}
return absIdx
}

func (pr *pkgDecoder) dataIdx(k reloc, idx int) string {
absIdx := pr.absIdx(k, idx)

var start uint32
if absIdx > 0 {
start = pr.elemEnds[absIdx-1]
}
end := pr.elemEnds[absIdx]

return pr.elemData[start:end]
}

func (pr *pkgDecoder) stringIdx(idx int) string {
return pr.dataIdx(relocString, idx)
}

func (pr *pkgDecoder) newDecoder(k reloc, idx int, marker syncMarker) decoder {
r := pr.newDecoderRaw(k, idx)
r.sync(marker)
return r
}

func (pr *pkgDecoder) newDecoderRaw(k reloc, idx int) decoder {
r := decoder{
common: pr,
k: k,
idx: idx,
}

// TODO(mdempsky) r.data.Reset(...) after #44505 is resolved.
r.data = *strings.NewReader(pr.dataIdx(k, idx))

r.sync(syncRelocs)
r.relocs = make([]relocEnt, r.len())
for i := range r.relocs {
r.sync(syncReloc)
r.relocs[i] = relocEnt{reloc(r.len()), r.len()}
}

return r
}

type decoder struct {
common *pkgDecoder

relocs []relocEnt
data strings.Reader

k reloc
idx int
}

func (r *decoder) checkErr(err error) {
if err != nil {
base.Fatalf("unexpected error: %v", err)
}
}

func (r *decoder) sync(m syncMarker) {
if debug {
pos, err0 := r.data.Seek(0, os.SEEK_CUR)
x, err := r.data.ReadByte()
r.checkErr(err)
if x != byte(m) {
// TODO(mdempsky): Revisit this error message, and make it more
// useful (e.g., include r.p.pkgPath).
base.Fatalf("data sync error: found %v at %v (%v) in (%v:%v), but expected %v", syncMarker(x), pos, err0, r.k, r.idx, m)
}
}
}

func (r *decoder) bool() bool {
r.sync(syncBool)
x, err := r.data.ReadByte()
r.checkErr(err)
assert(x < 2)
return x != 0
}

func (r *decoder) int64() int64 {
r.sync(syncInt64)
x, err := binary.ReadVarint(&r.data)
r.checkErr(err)
return x
}

func (r *decoder) uint64() uint64 {
r.sync(syncUint64)
x, err := binary.ReadUvarint(&r.data)
r.checkErr(err)
return x
}

func (r *decoder) len() int { x := r.uint64(); v := int(x); assert(uint64(v) == x); return v }
func (r *decoder) int() int { x := r.int64(); v := int(x); assert(int64(v) == x); return v }
func (r *decoder) uint() uint { x := r.uint64(); v := uint(x); assert(uint64(v) == x); return v }

func (r *decoder) code(mark syncMarker) int {
r.sync(mark)
return r.len()
}

func (r *decoder) reloc(k reloc) int {
r.sync(syncUseReloc)
idx := r.len()

e := r.relocs[idx]
assert(e.kind == k)
return e.idx
}

func (r *decoder) string() string {
r.sync(syncString)
return r.common.stringIdx(r.reloc(relocString))
}

func (r *decoder) strings() []string {
res := make([]string, r.len())
for i := range res {
res[i] = r.string()
}
return res
}

func (r *decoder) rawValue() constant.Value {
isComplex := r.bool()
val := r.scalar()
if isComplex {
val = constant.BinaryOp(val, token.ADD, constant.MakeImag(r.scalar()))
}
return val
}

func (r *decoder) scalar() constant.Value {
switch tag := codeVal(r.code(syncVal)); tag {
default:
panic(fmt.Sprintf("unexpected scalar tag: %v", tag))

case valBool:
return constant.MakeBool(r.bool())
case valString:
return constant.MakeString(r.string())
case valInt64:
return constant.MakeInt64(r.int64())
case valBigInt:
return constant.Make(r.bigInt())
case valBigRat:
num := r.bigInt()
denom := r.bigInt()
return constant.Make(new(big.Rat).SetFrac(num, denom))
case valBigFloat:
return constant.Make(r.bigFloat())
}
}

func (r *decoder) bigInt() *big.Int {
v := new(big.Int).SetBytes([]byte(r.string()))
if r.bool() {
v.Neg(v)
}
return v
}

func (r *decoder) bigFloat() *big.Float {
v := new(big.Float).SetPrec(512)
assert(v.UnmarshalText([]byte(r.string())) == nil)
return v
}
Loading

0 comments on commit 79cd168

Please sign in to comment.