Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Invalid runtime symbol while bolting go binary #154

Closed
nmdis1999 opened this issue May 12, 2021 · 14 comments
Closed

Invalid runtime symbol while bolting go binary #154

nmdis1999 opened this issue May 12, 2021 · 14 comments
Assignees

Comments

@nmdis1999
Copy link

nmdis1999 commented May 12, 2021

Hello, I was trying to run bolt on go binary and I am seeing very werid message. I followed following steps:

$export CGO_CFLAGS="-fno-reorder-blocks-and-partition -Wl,--emit-relocs"

$go build -ldflags="-linkmode external -extldflags '-fno-reorder-blocks-and-partition -Wl,--emit-relocs'" hello.go

$perf record -b ./hello

$perf2bolt -p perf.data -o perf.fdata ./hello

message from perf2bolt:

PERF2BOLT: Starting data aggregation job for perf.data
PERF2BOLT: spawning perf job to read branch events
PERF2BOLT: spawning perf job to read mem events
PERF2BOLT: spawning perf job to read process events
PERF2BOLT: spawning perf job to read task events
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: f137ed238db11440f03083b1c88b7ffc0f4af65e
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: creating new program header table at address 0x600000, offset 0x200000
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling strict relocation mode for aggregation purposes
BOLT-WARNING: non-standard function reference (e.g. bitmask) detected against function crosscall2 from data section at 0x4d3cb0
BOLT-WARNING: non-standard function reference (e.g. bitmask) detected against function crosscall2 from data section at 0x547f98
BOLT-INFO: pre-processing profile using perf data aggregator
BOLT-INFO: binary build-id is:     cbacf7ea82ee539d7d0574774bcba4936d52205b
PERF2BOLT: spawning perf job to read buildid list
PERF2BOLT: matched build-id and file name
PERF2BOLT: waiting for perf mmap events collection to finish...
PERF2BOLT: parsing perf-script mmap events output
PERF2BOLT: waiting for perf task events collection to finish...
PERF2BOLT: parsing perf-script task events output
PERF2BOLT: input binary is associated with 1 PID(s)
PERF2BOLT: waiting for perf events collection to finish...
PERF2BOLT: parse branch events...
PERF2BOLT: read 73 samples and 328 LBR entries
PERF2BOLT: 0 samples (0.0%) were ignored
PERF2BOLT: traces mismatching disassembled function contents: 0 (0.0%)
PERF2BOLT: out of range traces involving unknown regions: 12 (26.7%)
PERF2BOLT: processing branch events...
PERF2BOLT: wrote 24 objects and 0 memory objects to perf.fdata
$llvm-bolt hello -o hello.bolt -data=perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort -split-functions=2 -split-all-cold -split-eh -dyno-stats
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: f137ed238db11440f03083b1c88b7ffc0f4af65e
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: creating new program header table at address 0x600000, offset 0x200000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling lite mode
BOLT-WARNING: non-standard function reference (e.g. bitmask) detected against function crosscall2 from data section at 0x4d3cb0
BOLT-WARNING: non-standard function reference (e.g. bitmask) detected against function crosscall2 from data section at 0x547f98
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-INFO: 5 out of 1840 functions in the binary (0.3%) have non-empty execution profile
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: the input contains 2 (dynamic count : 1) opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: basic block reordering modified layout of 4 (0.21%) functions
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code.
BOLT-INFO: splitting separates 539 hot bytes from 1278 cold bytes (29.66% of split functions is hot).
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:

                  52 : executed forward branches
                   7 : taken forward branches
                  22 : executed backward branches
                  20 : taken backward branches
                   1 : executed unconditional branches
                   9 : all function calls
                   2 : indirect calls
                   1 : PLT calls
                 362 : executed instructions
                  86 : executed load instructions
                  23 : executed store instructions
                   0 : taken jump table branches
                   0 : taken unknown indirect branches
                  75 : total branches
                  28 : taken branches
                  47 : non-taken conditional branches
                  27 : taken conditional branches
                  74 : all conditional branches

                  54 : executed forward branches (+3.8%)
                   0 : taken forward branches (-100.0%)
                  20 : executed backward branches (-9.1%)
                  18 : taken backward branches (-10.0%)
                   1 : executed unconditional branches (=)
                   9 : all function calls (=)
                   2 : indirect calls (=)
                   1 : PLT calls (=)
                 365 : executed instructions (+0.8%)
                  86 : executed load instructions (=)
                  23 : executed store instructions (=)
                   0 : taken jump table branches (=)
                   0 : taken unknown indirect branches (=)
                  75 : total branches (=)
                  19 : taken branches (-32.1%)
                  56 : non-taken conditional branches (+19.1%)
                  18 : taken conditional branches (-33.3%)
                  74 : all conditional branches (=)

BOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: padding code to 0xa00000 to accommodate hot text
BOLT-INFO: setting _end to 0x54f9dc
BOLT-INFO: setting _end to 0x54f9dc
BOLT-INFO: setting __hot_start to 0x800000
BOLT-INFO: setting __hot_end to 0x80022f
BOLT-INFO: patched build-id (flipped last bit)

Error after running the executable

$./hello.bolt
function symbol table not sorted by program counter: 0x8000dc runtime.(*mTreap).removeNode > 0x800056 runtime.(*mTreap).remove
	0x402300 internal/cpu.initialize
	0x402350 internal/cpu.processOptions
	0x402580 internal/cpu.indexByte
	0x4025c0 internal/cpu.doinit
	0x4029e0 internal/cpu.cpuid
	0x402a00 internal/cpu.xgetbv
	0x402a20 type..hash.internal/cpu.arm64
	0x402a80 type..eq.internal/cpu.arm64
	0x402ae0 type..hash.internal/cpu.option
	0x402b60 type..eq.internal/cpu.option
	0x402bf0 type..hash.internal/cpu.x86
	0x402c50 type..eq.internal/cpu.x86
	0x402cb0 type..hash.[15]internal/cpu.option
	0x402d20 type..eq.[15]internal/cpu.option
	0x402df0 sync/atomic.(*Value).Store
	0x402f30 sync/atomic.CompareAndSwapUintptr
	0x402f40 sync/atomic.StoreUint32
	0x402f50 sync/atomic.StoreUintptr
	0x402f60 runtime/internal/atomic.Cas64
	0x402f80 runtime/internal/atomic.Casuintptr
	0x402f90 runtime/internal/atomic.Storeuintptr
	0x402fa0 runtime/internal/atomic.Store
	0x402fb0 runtime/internal/atomic.Store64
	0x402fc0 internal/bytealg.init.0
	0x402ff0 internal/bytealg.init
	0x403050 runtime.cmpstring
	0x403070 cmpbody
	0x4032b0 runtime.memequal
	0x4032e0 runtime.memequal_varlen
	0x403310 memeqbody
	0x403450 internal/bytealg.IndexByteString
	0x403470 indexbytebody
	0x403590 runtime.memhash0
	0x4035a0 runtime.memhash8
	0x403600 runtime.memhash16
	0x403660 runtime.memhash128
	0x4036c0 runtime.memhash_varlen
	0x403710 runtime.strhash
	0x403770 runtime.f32hash
	0x403890 runtime.f64hash
	0x4039b0 runtime.c64hash
	0x403a20 runtime.c128hash
	0x403a90 runtime.interhash
	0x403c10 runtime.nilinterhash
	0x403d90 runtime.memequal0
	0x403da0 runtime.memequal8
	0x403dc0 runtime.memequal16
	0x403de0 runtime.memequal32
	0x403e00 runtime.memequal64
	0x403e20 runtime.memequal128
	0x403e50 runtime.f32equal
	0x403e80 runtime.f64equal
	0x403eb0 runtime.c64equal
	0x403ef0 runtime.c128equal
	0x403f30 runtime.strequal
	0x403fa0 runtime.interequal
	0x404010 runtime.nilinterequal
	0x404080 runtime.efaceeq
	0x4041c0 runtime.ifaceeq
	0x404300 runtime.alginit
	0x4043e0 runtime.initAlgAES
	0x4044b0 runtime.atomicwb
	0x404540 runtime.atomicstorep
	0x404590 runtime.casp
	0x4045f0 sync/atomic.StorePointer
	0x404650 sync/atomic.CompareAndSwapPointer
	0x4046d0 runtime.mmap
	0x404840 runtime.munmap
	0x4048e0 runtime.sigaction
	0x404a40 runtime.cgocall
	0x404b30 runtime.cgocallbackg
	0x404ca0 runtime.cgocallbackg1
	0x404f10 runtime.unwindm
	0x404fd0 runtime.cgoIsGoPointer
	0x4050c0 runtime._cgo_panic_internal
	0x405140 runtime.cgoCheckWriteBarrier
	0x405220 runtime.cgoCheckMemmove
	0x4052c0 runtime.cgoCheckSliceCopy
	0x4053a0 runtime.cgoCheckTypedBlock
	0x405730 runtime.cgoCheckBits
	0x405820 runtime.cgoCheckUsingType
	0x405a80 runtime.makechan
	0x405cf0 runtime.chansend1
	0x405d30 runtime.chansend
	0x406320 runtime.send
	0x406440 runtime.sendDirect
	0x4064d0 runtime.recvDirect
	0x406560 runtime.closechan
	0x4067f0 runtime.chanrecv1
	0x406830 runtime.chanrecv
	0x406ea0 runtime.recv
	0x407060 reflect.chanlen
	0x407080 runtime.(*waitq).dequeue
	0x407170 runtime.init.0
	0x4071e0 runtime.(*cpuProfile).add
	0x407300 runtime.(*cpuProfile).addNonGo
	0x4073f0 runtime.(*cpuProfile).addExtra
	0x407640 runtime.GOMAXPROCS
	0x407700 runtime.debugCallCheck
	0x4077f0 runtime.debugCallWrap
	0x4078a0 runtime.gogetenv
	0x4079d0 runtime.(*TypeAssertionError).Error
	0x407e20 runtime.errorString.Error
	0x407ea0 runtime.plainError.Error
	0x407ec0 runtime.typestring
	0x407f10 runtime.printany
	0x408700 runtime.panicwrap
	0x408b00 runtime.Caller
	0x408e00 runtime.GOROOT
	0x408e90 runtime.float64frombits
	0x408ea0 runtime.memhash
	0x4091a0 runtime.memhash32
	0x409210 runtime.memhash64
	0x409280 runtime.getitab
	0x409610 runtime.(*itabTableType).find
	0x409670 runtime.itabAdd
	0x4097c0 runtime.(*itabTableType).add
	0x409820 runtime.(*itab).init
	0x409c80 runtime.itabsinit
	0x409d70 runtime.panicdottypeE
	0x409e60 runtime.panicdottypeI
	0x409f60 runtime.convT2E
	0x409ff0 runtime.convT2E32
	0x40a070 runtime.convT2Estring
	0x40a110 runtime.convT2Eslice
	0x40a1c0 runtime.convT2I64
	0x40a240 runtime.assertE2I
	0x40a320 runtime.assertE2I2
	0x40a3d0 reflect.ifaceE2I
	0x40a460 runtime.iterate_itabs
	0x40a4e0 runtime.(*lfstack).push
	0x40a640 runtime.(*lfstack).pop
	0x40a680 runtime.lfnodeValidate
	0x40a740 runtime.lock
	0x40a8e0 runtime.unlock
	0x40a9b0 runtime.notewakeup
	0x40aa70 runtime.notesleep
	0x40ab70 runtime.notetsleep_internal
	0x40ad20 runtime.notetsleep
	0x40adb0 runtime.notetsleepg
	0x40ae50 runtime.mallocinit
	0x40b0a0 runtime.(*mheap).sysAlloc
	0x40b720 runtime.sysReserveAligned
	0x40b860 runtime.(*mcache).nextFree
	0x40bad0 runtime.mallocgc
	0x40c480 runtime.largeAlloc
	0x40c610 runtime.newobject
	0x40c670 reflect.unsafe_New
	0x40c6d0 runtime.newarray
	0x40c7c0 runtime.profilealloc
	0x40c820 runtime.nextSample
	0x40c870 runtime.fastexprand
	0x40c9f0 runtime.persistentalloc
	0x40ca90 runtime.persistentalloc1
	0x40cd70 runtime.(*linearAlloc).alloc
	0x40ce50 runtime.(*hmap).incrnoverflow
	0x40cee0 runtime.(*hmap).newoverflow
	0x40d1a0 runtime.makemap_small
	0x40d230 runtime.makemap
	0x40d430 runtime.makeBucketArray
	0x40d660 runtime.mapaccess2
	0x40d890 runtime.mapaccessK
	0x40dab0 runtime.mapassign
	0x40e040 runtime.mapiterinit
	0x40e300 runtime.mapiternext
	0x40e840 runtime.hashGrow
	0x40ea70 runtime.growWork
	0x40eb20 runtime.evacuate
	0x40f190 runtime.advanceEvacuationMark
	0x40f270 reflect.mapaccess
	0x40f2e0 reflect.mapiterinit
	0x40f350 reflect.mapiternext
	0x40f390 reflect.mapiterkey
	0x40f3a0 reflect.maplen
	0x40f3c0 runtime.mapaccess1_fast32
	0x40f560 runtime.mapaccess2_fast32
	0x40f720 runtime.mapassign_fast32
	0x40fa50 runtime.growWork_fast32
	0x40fb00 runtime.evacuate_fast32
	0x40ff20 runtime.mapaccess1_fast64
	0x4100d0 runtime.mapaccess2_fast64
	0x410290 runtime.mapassign_fast64ptr
	0x4105f0 runtime.growWork_fast64
	0x4106a0 runtime.evacuate_fast64
	0x410b50 runtime.mapassign_faststr
	0x410f60 runtime.growWork_faststr
	0x411010 runtime.evacuate_faststr
	0x411490 runtime.typedmemmove
	0x411560 reflect.typedmemmove
	0x4115b0 reflect.typedmemmovepartial
	0x4116f0 runtime.reflectcallmove
	0x411790 runtime.typedslicecopy
	0x4118d0 runtime.typedmemclr
	0x411940 reflect.typedmemclr
	0x411990 runtime.memclrHasPointers
	0x4119f0 runtime.(*mspan).refillAllocCache
	0x411a10 runtime.(*mspan).nextFreeIndex
	0x411b90 runtime.markBitsForAddr
	0x411c70 runtime.findObject
	0x412040 runtime.heapBits.nextArena
	0x4120b0 runtime.heapBits.forward
	0x412160 runtime.heapBits.forwardOrBoundary
	0x412220 runtime.bulkBarrierPreWrite
	0x412720 runtime.bulkBarrierBitmap
	0x4128e0 runtime.typeBitsBulkBarrier
	0x412bc0 runtime.heapBits.initSpan
	0x412dd0 runtime.heapBits.initCheckmarkSpan
	0x412f10 runtime.heapBits.clearCheckmarkSpan
	0x412fc0 runtime.(*mspan).countAlloc
	0x413050 runtime.heapBitsSetType
	0x413a90 runtime.heapBitsSetTypeGCProg
	0x413e00 runtime.progToPointerMask
	0x413f30 runtime.runGCProg
	0x414580 runtime.allocmcache
	0x414630 runtime.freemcache
	0x414690 runtime.(*mcache).refill
	0x4147d0 runtime.(*mcache).releaseAll
	0x414870 runtime.(*mcentral).cacheSpan
	0x414cd0 runtime.(*mcentral).uncacheSpan
	0x414df0 runtime.(*mcentral).freeSpan
	0x414f80 runtime.(*mcentral).grow
	0x415100 runtime.sysAlloc
	0x415210 runtime.sysUnused
	0x415390 runtime.sysUsed
	0x415410 runtime.sysFree
	0x415460 runtime.sysFault
	0x4154c0 runtime.sysReserve
	0x415550 runtime.sysMap
	0x415630 runtime.queuefinalizer
	0x415870 runtime.wakefing
	0x415910 runtime.createfing
	0x415980 runtime.runfinq
	0x415d90 runtime.SetFinalizer
	0x416590 runtime.(*fixalloc).alloc
	0x416700 runtime.gcinit
	0x4167d0 runtime.readgogc
	0x416880 runtime.gcenable
	0x416910 runtime/debug.setGCPercent
	0x4169e0 runtime.(*gcControllerState).startCycle
	0x416d60 runtime.(*gcControllerState).revise
	0x416e70 runtime.(*gcControllerState).endCycle
	0x417330 runtime.(*gcControllerState).enlistWorker
	0x4174b0 runtime.(*gcControllerState).findRunnableGCWorker
	0x417720 runtime.pollFractionalWorkerExit
	0x4177e0 runtime.gcSetTriggerRatio
	0x417be0 runtime.gcWaitOnMark
	0x417ca0 runtime.gcStart
	0x418230 runtime.gcMarkDone
	0x4184b0 runtime.gcMarkTermination
	0x418ff0 runtime.gcBgMarkStartWorkers
	0x4190c0 runtime.gcBgMarkWorker
	0x4195a0 runtime.gcMark
	0x419890 runtime.gcSweep
	0x419a40 runtime.gcResetMarkState
	0x419af0 sync.runtime_registerPoolCleanup
	0x419b50 runtime.clearpools
	0x419cb0 runtime.gchelper
	0x419db0 runtime.gchelperstart
	0x419e40 runtime.itoaDiv
	0x419f20 runtime.fmtNSAsMS
	0x41a0b0 runtime.(*mTreap).insert
	0x8000dc runtime.(*mTreap).removeNode
fatal error: invalid runtime symbol table
runtime: panic before malloc heap initialized

runtime stack:
runtime.throw(0x4ba94d, 0x1c)
	/usr/lib/go-1.11/src/runtime/panic.go:608 +0x72 fp=0x7ffcabc888a0 sp=0x7ffcabc88870 pc=0x428712
runtime.moduledataverify1(0x547f40)
	/usr/lib/go-1.11/src/runtime/symtab.go:587 +0x5ad fp=0x7ffcabc889a8 sp=0x7ffcabc888a0 pc=0x44285d
runtime.moduledataverify()
	/usr/lib/go-1.11/src/runtime/symtab.go:555 +0x34 fp=0x7ffcabc889c8 sp=0x7ffcabc889a8 pc=0x442284
runtime.schedinit()
	/usr/lib/go-1.11/src/runtime/proc.go:543 +0x6d fp=0x7ffcabc88a30 sp=0x7ffcabc889c8 pc=0x42b1fd
runtime.rt0_go(0x7ffcabc88b38, 0x1, 0x7ffcabc88b38, 0x0, 0x7fa78ff5a09b, 0x0, 0x7ffcabc88b38, 0x100040000, 0x450010, 0x0, ...)
	/usr/lib/go-1.11/src/runtime/asm_amd64.s:195 +0x11a fp=0x7ffcabc88a38 sp=0x7ffcabc88a30 pc=0x45013a

Not sure what's happening here, can you help?

@yota9
Copy link
Contributor

yota9 commented May 12, 2021

Hello. The bolt doesn't support golang currently.

@aaupov
Copy link
Contributor

aaupov commented May 12, 2021

@nmdis1999 thanks for detailed reporting! BOLT should be working with golang binaries, we'll take a look.

@aaupov aaupov self-assigned this May 12, 2021
@yota9
Copy link
Contributor

yota9 commented May 12, 2021

@aaupov The golang heavily uses function pointer deltas, which BOLT doesn't support. Plus some problems like with symbol table sorting must be resolved specifically for golang binaries..

@aaupov
Copy link
Contributor

aaupov commented May 12, 2021

@yota9 Thanks for the context. Is it relying on function pointers for every function? Can we strategically blacklist e.g. runtime functions? Symbol table sorting might perhaps be worked around with -use-old-text, but let me see.

@yota9
Copy link
Contributor

yota9 commented May 12, 2021

@aaupov It is hard for me to say right now on every function or not, at least for the interface function it does, so it could be not only runtime. Use old text won't help, but you can save the list of the symbols using nm :) But the functions sizes will change and the deltas would be screwed up anyway. But it is not the biggest issue, there are few static tables that belongs to the function, which shows the offset from the beginning of the function to the specific instruction. For example pcsp table show the offsets to the SP-change instructions, that is the real challenge here :)

@maksfb
Copy link
Contributor

maksfb commented May 12, 2021

@yota9 That's good to know.

We can disable function re-ordering by removing -reorder-functions= option or perhaps even by enforcing -relocs=0. Still I don't think it will solve al issues.

While we had some limited success re-writing golang binaries, it appears to be dependent on the specific version of the runtime and options used. Overall, you are correct in stating "BOLT doesn't support golang". The generated code and runtime heavily depends on function and instruction locations. For proper support, BOLT will need to understand all dependencies and update them accordingly.

@nmdis1999
Copy link
Author

Can we expect BOLT supporting golang in future?

@aaupov
Copy link
Contributor

aaupov commented May 13, 2021

@nmdis1999: currently we don't have specific plans on supporting golang binaries.
Our more immediate plans include upstreaming BOLT to LLVM umbrella, split function support (default since GCC8) with stripped binaries (default in Linux distros), and improving support for Rust code.

@thomasdullien
Copy link

Hey @aaupov cool, thanks for the clarification. @maksfb / @yota9 -- is there some stuff I can read to understand better what assumptions Go breaks to make it interact poorly with Bolt? :)

@yota9
Copy link
Contributor

yota9 commented May 14, 2021

@thomasdullien Hello! I think no. Since golang runtime is mostly supported by Google, it has poor documentation and the only way I know is to look at runtime code and binary objdump :)

@Sameeranjoshi
Copy link
Contributor

@nmdis1999: currently we don't have specific plans on supporting golang binaries.
Our more immediate plans include upstreaming BOLT to LLVM umbrella, split function support (default since GCC8) with stripped binaries (default in Linux distros), and improving support for Rust code.

Hello, is there a list of open projects to work on?
I am interested in contributing.
It would be helpful if someone guides me, do you have any mailing list or similar discussion forums?

@andreybokhanko
Copy link

andreybokhanko commented May 17, 2021

Hi All,

I'm glad to announce that we (Advanced Software Technology Lab, Huawei) are working on introducing Go support in BOLT. @yota9 (aka @yota9-huawei 😄) is a member of our team (well, actually he is the key member of this project). Another member of the team is Alexey Moksyakov, who also committed a patch recently.

"Almost" product-quality version is ready internally, and it demonstrates pretty impressive performance gains on real-world Go applications. There is nothing in BOLT that prevents its usage for Go, but being a pretty distinct runtime environment (with roots going back to Plan9, not Unix) Go requires some pretty complex changes -- both in BOLT and Go compiler itself (that we're also working on).

We plan to upstream everything (both BOLT and Go compiler parts) soon; but as common, first we need to pass all the internal regulations. Thus, I can't make specific promises; but we definitely plan to upstream and continue development in a completely open way, as a part of an open-source community (LLVM at that time? -- we'll see).

Hope to share more news soon!

Yours,
Andrey
===
Director
Advanced Software Technology Lab
Huawei

@maksfb
Copy link
Contributor

maksfb commented May 17, 2021

That's amazing! Looking forward to your teams's patches. Will also be interested in hearing details on challenges you've faced and Go runtime internals.

@aaupov
Copy link
Contributor

aaupov commented Jan 19, 2023

Marking as closed, the support is added in https://reviews.llvm.org/D141234, to be upstreamed soon.

@aaupov aaupov closed this as completed Jan 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants