Skip to content

cmd/link,cmd/compile: linktime InlMark #77093

@Jorropo

Description

@Jorropo

In CL 733845 this simple lines of code:

a := byteorder.NativeUint(y)
b := byteorder.NativeUint(x)
selected := constanttime.Select(v, a, b)
byteorder.PutNativeUint(x, selected)

Compiles to (this is after pruning them):

    v28 (+81) = InlMark <void> [21] v499
    v29 (+55) = InlMark <void> [24] v499
    v30 (+46) = InlMark <void> [27] v499
    v33 (+87) = InlMark <void> [30] v499
    v158 (+82) = InlMark <void> [22] v499
    v159 (+55) = InlMark <void> [25] v499
    v160 (+46) = InlMark <void> [28] v499
    v163 (+87) = InlMark <void> [31] v499
    v278 (+86) = InlMark <void> [23] v499
    v279 (+58) = InlMark <void> [26] v499
    v280 (+50) = InlMark <void> [29] v499
    v285 (+98) = InlMark <void> [32] v499
    v485 (+53) = MOVQload <uint64> v85 v499
    v486 (+53) = MOVQload <uint64> v220 v499
    v289 (+84) = TESTQ <flags> v7 v7
    v275 (+84) = CMOVQNE <uint> v486 v485 v289 (byteorder.v[uint64], byteorder.v[uint], byteorder.v[uint], byteorder.v[uint], selected[uint])
    v388 (+59) = MOVQstore <mem> v220 v275 v499

This is a bit ridiculous level of inline marks, nops are cheap altho:

  • ICACHE / binary size
  • They still go through the decoder which on my CPU is limited to 4 instructions per cycle.
    • It can also fuse a NOP with the following instruction but my CPU can only fuse two instructions into one, and we do not even attempt to interleave InlMark into the regular execution stream, so they consume decoder slots at a floor(N_inlmarks / 2) ratio.
    • (very minor) It is stealing other fusing abilities like mov rax, rbx, add rax, rcxadd rax, rbx, rcx (decoded as 3 operands instruction which doesn't exists on amd64 rather than going through the register renaming unit).

I have other examples where removing the inline marks speeds up things, altho most of them are because the inline mark is the drop overflowing the instructions into the next cache line or getting unlucky with loop alignment.


AFAIK inline marks exists because runtime and tracing internals needs a placeholder PC to use for the inlined function in backtraces.

I think it would make more sense if the linker generated inlmarks (with cooperation from the compiler).
There would be an inline mark symbol for each function definition (so all functions inlining a same function would all point to the same symbol).
They would be PC Quantum sized objects in a zero-tripped ro-^x segment to minimize memory usage (still significant debug info / pclntab overhead ...).

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.PerformanceToolProposalIssues describing a requested change to a Go tool or command-line program.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions