Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: large number of NOP instructions in crypto/md5.blockGeneric #31545

Closed
mundaym opened this issue Apr 18, 2019 · 9 comments

Comments

Projects
None yet
4 participants
@mundaym
Copy link
Member

commented Apr 18, 2019

There seem to be a large number of contiguous NOP instructions inserted into the main loop inside md5.blockGeneric. Looks like they are markers for inlined binary.LittleEndian.UInt32 calls. I haven't checked the performance implications (probably doesn't make a huge difference) but it would still be good to remove these if we can:

v72
    00020 (+27) XCHGL AX, AX
v153
    00021 (+28) XCHGL AX, AX
v232
    00022 (+29) XCHGL AX, AX
v311
    00023 (+30) XCHGL AX, AX
v390
    00024 (+31) XCHGL AX, AX
v469
    00025 (+32) XCHGL AX, AX
v548
    00026 (+33) XCHGL AX, AX
v627
    00027 (+34) XCHGL AX, AX
v706
    00028 (+35) XCHGL AX, AX
v785
    00029 (+36) XCHGL AX, AX
v864
    00030 (+37) XCHGL AX, AX
v943
    00031 (+38) XCHGL AX, AX
v1022
    00032 (+39) XCHGL AX, AX
v1101
    00033 (+40) XCHGL AX, AX
v1180
    00034 (+41) XCHGL AX, AX
v1259
    00035 (+42) XCHGL AX, AX

/cc @randall77

@mundaym mundaym added this to the Unplanned milestone Apr 18, 2019

@dgryski

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

Dup of #29571 ?

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

Yes, this is a dup of #29571. I had hoped CL 170445 would fix instances like this, but it looks like it hasn't.

@randall77 randall77 closed this Apr 18, 2019

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

The reason CL 170445 didn't work is that these nops really are needed for correct tracebacks.
A binary.LittleEndian.Uint32 call reduces to a single instruction, which has a line number inside the inlined Uint32 function. If we took a profiling sample at that instruction, the nop is there to represent the location of the call in the parent function (blockGeneric, in this case). There aren't any other instructions in the parent frame which could be used as a substitute.

We could give the single instruction the line number of the parent, which would then obviate the need for the nop. I'm not sure what a principled method for making that happen would be, though. At the time of inlining, it isn't obvious that the inlined body will reduce to a single instruction (and it won't, on architectures that can't do unaligned loads).

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

In DWARF a single instruction can represent a stack of inlined function calls. So a profiling sample at that instruction would carry both the location in binary.LittleEndian.Uint32 and in the caller. Not sure if that makes any sense in the Go context.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

The Go inline markers can only represent a single frame.
That's not the problem here, though. Each of the inline marks represents a different call site.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2019

I'm sure I'm misunderstanding this, but if each inline mark represents a different call site, then doesn't that imply that the inlined call was completely eliminated? I don't see a reason to keep an instruction around to represent a call that was optimized away.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2019

The body of the inlined call compiles to a single instruction. That instruction has a pc/line of the body of the inlined call. That instruction has a "parent" pointer to a nop instruction, which has a pc/line of the call site.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Apr 22, 2019

It seems to me that if we could attach a call stack to an instruction, then the single instruction of the inlined call would describe both the location of the inlined function and the location of the call to the inlined function, and the separate nop instruction would not be required.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 22, 2019

True, one way to fix this would be to allow multiple locations to be attached to a single instruction. That's currently not possible.
For profiling maybe we could just do it in DWARF? For things like runtime.Callers we'd need to do it ourselves in the pcln tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.