Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: inline marker targets not reachable after assembly on ppc64x #40689

Closed
mundaym opened this issue Aug 11, 2020 · 11 comments
Closed

cmd/compile: inline marker targets not reachable after assembly on ppc64x #40689

mundaym opened this issue Aug 11, 2020 · 11 comments
Milestone

Comments

@mundaym
Copy link
Member

@mundaym mundaym commented Aug 11, 2020

It looks like issue #40473 also affects ppc64x based on this trybot run for CL 247697: https://storage.googleapis.com/go-build-log/8548211f/linux-ppc64le-buildlet_059f3c62.log. It should be relatively easy to fix with a code change similar to that done for s390x in CL 247697.

@mundaym
Copy link
Member Author

@mundaym mundaym commented Aug 11, 2020

@laboger
Copy link
Contributor

@laboger laboger commented Aug 11, 2020

@mundaym I don't think the test in your CL is correct for ppc64x. If you look in cmd/compile/internal/ppc64/ggen.go function ginsop it is not generating a NOP but OR R0,R0,R0 for the inline marker. So this instruction won't get removed by PPC64 assembler and we shouldn't have the same problem as on S390x.

Correction: I see where a NOP could be generated for an inlMark. I can make a change to stop removing NOPs in the assembler but I want to verify it's not leaving a lot of them. I'll have to read through the issue again because I'm not sure why this causes a hang and why no other platforms are hitting this problem?

@mundaym
Copy link
Member Author

@mundaym mundaym commented Aug 11, 2020

It's a tricky one to explain. I think of it as a 'use-after-free' bug:

  1. The compiler produces a Prog value representing an instruction and saves a pointer to it for use later as an inlining mark.
  2. The assembler rewrites the Prog into a GOT lookup and replaces the original Prog with a NOP (here).
  3. The assembler removes that NOP Prog from the Text linked list (here and here). Since this happens before assembly it does not assign the Prog a valid PC value.
  4. The linker reads from the pointer to the (now dead and unreachable) Prog that the compiler saved and uses it to produce invalid inlining table data.

The invalid inlining table data only appears to cause an infinite loop during traceback when the invalid PC value in the unscheduled Prog happens to correspond to an instruction that is part of the same inlined function (therefore the traceback never escapes the inlined function). This appears to be a rare situation.

It should be OK for the assembler to leave NOPs in the Text linked list since they do not produce actual code. Keeping them in the linked list however means they are scheduled and their PC value is valid (though really it points to the next instruction).

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 11, 2020

Change https://golang.org/cl/247842 mentions this issue: cmd/asm,cmd/internal/obj/ppc64: don't remove NOP in assembler

@gopherbot gopherbot closed this in 7d7bd5a Aug 13, 2020
@laboger
Copy link
Contributor

@laboger laboger commented Aug 13, 2020

gopherbot please backport to 1.15 and 1.14.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 13, 2020

@laboger What are the effects of this bug? That is, why should the CL be backported? I can't quite figure out what the actual problem is here. Thanks.

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 13, 2020

This is related to #40473. Tracebacks hang because inline marker information is corrupted.

@laboger
Copy link
Contributor

@laboger laboger commented Aug 13, 2020

The change for s390x depends on this one. In the s390x fix, code was added to detect when the inline marker information was out of sync and that check fails on ppc64x. This change is needed to prevent that check from failing, as well as preventing other errors that could occur when they are out of sync.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 13, 2020

@gopherbot Please backport to 1.14 and 1.15

This bug can cause tracebacks to hang. There is no workaround.

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 13, 2020

Backport issue(s) opened: #40766 (for 1.14), #40767 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 13, 2020

Change https://golang.org/cl/248381 mentions this issue: cmd/internal/obj/ppc64: don't remove NOP in assembler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.