Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: rare SIGBUS in runtime.handoff (tie-off) #28180

Closed
rc-matthew-l-weber opened this issue Oct 12, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@rc-matthew-l-weber
Copy link

commented Oct 12, 2018

I can confirm when building GO 1.11 on Linux Kernel 3.13.0-43-generic #72-Ubuntu (Ubuntu 14.04) that this bug still exists in some cases. It is very hit and miss when it fails ~50% failures with page fault.

Some additional datapoints.

  • GCC 4.8.4
  • Observe sometimes a "BUG: Bad page map in process compile _pte" and others for go_bootstrap
  • No system memory limits being hit

Originally posted by @aclements in #16705 (comment)

I don't believe any action is required as there have been lots of builds as part of the Buildroot autobuilder that have verified on newer kernels the issue does not exist their. The original theory about 3.13 looks valid.

@rc-matthew-l-weber

This comment has been minimized.

Copy link
Author

commented Oct 12, 2018

Example failed builds. "end log" link on the right provides the snippet of failing build output.
http://autobuild.buildroot.net/?reason=host-go-1.11

Kernel panics to match

Sep 29 03:05:29 largo kernel: [40621545.969586] BUG: Bad page map in process compile  pte:f000ff54f000def8 pmd:283254b067
Sep 29 03:05:29 largo kernel: [40621545.969757] addr:000000c002402020 vm_flags:08100073 anon_vma:ffff881e2d83ca00 mapping:          (null) index:c002402
Sep 29 03:05:29 largo kernel: [40621545.969929] CPU: 51 PID: 52879 Comm: compile Tainted: G    B         3.13.0-43-generic #72-Ubuntu
Sep 29 03:05:29 largo kernel: [40621545.969938] Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.2.1 09/12/2013
Sep 29 03:05:29 largo kernel: [40621545.969945]  ffff880aee2d9b00 ffff880e5da0dd40 ffffffff81720bf6 000000c002402020
Sep 29 03:05:29 largo kernel: [40621545.970127]  ffff880e5da0dd88 ffffffff811757f3 f000ff54f000def8 000000000c002402
Sep 29 03:05:29 largo kernel: [40621545.970150]  000000c002402020 ffff880fecea8090 ffff880aee2d9b00 ffff880873e37000
Sep 29 03:05:29 largo kernel: [40621545.970167] Call Trace:
Sep 29 03:05:29 largo kernel: [40621545.970194]  [<ffffffff81720bf6>] dump_stack+0x45/0x56
Sep 29 03:05:29 largo kernel: [40621545.970222]  [<ffffffff811757f3>] print_bad_pte+0x1a3/0x250
Sep 29 03:05:29 largo kernel: [40621545.970240]  [<ffffffff8117a74b>] handle_mm_fault+0xebb/0xf00
Sep 29 03:05:29 largo kernel: [40621545.970262]  [<ffffffff810d7e28>] ? get_futex_key+0x1d8/0x2c0
Sep 29 03:05:29 largo kernel: [40621545.970277]  [<ffffffff8172cc14>] __do_page_fault+0x184/0x560
Sep 29 03:05:29 largo kernel: [40621545.970343]  [<ffffffff810db15a>] ? do_futex+0x10a/0x760
Sep 29 03:05:29 largo kernel: [40621545.970368]  [<ffffffff810a0255>] ? set_next_entity+0x95/0xb0
Sep 29 03:05:29 largo kernel: [40621545.970380]  [<ffffffff810a02cf>] ? pick_next_task_fair+0x5f/0x1b0
Sep 29 03:05:29 largo kernel: [40621545.970390]  [<ffffffff8109d4f5>] ? sched_clock_cpu+0xb5/0x100
Sep 29 03:05:29 largo kernel: [40621545.970400]  [<ffffffff8172d00a>] do_page_fault+0x1a/0x70
Sep 29 03:05:29 largo kernel: [40621545.970429]  [<ffffffff81729468>] page_fault+0x28/0x30
Sep 29 20:17:36 largo kernel: [40683534.934197] BUG: Bad page map in process go_bootstrap  pte:f000ff54f000def8 pmd:00000000
Sep 29 20:17:36 largo kernel: [40683534.934292] addr:000000c000802000 vm_flags:08100073 anon_vma:ffff880ff384e800 mapping:          (null) index:c000802
Sep 29 20:17:36 largo kernel: [40683534.934415] CPU: 48 PID: 52035 Comm: go_bootstrap Tainted: G    B         3.13.0-43-generic #72-Ubuntu
Sep 29 20:17:36 largo kernel: [40683534.934419] Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.2.1 09/12/2013
Sep 29 20:17:36 largo kernel: [40683534.934422]  ffff881eeb008780 ffff881ff478fd40 ffffffff81720bf6 000000c000802000
Sep 29 20:17:36 largo kernel: [40683534.934460]  ffff881ff478fd88 ffffffff811757f3 f000ff54f000def8 000000000c000802
Sep 29 20:17:36 largo kernel: [40683534.934480]  000000c000802000 ffff881fd8b3f020 ffff881eeb008780 ffff88183d017700
Sep 29 20:17:36 largo kernel: [40683534.934496] Call Trace:
Sep 29 20:17:36 largo kernel: [40683534.934508]  [<ffffffff81720bf6>] dump_stack+0x45/0x56
Sep 29 20:17:36 largo kernel: [40683534.934522]  [<ffffffff811757f3>] print_bad_pte+0x1a3/0x250
Sep 29 20:17:36 largo kernel: [40683534.934530]  [<ffffffff8117a74b>] handle_mm_fault+0xebb/0xf00
Sep 29 20:17:36 largo kernel: [40683534.934540]  [<ffffffff8109a88a>] ? try_to_wake_up+0x1fa/0x2c0
Sep 29 20:17:36 largo kernel: [40683534.934547]  [<ffffffff8172cc14>] __do_page_fault+0x184/0x560
Sep 29 20:17:36 largo kernel: [40683534.934554]  [<ffffffff810db15a>] ? do_futex+0x10a/0x760
Sep 29 20:17:36 largo kernel: [40683534.934562]  [<ffffffff810a0255>] ? set_next_entity+0x95/0xb0
Sep 29 20:17:36 largo kernel: [40683534.934568]  [<ffffffff810a02cf>] ? pick_next_task_fair+0x5f/0x1b0
Sep 29 20:17:36 largo kernel: [40683534.934574]  [<ffffffff8109d4f5>] ? sched_clock_cpu+0xb5/0x100
Sep 29 20:17:36 largo kernel: [40683534.934580]  [<ffffffff8172d00a>] do_page_fault+0x1a/0x70
Sep 29 20:17:36 largo kernel: [40683534.934586]  [<ffffffff81729468>] page_fault+0x28/0x30

@bcmills bcmills added this to the Go1.12 milestone Oct 23, 2018

@bcmills

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2018

This doesn't sound like a Go bug. I don't think there is anything we can do about a kernel bug, unless there's something specific we're doing that is triggering that bug.

Any one have evidence for or against @aclements' comment on #16705? This is also a 3.13 report which lends credence to his theory.

@rc-matthew-l-weber

This comment has been minimized.

Copy link
Author

commented Oct 23, 2018

Confirmed switching the kernel past 3.13 resolved the issue

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2018

Ok, then I will close.
We can reopen if someone finds a usable workaround on the Go side. But 3.13 is old (6+ years), I don't expect it would be worth anyone's time.

@randall77 randall77 closed this Oct 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.