Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: spanBytesAlloc underflow on windows 32bit #18043

Closed
fmzquant opened this issue Nov 25, 2016 · 24 comments

Comments

Projects
None yet
6 participants
@fmzquant
Copy link

commented Nov 25, 2016

bug

this happened in windows 32 bit some times, never happen in linux 64 bit,
Very difficult to reproduce.

@fmzquant

This comment has been minimized.

Copy link
Author

commented Nov 25, 2016

go version is 1.7.3

@odeke-em odeke-em changed the title spanBytesAlloc underflow under windows 32 bit runtime: spanBytesAlloc underflow on windows 32bit Nov 25, 2016

@odeke-em

This comment has been minimized.

Copy link
Member

commented Nov 26, 2016

@aclements aclements added this to the Go1.8 milestone Nov 29, 2016

@aclements

This comment has been minimized.

Copy link
Member

commented Dec 1, 2016

@botvs, thanks for the report. I think I know what's going on. You mentioned that this is "very difficult to reproduce", which suggests that you have at least been able to reproduce it. If I sent you a patch to add some additional debugging to the runtime to confirm my suspicion, how hard do you think it would be for you to test?

@aclements

This comment has been minimized.

Copy link
Member

commented Dec 1, 2016

As for what I think it going on: if mcentral.cacheSpan is preempted by mark termination between the call to deductSweepCredit and the call to reimbuseSweepCredit, then mheap_.spanBytesAlloc will be zeroed and reimbuseSweepCredit will underflow. Probably the right fix is for reimbuseSweepCredit to check if we're still in the same GC cycle as the deductSweepCredit it's supposed to be balanced with and just do nothing if we're not.

@aclements

This comment has been minimized.

Copy link
Member

commented Dec 1, 2016

On the other hand, mcentral.cacheSpan happens while g.m.locks != 0, so it's not, in fact, preemptible.

@aclements

This comment has been minimized.

Copy link
Member

commented Dec 1, 2016

Another plausible but incorrect theory is that mheap_.spanBytesAlloc wasn't 8-byte aligned, since 1.7.3's runtime/internal/atomic.Xadd64 didn't check this (it does on master), but it appears it is in fact aligned. (And I'm pretty sure Xadd64 genuinely doesn't require alignment on 386 like the other 64-bit atomics do, though that's hard to confirm.)

@fmzquant

This comment has been minimized.

Copy link
Author

commented Dec 2, 2016

I'm sorry, I just saw this. How to patch it ?

@aclements

This comment has been minimized.

Copy link
Member

commented Dec 2, 2016

@botvs, are you set up to compile your own Go toolchain? If not, that's the first step to patching. If so, I can upload a change to the code review tool and you can patch that in pretty easily. I'll have to figure out exactly what debug info I want, though, since none of my theories so far have actually panned out.

@aclements

This comment has been minimized.

Copy link
Member

commented Dec 13, 2016

@botvs, ping.

@aclements aclements modified the milestones: Go1.8Maybe, Go1.8 Dec 13, 2016

@nathanjackson

This comment has been minimized.

Copy link

commented Dec 14, 2016

Hello, I am facing the exact same issue in Go 1.6.3 installed from the Fedora EPEL on CentOS 7.3.1611 64-bit. Any chance your patch might fix it? Its really not clear where the issue is coming from on my end.

@aclements

This comment has been minimized.

Copy link
Member

commented Dec 14, 2016

@nathanjackson, can you post more details, such as the error and stack trace?

I don't have a patch to fix this yet because I don't know what the problem is and haven't been able to reproduce this locally. If I made a patch against Go 1.6.3 to add more debugging information, would you be able to apply it and reproduce the issue?

@nathanjackson

This comment has been minimized.

Copy link

commented Dec 14, 2016

@aclements I should be able to do that. I'll see what I can gather and post.

If it helps at all, setting GODEBUG to gcstoptheworld=2 prevents the crash.

@nathanjackson

This comment has been minimized.

Copy link

commented Dec 14, 2016

@aclements Is there any way I can e-mail you my stack trace? I don't think I can post it publicly.

@aclements

This comment has been minimized.

Copy link
Member

commented Dec 14, 2016

@nathanjackson, you can email it to austin <at> google.com. Or feel free to redact it. I really only need to see the frames in package runtime.

@nathanjackson

This comment has been minimized.

Copy link

commented Dec 14, 2016

Alright, so I have to get approval in order to send the stack trace unfortunately. However, I can say it looks nearly identical to the above stack trace from OP's screenshot.

@aclements

This comment has been minimized.

Copy link
Member

commented Dec 20, 2016

@botvs or @nathanjackson, if either of you could try to reproduce this with https://go-review.googlesource.com/34612 applied to your Go tree, it could really help me figure this out. It's based on current master, but I'm pretty sure you can cherry-pick it cleanly to any 1.7 checkout using the following command: git fetch https://go.googlesource.com/go refs/changes/12/34612/1 && git cherry-pick FETCH_HEAD

@gopherbot

This comment has been minimized.

Copy link

commented Dec 20, 2016

CL https://golang.org/cl/34612 mentions this issue.

@nathanjackson

This comment has been minimized.

Copy link

commented Dec 20, 2016

@aclements You should have an e-mail.

I will see what I can do with your patch. Something interesting that I found on my end, if I use 1.7.3 or better, the error does go away, but one of three things happens, randomly.

  1. The program executes completes without error.
  2. Panic with a leftover defer error.
  3. Deadlock, the program seems to just stop executing.

All of these issues go away with GODEBUG=gcstoptheworld=1.

@aclements

This comment has been minimized.

Copy link
Member

commented Jan 9, 2017

@nathanjackson, thanks for the traceback. Does your application use unsafe, cgo, or fail in -race mode? I ask because the original report shows a reasonable argument to reimburseSweepCredit (even though it then crashes because the global state is wrong), but your traceback shows a huge argument to reimbuseSweepCredit, which suggests a different root cause in which the span state (or possibly the stack) got corrupted. Incorrect use of unsafe, cgo, or having races could also cause the other failures you're seeing.

@nathanjackson

This comment has been minimized.

Copy link

commented Jan 10, 2017

@aclements Sorry for not getting back with you sooner. I looked into this issue and it turns out that one of the libraries I was using was incorrectly using cgo\unsafe. Writing my own interface around the C library corrected the issue in 1.6.3 and 1.7.4. Thank you for your help.

@aclements

This comment has been minimized.

Copy link
Member

commented Jan 10, 2017

@nathanjackson, no worries. Sorry for taking so long to look at your traceback. Thanks for the update!

@aclements

This comment has been minimized.

Copy link
Member

commented Jan 10, 2017

@botvs, were you able to try the patch that I posted? Also, does your application use unsafe, cgo, or fail in -race mode (I should have asked this at the beginning).

gopherbot pushed a commit that referenced this issue Jan 10, 2017

runtime: debug prints for spanBytesAlloc underflow
Updates #18043.

Change-Id: I24e687fdd5521c48b672987f15f0d5de9f308884
Reviewed-on: https://go-review.googlesource.com/34612
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@aclements

This comment has been minimized.

Copy link
Member

commented Jan 17, 2017

Moving to Go 1.9. Hopefully we'll get some more diagnostics from Go 1.8.

@aclements aclements modified the milestones: Go1.9, Go1.8Maybe Jan 17, 2017

@gopherbot

This comment has been minimized.

Copy link

commented Apr 6, 2017

CL https://golang.org/cl/39832 mentions this issue.

@gopherbot gopherbot closed this in a5eb3dc Apr 21, 2017

@golang golang locked and limited conversation to collaborators Apr 21, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.