New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/link: unexpected fault address (when low on disk space?) #37310
Comments
Yeah, this is writing to mmap'd memory backed by the output file. SIGBUS may occur if write to mmap'd memory fails (maybe cannot flush to disk?). It would be good if we can fail nicely. But I don't know how. Maybe write (using file IO) to the last byte of the output file before mmap? If that write fails we can exit with a nice error. Does it work? cc @aclements |
I haven't looked at the code, but if we're going to |
Change https://golang.org/cl/228385 mentions this issue: |
What is the story for platforms that don't support fallocate? Should there be pre-zeroing or something else? |
We definitely considered pre-zeroing, but it adds 50–100% to the IO cost, and considering we've had one report of this in 10 years, it doesn't seem worth significantly slowing down every link for such a rare case. For reference, here are the benchmark numbers for allocating a 100 MiB file on three different machines using different techniques (source by @jeremyfaller and myself):
|
Actually, this is somewhat unfair since the linker didn't use mmap until a couple years ago. I don't believe we've had reports from other uses of mmap, but I don't know how commonly used it is.
Also EXCEPTION_IN_PAGE_ERROR on Windows. |
If we don't fail nicely when out of disk I fear we'll get lots of bogus issues filed. Perhaps cmd/go can check disk space if (and only if) cmd/link returns a failed exit status. |
When the linker uses
|
The fallocate calls will lower the chances of SIGBUS in the linker, but it might still happen on other unsupported platforms and filesystems. Darwin cmd/compile stats: Munmap 16.0ms ± 8% 0.8ms ± 3% -95.19% (p=0.000 n=8+10) TotalTime 484ms ± 2% 462ms ± 2% -4.52% (p=0.000 n=10+9) Updates #37310 Change-Id: I41c6e490adec26fa1ebee49a5b268828f5ba05e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/228385 Run-TryBot: Jeremy Faller <jeremy@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
At least in glibc, On Windows, as far as I can tell, there's no equivalent of |
The fallocate calls will lower the chances of SIGBUS in the linker, but it might still happen on other unsupported platforms and filesystems. Darwin cmd/compile stats: Munmap 16.0ms ± 8% 0.8ms ± 3% -95.19% (p=0.000 n=8+10) TotalTime 484ms ± 2% 462ms ± 2% -4.52% (p=0.000 n=10+9) Updates golang#37310 Change-Id: I41c6e490adec26fa1ebee49a5b268828f5ba05e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/228385 Run-TryBot: Jeremy Faller <jeremy@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
@jeremyfaller @aclements is this fixed by CL https://go-review.googlesource.com/c/go/+/228385 , or there is still things to be done for this? I guess we don't close this as currently we only pre-allocate on some platforms, not all? |
As @rsc pointed at #41466 (comment) , on OSes that we don't preallocate, we could use SetPanicOnFault to make it fail gracefully, if we do #41155 . |
Sorry, missed this discussion. I believe we are fixed as well as we can be without prezeroing. The OSs that don't support fallocate -- we currently do a best effort, but could still SIGBUS. I like the solution to |
Just to be clear, we didn't use SetPanicOnFault because it doesn't catch SIGBUS, only SIGSEGV. But now there is ongoing discussion on #41155 . If we do that, I think we want to revisit SetPanicOnFault. |
I don't see any dissents on #41155 and it's marked NeedsFix already, so I won't comment there, but replying to the "if we do #41155" above: I added SetPanicOnFault for exactly this specific use case, during an earlier, aborted rewrite of the linker. If an mmap fault is producing SIGBUS then that signal should be included, no question. |
I doubt it is related to disk space, I encounter this issue from rather non-trivial environment. We need to build RPMs, and quite often we build Go code from rpm spec files. Originally, I thought the issue was related to AFS file system we use, then I adjusted spec to use /tmp area, and regardless of my attempts I always get this issue when I run rpm build, but I have no issue when I compile code from a local file system. The issue appears either at
The code is very small and does not have any external dependencies. We use go 1.16, and yet I'm seeing the same issue as discussed in this ticket and all possible explanation in this and other tickets seems do not apply here. I build without CGO, I used local cache, I used /tmp area, and yet I still see build fails when it is invoked from RPM build environment, but it is just fine and I do not use RPM build. |
@vkuznet the original issue is specifically about the faults when disk space is low in the linker. Your issue is different, and it is the |
@cherrymui , as I wrote I tried different approaches, and in fact I can reproduce issue with |
From your stack trace it is the |
Change https://go.dev/cl/445835 mentions this issue: |
I just got some mysterious linker errors. I suspect they're because this machine only has 20MB of disk free (which I just noticed).
And another, which looks like the same:
The text was updated successfully, but these errors were encountered: