Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: running Go code on OpenBSD gomote fails when not running as root #35568

Open
ianlancetaylor opened this issue Nov 13, 2019 · 10 comments

Comments

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 13, 2019

I don't know what is going on here, but recording since something is wrong.

When I use gomote run with the openbsd-amd64-62 gomote, everything works as expected. When I use gomote ssh to ssh into the gomote, the go tool consistently fails with the following stack trace.

The only obvious difference is that gomote run runs as root but gomote ssh does not.

CC @mknyszek @aclements @bradfitz

fatal error: failed to reserve page bitmap memory

runtime stack:
runtime.throw(0xa4504f, 0x24)
        /tmp/workdir/go/src/runtime/panic.go:1106 +0x72 fp=0x7f7ffffd5018 sp=0x7f7ffffd4fe8 pc=0x4331a2
runtime.(*pageAlloc).init(0xeb7b88, 0xeb7b80, 0xecff58)
        /tmp/workdir/go/src/runtime/mpagealloc.go:239 +0x162 fp=0x7f7ffffd5060 sp=0x7f7ffffd5018 pc=0x428c12
runtime.(*mheap).init(0xeb7b80)
        /tmp/workdir/go/src/runtime/mheap.go:694 +0x274 fp=0x7f7ffffd5088 sp=0x7f7ffffd5060 pc=0x425de4
runtime.mallocinit()
        /tmp/workdir/go/src/runtime/malloc.go:471 +0xff fp=0x7f7ffffd50b8 sp=0x7f7ffffd5088 pc=0x40c5af
runtime.schedinit()
        /tmp/workdir/go/src/runtime/proc.go:545 +0x60 fp=0x7f7ffffd5110 sp=0x7f7ffffd50b8 pc=0x436700
runtime.rt0_go(0x7f7ffffd5148, 0x1, 0x7f7ffffd5148, 0x0, 0x0, 0x1, 0x7f7ffffd5238, 0x0, 0x7f7ffffd523b, 0x7f7ffffd5254, ...)
        /tmp/workdir/go/src/runtime/asm_amd64.s:214 +0x125 fp=0x7f7ffffd5118 sp=0x7f7ffffd5110 pc=0x45eff5
@mknyszek

This comment has been minimized.

Copy link
Contributor

@mknyszek mknyszek commented Nov 13, 2019

If I were to guess, there's an RLIMIT_AS set for non-root users or something. I'll look into this now.

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Nov 13, 2019

People who like this bug also like #10719.

/cc @bcmills

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

@ianlancetaylor ianlancetaylor commented Nov 13, 2019

For root bash's ulimit -v reports 33562624. For non-root, it reports 1576960.

@mknyszek

This comment has been minimized.

Copy link
Contributor

@mknyszek mknyszek commented Nov 13, 2019

@ianlancetaylor I investigated this with someone who knows OpenBSD a bit.

The number that's checked in the kernel from a PROT_NONE anonymous mapping is RLIMIT_DATA, which is limited for non-root users in login.conf. We can fix this on our builders by making datasize-cur and datasize-max unlimited, but if OpenBSD has a low default then that's a problem since Go no longer works out of the box on a newly-installed OpenBSD image.

Perhaps there's a workaround here but I need to give it some thought.

It's a little bit weird to me that a PROT_NONE mapping counts toward this on any platform. Linux does this too, but its default for everyone for virtual address space is unlimited and not 768 MiB.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

@ianlancetaylor ianlancetaylor commented Nov 13, 2019

Just a note that the same problem still happens on OpenBSD 6.4.

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Nov 14, 2019

/cc @mdempsky as FYI just because he likes OpenBSD issues.

@mdempsky

This comment has been minimized.

Copy link
Member

@mdempsky mdempsky commented Nov 14, 2019

It's been a while since I've looked at OpenBSD's mmap implementation. I seem to recall it took the stance that PROT_NONE mappings still mapped data (and thus counts towards RLIMIT_DATA), just without read/write access.

I see that http://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/lang/go/Makefile?rev=1.75 uses ulimit -d $(ulimit -H -d) to raise the RLIMIT_DATA soft limit to the hard limit. But maybe that only works because ports builders usually have login class "staff" or "pbuild" (which set datasize-max=infinity), whereas gomote ssh is just giving a default login class (with datasize-max=768M)?

/cc @4a6f656c

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 15, 2019

Change https://golang.org/cl/207497 mentions this issue: runtime: convert page allocator bitmap to sparse array

gopherbot pushed a commit that referenced this issue Dec 3, 2019
Currently the page allocator bitmap is implemented as a single giant
memory mapping which is reserved at init time and committed as needed.
This causes problems on systems that don't handle large uncommitted
mappings well, or institute low virtual address space defaults as a
memory limiting mechanism.

This change modifies the implementation of the page allocator bitmap
away from a directly-mapped set of bytes to a sparse array in same vein
as mheap.arenas. This will hurt performance a little but the biggest
gains are from the lockless allocation possible with the page allocator,
so the impact of this extra layer of indirection should be minimal.

In fact, this is exactly what we see:
    https://perf.golang.org/search?q=upload:20191125.5

This reduces the amount of mapped (PROT_NONE) memory needed on systems
with 48-bit address spaces to ~600 MiB down from almost 9 GiB. The bulk
of this remaining memory is used by the summaries.

Go processes with 32-bit address spaces now always commit to 128 KiB of
memory for the bitmap. Previously it would only commit the pages in the
bitmap which represented the range of addresses (lowest address to
highest address, even if there are unused regions in that range) used by
the heap.

Updates #35568.
Updates #35451.

Change-Id: I0ff10380156568642b80c366001eefd0a4e6c762
Reviewed-on: https://go-review.googlesource.com/c/go/+/207497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
@4a6f656c

This comment has been minimized.

Copy link
Contributor

@4a6f656c 4a6f656c commented Dec 4, 2019

The login class will be the reason why the allocation worked when run as root, but not when run as a normal user. However, even when a user has datasize-max=infinity (and increases the data size soft limit), there is still an upper bound based on the amount of memory available to the machine - as such, on a laptop with 4GB of RAM, the ~8.5GB PROT_NONE allocation was still failing, even if run as a user with a login class of staff (or as root). It presumably only worked on the builders as root due to a significant amount of memory being available.

The same failure could also be triggered under Linux via ulimit -v (some code removed around 2017 had a comment noting that sysReserve could fail on 64-bit systems, either due to kernel enforced constraints or ulimit -v).

Obviously this was a pretty significant regression from Go 1.13 - I can confirm that with the sparse array change, I can once again build Go with a data size limit of 2GB (although there appears to have been another regression with the memory requirements for compiling cmd/compile/internal/ssa - 1.5GB is sufficient for Go 1.13.4) and run a Go binary with a data size limit of 768MB.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

@ianlancetaylor ianlancetaylor commented Dec 5, 2019

@mknyszek Is there anything else to do on this issue or do we think that it is fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.