Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: checkfds breaks on AIX and Solaris #61584

Closed
rolandshoemaker opened this issue Jul 25, 2023 · 12 comments
Closed

runtime: checkfds breaks on AIX and Solaris #61584

rolandshoemaker opened this issue Jul 25, 2023 · 12 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.

Comments

@rolandshoemaker
Copy link
Member

http://go.dev/cl/509020 broke the aix-ppc64 and solaris-amd64-oraclerel builders.

For AIX it appears that the secureFDs was previously not working as intended, as something at another level (presumably in the kernel) was already reopening FDs when the binary was SUID, meaning our calls to fcntl(F_GETFD) always returned the expected value with no error. Doing this when not in SUID returns -1 with an error of 0, which breaks our assumption that EBADF will be returned (also violates the IBM documentation, possibly suggesting we are not properly getting the error code).

For Solaris, I've not been able to acquire a buildlet to test yet, but the Oracle docs suggest nothing particularly obvious. Possible there is a similar thing happening as AIX?

Build logs:

cc @ianlancetaylor @heschi

@rolandshoemaker rolandshoemaker added the NeedsFix The path to resolution is known, but the work has not been done. label Jul 25, 2023
@rolandshoemaker rolandshoemaker self-assigned this Jul 25, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 25, 2023
@bcmills
Copy link
Contributor

bcmills commented Jul 25, 2023

Based on the platforms affected, I suspect a bug in syscall/exec_libc.go.

@rolandshoemaker
Copy link
Member Author

Solaris appears to have the same behavior (returning -1, 0) as AIX.

@bcmills is probably right, although I don't immediately see anything obvious.

@ianlancetaylor
Copy link
Contributor

Argh, the AIX gomote does not support ssh.

@bcmills
Copy link
Contributor

bcmills commented Jul 25, 2023

Ah, here we go. https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html:

If file descriptor 0, 1, or 2 would otherwise be closed after a successful call to one of the exec family of functions, implementations may open an unspecified file for the file descriptor in the new process image.

aix and solaris probably do that.

@bcmills
Copy link
Contributor

bcmills commented Jul 25, 2023

(That does violate the POSIX requirement that “[t]he return value shall not be negative” for F_GETFD, but maybe this is an old bug that was overlooked because it happens so rarely.)

@ianlancetaylor
Copy link
Contributor

But fcntl is returning -1. That definitely suggests that it is failing. https://www.ibm.com/docs/en/i/7.4?topic=ssw_ibm_i_74/apis/fcntl.html . But for some reason we are seeing an errno value of 0.

@bcmills
Copy link
Contributor

bcmills commented Jul 25, 2023

Right. fcntl(…, F_GETFD, …) gets the flags associated with the file descriptor, but perhaps that fails because nothing explicitly associated any flags with it to begin with (it was opened implicitly by execve).

And then maybe it forgets to set errno on that path because the FD number is actually valid despite the lack of flags.

Perhaps we could confirm (or refute) that by trying some other operation on the descriptor?

@ianlancetaylor
Copy link
Contributor

Hmmm, calling read on the descriptor does return 0.

@ianlancetaylor
Copy link
Contributor

But calling open of /dev/null also returns 0. That is, it succeeds. I think we are indeed failing to return the correct errno value.

@ianlancetaylor
Copy link
Contributor

Oh, the problem is that checkfds is called before minit. The call to minit is required to fetch the correct errno value.

That is also the problem on Solaris.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/513215 mentions this issue: runtime2: don't check fcntl errno in checkfds on AIX and Solaris

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/513535 mentions this issue: runtime: call miniterrno on m0 on AIX and Solaris

gopherbot pushed a commit that referenced this issue Jul 27, 2023
AIX and Solaris call into libc for syscalls, and expect M.mOS.perrno
to point to the thread-local errno value for the current M.
We initialize that field in miniterrno called from mstart.
However, this means that any libc calls before mstart will not
return the correct errno value.

This caused trouble in checkfds, which runs very early, before mstart.
We worked around that in 513215. This CL reverts 513215 in favor
of a better workaround: call miniterrno for m0 earlier (we will
still wind up calling miniterrno again from mstart, which does
no harm).

This is a better workaround because it means that if we add future
syscalls before mstart, they will behave as expected.

Fixes #61584

Change-Id: Ib6a0d3c53d2c8214cc339a5019f9d4f71a746f0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/513535
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants