Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime, internal/poll: darwin: ensure that no thread is consumed, nor a syscall.Read if FD isn't yet ready for I/O [1.12 backport] #34713

Closed
gopherbot opened this issue Oct 6, 2019 · 11 comments

Comments

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 6, 2019

@odeke-em requested issue #33953 to be considered for backport to the next 1.12 minor release.

Some great news, @minux's CL https://go-review.googlesource.com/c/go/+/197938 fixed this issue in that change from entersyscallblock() to entersyscall() so we'll need to backport Minux's CL to both Go1.12 and Go1.13 as per https://groups.google.com/d/msg/golang-dev/UYDTDWHJsC8/a1_4KiKnCwAJ.

@gopherbot please backport this issue to Go1.12 and Go1.13.

@toothrot

This comment has been minimized.

Copy link
Contributor

@toothrot toothrot commented Oct 9, 2019

This backport is not approved. Upgrading to 1.13 will be a valid workaround after #34712 is released.

@toothrot toothrot closed this Oct 9, 2019
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Oct 9, 2019

@toothrot

This comment has been minimized.

Copy link
Contributor

@toothrot toothrot commented Oct 9, 2019

@odeke-em First of all, thank you for your help in filing and investigating these issues. The rationale we try to use in these scenarios is captured in https://github.com/golang/go/wiki/MinorReleases:

A “serious” problem is one that prevents a program from working at all. "Use a more recent stable version" is a valid workaround, so very few fixes will be backported to both previous issues.

In this case, upgrading to Go 1.13 will be a valid workaround. Is there a reason that is not the case for your project?

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Oct 9, 2019

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Oct 9, 2019

The backports policy is rather odd; I've proposed a change in #34622

@toothrot fwiw the patch doesn't even apply cleanly to 1.12.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Oct 9, 2019

This seems like the sort of issue that would show up pretty readily during testing and release qualification: an affected binary should demonstrate a much larger footprint during load-testing.

Moreover, IIUC this bug has been present in 1.12 from the time it was released (late February), but nobody seems to have noticed it until I spotted a test flake in May, and from there we only had one known-affected user up until very recently.

That suggests one of three possibilities:

  1. Very few users were affected.
  2. Affected users have already applied workarounds for 1.12.
  3. Affected users decided to skip 1.12 entirely.

Case (1) would not meet the “serious” qualification. Case (2) would not meet the “no workaround” qualification. For case (3), the backport is approved for 1.13 and users would be able to upgrade to 1.13.2 when it is released.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Oct 9, 2019

This seems like the sort of issue that would show up pretty readily during testing and release qualification: an affected binary should demonstrate a much larger footprint during load-testing.

Not really, this issue is the kind that requires careful examination and has been missed between releases, until careful observation like the issue reported at https://groups.google.com/d/msg/golang-dev/UYDTDWHJsC8/x7huS4ctBwAJ which resulted in the investigation as to slow syscalls and subsequently CL https://go-review.googlesource.com/c/go/+/197938/
Given that we maintain the past 3 Go releases and only released Go1.13 a month+ ago, we have Go1.11, Go1.12 and Go1.13. If this hadn't been noticed and performance with I/O would have just continued like that, we'd consider that the norm 5 releases from now and that'd be the valid case for not backporting, but Go1.12 that was only 1+ releases ago, reported even before Go1.13 was released.

That suggests one of three possibilities:

  1. Very few users were affected.
  2. Affected users have already applied workarounds for 1.12.
  3. Affected users decided to skip 1.12 entirely.

To the 3 postulations, I disagree and I answer:
a) The test that we have in code for os.TestPipeThreads uses 50 threads, so a very high watermark but really should have a lesser value which as I documented in #33953 (comment) should have used between 8 to 10 threads on an 8 core machine, thus this issue went for long without being caught, until Michael T Jones run it on his 256 core machine -- an extraordinary case that then showed the problem with what was running on the common case for everyone.
b) to 1, 2, 3, again, the netpoller's interaction with threads is not easy to directly limit in normal web applications unless perhaps if there are sandboxed limits but even then asking users to count the number of threads that'll be used gets too technical, most users will just running out of resources errors in the extreme case and just upgrade their machines, but that shouldn't discount the bug's nature because of a lack of knowing
c) We maintain only the past 3 Go releases so Go1.11, Go1.12 and Go1.13, having Go1.11 and Go1.13 patched but Go1.12 unpatched for an important thing like I/O seems like an inconsistent state of affairs to me. With the logic previously used to disqualify this from the Go1.12 backport, I'd apply the same to say, then let's not backport it to Go1.13.

Thank you.

@FiloSottile

This comment has been minimized.

Copy link
Member

@FiloSottile FiloSottile commented Oct 9, 2019

at that point I’d encourage everyone running a server to skip using the version that’s not backported to.

We do encourage everyone who is updating to update to Go 1.13. The point of maintaining Go 1.12 with security and OS compatibility updates is not to force someone who is not in the process of updating to have to suddenly prioritize it.

we maintain the past 3 Go releases and only released Go1.13 a month+ ago, we have Go1.11, Go1.12 and Go1.13

We never maintained 3 releases AFAIK. It's always been 2. Go 1.11 is now unmantained.

@FiloSottile

This comment has been minimized.

Copy link
Member

@FiloSottile FiloSottile commented Oct 9, 2019

Anyway, this specific issue was a clear application of the current policy, which is to only backport to Go 1.12 things that would force a user which is already successfully running Go 1.12 to suddenly have to move to Go 1.13 (like a security issue, or incompatibility with a new OS version).

If you would like that policy to change, the discussion belongs in #34622.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Oct 9, 2019

Thank you @toothrot @bcmills @networkimprov and @FiloSottile for the discussion and for the correction! Got it, the qualification process and release risks associated with backporting would render this issue more of a performance than a security issue but also less release/assurance engineering for new backports, so I am now on board with the decision not to backport to Go1.12.

Thank you again for the discourse and for the spending time on this.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Nov 22, 2019

As a heads up, over in #34536 (comment) and the next comment, I suggest that we backport this change after all, unless it is completely infeasible or inappropriate. "Upgrading to 1.13 will be a valid workaround" is inconsistent with the current release policy. Discussion should probably be on #34536 though, not here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.