Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network: catch SIGPIPEs via SO_NOSIGPIPE on Darwin #12039

Merged
merged 12 commits into from Jul 17, 2020

Conversation

rebello95
Copy link
Contributor

@rebello95 rebello95 commented Jul 10, 2020

With Envoy Mobile, we have started seeing some SIGPIPE fatal crashes in our iOS application. This is a bit strange considering Envoy attempts to suppress SIGPIPE errors. However, this is done at the global process-wide level, meaning something else could be re-setting it.

It's worth noting that:

  • We only started seeing this after updating Envoy Mobile's Envoy submodule with this diff. Binary searches unfortunately were not of much help since this is a pretty rare crash that we've only seen reported from production users' devices (no reproducible steps).
  • The above diff also includes some socket changes (and a few others), which could potentially be related to why we just now started seeing this issue (cc @florincoras in case you have ideas!).
  • The earliest build where we started seeing this issue was built with Xcode 11.5 rather than Xcode 11.3.1. Though unlikely, it is possible that this affected the effectiveness of Envoy's ignoring of SIGPIPE if something changed there.

This fix takes advantage of Apple's SO_NOSIGPIPE setting on a per-socket level to catch the SIGPIPEs that we're now seeing. It's being set on the upstream and listener socket creation codepaths in line with Envoy's existing setup to catch all SIGPIPEs.

SO_NOSIGPIPE is an option that prevents SIGPIPE from being raised when a write fails on a socket to which there is no reader; instead, the write to the socket returns with the error EPIPE when there is no reader.

Risk Level: Low, Envoy was already attempting to suppress SIGPIPEs
Testing: Existing tests
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Michael Rebello me@michaelrebello.com
Co-authored-by: Jose Nino jnino@lyft.com

@rebello95 rebello95 changed the title fix sigpipe issue on darwin network: catch SIGPIPEs via SO_NOSIGPIPE on Darwin Jul 10, 2020
@rebello95
Copy link
Contributor Author

cc @junr03 @goaway @florincoras

@florincoras
Copy link
Member

@rebello95 interesting. As far as I know, we are not trying to re-set SIGPIPE (and a quick code search seems to confirm that). So, I guess that SIGPIPE suppression never worked but something else changed, maybe event scheduling/handling, and this is now surfacing.

@rebello95
Copy link
Contributor Author

Definitely possible. Unfortunately I haven't been able to confirm the exact root cause since it hasn't been reproducible.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
@junr03
Copy link
Member

junr03 commented Jul 13, 2020

@rebello95 I think the master merge is missing DCO

@rebello95
Copy link
Contributor Author

@junr03 yep, rebased and fixed DCO just now

@junr03
Copy link
Member

junr03 commented Jul 13, 2020

@ggreenway I was wondering if you could take a look at this. I feel this is up your alley.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
@rebello95
Copy link
Contributor Author

I think CI failures are unrelated - updated with master 🤷‍♂️

@mattklein123 mattklein123 self-assigned this Jul 14, 2020
Signed-off-by: Michael Rebello <me@michaelrebello.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few small comments, thanks.

/wait

source/common/network/socket_option_factory.h Outdated Show resolved Hide resolved
Comment on lines 105 to 106
Network::Socket::appendOptions(cluster_options,
Network::SocketOptionFactory::buildSocketSigpipeOptions());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have this here and on the listener side, do we still need the process-wide signal() call that attempts to ignore SIGPIPE? I suppose that would be a riskier change, but can you add a TODO above that call to consider removing it and maybe also add some comments here and on the listener side of why we set this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll always need the process-wide signal() call since this additional socket option only applies on certain OSes, as gated by #ifdef SO_NOSIGPIPE in this PR, so it probably doesn't make sense to have a todo. I updated with additional documentation as requested.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM please check format.

/wait

Signed-off-by: Michael Rebello <me@michaelrebello.com>
@rebello95
Copy link
Contributor Author

@mattklein123 fixed

@rebello95
Copy link
Contributor Author

And thank you for the reviews!

Signed-off-by: Michael Rebello <me@michaelrebello.com>
@mattklein123
Copy link
Member

CI failures look legit.

/wait

Signed-off-by: Michael Rebello <me@michaelrebello.com>
Signed-off-by: Michael Rebello <me@michaelrebello.com>
Signed-off-by: Michael Rebello <me@michaelrebello.com>
Signed-off-by: Michael Rebello <me@michaelrebello.com>
@rebello95
Copy link
Contributor Author

Fixed most of the tests, but still working with @junr03 to resolve the rest

Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Michael Rebello <me@michaelrebello.com>
@rebello95
Copy link
Contributor Author

Should be good now!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 7d8b9ac into envoyproxy:master Jul 17, 2020
@rebello95 rebello95 deleted the sigpipe-fix branch July 17, 2020 16:32
rebello95 added a commit to envoyproxy/envoy-mobile that referenced this pull request Jul 17, 2020
Bumping upstream to include iOS fixes for SIGPIPE error handling: envoyproxy/envoy#12039

Signed-off-by: Michael Rebello <me@michaelrebello.com>
rebello95 added a commit to envoyproxy/envoy-mobile that referenced this pull request Jul 17, 2020
Bumping upstream to include iOS fixes for SIGPIPE error handling: envoyproxy/envoy#12039

Signed-off-by: Michael Rebello <me@michaelrebello.com>
KBaichoo pushed a commit to KBaichoo/envoy that referenced this pull request Jul 30, 2020
Signed-off-by: Michael Rebello <me@michaelrebello.com>
Co-authored-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
scheler pushed a commit to scheler/envoy that referenced this pull request Aug 4, 2020
Signed-off-by: Michael Rebello <me@michaelrebello.com>
Co-authored-by: Jose Nino <jnino@lyft.com>
Signed-off-by: scheler <santosh.cheler@appdynamics.com>
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Bumping upstream to include iOS fixes for SIGPIPE error handling: #12039

Signed-off-by: Michael Rebello <me@michaelrebello.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Bumping upstream to include iOS fixes for SIGPIPE error handling: #12039

Signed-off-by: Michael Rebello <me@michaelrebello.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants