Skip to content

original_src: Fix ephemeral port exhaustion by setting IP_BIND_ADDRESS_NO_PORT#38288

Merged
mattklein123 merged 3 commits into
envoyproxy:mainfrom
jronak:fix/original-src
Feb 10, 2025
Merged

original_src: Fix ephemeral port exhaustion by setting IP_BIND_ADDRESS_NO_PORT#38288
mattklein123 merged 3 commits into
envoyproxy:mainfrom
jronak:fix/original-src

Conversation

@jronak
Copy link
Copy Markdown
Contributor

@jronak jronak commented Feb 2, 2025

Commit Message: The original_src filter binds the upstream socket to the source IP address by invoking the bind syscall. This works correctly, but we observed ephemeral port exhaustion in production when the original_src was enabled.

When a socket binds to a non-zero IP with a zero port, the kernel assigns an ephemeral port immediately. This port remains unavailable for reuse because the kernel does not know if the socket will eventually connect or listen.

To address this, the kernel provides the IP_BIND_ADDRESS_NO_PORT socket option, which disables immediate ephemeral port reservation for sockets intended for connection. Using this option helps prevent ephemeral port exhaustion.

Additional Description: N/A
Risk Level: Low
Testing: Unit test
Docs Changes: N/A
Release Notes: N/A

@jronak jronak requested a review from mattklein123 as a code owner February 2, 2025 01:58
@repokitteh-read-only
Copy link
Copy Markdown

Hi @jronak, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #38288 was opened by jronak.

see: more, trace.

@adisuissa
Copy link
Copy Markdown
Contributor

cc @klarose @mattklein123 as codeowners.
Assigning Matt as code-owner reviewer.
/assign @mattklein123

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by comment:
IIUC this PR modifies the current behavior. If so, the feature should either be configured by a config-knob (i.e., adding this to the filter's API), or if the feature is more of a bugfix then it should be runtime guarded.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions:

  1. What kernel versions is this option available on? Does it work on all kernels we would expect Envoy to run on with other options?
  2. Presumably it's still possible to fail later when a port is actually bound? Do we correctly handle that behavior?
  3. Agree with @adisuissa that this should probably be at least runtime guarded if not feature driven.

/wait

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #38288 was synchronize by jronak.

see: more, trace.

The original_src filter binds the upstream socket to the original source IP
address by invoking the bind syscall. This works correctly in principle, but
in production, we observed ephemeral port exhaustion when the original_src
filter was enabled.

When a socket is bound to a non-zero IP with a zero port, the kernel assigns
an ephemeral port immediately. This port remains unavailable for reuse because
the kernel does not know if the socket will eventually connect or listen.

To address this, the kernel provides the IP_BIND_ADDRESS_NO_PORT socket option,
which disables immediate ephemeral port reservation for sockets intended for
connect. Using this option helps prevent ephemeral port exhaustion.

Signed-off-by: Ronak Jain <ronakjainc@gmail.com>
Signed-off-by: Ronak Jain <ronakjainc@gmail.com>
Signed-off-by: Ronak Jain <ronakjainc@gmail.com>
@jronak
Copy link
Copy Markdown
Contributor Author

jronak commented Feb 9, 2025

  1. What kernel versions is this option available on? Does it work on all kernels we would expect Envoy to run on with other options?

This socket option has been available since Linux kernel 4.2, i.e. it is consistently available across both SLTS (4.4) and LTS (5.4+) kernel versions.

  1. Presumably it's still possible to fail later when a port is actually bound? Do we correctly handle that behavior?

So an ephemeral port is no longer allocated when bind is called on these sockets. Instead, similar to regular sockets, the ephemeral port is allocated by the kernel during the connect syscall. This means no special handling is required on our end, as the kernel consistently manages port allocation. Additionally, our existing error handling is sufficient to address ephemeral port exhaustion errors.

  1. Agree with @adisuissa that this should probably be at least runtime guarded if not feature driven.

I agree this needs to be behind runtime guard. I have updated the PR to use reloadable runtime feature.

Thanks for the review @adisuissa @mattklein123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants