Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1927942: pkg/etcdenvvar: enable SO_REUSEADDR #553

Merged
merged 1 commit into from
Apr 1, 2021

Conversation

hexfusion
Copy link
Contributor

@hexfusion hexfusion commented Mar 10, 2021

This PR adds support for setting socket options SO_REUSEADDR[1] to etcd listeners via ListenConfig. What we have found is during etcd process restart there can be a considerable time waiting for the port to release by TIME_WAIT which for OpenShift is 60 seconds..

$ ps aux | grep etcd | wc -l
0

$  ss --numeric -o state time-wait 
tcp    0       0                 10.0.138.12:34642            10.0.138.12:2380   timer:(timewait,50sec,0)                                                       
tcp    0       0                 10.0.138.12:34470            10.0.138.12:2380   timer:(timewait,35sec,0)                                                       
tcp    0       0                 10.0.138.12:34086            10.0.138.12:2380   timer:(timewait,10sec,0)                                                       
tcp    0       0                 10.0.138.12:34232            10.0.138.12:2380   timer:(timewait,20sec,0)                                                       
tcp    0       0                 10.0.138.12:34694            10.0.138.12:2380   timer:(timewait,55sec,0)                                                       
tcp    0       0                 10.0.138.12:34332            10.0.138.12:2380   timer:(timewait,25sec,0)                                                       
tcp    0       0                 10.0.138.12:33966            10.0.138.12:2380   timer:(timewait,845ms,0)                                                       
tcp    0       0                 10.0.138.12:34040            10.0.138.12:2380   timer:(timewait,5.845ms,0)                                                     
tcp    0       0                 10.0.138.12:34184            10.0.138.12:2380   timer:(timewait,15sec,0)                                                       
tcp    0       0                 10.0.138.12:34570            10.0.138.12:2380   timer:(timewait,45sec,0)                                                       
tcp    0       0                 10.0.138.12:34512            10.0.138.12:2380   timer:(timewait,40sec,0)                                                       
tcp    0       0                 10.0.138.12:34398            10.0.138.12:2380   timer:(timewait,30sec,0)  

So we can wait for many seconds even after etcd process is long dead for client and peer ports to become available.

2021-02-19 13:57:25.117257 C | etcdmain: listen tcp 10.0.138.12:2380: bind: address already in use

A similar approach has been taken with the addition of these options in the kube-apiserver[2][3]

[1] https://man7.org/linux/man-pages/man7/socket.7.html
[2] kubernetes/kubernetes#93861
[3] openshift/kubernetes#309

depends on openshift/etcd#70

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Mar 10, 2021
@openshift-ci-robot
Copy link

@hexfusion: This pull request references Bugzilla bug 1927942, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @geliu2016

In response to this:

Bug 1927942: pkg/etcdenvvar: enable SO_REUSEADDR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2021
@hexfusion
Copy link
Contributor Author

/hold

depends on openshift/etcd#70

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 10, 2021
@openshift-ci-robot
Copy link

@hexfusion: This pull request references Bugzilla bug 1927942, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @geliu2016

In response to this:

Bug 1927942: pkg/etcdenvvar: enable SO_REUSEADDR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

2 similar comments
@openshift-ci-robot
Copy link

@hexfusion: This pull request references Bugzilla bug 1927942, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @geliu2016

In response to this:

Bug 1927942: pkg/etcdenvvar: enable SO_REUSEADDR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@hexfusion: This pull request references Bugzilla bug 1927942, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @geliu2016

In response to this:

Bug 1927942: pkg/etcdenvvar: enable SO_REUSEADDR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@retroflexer
Copy link
Contributor

Since this is just adding an environment variable, should not really be dependent on the changes to openshift/etcd.

Leaving the hold on for Sam to remove it when he is happy with the tests.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 11, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, retroflexer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hexfusion
Copy link
Contributor Author

/test all

@hexfusion
Copy link
Contributor Author

/test e2e-agnostic

@hexfusion
Copy link
Contributor Author

/retest

1 similar comment
@hexfusion
Copy link
Contributor Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 1, 2021

@hexfusion: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-disruptive b85eb57 link /test e2e-aws-disruptive
ci/prow/e2e-gcp-disruptive b85eb57 link /test e2e-gcp-disruptive

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hexfusion
Copy link
Contributor Author

/hold cancel

These needs soak and if we have issues this PR is the toggle.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2021
@openshift-merge-robot openshift-merge-robot merged commit 0f30e4e into openshift:master Apr 1, 2021
@openshift-ci-robot
Copy link

@hexfusion: All pull requests linked via external trackers have merged:

Bugzilla bug 1927942 has been moved to the MODIFIED state.

In response to this:

Bug 1927942: pkg/etcdenvvar: enable SO_REUSEADDR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants