Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: dns-controller: 3999 port address already in use #9404

Merged
merged 1 commit into from
Jun 23, 2020

Conversation

vgunapati
Copy link
Contributor

@vgunapati vgunapati commented Jun 19, 2020

Fixes #4877

The main problem is using the default rolling upgrade strategy. If the dns-controller is created on the same node during the upgrade we are seeing this issue as this is hostNetwork pod

panic: listen tcp4 0.0.0.0:3998: bind: address already in use

example:

dns-controller-5f88fdd576-d8d6m     0/1     CrashLoopBackOff   11         35m   172.16.45.40        ip-172-16-45-40.us-west-2.compute.internal    <none>           <none>
dns-controller-68f76b5848-kdfhm     1/1     Running            0          51m   172.16.45.40        ip-172-16-45-40.us-west-2.compute.internal    <none>           <none>

This PR will use Recreate strategy to avoid this issue

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 19, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @vgunapati. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 19, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 19, 2020
@vgunapati
Copy link
Contributor Author

/assign @rdrgmnzs

@hakman
Copy link
Member

hakman commented Jun 19, 2020

/cc @johngmyers @olemarkus

@hakman
Copy link
Member

hakman commented Jun 19, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 19, 2020
@olemarkus
Copy link
Member

This probably makes sense. It will mean that dns-controller doesn't run between termination and recreation, but I think that is fine.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2020
@johngmyers
Copy link
Member

Yeah, normally I'd think pod anti-affinity, but dns-controller doesn't have strong availability requirements.
/lgtm

@vgunapati
Copy link
Contributor Author

vgunapati commented Jun 19, 2020

Yeah, normally I'd think pod anti-affinity, but dns-controller doesn't have strong availability requirements.
/lgtm

As we only have one dns-controller pod and did not see any leader election in dns-controller logs nor impact to cluster during Recreate of dns-controller pod I have used Recreate strategy
We can do something like following, for example

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: k8s-app
            operator: In
            values:
            - dns-controller
        topologyKey: kubernetes.io/hostname

Please let me know if you want me to change to use podAntiAffinity

@johngmyers
Copy link
Member

No, this is good as-is.

@olemarkus
Copy link
Member

This one could be a candidate for backport. I guess dns-controller will go into crashloop on every kops upgrade otherwise.

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 20, 2020
@rifelpet
Copy link
Member

I agree this should probably be backported

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 22, 2020
@rifelpet
Copy link
Member

Travis CI issues... @vgunapati are you able to retry these jobs? https://travis-ci.org/github/kubernetes/kops/builds/700018759 if not can you ammend your commit message and force push back to your branch? we just need to trigger another travis build somehow.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2020
@vgunapati
Copy link
Contributor Author

vgunapati commented Jun 23, 2020

@rifelpet I have amended my commit without changes to code. Looks like new travis build just triggered
https://travis-ci.org/github/kubernetes/kops/builds/701094721

@rifelpet
Copy link
Member

hopefully that does the trick, thanks!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet, vgunapati

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 9708057 into kubernetes:master Jun 23, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jun 23, 2020
k8s-ci-robot added a commit that referenced this pull request Jun 23, 2020
…04-upstream-release-1.18

Automated cherry pick of #9404: Fix: dns-controller: 3999 port address already in use
k8s-ci-robot added a commit that referenced this pull request Jun 24, 2020
…04-upstream-release-1.17

Automated cherry pick of #9404: Fix: dns-controller: 3999 port address already in use
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dns-controller: :3999: bind: address already in use
7 participants