Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1538389 - Allow node IP change to update Host IP in HostSubnet resource #18281

Merged

Conversation

pravisankar
Copy link

  • Node status addresses may have both old and new IP address and by
    validating against these addrs we may not be able to update HostSubnet
    with new node IP. Openshift node service waits for its HostSubnet resource
    with new Host IP and eventually fails.

  • This change fixes the above issue by reverting the change for node flip/flop.
    Recommended/correct way to handle node flip/flop case is to specify nodeIP config option in node-config.yaml

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 25, 2018
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 25, 2018
@pravisankar pravisankar added kind/bug Categorizes issue or PR as related to a bug. component/networking sig/networking labels Jan 25, 2018
@pravisankar
Copy link
Author

@openshift/sig-networking PTAL

@imcsk8
Copy link
Contributor

imcsk8 commented Jan 25, 2018

LGTM

@pravisankar
Copy link
Author

/retest


var nodeIP string
if len(node.Status.Addresses) > 0 && node.Status.Addresses[0].Address != "" {
nodeIP = node.Status.Addresses[0].Address
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check on NodeAddressType here now, to eg prefer NodeInternalAddress? I'm assuming that netutils.GetNodeIP() would most likely return the private/internal address for the node given it's a DNS lookup and a cloud provider would likely return the private/internal address rather than the public one.

Otherwise LGTM; better to be more explicit in the node config than rely on "magic" in the master to figure out this stuff.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that node address got from DNS lookup can differ from what cloud provider returns. Currently, kubelet is populating first node address as the NodeInternalIP but I agree that we should not depend on the kubelet internals.
Yes, I will make it explicit here.

@knobunc
Copy link
Contributor

knobunc commented Jan 29, 2018

@rajatchopra PTAL

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach seems reasonable to me too. Thanks.

Ravi Sankar Penta added 2 commits January 29, 2018 11:56
- If IP addr is not populated in node object, master should not try
to resolve the node address because it may not match exact IP
addr used by node.
- We can safely ignore the event in the case and when the IP is populated
in then node status object, we will get another event and that is the
right time to create/update hostsubnet.
- Node status addresses may have both old and new IP address and by
validating against these addrs we may not be able to update HostSubnet
with new node IP. Openshift node service waits for its HostSubnet resource
with new Host IP and eventually fails.

- This change fixes the above issue by reverting the change for node flip/flop.
Recommended/correct way to handle node flip/flop case is to specify nodeIP config
option in node-config.yaml
@pravisankar
Copy link
Author

Updated, address with NodeInternalIP type is used instead of node.Status.Addresses[0].Address to fetch the node IP.
@dcbw PTAL

}
}
if len(nodeIP) == 0 {
glog.Errorf("Node IP is not set for node %s, skipping %s event, node: %v", node.Name, eventType, node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file needs to use utilruntime.HandleError. Any kube code that eats an error (that uses glog.Errorf) should be utilruntime.HandleError instead. That ensures that we get rate limiting, that errors are reported to sentry, and also that hotloop requests get processed.

Can be a follow up PR but please correct that before we ship 3.9

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created follow up pr: #18343

Copy link
Contributor

@rajatchopra rajatchopra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to go
/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 30, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pravisankar, rajatchopra

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@pravisankar
Copy link
Author

/retest

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit 48a0122 into openshift:master Jan 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/networking kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. sig/networking size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants