Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: nodeipam: fix hot loop on non-existent node #475

Conversation

sjenning
Copy link

@sjenning sjenning commented Dec 2, 2020

opening for testing

@openshift-ci-robot
Copy link

@sjenning: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

WIP: nodeipam: fix hot loop on non-existent node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 2, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2020
@openshift-merge-robot
Copy link

@sjenning: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/verify-commits af19f7a link /test verify-commits
ci/prow/verify af19f7a link /test verify
ci/prow/e2e-aws-ovn af19f7a link /test e2e-aws-ovn
ci/prow/e2e-aws af19f7a link /test e2e-aws
ci/prow/e2e-aws-csi af19f7a link /test e2e-aws-csi
ci/prow/e2e-agnostic-cmd af19f7a link /test e2e-agnostic-cmd
ci/prow/e2e-gcp af19f7a link /test e2e-gcp
ci/prow/e2e-agnostic-upgrade af19f7a link /test e2e-agnostic-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sjenning
Copy link
Author

sjenning commented Dec 3, 2020

it's working

W1203 03:40:04.467203       1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="node1" does not exist
E1203 03:40:04.560644       1 controller_utils.go:201] unable to taint [&Taint{Key:node.kubernetes.io/unschedulable,Value:,Effect:NoSchedule,TimeAdded:2020-12-03 03:40:04.463809578 +0000 UTC m=+6183.288125616,}] unresponsive Node "node1": nodes "node1" not found
E1203 03:40:04.561549       1 node_lifecycle_controller.go:601] Failed to taint NoSchedule on node <node1>, requeue it: failed to swap taints of node &Node{ObjectMeta:{node1   /api/v1/nodes/node1 3c00adda-7e86-48a2-89c6-fe0e7f791f8e 6916770 0 2020-12-03 03:40:04 +0000 UTC <nil> <nil> map[] map[] [] []  [{openshift-tests Update v1 2020-12-03 03:40:04 +0000 UTC FieldsV1 {"f:spec":{"f:unschedulable":{}}}}]},Spec:NodeSpec{PodCIDR:,DoNotUseExternalID:,ProviderID:,Unschedulable:true,Taints:[]Taint{Taint{Key:node.kubernetes.io/not-ready,Value:,Effect:NoSchedule,TimeAdded:<nil>,},},ConfigSource:nil,PodCIDRs:[],},Status:NodeStatus{Capacity:ResourceList{},Allocatable:ResourceList{},Phase:,Conditions:[]NodeCondition{},Addresses:[]NodeAddress{},DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:0,},},NodeInfo:NodeSystemInfo{MachineID:,SystemUUID:,BootID:,KernelVersion:,OSImage:,ContainerRuntimeVersion:,KubeletVersion:,KubeProxyVersion:,OperatingSystem:,Architecture:,},Images:[]ContainerImage{},VolumesInUse:[],VolumesAttached:[]AttachedVolume{},Config:nil,},}
E1203 03:40:04.649208       1 range_allocator.go:408] Failed to update node node1 PodCIDR to [10.128.7.0/24] after multiple attempts: failed to patch node CIDR: nodes "node1" not found
I1203 03:40:04.650332       1 controller_utils.go:185] Recording status change CIDRAssignmentFailed event message for node node1
E1203 03:40:04.650872       1 range_allocator.go:414] CIDR assignment for node node1 failed: failed to patch node CIDR: nodes "node1" not found. Releasing allocated CIDR
E1203 03:40:04.652022       1 range_allocator.go:197] Error updating CIDR for {[%!q(*net.IPNet=&{[10 128 7 0] [255 255 255 0]})] "node1"}: failed to patch node CIDR: nodes "node1" not found
E1203 03:40:04.652539       1 range_allocator.go:231] Cannot get retryParams for "node1" as entry does not exist
E1203 03:40:04.652995       1 range_allocator.go:206] Exceeded retry count for {[%!q(*net.IPNet=&{[10 128 7 0] [255 255 255 0]})] "node1"}, dropping from queue
I1203 03:40:04.654602       1 event.go:291] "Event occurred" object="node1" kind="Node" apiVersion="v1" type="Normal" reason="CIDRAssignmentFailed" message="Node node1 status is now: CIDRAssignmentFailed"

@sjenning sjenning closed this Dec 3, 2020
@sjenning
Copy link
Author

sjenning commented Dec 3, 2020

/reopen

@openshift-ci-robot
Copy link

@sjenning: Failed to re-open PR: state cannot be changed. The fix-range-allocator-hotloop-4.6 branch was force-pushed or recreated.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants