-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AAP failed/stuck job due to pod networking problem #187
Comments
Same issue when creating a new VM with the virt launcher pod, example in ns "stormshift-microshift" |
Looks like we have a generall problem with ucs56/57 nodes:
|
=> https://access.redhat.com/solutions/7042208 old KSC did not help... |
Let's try: https://hackmd.io/@mjace/H1fJuv5Ap?utm_source=preview-mode&utm_medium=rec
|
Solved
The pods are above from my hcp playground we can ingore for now. |
Same problem again today with ucs56 - trying the workaround.... |
...by deleting the control plane pods AND the ovnkube-node pods on ucs56 and ucs57 |
Feels like ovnk is in an inconsitent state. e.g this event in openshift-console when trying to restart the console: "4m42s Warning ErrorUpdatingResource pod/downloads-54777dd798-vxmhz addLogicalPort failed for openshift-console/downloads-54777dd798-vxmhz: timed out waiting for logical switch in logical switch cache "ucs57" subnet: error getting logical switch ucs57: switch not in logical switch cache" |
Trying to drain and reboot UCS56.... |
... that helped, cluster looks way better now. I needed also to disable/enable the CNV console plugin. |
still wondering what the root cause is/was - might need to regularly reboot nodes? Closing for now. |
I am trying to run job template "stormshift-update-template-vms" on ISAR AAP.
The fails, the automation-job pod in NS "ansible-automation-platform" is stuck in state "ContainerCreating".
Event log shows error messages:
addLogicalPort failed for ansible-automation-platform/automation-job-252-jcjdd: failed to assign pod addresses for pod default/ansible-automation-platform/automation-job-252-jcjdd on switch: ucs57, err: range is full
and
failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_automation-job-252-jcjdd_ansible-automation-platform_fbcce7b4-1feb-48d0-8067-21a2f69ab074_0(000b3c811b66882860ad874f24cbf77dafeca43b201ede10c99c6748000a1b5d): error adding pod ansible-automation-platform_automation-job-252-jcjdd to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:000b3c811b66882860ad874f24cbf77dafeca43b201ede10c99c6748000a1b5d Netns:/var/run/netns/b5ad18e1-7e53-4ac6-95ff-01737e7ae193 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=ansible-automation-platform;K8S_POD_NAME=automation-job-252-
@rbo , can you please advise?
The text was updated successfully, but these errors were encountered: