-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: azure aks deploy sometimes fail to find the resources group #8989
Labels
area/ci
Issues affecting the continuous integration
bug
Incorrect behaviour
needs-review
Needs to be assessed by the team.
Projects
Comments
wainersm
added
bug
Incorrect behaviour
needs-review
Needs to be assessed by the team.
area/ci
Issues affecting the continuous integration
labels
Feb 1, 2024
wainersm
added a commit
to wainersm/kata-containers
that referenced
this issue
Feb 1, 2024
To provision k8s on azure (AKS) there should be created a temporary resources group before. The script sends the request to get it created but doesn't wait the operation to finish, so sometimes it tries to use a resources group what doesn't exist and then bail out. This added some `az group wait` check points. Even on deletion we want to ensure there won't be dangling resources groups. Fixes kata-containers#8989 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
sprt
added a commit
to sprt/kata-containers
that referenced
this issue
Feb 1, 2024
This addresses an internal AKS issue that intermittently prevents clusters from getting created. The fix has been rolled out to eastus but not yet eastus2, so we unblock the CI by switching. No downsides in general. This supersedes kata-containers#8990. Fixes: kata-containers#8989 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
wainersm
added a commit
to wainersm/kata-containers
that referenced
this issue
Feb 2, 2024
delete_cluster() has tried to delete the az resources group regardless if it exists. In some cases the result of that operation is ignored, i.e., fail to resource group not found, but the log messages get a little dirty. Let's delete the RG only if it exists then. Fixes kata-containers#8989 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This was referenced Feb 2, 2024
wainersm
added a commit
to wainersm/kata-containers
that referenced
this issue
Feb 2, 2024
delete_cluster() has tried to delete the az resources group regardless if it exists. In some cases the result of that operation is ignored, i.e., fail to resource group not found, but the log messages get a little dirty. Let's delete the RG only if it exists then. Fixes kata-containers#8989 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
c3d
pushed a commit
to c3d/kata-containers
that referenced
this issue
Feb 23, 2024
This addresses an internal AKS issue that intermittently prevents clusters from getting created. The fix has been rolled out to eastus but not yet eastus2, so we unblock the CI by switching. No downsides in general. This supersedes kata-containers#8990. Fixes: kata-containers#8989 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
c3d
pushed a commit
to c3d/kata-containers
that referenced
this issue
Feb 23, 2024
delete_cluster() has tried to delete the az resources group regardless if it exists. In some cases the result of that operation is ignored, i.e., fail to resource group not found, but the log messages get a little dirty. Let's delete the RG only if it exists then. Fixes kata-containers#8989 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/ci
Issues affecting the continuous integration
bug
Incorrect behaviour
needs-review
Needs to be assessed by the team.
Recently I've noticed some CI jobs that relies on AKS provisioning failing because the Azure resource group (which is created on-demand before creating the cluster) is not found:
For example, https://github.com/kata-containers/kata-containers/actions/runs/7724334694/job/21074189970?pr=8839 and https://github.com/kata-containers/kata-containers/actions/runs/7722583170/job/21076489139?pr=8974, you will see an error similar to:
On https://github.com/kata-containers/kata-containers/blob/main/tests/gha-run-k8s-common.sh#L68 it is requested the creation the aforementioned resource group but it doesn't wait the resource to effectively exist. That might be the problem.
The text was updated successfully, but these errors were encountered: