Skip to content

Commit

Permalink
Kyverno stuck (#230)
Browse files Browse the repository at this point in the history
* Add ops recipe for Kyverno being stuck in upgrade pending

* Add ops recipe for Kyverno being stuck in upgrade pending
  • Loading branch information
pipo02mix committed Feb 26, 2024
1 parent 1f16782 commit c0d104a
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 7 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- New recipe for Kyverno stuck in upgrade pending.

### Changed

- Update docsy to v0.9.0
Expand Down
@@ -0,0 +1,14 @@
---
title: "Kyverno Stuck In Pending Upgrade"
owner:
- https://github.com/orgs/giantswarm/teams/team-shield
confidentiality: public
---

There have been cases where during cluster upgrades, for example from AWS v18 -> v19, the Kyverno migration logic takes longer than the default `app-operator` installation timeout. This can result in Kyverno getting stuck in Helm `pending-upgrade` and requiring manual intervention.

To force the resolution the best idea is to rollback to previous version, which will cause `app-operator` to re-reconcile the App and refresh the stuck Helm charts.

```
CLUSTER_ID=XXXXX; helm rollback -n "$CLUSTER_ID" "$CLUSTER_ID"-security-bundle $(helm ls -n $CLUSTER_ID -f "$CLUSTER_ID"-security-bundle -o yaml | yq '.[].revision') --force
```
Expand Up @@ -5,7 +5,7 @@ owner:
confidentiality: public
---

We are offering GitOps as interface for our customers, here we collect tips on how to troubleshoot problems which can occur.
We are offering GitOps as interface for our customers, here we collect tips on how to troubleshoot problems which can occur.

# Table of Contents
1. [Identify which kustomization owns a resource](#identify-which-kustomization-owns-a-resource)
Expand All @@ -23,14 +23,14 @@ We are offering GitOps as interface for our customers, here we collect tips on h
kustomize.toolkit.fluxcd.io/name: gorilla-clusters-rfjh2
kustomize.toolkit.fluxcd.io/namespace: default
```

From the kustomization one can tell the source Git repository by looking at the spec field `sourceRef`.

2. Use the flux command line. It offers a subcommand `trace` which describes all details related to GitOps:

```
» flux trace app/alfred-app -n alfred-ns
Object: App/alfred-app
Namespace: rfjh2
Status: Managed by Flux
Expand All @@ -43,7 +43,7 @@ We are offering GitOps as interface for our customers, here we collect tips on h
Namespace: default
...
```

__Note__: If the resource has no labels (or `flux trace` returns `object not managed by Flux`) the object is not produced as result of helm or kustomize but could still be owned by a higher resource. An example would be a *pod* which may not have the labels, but the parent *deployment* does.

## Download the Git Repository source
Expand All @@ -70,8 +70,8 @@ Remember to notify the customer of this change.

## Customer Communication

After stopping reconcilation, please notify the customer of the change via slack support channel where the customer will be able to review and make the necessary changes the following business day.
After stopping reconcilation, please notify the customer of the change via slack support channel where the customer will be able to review and make the necessary changes the following business day.

In the case of an issue that cannot be fixed by stopping reconcilation and manually doing, a silence may be required. In this case, please notify via slack support channel a) the situation that we are alerted for and that we cannot help due to customer ownership and no access b) we will silence the alert until the next buisness day.
In the case of an issue that cannot be fixed by stopping reconcilation and manually doing, a silence may be required. In this case, please notify via slack support channel a) the situation that we are alerted for and that we cannot help due to customer ownership and no access b) we will silence the alert until the next buisness day.

In case of urgent situations or when pausing reconcilation does not fix the issue and the customer needs to be notified before the next business day, please reference the customer specific escalation matrix found in intranet. This will notify the customer of the situation and that Giant Swarm has no way to fix the problem and that Giant Swarm will silence the alert because of this. `urgent@giantswarm.io` remains available for additional help within the Giant Swarm scope but can only be useful after the customer takes care of their fix.
In case of urgent situations or when pausing reconcilation does not fix the issue and the customer needs to be notified before the next business day, please reference the customer specific escalation matrix found in intranet. This will notify the customer of the situation and that Giant Swarm has no way to fix the problem and that Giant Swarm will silence the alert because of this. `urgent@giantswarm.io` remains available for additional help within the Giant Swarm scope but can only be useful after the customer takes care of their fix.

0 comments on commit c0d104a

Please sign in to comment.