Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 112 additions & 3 deletions docs/upgrade/v1-1-2-to-v1-2-0.md
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ If you notice the upgrade is stuck in the **Upgrading System Service** state for
1. Check if the `prometheus-rancher-monitoring-prometheus-0` pod is stuck with the status `Terminating`.

```
$ kubectl -n cattle-monitoring-system get pods
$ kubectl -n cattle-monitoring-system get pods
NAME READY STATUS RESTARTS AGE
prometheus-rancher-monitoring-prometheus-0 0/3 Terminating 0 19d
```
Expand Down Expand Up @@ -399,15 +399,16 @@ If an upgrade is stuck in an `Upgrading System Service` state for an extended pe

---

### 8. The `registry.suse.com/harvester-beta/vmdp:latest` image is not available in airgapped environment
### 8. The `registry.suse.com/harvester-beta/vmdp:latest` image is not available in air-gapped environment

Harvester does not package the `registry.suse.com/harvester-beta/vmdp:latest` image in the ISO file as of v1.1.0. For Windows VMs before v1.1.0, they used this image as a container disk. However, kubelet may remove old images to free up bytes. Windows VMs can't access an air-gapped environment when this image is removed. You can fix this issue by changing the image to `registry.suse.com/suse/vmdp/vmdp:2.5.4.2` and restarting the Windows VMs.

- Related issue:
- [[BUG] VMDP Image wrong after upgrade to Harvester 1.2.0](https://github.com/harvester/harvester/issues/4534)

---
### 9. Upgrade stuck in the Post-draining state

### 9. An Upgrade is stuck in the Post-draining state

The node might be stuck in the OS upgrade process if you encounter the **Post-draining** state, as shown below.

Expand Down Expand Up @@ -483,3 +484,111 @@ After performing the steps above, you should pass post-draining with the next re
- [A potential bug in NewElementalPartitionsFromList which caused upgrade error code 33](https://github.com/rancher/elemental-toolkit/issues/1827)
- Workaround:
- https://github.com/harvester/harvester/issues/4526#issuecomment-1732853216

---

### 10. An upgrade is stuck in the Upgrading System Service state due to the `customer provided SSL certificate without IP SAN` error in `fleet-agent`

If an upgrade is stuck in an **Upgrading System Service** state for an extended period, follow these steps to investigate this issue:

1. Find the pods related to the upgrade:

```
kubectl get pods -A | grep upgrade
```

Example output:

```
# kubectl get pods -A | grep upgrade
cattle-system system-upgrade-controller-5685d568ff-tkvxb 1/1 Running 0 85m
harvester-system hvst-upgrade-vq4hl-apply-manifests-65vv8 1/1 Running 0 87m // waiting for managedchart to be ready
..
```

2. The pod `hvst-upgrade-vq4hl-apply-manifests-65vv8` has the following loop log:

```
Current version: 102.0.0+up40.1.2, Current state: WaitApplied, Current generation: 23
Sleep for 5 seconds to retry
```

3. Check the status for all bundles. Note thata couple of bundles are `OutOfSync`:

```
# kubectl get bundle -A
NAMESPACE NAME BUNDLEDEPLOYMENTS-READY STATUS
...
fleet-local mcc-local-managed-system-upgrade-controller 1/1
fleet-local mcc-rancher-logging 0/1 OutOfSync(1) [Cluster fleet-local/local]
fleet-local mcc-rancher-logging-crd 0/1 OutOfSync(1) [Cluster fleet-local/local]
fleet-local mcc-rancher-monitoring 0/1 OutOfSync(1) [Cluster fleet-local/local]
fleet-local mcc-rancher-monitoring-crd 0/1 WaitApplied(1) [Cluster fleet-local/local]
```

4. The pod `fleet-agent-*` has following error log:

```
fleet-agent pod log:

time="2023-09-19T12:18:10Z" level=error msg="Failed to register agent: looking up secret cattle-fleet-local-system/fleet-agent-bootstrap: Post \"https://192.168.122.199/apis/fleet.cattle.io/ v1alpha1/namespaces/fleet-local/clusterregistrations\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.122.199 because it doesn't contain any IP SANs"
```

5. Check the `ssl-certificates` settings in Harvester:

From the command line:

```
# kubectl get settings.harvesterhci.io ssl-certificates
NAME VALUE
ssl-certificates {"publicCertificate":"-----BEGIN CERTIFICATE-----\nMIIFNDCCAxygAwIBAgIUS7DoHthR/IR30+H/P0pv6HlfOZUwDQYJKoZIhvcNAQEL\nBQAwFjEUMBIGA1UEAwwLZXhhbXBsZS5j...."}
```

From the Harvester Web UI:

![](/img/v1.2/upgrade/known_issues/4519-harvester-settings-ssl-certificates.png)

6. Check the `server-url` setting, it is the value of VIP:

```
# kubectl get settings.management.cattle.io -n cattle-system server-url
NAME VALUE
server-url https://192.168.122.199
```

7. The root cause:

User sets the self-signed `ssl-certificates` with FQDN in the Harvester settings, but the `server-url` points to the VIP, the `fleet-agent` pod fails to register.

```
For example: create self-signed certificate for (*).example.com

openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 -nodes \
-keyout example.key -out example.crt -subj "/CN=example.com" \
-addext "subjectAltName=DNS:example.com,DNS:*.example.com"

The general outputs are: example.crt, example.key
```

8. The workaround:

Update `server-url` with the value of `https://harv31.example.com`

```
# kubectl edit settings.management.cattle.io -n cattle-system server-url
setting.management.cattle.io/server-url edited
...

# kubectl get settings.management.cattle.io -n cattle-system server-url
NAME VALUE
server-url https://harv31.example.com
```

After the workaround is applied, the `fleet-agent` pod is replaced by Rancher automatically and registers successfully, the upgrade continues.

- Related issue:
- [[BUG] Upgrade to Harvester 1.2.0 fails in fleet-agent due to customer provided SSL certificate without IP SAN](https://github.com/harvester/harvester/issues/4519)
- Workaround:
- https://github.com/harvester/harvester/issues/4519#issuecomment-1727132383

---
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
116 changes: 112 additions & 4 deletions versioned_docs/version-v1.2/upgrade/v1-1-2-to-v1-2-0.md
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ If you notice the upgrade is stuck in the **Upgrading System Service** state for
1. Check if the `prometheus-rancher-monitoring-prometheus-0` pod is stuck with the status `Terminating`.

```
$ kubectl -n cattle-monitoring-system get pods
$ kubectl -n cattle-monitoring-system get pods
NAME READY STATUS RESTARTS AGE
prometheus-rancher-monitoring-prometheus-0 0/3 Terminating 0 19d
```
Expand Down Expand Up @@ -330,7 +330,7 @@ If you notice the upgrade is stuck in the **Upgrading System Service** state for

---

### 7. Upgrade stuck in the `Upgrading System Service` state
### 7. An upgrade is stuck in the `Upgrading System Service` state

If an upgrade is stuck in an `Upgrading System Service` state for an extended period, some system services' certificates may have expired. To investigate and resolve this issue, follow these steps:

Expand Down Expand Up @@ -399,15 +399,16 @@ If an upgrade is stuck in an `Upgrading System Service` state for an extended pe

---

### 8. The `registry.suse.com/harvester-beta/vmdp:latest` image is not available in airgapped environment
### 8. The `registry.suse.com/harvester-beta/vmdp:latest` image is not available in air-gapped environment

Harvester does not package the `registry.suse.com/harvester-beta/vmdp:latest` image in the ISO file as of v1.1.0. For Windows VMs before v1.1.0, they used this image as a container disk. However, kubelet may remove old images to free up bytes. Windows VMs can't access an air-gapped environment when this image is removed. You can fix this issue by changing the image to `registry.suse.com/suse/vmdp/vmdp:2.5.4.2` and restarting the Windows VMs.

- Related issue:
- [[BUG] VMDP Image wrong after upgrade to Harvester 1.2.0](https://github.com/harvester/harvester/issues/4534)

---
### 9. Upgrade stuck in the Post-draining state

### 9. An Upgrade is stuck in the Post-draining state

The node might be stuck in the OS upgrade process if you encounter the **Post-draining** state, as shown below.

Expand Down Expand Up @@ -484,3 +485,110 @@ After performing the steps above, you should pass post-draining with the next re
- Workaround:
- https://github.com/harvester/harvester/issues/4526#issuecomment-1732853216

---

### 10. An upgrade is stuck in the Upgrading System Service state due to the `customer provided SSL certificate without IP SAN` error in `fleet-agent`

If an upgrade is stuck in an **Upgrading System Service** state for an extended period, follow these steps to investigate this issue:

1. Find the pods related to the upgrade:

```
kubectl get pods -A | grep upgrade
```

Example output:

```
# kubectl get pods -A | grep upgrade
cattle-system system-upgrade-controller-5685d568ff-tkvxb 1/1 Running 0 85m
harvester-system hvst-upgrade-vq4hl-apply-manifests-65vv8 1/1 Running 0 87m // waiting for managedchart to be ready
..
```

2. The pod `hvst-upgrade-vq4hl-apply-manifests-65vv8` has the following loop log:

```
Current version: 102.0.0+up40.1.2, Current state: WaitApplied, Current generation: 23
Sleep for 5 seconds to retry
```

3. Check the status for all bundles. Note thata couple of bundles are `OutOfSync`:

```
# kubectl get bundle -A
NAMESPACE NAME BUNDLEDEPLOYMENTS-READY STATUS
...
fleet-local mcc-local-managed-system-upgrade-controller 1/1
fleet-local mcc-rancher-logging 0/1 OutOfSync(1) [Cluster fleet-local/local]
fleet-local mcc-rancher-logging-crd 0/1 OutOfSync(1) [Cluster fleet-local/local]
fleet-local mcc-rancher-monitoring 0/1 OutOfSync(1) [Cluster fleet-local/local]
fleet-local mcc-rancher-monitoring-crd 0/1 WaitApplied(1) [Cluster fleet-local/local]
```

4. The pod `fleet-agent-*` has following error log:

```
fleet-agent pod log:

time="2023-09-19T12:18:10Z" level=error msg="Failed to register agent: looking up secret cattle-fleet-local-system/fleet-agent-bootstrap: Post \"https://192.168.122.199/apis/fleet.cattle.io/ v1alpha1/namespaces/fleet-local/clusterregistrations\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.122.199 because it doesn't contain any IP SANs"
```

5. Check the `ssl-certificates` settings in Harvester:

From the command line:

```
# kubectl get settings.harvesterhci.io ssl-certificates
NAME VALUE
ssl-certificates {"publicCertificate":"-----BEGIN CERTIFICATE-----\nMIIFNDCCAxygAwIBAgIUS7DoHthR/IR30+H/P0pv6HlfOZUwDQYJKoZIhvcNAQEL\nBQAwFjEUMBIGA1UEAwwLZXhhbXBsZS5j...."}
```

From the Harvester Web UI:

![](/img/v1.2/upgrade/known_issues/4519-harvester-settings-ssl-certificates.png)

6. Check the `server-url` setting, it is the value of VIP:

```
# kubectl get settings.management.cattle.io -n cattle-system server-url
NAME VALUE
server-url https://192.168.122.199
```

7. The root cause:

User sets the self-signed `ssl-certificates` with FQDN in the Harvester settings, but the `server-url` points to the VIP, the `fleet-agent` pod fails to register.

```
For example: create self-signed certificate for (*).example.com

openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 -nodes \
-keyout example.key -out example.crt -subj "/CN=example.com" \
-addext "subjectAltName=DNS:example.com,DNS:*.example.com"

The general outputs are: example.crt, example.key
```

8. The workaround:

Update `server-url` with the value of `https://harv31.example.com`

```
# kubectl edit settings.management.cattle.io -n cattle-system server-url
setting.management.cattle.io/server-url edited
...

# kubectl get settings.management.cattle.io -n cattle-system server-url
NAME VALUE
server-url https://harv31.example.com
```

After the workaround is applied, the `fleet-agent` pod is replaced by Rancher automatically and registers successfully, the upgrade continues.

- Related issue:
- [[BUG] Upgrade to Harvester 1.2.0 fails in fleet-agent due to customer provided SSL certificate without IP SAN](https://github.com/harvester/harvester/issues/4519)
- Workaround:
- https://github.com/harvester/harvester/issues/4519#issuecomment-1727132383

---