[BUG] Harvester pod crashes after upgrading from v0.3.0 to v1.0.0-rc1 (contain vm backup before upgrade) #1644

TachunLin · 2021-12-08T06:42:45Z

Describe the bug

Prepare a 4 nodes harvester cluster with v0.3.0 with a vm backup then do manual upgrade to v1.0.0-rc1
Can't access harvester, harvester pod crashes after the upgrade.

To Reproduce
Steps to reproduce the behavior:

Download harvester v0.3.0 iso and do checksum
Download harvester v1.0.0 iso and do checksum
Use ISO Install a 4 nodes harvester cluster
Create several OS images from URL
Create ssh key
Enable vlan network with harvester-mgmt
Create virtual network vlan1 with id 1
Create 2 virtual machines

ubuntu-vm: 2 core, 4GB memory, 30GB disk
centos-vm: 2 core, 4GB memory, 30GB disk

Setup backup target
Take a backup from ubuntu vm

upgrade process
Follow the manual upgrade steps to upgrade from v0.3.0 to v1.0.0-rc1
https://github.com/harvester/docs/pull/67/files

Expected behavior

Can manual upgrade harvester from v0.3.0 to v1.0.0-rc1 with vm backup.
Harvester pods keeps working without crash

Support bundle

bundle.zip

Environment:

Harvester ISO version before upgrade: v0.3.0
Harvester ISO version after upgrade: v1.0.0-rc1
Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): 4 nodes harvester cluster on local kvm machines

Harvester network information

VIP: 192.168.122.194
node1: 192.168.122.36
node2: 192.168.122.85
node3: 192.168.122.186
node4: 192.168.122.97

Additional context
Add any other context about the problem here.

E1207 11:10:55.797690       8 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 755 [running]:
github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x22a7b80, 0x3e2af10)
	/go/src/github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x22a7b80, 0x3e2af10)
	/usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/harvester/harvester/pkg/controller/master/backup.IsBackupTargetSame(...)
	/go/src/github.com/harvester/harvester/pkg/controller/master/backup/util.go:31
github.com/harvester/harvester/pkg/controller/master/backup.(*Handler).uploadVMBackupMetadata(0xc00019ec00, 0xc008580000, 0xc00550d100, 0x0, 0x0)
	/go/src/github.com/harvester/harvester/pkg/controller/master/backup/backup.go:574 +0x6c
github.com/harvester/harvester/pkg/controller/master/backup.(*Handler).OnBackupChange(0xc00019ec00, 0xc0010f1050, 0x24, 0xc0029ea000, 0xc00046c180, 0x60, 0x58)
	/go/src/github.com/harvester/harvester/pkg/controller/master/backup/backup.go:130 +0x405
github.com/harvester/harvester/pkg/generated/controllers/harvesterhci.io/v1beta1.FromVirtualMachineBackupHandlerToHandler.func1(0xc0010f1050, 0x24, 0x29c3d40, 0xc0029ea000, 0x756ea12c64a09e, 0x73307081352ddbde, 0x403bcb, 0xc00578af20)
	/go/src/github.com/harvester/harvester/pkg/generated/controllers/harvesterhci.io/v1beta1/virtualmachinebackup.go:102 +0x6b
github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0xc00578af00, 0xc0010f1050, 0x24, 0x29c3d40, 0xc0029ea000, 0xc00578af20, 0x756ea188bc575c, 0x40a63f, 0xc00003a000)
	/go/src/github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller/sharedcontroller.go:29 +0x4e
github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc0007120c0, 0xc0010f1050, 0x24, 0x29c3d40, 0xc0029ea000, 0xc00292b801, 0x0)
	/go/src/github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller/sharedhandler.go:69 +0x14c
github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc0008ed8c0, 0xc0010f1050, 0x24, 0xc00292b948, 0x43b100)
	/go/src/github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller/controller.go:215 +0xd1
github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc0008ed8c0, 0x21813a0, 0xc00578af20, 0x0, 0x0)
	/go/src/github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller/controller.go:197 +0xe7
github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc0008ed8c0, 0x203000)
	/go/src/github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller/controller.go:174 +0x54
github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller.(*controller).runWorker(...)
	/go/src/github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller/controller.go:163
github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00230e130)
	/go/src/github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00230e130, 0x299e500, 0xc0006b5fb0, 0x1, 0xc0001151a0)
	/go/src/github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00230e130, 0x3b9aca00, 0x0, 0xc001820201, 0xc0001151a0)
	/go/src/github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc00230e130, 0x3b9aca00, 0xc0001151a0)
	/go/src/github.com/harvester/harvester/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller.(*controller).run
	/go/src/github.com/harvester/harvester/vendor/github.com/rancher/lasso/pkg/controller/controller.go:134 +0x33b
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1e7920c]

The text was updated successfully, but these errors were encountered:

guangbochen · 2021-12-13T04:21:07Z

should be addressed by #1642

FrankYang0529 · 2021-12-14T10:06:46Z

I try to upgrade Harvester from v0.3.0 to master-head and Harvester pods don't crash. However, VMBackup CRD doesn't be updated, so we have some error messages like this:

time="2021-12-14T10:00:19Z" level=error msg="error syncing 'ssl-certificates': handler harvester-setting-controller: helmchartconfigs.helm.cattle.io \"rke2-ingress-nginx\" not found, requeuing"
time="2021-12-14T10:02:18Z" level=error msg="error syncing 'default/s3-1': handler harvester-vm-backup-controller: no backup target in vmbackup.status, requeuing"

TachunLin · 2021-12-17T15:15:06Z

Verified fixed after upgrade from v0.3.0 to master-935a1670-head.(12/17 based on v1.0.0-rc1)
Close this issue.

Result

After manual upgrade from v0.3.0 to v1.0.0-rc1 (master-935a1670-head).

1. Harvester pods did not crash -> PASS

harvester-node01-upgrade100rc1:/home/rancher # kubectl get pods -n harvester-system
NAME                                                     READY   STATUS      RESTARTS   AGE
harvester-d544ddb6f-52mdk                                1/1     Running     0          58m
harvester-d544ddb6f-mg4dc                                1/1     Running     0          58m
harvester-d544ddb6f-npd7c                                1/1     Running     0          58m
harvester-load-balancer-59bf75f489-57nnp                 1/1     Running     0          58m
harvester-network-controller-68m6r                       1/1     Running     0          57m
harvester-network-controller-cdrjp                       1/1     Running     0          57m
harvester-network-controller-jfg96                       1/1     Running     0          58m
harvester-network-controller-manager-c57f8cbcb-67mcq     1/1     Running     0          58m
harvester-network-controller-manager-c57f8cbcb-9mqbx     1/1     Running     0          58m
harvester-network-controller-v2k2n                       1/1     Running     0          57m
harvester-node-disk-manager-29xrr                        1/1     Running     0          57m
harvester-node-disk-manager-v27hz                        1/1     Running     0          57m
harvester-node-disk-manager-wwpwf                        1/1     Running     0          58m
harvester-node-disk-manager-zwbxv                        1/1     Running     0          57m
harvester-promote-harvester-node02-upgrade100rc1-nphjz   0/1     Completed   0          9h
harvester-promote-harvester-node03-upgrade100rc1-72lj7   0/1     Completed   0          9h
harvester-webhook-67744f845f-pmrlg                       1/1     Running     0          57m
harvester-webhook-67744f845f-r5c44                       1/1     Running     0          58m
harvester-webhook-67744f845f-tqjkl                       1/1     Running     0          57m
kube-vip-2l2qp                                           1/1     Running     1          56m
kube-vip-cloud-provider-0                                1/1     Running     34         10h
kube-vip-cvklf                                           1/1     Running     0          56m
kube-vip-q99lt                                           1/1     Running     0          56m
virt-api-86455cdb7d-2hb4x                                1/1     Running     3          10h
virt-api-86455cdb7d-q8fpc                                1/1     Running     3          10h
virt-controller-5f649999dd-q5bqs                         1/1     Running     18         10h
virt-controller-5f649999dd-sqh9g                         1/1     Running     20         10h
virt-handler-4ncxn                                       1/1     Running     3          10h
virt-handler-cmzg2                                       1/1     Running     3          9h
virt-handler-k8pg9                                       1/1     Running     3          10h
virt-handler-x754t                                       1/1     Running     2          8h
virt-operator-56c5bdc7b8-cgwc8                           1/1     Running     28         10h

2. Check whether longhornBackupName is in each VM Backup. -> PASS

$ kubectl get backup -A
longhorn-system   backup-7933c0d09ec04d1a   snapshot-b1fbfcf4-ad45-442d-9bb6-8119e713d892   1367343104     2021-12-17T07:13:00Z   Completed   2021-12-17T07:17:51.097572079Z

$ kubectl get vmbackup ubuntu-backup -o yaml | less

volumeBackups:
  - creationTime: "2021-12-17T07:12:59Z"
    longhornBackupName: backup-7933c0d09ec04d1a
    name: ubuntu-backup-volume-ubuntu-vm-disk-0-ylodf

3. Check whether there is -.cfg file in backup target. -> PASS
There is a default-ubuntu-backup.cfg file exists on S3 backup remote bucket

$ cat default-ubuntu-backup.cfg

4. Check whether you can restore VM. -> PASS
Can create a new vm from restore the existing backup

![image](https://user-images.githubusercontent.com/29251855/146564664-46809072-a320-44a6-8665-29aa1b8d936f.png)

Environment:

Harvester ISO version before upgrade: v0.3.0
Harvester ISO version after upgrade: v1.0.0-rc1
Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): 4 nodes harvester cluster on local kvm machines

Harvester node information

VIP: 192.168.122.71
node1: 192.168.122.253 (6 core, 12GB, 200GB)
node2: 192.168.122.10 (6 core, 12GB, 200GB)
node3: 192.168.122.36 (4 core, 12GB, 200GB)
node4: 192.168.122.97 (4 core, 10GB, 200GB)

Verify Steps

Download harvester v0.3.0 iso and do checksum
Download harvester v1.0.0 iso and do checksum
Use ISO Install a 4 nodes harvester cluster
Create several OS images from URL
Create ssh key
Enable vlan network with harvester-mgmt
Create virtual network vlan1 with id 1
Create 2 virtual machines

ubuntu-vm: 2 core, 4GB memory, 30GB disk

Setup backup target
Take a backup from ubuntu vm

upgrade process
Follow the manual upgrade steps to upgrade from v0.3.0 to v1.0.0-rc1
https://github.com/harvester/docs/pull/67/files

Add the following content to /usr/local/harvester-upgrade/upgrade-helpers/manifests/10-harvester.yaml
before upgrade harvester controller node

--
apiVersion: management.cattle.io/v3
kind: ManagedChart
metadata:
  name: harvester-crd
  namespace: fleet-local
spec:
  chart: harvester-crd
  releaseName: harvester-crd
  version: 0.0.0-dev
  defaultNamespace: harvester-system
  repoName: harvester-charts
  # takeOwnership will force apply this chart without checking ownership in labels and annotations.
  # https://github.com/rancher/fleet/blob/ce9c0d6c0a455d61e87c0f19df79d0ee11a89eeb/pkg/helmdeployer/deployer.go#L323
  # https://github.com/rancher/helm/blob/ee91a121e0aa301fcef2bfbc7184f96edd4b50f5/pkg/action/validate.go#L71-L76
  takeOwnership: true
  targets:
  - clusterName: local
    clusterSelector:
      matchExpressions:
      - key: provisioning.cattle.io/unmanaged-system-agent
        operator: DoesNotExist
  values: {}

Additional Context

Since issue #1645, currently we can't access harvester dashboard by VIP
And since #1666, after upgrade we are still not able to login with original admin password

TachunLin added kind/bug Issues that are defects reported by users or that we know have reached a real release area/storage severity/1 Function broken (a critical incident with very high impact) area/multi-tenancy labels Dec 8, 2021

guangbochen added the blocker blocker of major functionality label Dec 8, 2021

bk201 mentioned this issue Dec 13, 2021

[FEATURE] Manual upgrade from 0.3.0 to 1.0.0 #1665

Closed

7 tasks

TachunLin added this to the v1.0.0 milestone Dec 13, 2021

guangbochen assigned FrankYang0529 Dec 14, 2021

This was referenced Dec 15, 2021

Add harvester-crd chart #1680

Merged

Add harvester-crd chart harvester/harvester-installer#194

Merged

TachunLin self-assigned this Dec 16, 2021

TachunLin closed this as completed Dec 17, 2021

lanfon72 mentioned this issue Dec 23, 2021

+ add v1.0.0 verified issues harvester/tests#127

Merged

TachunLin mentioned this issue Jan 7, 2022

Add v1.0.0 new test cases and update rancher integration cases harvester/tests#133

Merged

59 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Harvester pod crashes after upgrading from v0.3.0 to v1.0.0-rc1 (contain vm backup before upgrade) #1644

[BUG] Harvester pod crashes after upgrading from v0.3.0 to v1.0.0-rc1 (contain vm backup before upgrade) #1644

TachunLin commented Dec 8, 2021

guangbochen commented Dec 13, 2021

FrankYang0529 commented Dec 14, 2021

TachunLin commented Dec 17, 2021

[BUG] Harvester pod crashes after upgrading from v0.3.0 to v1.0.0-rc1 (contain vm backup before upgrade) #1644

[BUG] Harvester pod crashes after upgrading from v0.3.0 to v1.0.0-rc1 (contain vm backup before upgrade) #1644

Comments

TachunLin commented Dec 8, 2021

guangbochen commented Dec 13, 2021

FrankYang0529 commented Dec 14, 2021

TachunLin commented Dec 17, 2021

Result

Verify Steps

Additional Context