Rancher integration test - Rancher 2.7.6 + Harvester 1.1.2 #936

albinsun · 2023-09-13T00:23:57Z

What's the test to develop? Please describe

Rancher integration test on a setup Rancher 2.7.6 with Harvester 1.1.2 to confirm the support status

Describe the items of the test development (DoD, definition of done) you'd like

TCs will reference to v1.2.0 release testing #900

Test Outline

Import Harvester in Rancher 2.7.6

Create rke2 custom cluster, Harvester node driver cluster, cluster using terraform

Deploy cloud provider and csi driver all the clusters.

Necessary checks for cloud provider and csi driver working on cluster.

Scale down and scale up of clustersbasic operations.

Environment

Harvester
- Version: v1.1.2
- Profile: QEMU/KVM, 3 nodes (8C/16G/250G)
- ui-source: Auto
Rancher
- Version: v2.7.6
- Profile: Docker

Terraform

Terraform v1.5.3
on linux_amd64
+ provider registry.terraform.io/rancher/rancher2 v3.0.1

The text was updated successfully, but these errors were encountered:

albinsun · 2023-09-13T03:47:16Z

Harvester Configuration

Based on

Prerequisites

VLAN 1 network on mgmt and 1 network on other NICs

2 Virtual machines with data and md5sum computed- 1 running, 1 stopped

Create a new storage class apart from default one. Use the new storage class for some basic operations.

Setup

Setup a 3 nodes cluster ${~~~\color{green}\textsf{V}}$

Network

Create VLAN 1 network on mgmt NIC, "mgmt-vlan1" ${~~~\color{green}\textsf{V}}$
Create untagged network on other NIC, "nonmgmt-untagged" ${~~~\color{green}\textsf{V}}$
1. Create cluster network nonmgmt
2. Create network config nc-nonmgmt
3. Create VM network nonmgmt-untagged

Storage

Add disk to a node1 and node2 respectively ${~~~\color{green}\textsf{V}}$
1. Plug a 64G ext4 nonpartitioned disk to node1 and node2
2. System detects and put as blockdevices resource
  - node0
  - node1
3. Attach blockdevice as storage, also add disk tags
  - node0
  - node1
Create a new storrage class "eext4" which selects the new disk ${~~~\color{green}\textsf{V}}$
Create a new 10G volume "eext4-pred" base on the new "eext4" ${~~~\color{green}\textsf{V}}$

VM Creation

Running VM: Create "vm-running" ${~~~\color{green}\textsf{V}}$
1. Disk: ubuntu-focal-cloudimg + Existing Volume "eext4-pred"
2. Network: "mgmt-vlan1"
3. Mount eext4-pred and edit fstab
```
root@vm-running:~# cat /etc/fstab 
LABEL=cloudimg-rootfs   /        ext4   defaults        0 1
LABEL=UEFI      /boot/efi       vfat    umask=0077      0 1
UUID="35222e2d-fced-4d0b-8445-ffeb3a906378"     /data   ext4    defaults        0 1
```
4. Some test data and its checksum computed on both own and attached volume.
  - own volume
  - attached volume
5. Restart from UI, checksum should also valid
  - own volume
  - attached volume
Stopped VM: Create "vm-stopped" ${~~~\color{green}\textsf{V}}$
1. Disk: ubuntu-focal-cloudimg + on-demand volume "eext4-ond"
2. Network: "nonmgmt-untagged"
3. Mount eext4-ond and edit fstab
```
ubuntu@vm-stopped:/data$ cat /etc/fstab 
LABEL=cloudimg-rootfs   /        ext4   defaults        0 1
LABEL=UEFI      /boot/efi       vfat    umask=0077      0 1
UUID="691bc5e1-f642-427d-acb0-5da31fb20732"     /data   ext4    defaults        0 1
```
4. Some test data and its checksum computed on both own and attached volume.
  - own volume
  - attached volume
5. Restart from UI, checksum should also valid
6. Stop VM

albinsun · 2023-09-13T12:33:46Z

Rancher Integration

Outline

Test steps

Import Harvester in Rancher 2.7.6

Create rke2 custom cluster, Harvester node driver cluster, cluster using terraform

Deploy cloud provider and csi driver all the clusters.

Necessary checks for cloud provider and csi driver working on cluster.

Scale down and scale up of clustersbasic operations.

Import to Rancher

Import Harvester to Rancher 2.7.6 ${~~~\color{green}\textsf{V}}$
1. Rancher, go to "Virtualization Management" -> "Import Existing" -> "Create"
2. Register Harvester to Rancher
3. Imported Harvester shows Active

RKE2 Harvester Node Driver Cluster (Manual)

Setup Cluster

Create cloud credential ${~~~\color{green}\textsf{V}}$

Go to "Cluster Management" -> "Cloud Credentials" -> "Create" -> "Harvester"
Create cluster (Takes ~20m) ${~~~\color{orange}\textsf{V}}$
1. Go to "Cluster Management" -> "Clusters" -> "Create"
  - v1.23.17+rke2r1
2. Cluster should be created
  - Rancher
  - Harvester
- ⚠️ Can only select `v1.23.17+rke2r1` to conform to Harvester `v1.1.2`
  - v1.26.8+rke2r1
  - v1.25.13+rke2r1
  - v1.24.17+rke2r1
- 🐞 (minor) UI broken if Click Rancher tab from Harvester.
  1. Go to Virtualization Managemen
  2. Enter Harvester
  3. Click rancher tab will broken (Workaround: connect to Home tab)
  ** Workaround **
  One workaround is to enter base rancher URL again.
Create IP Pool ${~~~\color{green}\textsf{V}}$
1. Go to "Virtualization Management" -> harvester -> "Settings" -> "vip-pools"

Test `harvester-cloud-provider`

Both App & Workload should be Active ${~~~\color{green}\textsf{V}}$

App

Workload
Deploy Nginx workload ${~~~\color{green}\textsf{V}}$
1. Create a deployment test-nginx with image nginx:latest and pod label
2. Check deployment test-nginx is Active
Verify Load Balancer with IPAM "DHCP" ${~~~\color{green}\textsf{V}}$
1. Go to "Service discovery" -> "Services" -> "Create" -> "Load Balancer"
  Selectors for test-nginx
2. Create lb-dhcp-80 and can route correctly
3. Create lb-dhcp-http and can route correctly
Verify Load Balancer with IPAM "Pool" ${~~~\color{green}\textsf{V}}$
1. Create lb-pool-80 & lb-pool-http with IPAM Pool
2. lb-pool-80 and can route correctly
3. lb-pool-http and can route correctly

Test `harvester-csi-driver`

Both App & Workload should be Active ${~~~\color{green}\textsf{V}}$

App

Workload
Check Harvester already set as the default storage class ${~~~\color{green}\textsf{V}}$
Deploy nginx:latest with on-demand PVC ${~~~\color{green}\textsf{V}}$

Config
- Storage (PVC & PV)
- Mount
Related resources are created
- Deployment
- PVC
- PV
Verify Load Balancers ${~~~\color{green}\textsf{V}}$

Scaling

Scale Pool Up (Takes ~15m) ${~~~\color{green}\textsf{V}}$
- Go to "Cluster Management" -> "Clusters" -> cluster -> "+" sign
- After
- Deployment & LB still work
Scale Pool Down (Takes ~20m) ${~~~\color{orange}\textsf{V}}$
- Go to "Cluster Management" -> "Clusters" -> cluster -> "-" sign
- After
- Deployment & LB still work
- 🐞 (minor) Legacy node record in Cluster -> Nodes page
  
  Machine is deleted on Harvester and Rancher Cluster Management
  
  But found legacy record in cluster page
  
  Workaround: delete manually
- 🐞 (Known) Possible scale down fail - https://github.com/rancher/rancher/issues/42582

albinsun · 2023-09-14T03:13:03Z

⚠️ Deprecated, see latest test below

RKE2 Harvester Node Driver Cluster (Terraform)

Setup Cluster

Create API Key ${~~~\color{green}\textsf{V}}$

Go to Account icon (top-right cornor) -> "Account and API Keys" -> "Create API Key"

Setup RKE2 cluster via Terraform ${~~~\color{red}\textsf{X}}$

Will hit 500 Internal Server Error using Kubernetes v1.23.17+rke2r1 no matter with rancher2 provider 3.0.0, 3.0.1 or ~~3.1.1~~.

Error: Creating cluster V2: Bad response statusCode [500]. Status [500 Internal Server Error].
Body: [code=InternalError, message=Internal error occurred: failed calling webhook "rancher.cattle.io.clusters.provisioning.cattle.io": 
failed to call webhook: an error on the server 
...

Note

We found that rancher2_provider:3.0.1 + kubernetes:v1.26.7+rke2r1 can setup, but has problem when creating LB later (stuck in pending).
- Can setup v1.26.7+rke2r1
- Can create test-nginx workload
- But craete LB stucks in pending without explicit event
However this should not be a formal case since v1.26.7+rke2r1 is not listed in Rancher v2.7.6.

albinsun · 2023-09-21T11:28:57Z

RKE2 Harvester Node Driver Cluster (Terraform)

Environment

Harvester v1.1.2 (QEMU/KVM, 3 nodes (8C/16G/250G))
Rancher v2.7.6 (Docker)

Terraform

$ ./terraform -version
Terraform v1.5.3
on linux_amd64
+ provider registry.terraform.io/rancher/rancher2 v3.1.1

terraform file: main.tf

Ref. https://registry.terraform.io/providers/rancher/rancher2/latest/docs/resources/cluster_v2#creating-rancher-v2-harvester-cluster-v2-with-harvester-cloud-provider

Provider rancher/rancher2 v3.1.1

Note Test 2 times.

terraform init ${~~~\color{green}\textsf{V}}$

Setup Cluster

Create API Key ${~~~\color{green}\textsf{V}}$

Go to Account icon (top-right cornor) -> "Account and API Keys" -> "Create API Key"

Create cluster via Terraform (Takes ~20m) ${~~~\color{green}\textsf{V}}$

Rancher
Harvester

Terraform

$ ./terraform init -upgrade
...
$ ./terraform validate
...
$ ./terraform apply -auto-approve
data.rancher2_cluster_v2.myharvester: Reading...
data.rancher2_cluster_v2.myharvester: Read complete after 1s [id=fleet-default/harvester131]
...
rancher2_cluster_v2.rke2-harvester-terraform: Still creating... [17m1s elapsed]
rancher2_cluster_v2.rke2-harvester-terraform: Still creating... [17m11s elapsed]
rancher2_cluster_v2.rke2-harvester-terraform: Creation complete after 17m18s [id=fleet-default/rke2-harvester-terraform]

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Create IP Pool ${~~~\color{green}\textsf{V}}$

Go to "Virtualization Management" -> harvester -> "Advanced" -> "Settings" -> "vip-pools"

Test `harvester-cloud-provider`

Bpth App & Workload should be Active ${~~~\color{green}\textsf{V}}$

App

Workload
Deploy Nginx workload ${~~~\color{green}\textsf{V}}$

Create a deployment test-nginx with image nginx:latest and pod lable mykey:myval
Verify Load Balancer with IPAM "DHCP" and "Pool" ${~~~\color{green}\textsf{V}}$

lb-dhcp-80 is Active and can route correctly

lb-dhcp-80 is Active and can route correctly

Test `harvester-csi-driver`

Both App & Workload should be Active ${~~~\color{green}\textsf{V}}$

App

Workload
Check Harvester already set as the default storage class ${~~~\color{green}\textsf{V}}$
Deploy nginx:latest with on-demand PVC ${~~~\color{green}\textsf{V}}$

Config
- Storage (PVC & PV)
- Mount
Related resources are created
- Deployment
- PVC
- PV
Verify Load Balancer with IPAM "DHCP" and "Pool" ${~~~\color{green}\textsf{V}}$

Scalinng

Scale Pool Up (Takes ~15m) ${~~~\color{green}\textsf{V}}$
- Go to "Cluster Management" -> "Clusters" -> cluster -> "+" sign
- After
- Deployment & LB still work
Scale Pool Down (Takes ~20m) ${~~~\color{green}\textsf{V}}$
- Go to "Cluster Management" -> "Clusters" -> cluster -> "-" sign
- After
- Deployment & LB still work

Setup in Other Versions

Provider rancher/rancher2 v3.0.1

Note Test 2 times.

terraform init ✔️
terraform apply ✔️
terraform destroy ✔️

Provider rancher/rancher2 v3.0.0

Note Test 2 times.

terraform init ✔️
terraform apply ⚠️ (Fail 1 time)
1. Trial 1 ❌
  Stuck in configuring bootstrap node(s) rke2-harvester-terraform-pool1-67c86697b4-bf86h: waiting for probes: kube-controller-manager, kube-scheduler, kubelet
2. Trial 2 ✔️
terraform destroy ✔️

Note/Issues

⚠️ Can only select v1.23.17+rke2r1 to conform to Harvester v1.1.2
🐞 (minor) UI broken if Click Rancher tab from Harvester.
1. Go to Virtualization Managemen
2. Enter Harvester
3. Click rancher tab will broken (Workaround: connect to Home tab)
** Workaround **
One workaround is to enter base rancher URL again.
🐞 [BUG] Scaling down etcd machine pool can cause multiple machines to be deleted unintentionally #42582

albinsun closed this as completed Sep 25, 2023

albinsun mentioned this issue Dec 14, 2023

Rancher integration test - Harvester 1.1.2 + Rancher 2.7.9 #1012

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rancher integration test - Rancher 2.7.6 + Harvester 1.1.2 #936

Rancher integration test - Rancher 2.7.6 + Harvester 1.1.2 #936

albinsun commented Sep 13, 2023 •

edited

albinsun commented Sep 13, 2023 •

edited

albinsun commented Sep 13, 2023 •

edited

albinsun commented Sep 14, 2023 •

edited

albinsun commented Sep 21, 2023

Rancher integration test - Rancher 2.7.6 + Harvester 1.1.2 #936

Rancher integration test - Rancher 2.7.6 + Harvester 1.1.2 #936

Comments

albinsun commented Sep 13, 2023 • edited

What's the test to develop? Please describe

Describe the items of the test development (DoD, definition of done) you'd like

Test Outline

Environment

albinsun commented Sep 13, 2023 • edited

Harvester Configuration

Setup

Network

Storage

VM Creation

albinsun commented Sep 13, 2023 • edited

Rancher Integration

Import to Rancher

RKE2 Harvester Node Driver Cluster (Manual)

Setup Cluster

Test harvester-cloud-provider

Test harvester-csi-driver

Scaling

albinsun commented Sep 14, 2023 • edited

RKE2 Harvester Node Driver Cluster (Terraform)

Setup Cluster

Note

albinsun commented Sep 21, 2023

RKE2 Harvester Node Driver Cluster (Terraform)

Environment

Provider rancher/rancher2 v3.1.1

Setup Cluster

Test harvester-cloud-provider

Test harvester-csi-driver

Scalinng

Setup in Other Versions

Provider rancher/rancher2 v3.0.1

Provider rancher/rancher2 v3.0.0

Note/Issues

albinsun commented Sep 13, 2023 •

edited

albinsun commented Sep 13, 2023 •

edited

albinsun commented Sep 13, 2023 •

edited

Test `harvester-cloud-provider`

Test `harvester-csi-driver`

albinsun commented Sep 14, 2023 •

edited

Test `harvester-cloud-provider`

Test `harvester-csi-driver`