Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services get deleted after importing Harvester to Rancher #34716

Closed
guangbochen opened this issue Sep 14, 2021 · 8 comments
Closed

Services get deleted after importing Harvester to Rancher #34716

guangbochen opened this issue Sep 14, 2021 · 8 comments
Assignees
Labels
area/harvester status/awaiting-harvester Indicates an issue is waiting for Harvester team's action
Milestone

Comments

@guangbochen
Copy link
Contributor

guangbochen commented Sep 14, 2021

To Reproduce
Steps to reproduce the behavior:

  1. Install rancher v2.6.0 using docker run
  2. Install Harvester using the master-head ISO image.
  3. Import the Harvester cluster to Rancher

Result:
Notice that the harvester-pre-delete job is triggered. After it completed, kubevirt services except for the virt-operator are deleted.

$ kubectl  get po -n harvester-system
NAME                                                    READY   STATUS    RESTARTS   AGE
harvester-7b68989d68-8hkjv                              0/1     Pending   0          16m
harvester-7b68989d68-n5tz7                              1/1     Running   0          16m
harvester-7b68989d68-xwzvq                              0/1     Pending   0          16m
harvester-network-controller-hc8t2                      1/1     Running   0          16m
harvester-network-controller-manager-56f795b468-gnmkg   1/1     Running   0          16m
harvester-network-controller-manager-56f795b468-hgnv9   1/1     Running   0          16m
harvester-webhook-7fb4f8f7f9-djkgm                      0/1     Pending   0          16m
harvester-webhook-7fb4f8f7f9-hrcf5                      0/1     Pending   0          16m
harvester-webhook-7fb4f8f7f9-r5vth                      1/1     Running   0          16m
virt-operator-669cfcc9d8-4gmkz                          1/1     Running   0          16m

Additional context
Current status of the managedchart:

$kubectl get managedchart -n fleet-local harvester -o yaml
apiVersion: management.cattle.io/v3
kind: ManagedChart
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"management.cattle.io/v3","kind":"ManagedChart","metadata":{"annotations":{},"name":"harvester","namespace":"fleet-local"},"spec":{"chart":"harvester","defaultNamespace":"harvester-system","repoName":"harvester-charts","targets":[{"clusterName":"local","clusterSelector":{"matchExpressions":[{"key":"provisioning.cattle.io/unmanaged-system-agent","operator":"DoesNotExist"}]}}],"values":{"containers":{"apiserver":{"authMode":"rancher","hciMode":true,"image":{"imagePullPolicy":"IfNotPresent","repository":"rancher/harvester","tag":"master-head"}}},"harvester-network-controller":{"enabled":true,"image":{"pullPolicy":"IfNotPresent"}},"longhorn":{"defaultSettings":{"taintToleration":"kubevirt.io/drain:NoSchedule"},"enabled":true},"multus":{"enabled":false},"rancherEmbedded":true,"webhook":{"image":{"imagePullPolicy":"IfNotPresent","repository":"rancher/harvester-webhook","tag":"master-head"}}},"version":"0.0.0-dev"}}
  creationTimestamp: "2021-09-01T07:58:44Z"
  generation: 1
  name: harvester
  namespace: fleet-local
  resourceVersion: "10119"
  uid: 9b399dfe-268e-49e1-bed6-55a08b942a9d
spec:
  chart: harvester
  defaultNamespace: harvester-system
  repoName: harvester-charts
  targets:
  - clusterName: local
    clusterSelector:
      matchExpressions:
      - key: provisioning.cattle.io/unmanaged-system-agent
        operator: DoesNotExist
  values:
    containers:
      apiserver:
        authMode: rancher
        hciMode: true
        image:
          imagePullPolicy: IfNotPresent
          repository: rancher/harvester
          tag: master-head
    harvester-network-controller:
      enabled: true
      image:
        pullPolicy: IfNotPresent
    longhorn:
      defaultSettings:
        taintToleration: kubevirt.io/drain:NoSchedule
      enabled: true
    multus:
      enabled: false
    rancherEmbedded: true
    webhook:
      image:
        imagePullPolicy: IfNotPresent
        repository: rancher/harvester-webhook
        tag: master-head
  version: 0.0.0-dev
status:
  conditions:
  - lastUpdateTime: "2021-09-01T08:10:28Z"
    message: NotReady(1) [Cluster fleet-local/local]; configmap.v1 longhorn-system/longhorn-storageclass
      missing; deployment.apps harvester-system/harvester error] Progress deadline
      exceeded; deployment.apps harvester-system/harvester-webhook [progressing,error]
      Deployment does not have minimum availability., Progress deadline exceeded
    status: "False"
    type: Ready
  - lastUpdateTime: "2021-09-01T08:10:28Z"
    status: "True"
    type: Processed
  - lastUpdateTime: "2021-09-01T08:14:23Z"
    status: "True"
    type: Defined
  display:
    readyClusters: 0/1
    state: NotReady
  maxNew: 50
  maxUnavailable: 1
  maxUnavailablePartitions: 0
  observedGeneration: 1
  partitions:
  - count: 1
    maxUnavailable: 1
    name: All
    summary:
      desiredReady: 1
      nonReadyResources:
      - bundleState: NotReady
        modifiedStatus:
        - apiVersion: v1
          kind: ConfigMap
          missing: true
          name: longhorn-storageclass
          namespace: longhorn-system
        name: fleet-local/local
        nonReadyStatus:
        - apiVersion: apps/v1
          kind: Deployment
          name: harvester
          namespace: harvester-system
          summary:
            error: true
            message:
            - Progress deadline exceeded
            state: failed
          uid: 7a9ac92c-ca84-4e43-9bb6-45abe423f35c
        - apiVersion: apps/v1
          kind: Deployment
          name: harvester-webhook
          namespace: harvester-system
          summary:
            error: true
            message:
            - Deployment does not have minimum availability.
            - Progress deadline exceeded
            state: updating
            transitioning: true
          uid: c89d0909-af6f-4677-864e-93e5e14f0f5f
      notReady: 1
      ready: 0
    unavailable: 1
  summary:
    desiredReady: 1
    nonReadyResources:
    - bundleState: NotReady
      modifiedStatus:
      - apiVersion: v1
        kind: ConfigMap
        missing: true
        name: longhorn-storageclass
        namespace: longhorn-system
      name: fleet-local/local
      nonReadyStatus:
      - apiVersion: apps/v1
        kind: Deployment
        name: harvester
        namespace: harvester-system
        summary:
          error: true
          message:
          - Progress deadline exceeded
          state: failed
        uid: 7a9ac92c-ca84-4e43-9bb6-45abe423f35c
      - apiVersion: apps/v1
        kind: Deployment
        name: harvester-webhook
        namespace: harvester-system
        summary:
          error: true
          message:
          - Deployment does not have minimum availability.
          - Progress deadline exceeded
          state: updating
          transitioning: true
        uid: c89d0909-af6f-4677-864e-93e5e14f0f5f
    notReady: 1
    ready: 0
  unavailable: 1
  unavailablePartitions: 0
@guangbochen
Copy link
Contributor Author

It seems that this is not consistently reproducible.

Possible related logs when it happens:

# Logs from fleet-agent
time="2021-09-02T10:16:06Z" level=info msg="purge requested for mcc-harvester"
time="2021-09-02T10:16:06Z" level=info msg="Deleting orphan bundle ID mcc-local-managed-system-upgrade-controller, release cattle-system/mcc-local-managed-system-upgrade-controller"
time="2021-09-02T10:16:07Z" level=info msg="uninstall: Deleting mcc-local-managed-system-upgrade-controller"
time="2021-09-02T10:16:07Z" level=info msg="purge requested for mcc-local-managed-system-upgrade-controller"
# Logs from fleet-controller
time="2021-09-02T10:08:14Z" level=info msg="Deployed new agent for cluster fleet-local/local"
time="2021-09-02T10:08:14Z" level=info msg="System namespace (cattle-fleet-system) does not equal default namespace (fleet-system), checking for leftover objects..."
time="2021-09-02T10:08:21Z" level=info msg="Cluster registration fleet-local/request-9wr5q, cluster fleet-local/local granted [false]"
time="2021-09-02T10:08:21Z" level=info msg="Cluster registration fleet-local/request-9wr5q, cluster fleet-local/local granted [true]"
time="2021-09-02T10:08:39Z" level=info msg="rate limited bundle(map[objectset.rio.cattle.io/hash:ea476abea833a7a049113cc1ff17bd777c8e3889]) 1.034526264s"

The fleet bundle is in NotReady state for a single node cluster(Probably not related)

kubectl get bundle -n fleet-local mcc-harvester
NAME            BUNDLEDEPLOYMENTS-READY   STATUS
mcc-harvester   0/1                       NotReady(1) [Cluster fleet-local/local]; configmap.v1 longhorn-system/longhorn-storageclass missing; deployment.apps harvester-system/harvester error] Progress deadline exceeded; deployment.apps harvester-system/harvester-webhook [progressing,error] Deployment does not have minimum availability., Progress deadline exceeded

@gitlawr
Copy link
Contributor

gitlawr commented Sep 15, 2021

Reproduce steps with no Harvester involved:

  1. Provision a 4C8G ubuntu 20.04 AWS instance
  2. Install rke2
curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service
  1. Install helm,kubectl
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
snap install kubectl --classic
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
  1. Install rancher
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
kubectl create namespace cattle-system
helm install rancher rancher-latest/rancher \
  --version v2.6.0 \
  --namespace cattle-system \
  --set bootstrapPassword=admin \
  --set "features=multi-cluster-management=false,multi-cluster-management-agent=false" \
  --set hostPort=8888 \
  --set ingress.enabled=false \
  --set noDefaultAdmin=false \
  --set rancherImage=rancher/rancher \
  --set rancherImagePullPolicy=IfNotPresent \
  --set rancherImageTag=v2.6.0 \
  --set replicas=-2 \
  --set tls=external \
  --set useBundledSystemChart=true \
  --set bootstrapPassword=admin
  1. Access Rancher using https://node-ip:8888, Add chart repo to the local cluster with
    name: test, git url: https://github.com/gitlawr/test-chart, branch: main

  2. apply the following managedchart

apiVersion: management.cattle.io/v3
kind: ManagedChart
metadata:
  name: test
  namespace: fleet-local
spec:
  chart: test
  defaultNamespace: test
  repoName: test
  targets:
  - clusterName: local
    clusterSelector:
      matchExpressions:
      - key: provisioning.cattle.io/unmanaged-system-agent
        operator: DoesNotExist
  values:
    foo: bar
  version: 0.1.0
  1. Confirm that the chart is deployed:
$ helm ls -n test
NAME    	NAMESPACE	REVISION	UPDATED                               	STATUS  	CHART     	APP VERSION
mcc-test	test     	2       	2021-09-15 09:40:23.78808729 +0000 UTC	deployed	test-0.1.0	1.16.0
  1. Provision a 2C4G ubuntu 20.04 AWS instance, and run Rancher(central) v2.6.0 using docker.

  2. Import the rke2 cluster to the central Rancher.

  3. Check that the helm release is gone:

$ helm ls -n test
NAME	NAMESPACE	REVISION	UPDATED	STATUS	CHART	APP VERSION
  1. check that the pre-delete hook resource in the test chart is deployed(meaning helm delete is triggered)
$ kubectl get svc -n test
NAME       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
mcc-test   ClusterIP   10.43.238.109   <none>        80/TCP    2m57s

@yasker
Copy link
Member

yasker commented Sep 24, 2021

This seems still broken. I was using master from Sep 23rd and removing a Harvester cluster in Rancher Virtualization Management result in me losing the Harvester cluster.

Need Harvester team to confirm. cc @guangbochen @gitlawr

@gitlawr
Copy link
Contributor

gitlawr commented Sep 26, 2021

When the imported Harvester cluster is removed from upstream Rancher, the cattle-system namespace gets deleted.
It seems to be a regression of #33800.

Another point is that we still use v2.6.0 embedded Rancher in Harvester head due to #34866. The fix is not available in v2.6.1-rc3. We need to validate with embedded Rancher bumped to v2.6-head if that matters.

@guangbochen
Copy link
Contributor Author

guangbochen commented Sep 26, 2021

We need to check the root cause of why the #33800 regression occurs.
Previously we have confirmed that after the v2.6.0 Rancher release, it is not required for us to align the embedded Rancher version with the upstream Rancher version as this will be a common scenario that users will bump the upstream Rancher more regularly comparing to bump the Harvester HCI stack.

@gitlawr
Copy link
Contributor

gitlawr commented Sep 26, 2021

I tried a custom-built Harvester with the Rancher v2.6-head as the embedded one. When it is removed from the upstream the cattle-system namespace gets deleted. Reopen #33800 as a regression.

@yasker
Copy link
Member

yasker commented Sep 28, 2021

I think we should be able to close this per @gitlawr 's comment on #33800 (comment) . But I will defer that to @gitlawr

@yasker yasker added the status/awaiting-harvester Indicates an issue is waiting for Harvester team's action label Sep 28, 2021
@gitlawr
Copy link
Contributor

gitlawr commented Sep 29, 2021

Verified in v2.6-10dfee4974b196c0d3d8a33bc80f5d750c0272f2-head following steps described in #34716 (comment)
The helm release deployed by managedChart resource is not deleted after the downstream cluster is imported to upstream Rancher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/harvester status/awaiting-harvester Indicates an issue is waiting for Harvester team's action
Projects
None yet
Development

No branches or pull requests

6 participants