Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport: cluster-monitoring is being constantly re-deployed #20111

Closed
alena1108 opened this issue May 9, 2019 · 10 comments
Closed

Backport: cluster-monitoring is being constantly re-deployed #20111

alena1108 opened this issue May 9, 2019 · 10 comments
Assignees
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release [zube]: Done
Milestone

Comments

@alena1108
Copy link

Backport #19945

@alena1108 alena1108 added the kind/bug Issues that are defects reported by users or that we know have reached a real release label May 9, 2019
@alena1108 alena1108 added this to the v2.2.3 milestone May 9, 2019
@sangeethah sangeethah assigned jiaqiluo and unassigned sangeethah May 9, 2019
@jiaqiluo
Copy link
Member

The bug fix is validated on Rancher: v2.2.3-rc8

Test 1:

  • add a cluster with 3 etcd nodes, 2 control plane nodes and any number of worker nodes
  • enable the cluster monitoring
  • enable the project monitoring

Result:

  • notice that the app cluster-monitoring and project-monitoring are deployed only one time.

Test 2:

  • run Rancher: v2.2.2
  • add two identical clusters and each has 3 etcd nodes, 2 control plane nodes and any number of worker nodes
  • on cluster1 enable the cluster monitoring
  • upgrade Rancher to v2.2.3-rc8
  • check if the cluster-monitoring app gets re-deployed multiple times
  • on cluster1 enable the project monitoring
  • on cluster2 enable the cluster monitoring and the project monitoring

Result:

  • the app cluster-monitoring and project-monitoring are deployed only one time.

@jiaqiluo
Copy link
Member

The bug is not fixed, see #19945 (comment)

@alena1108
Copy link
Author

Can be tested with v2.2.2-patch1-rc2

@jiaqiluo
Copy link
Member

The bug fix is validated on Rancher: v2.2.2-patch1-rc2

Test 1 - single install, fresh install:

  • add a cluster with 3 etcd nodes, 2 control plane nodes, and 2 worker nodes
  • enable the cluster monitoring
  • deploy some workloads and apps
  • run automation tests on the cluster

Result:

  • notice that the app cluster-monitoring and project-monitoring are deployed only one time.

Test 2 - upgrade path, HA install:

  • run Rancher: v2.2.2 on 3-node HA mode
  • add one custom RKE cluster and one imported RKE cluster, each has 3 etcd nodes, 2 control plane nodes, and 2 worker nodes
  • enable the cluster monitoring
  • run pre-upgrade tests
  • upgrade Rancher to v2.2.3-rc8
  • run post-upgrade tests
  • disable cluster monitoring to confirm that old config maps are deleted
  • enable cluster monitoring again
  • deploy some workloads and apps
  • run automation tests on the cluster

Result:

  • the app cluster-monitoring and project-monitoring are deployed only one time.

@krisdevopsbot
Copy link

Upgraded cluster to 2.2.3 disabled/enabled monitoring (cattle-prometheus namespace was deleted.) Enabled cluster monitoring again and seeing cluster-monitor already at cluster-monitoring.v18. Is there anything in the upgrade path I may have missed?

@jiaqiluo
Copy link
Member

jiaqiluo commented May 23, 2019

Hi @krisdevopsbot
What is the Rancher version of your setup before upgrading? Do you mean that you disable the monitoring before upgrading, and enable it after upgrading to v2.2.3?

Did you delete the namespace cattle-prometheus manually? Disabling monitoring does not delete the namespace by itself. And it should not be a problem if you delete it manually.

@roycec
Copy link

roycec commented May 29, 2019

Hello,

I upgraded from 2.2.2 to 2.2.3 yesterday. I then saw, the configmaps still getting created. Today I disabled the monitoring feature (on cluster level), waited until the components were removed and activated monitoring again. But with the same result, that a new configmap is created every few minutes.

@krisdevopsbot
Copy link

krisdevopsbot commented May 29, 2019

Yes, I disabled the feature and checked to make sure cattle-prometheus was deleted prior up upgrade

@krisdevopsbot
Copy link

I built a new 3 node cluster on new VMs with RKE (HA airgap install) and still seeing this on 2.2.3

@krisdevopsbot
Copy link

krisdevopsbot commented May 31, 2019

#20186 workaround fixed it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release [zube]: Done
Projects
None yet
Development

No branches or pull requests

8 participants