Backport: cluster-monitoring is being constantly re-deployed #20111

alena1108 · 2019-05-09T18:49:31Z

Backport #19945

jiaqiluo · 2019-05-10T17:19:22Z

The bug fix is validated on Rancher: v2.2.3-rc8

Test 1:

add a cluster with 3 etcd nodes, 2 control plane nodes and any number of worker nodes
enable the cluster monitoring
enable the project monitoring

Result:

notice that the app cluster-monitoring and project-monitoring are deployed only one time.

Test 2:

run Rancher: v2.2.2
add two identical clusters and each has 3 etcd nodes, 2 control plane nodes and any number of worker nodes
on cluster1 enable the cluster monitoring
upgrade Rancher to v2.2.3-rc8
check if the cluster-monitoring app gets re-deployed multiple times
on cluster1 enable the project monitoring
on cluster2 enable the cluster monitoring and the project monitoring

Result:

the app cluster-monitoring and project-monitoring are deployed only one time.

jiaqiluo · 2019-05-10T21:19:46Z

The bug is not fixed, see #19945 (comment)

alena1108 · 2019-05-13T17:23:45Z

Can be tested with v2.2.2-patch1-rc2

jiaqiluo · 2019-05-13T20:52:41Z

The bug fix is validated on Rancher: v2.2.2-patch1-rc2

Test 1 - single install, fresh install:

add a cluster with 3 etcd nodes, 2 control plane nodes, and 2 worker nodes
enable the cluster monitoring
deploy some workloads and apps
run automation tests on the cluster

Result:

notice that the app cluster-monitoring and project-monitoring are deployed only one time.

Test 2 - upgrade path, HA install:

run Rancher: v2.2.2 on 3-node HA mode
add one custom RKE cluster and one imported RKE cluster, each has 3 etcd nodes, 2 control plane nodes, and 2 worker nodes
enable the cluster monitoring
run pre-upgrade tests
upgrade Rancher to v2.2.3-rc8
run post-upgrade tests
disable cluster monitoring to confirm that old config maps are deleted
enable cluster monitoring again
deploy some workloads and apps
run automation tests on the cluster

Result:

the app cluster-monitoring and project-monitoring are deployed only one time.

krisdevopsbot · 2019-05-23T01:06:50Z

Upgraded cluster to 2.2.3 disabled/enabled monitoring (cattle-prometheus namespace was deleted.) Enabled cluster monitoring again and seeing cluster-monitor already at cluster-monitoring.v18. Is there anything in the upgrade path I may have missed?

jiaqiluo · 2019-05-23T17:40:55Z

Hi @krisdevopsbot
What is the Rancher version of your setup before upgrading? Do you mean that you disable the monitoring before upgrading, and enable it after upgrading to v2.2.3?

Did you delete the namespace cattle-prometheus manually? Disabling monitoring does not delete the namespace by itself. And it should not be a problem if you delete it manually.

roycec · 2019-05-29T11:16:26Z

Hello,

I upgraded from 2.2.2 to 2.2.3 yesterday. I then saw, the configmaps still getting created. Today I disabled the monitoring feature (on cluster level), waited until the components were removed and activated monitoring again. But with the same result, that a new configmap is created every few minutes.

krisdevopsbot · 2019-05-29T12:03:45Z

Yes, I disabled the feature and checked to make sure cattle-prometheus was deleted prior up upgrade

krisdevopsbot · 2019-05-31T21:52:57Z

I built a new 3 node cluster on new VMs with RKE (HA airgap install) and still seeing this on 2.2.3

krisdevopsbot · 2019-05-31T23:25:03Z

#20186 workaround fixed it

alena1108 added the kind/bug Issues that are defects reported by users or that we know have reached a real release label May 9, 2019

alena1108 added this to the v2.2.3 milestone May 9, 2019

alena1108 assigned thxCode and orangedeng May 9, 2019

alena1108 added status/resolved [zube]: To Test labels May 9, 2019

alena1108 assigned sangeethah May 9, 2019

alena1108 added the status/triaged label May 9, 2019

sangeethah assigned jiaqiluo and unassigned sangeethah May 9, 2019

loganhz added the team/cn label May 10, 2019

jiaqiluo closed this as completed May 10, 2019

zube bot added [zube]: Done and removed [zube]: To Test labels May 10, 2019

jiaqiluo reopened this May 10, 2019

zube bot added [zube]: Next Up and removed [zube]: Done labels May 10, 2019

jiaqiluo added [zube]: Reopened status/reopened labels May 10, 2019

zube bot removed the [zube]: Next Up label May 10, 2019

jiaqiluo added [zube]: Next Up and removed status/resolved labels May 10, 2019

zube bot removed the [zube]: Reopened label May 10, 2019

jiaqiluo added the [zube]: Reopened label May 10, 2019

zube bot removed the [zube]: Next Up label May 10, 2019

loganhz added the status/ready-for-review label May 11, 2019

loganhz added the status/resolved label May 11, 2019

alena1108 added the status/merged-waiting-for-rc label May 11, 2019

alena1108 modified the milestones: v2.2.4, v2.2.3 May 13, 2019

alena1108 added the status/to-test label May 13, 2019

jiaqiluo closed this as completed May 13, 2019

zube bot added [zube]: Done and removed [zube]: Reopened labels May 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport: cluster-monitoring is being constantly re-deployed #20111

Backport: cluster-monitoring is being constantly re-deployed #20111

alena1108 commented May 9, 2019

jiaqiluo commented May 10, 2019

jiaqiluo commented May 10, 2019

alena1108 commented May 13, 2019

jiaqiluo commented May 13, 2019

krisdevopsbot commented May 23, 2019

jiaqiluo commented May 23, 2019 •

edited

roycec commented May 29, 2019

krisdevopsbot commented May 29, 2019 •

edited

krisdevopsbot commented May 31, 2019

krisdevopsbot commented May 31, 2019 •

edited

Backport: cluster-monitoring is being constantly re-deployed #20111

Backport: cluster-monitoring is being constantly re-deployed #20111

Comments

alena1108 commented May 9, 2019

jiaqiluo commented May 10, 2019

jiaqiluo commented May 10, 2019

alena1108 commented May 13, 2019

jiaqiluo commented May 13, 2019

krisdevopsbot commented May 23, 2019

jiaqiluo commented May 23, 2019 • edited

roycec commented May 29, 2019

krisdevopsbot commented May 29, 2019 • edited

krisdevopsbot commented May 31, 2019

krisdevopsbot commented May 31, 2019 • edited

jiaqiluo commented May 23, 2019 •

edited

krisdevopsbot commented May 29, 2019 •

edited

krisdevopsbot commented May 31, 2019 •

edited