Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daily octopus upgrade test is failing due to unexpected octopus version #9097

Closed
travisn opened this issue Nov 3, 2021 · 4 comments · Fixed by #9098
Closed

Daily octopus upgrade test is failing due to unexpected octopus version #9097

travisn opened this issue Nov 3, 2021 · 4 comments · Fixed by #9098
Labels

Comments

@travisn
Copy link
Member

travisn commented Nov 3, 2021

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
The daily octopus upgrade test is failing. See this test for example.

The test log shows that the rgw daemon is not updating, then the test times out and fails:

2021-11-03 00:25:29.028371 I | testutil: waiting for 1 deployments (found 0) with label app=rook-ceph-rgw,ceph-version!=15.2.15-0 in namespace upgrade-ns

The operator log shows that there is an unexpected version in the octopus image. Since it looks like a downgrade, rgw refuses to upgrade even though all the other daemons including mon,mgr,osd choose to upgrade.

2021-11-03 22:19:10.694041 I | ceph-cluster-controller: CR has changed for "upgrade-ns". diff=  v1.ClusterSpec{
  	CephVersion: v1.CephVersionSpec{
- 		Image:            "quay.io/ceph/ceph:v15",
+ 		Image:            "quay.io/ceph/daemon-base:latest-octopus-devel",
  		AllowUnsupported: false,
  	},
  	Storage:     {UseAllNodes: true, Config: {"databaseSizeMB": "1024", "journalSizeMB": "1024"}, Selection: {UseAllDevices: &true}},
  	Annotations: nil,
  	... // 21 identical fields
  }
2021-11-03 22:19:10.694939 I | ceph-cluster-controller: reconciling ceph cluster in namespace "upgrade-ns"
2021-11-03 22:19:10.718721 I | op-mon: parsing mon endpoints: a=10.101.175.6:6789
2021-11-03 22:19:10.770814 I | ceph-cluster-controller: detecting the ceph image version for image quay.io/ceph/daemon-base:latest-octopus-devel...
2021-11-03 22:19:15.011090 I | ceph-cluster-controller: detected ceph image version: "15.2.14-133 octopus"
2021-11-03 22:19:15.032465 I | ceph-cluster-controller: validating ceph version from provided image
2021-11-03 22:19:15.052961 I | op-mon: parsing mon endpoints: a=10.101.175.6:6789
2021-11-03 22:19:15.059380 I | cephclient: writing config file /var/lib/rook/upgrade-ns/upgrade-ns.config
2021-11-03 22:19:15.059739 I | cephclient: generated admin config in /var/lib/rook/upgrade-ns
2021-11-03 22:19:15.816626 E | ceph-cluster-controller: failed to determine if we should upgrade or not. image spec version 15.2.14-133 octopus is lower than the running cluster version 15.2.15-0 octopus, downgrading is not supported
2021-11-03 22:19:15.816714 I | ceph-cluster-controller: cluster "upgrade-ns": version "15.2.14-133 octopus" detected for image "quay.io/ceph/daemon-base:latest-octopus-devel"

While the downgraded version is unexpected, the operator should downgrade the daemons consistently instead of proceeding with some and skipping others.

The operator is failing to decide if it's an upgrade here, and the isUpgrade flag is not set as seen later in the method. If the images changed in any way, we should just assume it's an upgrade.

Expected behavior:
The daily tests should pass consistently.

@leseb
Copy link
Member

leseb commented Nov 4, 2021

@djgalloway any idea why the latest version tag (in the latest-octopus-devel image) is lower than the most recent stable Octopus?

Essentially "quay.io/ceph/daemon-base:latest-octopus-devel" has 15.2.14-133 where the latest Octopus stable is 15.2.15.

Thanks!

@leseb leseb reopened this Nov 4, 2021
@djgalloway
Copy link

It's this: ceph/ceph-container#1959

@leseb
Copy link
Member

leseb commented Nov 4, 2021

It's this: ceph/ceph-container#1959

I'm not sure I understand.

@djgalloway
Copy link

Sorry, the ceph-container-build-push-imgs-devel-nightly job has been failing since October 12 because forego is no longer available.

Dimitri's PR will build forego from source so the nightly job can proceed.

@leseb leseb closed this as completed Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants