Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object buckets gone after upgrade from v1.5.1 to v1.5.2 #6767

Closed
maximilize opened this issue Dec 4, 2020 · 1 comment · Fixed by #6773
Closed

Object buckets gone after upgrade from v1.5.1 to v1.5.2 #6767

maximilize opened this issue Dec 4, 2020 · 1 comment · Fixed by #6773
Assignees
Labels
bug ceph main ceph tag
Projects

Comments

@maximilize
Copy link

Is this a bug report or feature request?

  • Bug Report

I have a rook-ceph cluster running with cephfs, block storage and object store enabled. I set it up just recently with version v1.5.1 and used most of the time the example manifests.
After applying the v1.5.1 manifests, the s3 buckets are gone inside radosgw, but the CRD's still exists.

It seems that the rook operator got stuck in a loop of re-creating and deleting the buckets.

Expected behavior:

Applying the upgrade to v1.5.2 without any issues, especially without deleting all data.

How to reproduce it (minimal and precise):

  1. Setup a rook-ceph v1.5.1 cluster with object storage enabled.
  2. Create one or more buckets. I used the storage class rook-ceph-delete-bucket
  3. Apply upgrades to v1.5.2:
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.5.2/cluster/examples/kubernetes/ceph/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.5.2/cluster/examples/kubernetes/ceph/common.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.5.2/cluster/examples/kubernetes/ceph/operator.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.5.2/cluster/examples/kubernetes/ceph/toolbox.yaml
  1. After a while the buckets are gone. Check with radosgw-admin bucket ls. The CRD's still exists

File(s) to submit:
ceph-s3-crash.log (Operator log)

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 20.04
  • Cloud provider or hardware configuration: Bare metal
  • Kubernetes version (use kubectl version): v1.19.3
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Rancher RKE
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): Healthy
@maximilize maximilize added the bug label Dec 4, 2020
@travisn travisn added this to Blocking Release in v1.5 via automation Dec 4, 2020
@BlaineEXE
Copy link
Member

Having taken a little break and come back to this, I think this is a result of a bug that is fixed in v1.5.2. For creating new object bucket claims using Rook v1.5.2, the behavior should work properly. But when upgrading, the bug existing before v1.5.2 is exacerbated/exposed.

I think the fix for the upgrade scenario here is to fix more of the small issues in the lib-bucket-provisioner and make it so it will overwrite existing secrets and configmaps if they already exist rather than giving up and erroring out.

@BlaineEXE BlaineEXE added the ceph main ceph tag label Dec 4, 2020
BlaineEXE added a commit to BlaineEXE/rook that referenced this issue Dec 4, 2020
This reverts commit 3b4ee6c.

Revert the fix that introduces the following bug on upgrade:
rook#6767

This fix as well as a fix for the upgrade case will follow in days to
come.

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
v1.5 automation moved this from Blocking Release to Done Dec 4, 2020
mergify bot pushed a commit that referenced this issue Dec 4, 2020
This reverts commit 3b4ee6c.

Revert the fix that introduces the following bug on upgrade:
#6767

This fix as well as a fix for the upgrade case will follow in days to
come.

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
(cherry picked from commit 22cd1bf)
BlaineEXE added a commit to BlaineEXE/rook that referenced this issue Dec 7, 2020
Fixes rook#6650
Does not reintroduce bug rook#6767 from previous fix for rook#6650

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
BlaineEXE added a commit to BlaineEXE/rook that referenced this issue Dec 8, 2020
Fixes rook#6650
Does not reintroduce bug rook#6767 from previous fix for rook#6650

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
BlaineEXE added a commit to BlaineEXE/rook that referenced this issue Dec 8, 2020
Fixes rook#6650
Does not reintroduce bug rook#6767 from previous fix for rook#6650

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
BlaineEXE added a commit to BlaineEXE/rook that referenced this issue Dec 16, 2020
Fixes rook#6650
Does not reintroduce bug rook#6767 from previous fix for rook#6650

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
BlaineEXE added a commit to BlaineEXE/rook that referenced this issue Dec 16, 2020
Fixes rook#6650
Does not reintroduce bug rook#6767 from previous fix for rook#6650

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
BlaineEXE added a commit to BlaineEXE/rook that referenced this issue Dec 16, 2020
Update to the latest lib bucket provisioner code.
Fixes rook#6650

Modifies CRD for objectbucketclaims to fix an additional bug where an
ObjectBucket's 'ClaimRef' is lost due to the CRD validation being
specified incorrectly.

Does not reintroduce bug rook#6767 from previous fix for rook#6650

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ceph main ceph tag
Projects
No open projects
v1.5
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants