Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: disable mirroring on pool with "image" mode #13905

Merged
merged 1 commit into from
Mar 19, 2024

Conversation

sp98
Copy link
Contributor

@sp98 sp98 commented Mar 11, 2024

For image mode mirroring, if cephBlockPool.Pool.Spec.Mirroring.Enable is set to false, then remove the peer cluster and disable mirroring on all the pool if the user has disabled mirroring on all the pool images.

if mirroring is not disabled on all the pool images, then reconcile will fail asking the users to manually disable mirroring on those images.

Some initial Testing:

When mirroring is enabled:
--------------------

runner@fv-az1017-987:~/work/rook/rook$ oc get cephblockpool replicapool -o yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  creationTimestamp: "2024-03-15T04:35:27Z"
  finalizers:
  - cephblockpool.ceph.rook.io
  generation: 3
  name: replicapool
  namespace: rook-ceph
  resourceVersion: "6053"
  uid: ee1db641-1657-47cd-a031-d08c44f492ca
spec:
  application: ""
  erasureCoded:
    codingChunks: 0
    dataChunks: 0
  failureDomain: osd
  mirroring:
    enabled: true
    mode: image
    peers:
      secretNames:
      - pool-peer-token-replicapool-config
  quotas: {}
  replicated:
    size: 1
  statusCheck:
    mirror: {}
status:
  info:
    rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool
  mirroringInfo:
    lastChanged: "2024-03-15T04:54:23Z"
    lastChecked: "2024-03-15T04:55:23Z"
    mode: image
    peers:
    - client_name: client.rbd-mirror-peer
      direction: rx-tx
      mirror_uuid: 910ebd07-140c-4415-8cde-de715efcc35a
      site_name: b6a29137-eaa6-4a7a-a5a5-b9577d63309c
      uuid: 5221e1e2-38f2-49a6-8d08-0e47a1937655
    site_name: 357462f6-0c53-4b02-8c5a-763b1f500cff
  mirroringStatus:
    lastChecked: "2024-03-15T04:55:23Z"
    summary:
      daemon_health: OK
      health: OK
      image_health: OK
      states:
        replaying: 1
  observedGeneration: 1
  phase: Ready
  snapshotScheduleStatus: {}
  
  
  
  After disabling mirroring on each pool image and then disabling mirroring on cephCluster spec.
  ---------------------------
  
  runner@fv-az1017-987:~/work/rook/rook$ oc get cephblockpool replicapool -o yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  creationTimestamp: "2024-03-15T04:35:27Z"
  finalizers:
  - cephblockpool.ceph.rook.io
  generation: 4
  name: replicapool
  namespace: rook-ceph
  resourceVersion: "6232"
  uid: ee1db641-1657-47cd-a031-d08c44f492ca
spec:
  application: ""
  erasureCoded:
    codingChunks: 0
    dataChunks: 0
  failureDomain: osd
  quotas: {}
  replicated:
    size: 1
  statusCheck:
    mirror: {}
status:
  mirroringInfo:
    lastChanged: "2024-03-15T04:57:23Z"
  mirroringStatus: {}
  observedGeneration: 4
  phase: Ready
  snapshotScheduleStatus: {}
runner@fv-az1017-987:~/work/rook/rook$ 


-------------

runner@fv-az1017-987:~/work/rook/rook$ oc exec -it rook-ceph-tools-848bb44cb-w44zh sh 
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
sh-4.4$ ceph status  
  cluster:
    id:     357462f6-0c53-4b02-8c5a-763b1f500cff
    health: HEALTH_OK
 
  services:
    mon:           1 daemons, quorum a (age 28m)
    mgr:           a(active, since 22m)
    mds:           1/1 daemons up, 1 hot standby
    osd:           1 osds: 1 up (since 27m), 1 in (since 28m)
    cephfs-mirror: 1 daemon active (1 hosts)
    rbd-mirror:    1 daemon active (1 hosts)
 
  data:
    volumes: 1/1 healthy
    pools:   5 pools, 144 pgs
    objects: 42 objects, 1.2 MiB
    usage:   50 MiB used, 6.0 GiB / 6 GiB avail
    pgs:     144 active+clean
 
  io:
    client:   853 B/s rd, 1 op/s rd, 0 op/s wr
 
sh-4.4$ rbd mirror pool status replicapool
rbd: mirroring not enabled on the pool
sh-4.4$ 

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@sp98 sp98 force-pushed the remove-stale-cluster-peers branch 2 times, most recently from 805e257 to 02a2961 Compare March 11, 2024 08:09
@sp98 sp98 requested a review from travisn March 11, 2024 08:26
@sp98 sp98 marked this pull request as ready for review March 11, 2024 08:26
@sp98 sp98 changed the title core: remove stale cluster peer [DNM]core: remove stale cluster peer Mar 11, 2024
@sp98 sp98 changed the title [DNM]core: remove stale cluster peer [Not Ready for Review]core: remove stale cluster peer Mar 11, 2024
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch 2 times, most recently from 6719ab0 to 2ddf7ff Compare March 11, 2024 12:50
pkg/operator/ceph/pool/peers.go Outdated Show resolved Hide resolved
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch 2 times, most recently from 5fb0d1b to d6eef01 Compare March 12, 2024 04:24
@sp98 sp98 added the debug-ci run CI with debugging label Mar 12, 2024
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch 4 times, most recently from 89ce927 to 7f431f4 Compare March 13, 2024 03:41
@sp98 sp98 changed the title [Not Ready for Review]core: remove stale cluster peer core: remove stale cluster peer Mar 13, 2024
@sp98 sp98 changed the title core: remove stale cluster peer [WIP]core: remove stale cluster peer Mar 13, 2024
Copy link
Member

@parth-gr parth-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi cluster mirroring ci test is failing

@sp98
Copy link
Contributor Author

sp98 commented Mar 13, 2024

multi cluster mirroring ci test is failing

Still testing it with the CI. So added a wait:

        run : |
            sleep 1h```

And I just stop the CI in sometime. That causes it to fail. Otherwise it was working fine.

@sp98 sp98 force-pushed the remove-stale-cluster-peers branch 3 times, most recently from f46a100 to d3ffd23 Compare March 14, 2024 08:58
@sp98 sp98 requested review from travisn and parth-gr March 14, 2024 10:26
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch from d3ffd23 to 59cb485 Compare March 14, 2024 10:32
pkg/daemon/ceph/client/pool.go Outdated Show resolved Hide resolved
return errors.Wrapf(err, "failed to remove cluster peer with UUID %q for the pool %q", peer.UUID, pool.Name)
}

for _, image := range images {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a lot of images. I wonder if we need to clean this up in a job, similar to how it's proposed for force deletion of the subvolume groups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made changes to disable mirroring on only those images were it was enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we still don't have the force deletion of the subvolume PR yet, I propose that disabling mirroring for images in a job (if required), can be taken up once the other PR is ready.

@sp98 sp98 force-pushed the remove-stale-cluster-peers branch 2 times, most recently from 81f49fd to b3d0a5c Compare March 18, 2024 03:57
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch 2 times, most recently from 1d81deb to d3caa25 Compare March 18, 2024 12:00
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch 2 times, most recently from 2900443 to 800b9c1 Compare March 18, 2024 13:42
@sp98 sp98 requested review from Madhu-1 and parth-gr March 18, 2024 13:51
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch from 800b9c1 to c835b8a Compare March 18, 2024 13:54
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch from c835b8a to 186d8b6 Compare March 18, 2024 13:56
pkg/daemon/ceph/client/mirror.go Outdated Show resolved Hide resolved
pkg/daemon/ceph/client/mirror.go Outdated Show resolved Hide resolved
pkg/daemon/ceph/client/mirror.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/controller.go Outdated Show resolved Hide resolved
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch from 186d8b6 to 9c98803 Compare March 18, 2024 14:16
@sp98 sp98 requested a review from Madhu-1 March 18, 2024 14:17
Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sp98
Copy link
Contributor Author

sp98 commented Mar 18, 2024

so we are still pondering on whether we should disable mirroring on the images in this PR

@travisn @Madhu-1 thoughts?

@Madhu-1
Copy link
Member

Madhu-1 commented Mar 18, 2024

so we are still pondering on whether we should disable mirroring on the images in this PR

@travisn @Madhu-1 thoughts?

IMO we should not do it internally until users want us to do it with some extra flags/annotations, this could be a separate PR's

@Madhu-1
Copy link
Member

Madhu-1 commented Mar 18, 2024

i forgot one case where we want to allow users to remove unwanted(dead) cluster mirroring peers.

@travisn travisn dismissed their stale review March 18, 2024 17:00

Only a few minor questions remain, removing review to unblock after others approve

@sp98
Copy link
Contributor Author

sp98 commented Mar 19, 2024

i forgot one case where we want to allow users to remove unwanted(dead) cluster mirroring peers.

Currently, if mirroring is disabled by the customer, then we remove all the peers on the pool (provided that the mirroring on the pool images is disabled).

We don't have the automation to only remove specific/unwanted/dead peers as of now. But that looks like a valid use case and is not very difficult to implement. We can discuss more about this and maybe take it up in a future PR.

@sp98 sp98 requested a review from Madhu-1 March 19, 2024 06:50
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch from 9c98803 to b56fda1 Compare March 19, 2024 06:52
Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

For image mode mirroring, if cephBlockPool.Pool.Spec.Mirroring.Enable
is set to false, then remove the peer cluster and disable mirroring on
all the pool if the user has disabled mirroring on all the pool images.

If mirroring is not disabled on all the pool images, then reconcile will
fail asking the users to manually disable mirroring on those images.

Signed-off-by: sp98 <sapillai@redhat.com>
@sp98 sp98 force-pushed the remove-stale-cluster-peers branch from b56fda1 to a37c476 Compare March 19, 2024 08:50
@sp98 sp98 removed the debug-ci run CI with debugging label Mar 19, 2024
return errors.Wrapf(err, "failed to list mirrored images for pool %q", pool.Name)
}

if len(*mirroredPools.Images) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we always need to disable it, do the code after this ever hit,
Or the flow is,
Disable mirroring -> disable mirroring manually on images-> the re-start reconcile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For image mode mirroring, we only want to disable it only when mirroring is disable on all the images on the pool.

len(*mirroredPools.Images) > 0 would mean that there are still images in this pool with mirroring enabled. So we will return the error with this message. Reconcile will keep on failing until the users manually disable the mirroring on each image on this pool.

@sp98 sp98 merged commit 20d2461 into rook:master Mar 19, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants