Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: add mon storage class support #12384

Merged
merged 6 commits into from
Aug 17, 2023
Merged

Conversation

ideepika
Copy link
Contributor

@ideepika ideepika commented Jun 14, 2023

It is generally desirable in production to have multiple zones availability. A
whole storage class could go unavailable then and enough mons could stay up to
keep the cluster running.
In such a case the distribution of mons can be made more homogeneous for each
zone, by this we can leverage the spread of mons for providing high
availability. This could be a useful scenario for a datacenter with own storage
class in contrast to ones provided by cloud providers.

The type of failure domain used for this scenario is commonly "zone", but can
be set to a different failure domain.

eg config:

 mon:
    count: 3
    allowMultiplePerNode: false
    **zones:**
    - name: a
      volumeClaimTemplate:
        spec:
          storageClassName: zone-a-storage
            resources:
              requests:
                storage: 10Gi

See also: #11407

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Copy link
Member

@parth-gr parth-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the description with what changes you are bringing

@ideepika ideepika force-pushed the feature/ksd-174 branch 5 times, most recently from ea76740 to 593f22f Compare June 20, 2023 15:18
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, we will also need:

  • Documentation update in ceph-cluster-crd.md
  • A comment added to the cluster-on-pvc.yaml example
  • This scenario will be difficult to test in the CI besides unit tests, but will need manual testing to confirm, if you could comment on the manual testing scenarios you've tried.

pkg/operator/ceph/cluster/mon/mon.go Outdated Show resolved Hide resolved
// Find a zone in the stretch cluster that still needs an assignment
for _, zone := range c.spec.Mon.StretchCluster.Zones {
for _, zone := range zones {
count, ok := zoneCount[zone.Name]
if !ok {
// The zone isn't currently assigned to any mon, so return it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 626 only applies to stretch cluster where two mons can be in some zones. For non-stretch, we should keep it to one mon per zone in all cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still looks like an issue (now see line 630 instead of 626). The stretch cluster chooses two mons per zone, but for non-stretch clusters, we only want one mon per zone. I believe we should update this condition

if c.spec.Mon.Count == 5 && count == 1 && !zone.Arbiter {

to this:

if c.spec.IsStretchCluster() && c.spec.Mon.Count == 5 && count == 1 && !zone.Arbiter {

pkg/operator/ceph/cluster/mon/mon.go Outdated Show resolved Hide resolved
pkg/operator/ceph/cluster/mon/mon.go Show resolved Hide resolved
} else {
zones = c.spec.Mon.Zones
}
//TODO: check if it works for all zones or not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it should work for all zones. Is this just a note about testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, just for testing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the comment now, or still need to test it?

pkg/apis/ceph.rook.io/v1/cluster.go Show resolved Hide resolved
@mergify
Copy link

mergify bot commented Jun 22, 2023

This pull request has merge conflicts that must be resolved before it can be merged. @ideepika please rebase it. https://rook.io/docs/rook/latest/Contributing/development-flow/#updating-your-fork

@ideepika ideepika force-pushed the feature/ksd-174 branch 2 times, most recently from 74f07b2 to f0e312e Compare July 5, 2023 10:59
@travisn
Copy link
Member

travisn commented Jul 5, 2023

@ideepika Please share the logs and summarize the issue you are seeing when creating the cluster with these changes.

@ideepika ideepika force-pushed the feature/ksd-174 branch 4 times, most recently from a3e7694 to a1992c6 Compare July 24, 2023 18:31
@ideepika
Copy link
Contributor Author

with :

+++ b/deploy/examples/cluster-test.yaml
@@ -32,6 +32,9 @@ spec:
     image: quay.io/ceph/ceph:v18
     allowUnsupported: true
   mon:
+    failureDomainLabel: topology.kubernetes.io/zone
+    zones:
+      - name: b
     count: 1

I was able to verify the zonal deployment based on labels:

Every 2.0s: kubectl get pods -n rook-ceph --show-labels                                                                                                                                                                       x1carbon: Tue Jul 25 16:18:17 2023

NAME                                     READY   STATUS             RESTARTS   AGE   LABELS
rook-ceph-csi-detect-version-mz8nh       0/1     ImagePullBackOff   0          14m   app=rook-ceph-csi-detect-version,controller-uid=7aceaaf8-6045-47d6-a0bc-c26f1f40894c,job-name=rook-ceph-csi-detect-version,rook-version=v1.11.0-alpha.0.540.ga1992c6d8-dirt
y
rook-ceph-mon-a-canary-587ccfb77-gd2g5   0/1     Pending            0          79s   app.kubernetes.io/component=cephclusters.ceph.rook.io,app.kubernetes.io/created-by=rook-ceph-operator,app.kubernetes.io/instance=a,app.kubernetes.io/managed-by=rook-ceph-o
perator,app.kubernetes.io/name=ceph-mon,app.kubernetes.io/part-of=my-cluster,app=rook-ceph-mon,ceph_daemon_id=a,ceph_daemon_type=mon,mon=a,mon_canary=true,mon_cluster=rook-ceph,pod-template-hash=587ccfb77,rook.io/operator-namespace=rook-ceph,rook_cluster=r
ook-ceph,zone=b

when node label is zone a

➜  examples git:(feature/ksd-174) ✗ kubectl get nodes --show-labels
NAME       STATUS   ROLES           AGE   VERSION   LABELS
minikube   Ready    control-plane   95m   v1.24.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube,kubernetes.io/os=linux,minikube.k8s.io/commit=62e108c3dfdec8029a890ad6d8ef96b6461426dc,minikube.k8s.io/name=minikube,minikube.k8s.io/primary=true,minikube.k8s.io/updated_at=2023_07_25T14_43_44_0700,minikube.k8s.io/version=v1.26.1,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=,topology.kubernetes.io/zone=a

when zone changed to b it's working as expected:

Every 2.0s: kubectl get pods -n rook-ceph                                                    x1carbon: Tue Jul 25 16:20:25 2023

NAME                                     READY   STATUS        RESTARTS   AGE
rook-ceph-csi-detect-version-rjdhw       0/1     Terminating   0          72s
rook-ceph-mon-a-78fc59df9-wxsdn          0/1     Running       0          5s
rook-ceph-mon-a-canary-587ccfb77-p6slc   1/1     Terminating   0          7s
rook-ceph-operator-558889fc56-tnp4l      1/1     Running       0          15s

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase to pick up the latest CI fixes.
The changes are looking correct, though since we don't have stretch cluster tested in the CI, it's a bit risky so I'd like to do some manual testing another day.

// Find a zone in the stretch cluster that still needs an assignment
for _, zone := range c.spec.Mon.StretchCluster.Zones {
for _, zone := range zones {
count, ok := zoneCount[zone.Name]
if !ok {
// The zone isn't currently assigned to any mon, so return it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still looks like an issue (now see line 630 instead of 626). The stretch cluster chooses two mons per zone, but for non-stretch clusters, we only want one mon per zone. I believe we should update this condition

if c.spec.Mon.Count == 5 && count == 1 && !zone.Arbiter {

to this:

if c.spec.IsStretchCluster() && c.spec.Mon.Count == 5 && count == 1 && !zone.Arbiter {

Documentation/CRDs/Cluster/ceph-cluster-crd.md Outdated Show resolved Hide resolved
Deepika Upadhyay added 5 commits August 3, 2023 16:41
if zones field is specified, deploy the mon based on the zone specified,
this will help with higher mon availability.

See also: rook#11407

Signed-off-by: Deepika Upadhyay <deepika@koor.tech>
Signed-off-by: Deepika Upadhyay <deepika@koor.tech>
Signed-off-by: Deepika Upadhyay <deepika@koor.tech>
Signed-off-by: Deepika Upadhyay <deepika@koor.tech>
* fixes mon zonal deployment of 2 per zone to 1 if zones field is
  specified.
* adds zones field doc to CRDs Documentation.
* add code for picking up failureDomainLabel if specified for zones
  field in mon deployment

Signed-off-by: Deepika Upadhyay <deepika@koor.tech>
Documentation/CRDs/Cluster/ceph-cluster-crd.md Outdated Show resolved Hide resolved
Documentation/CRDs/Cluster/ceph-cluster-crd.md Outdated Show resolved Hide resolved
@ideepika
Copy link
Contributor Author

ideepika commented Aug 7, 2023

@travisn PTAL

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testing of a stretch cluster and the new zone cluster looks good, will approve after the last question about the TODO...

Signed-off-by: Deepika Upadhyay <deepika@koor.tech>
@ideepika
Copy link
Contributor Author

@travisn anything that I can help with this PR?

@subhamkrai
Copy link
Contributor

@ideepika how about you just rebase with master and push, we have multiple CI fixes so we should have more green CI

@travisn
Copy link
Member

travisn commented Aug 17, 2023

@Javlopez Ready to approve?

@travisn travisn merged commit afaadd7 into rook:master Aug 17, 2023
45 of 49 checks passed
travisn added a commit to travisn/rook that referenced this pull request Aug 17, 2023
The CRDs were not generated for the latest since rook#12384
was merged without rebasing, and other updates with a newer
dependency required the newer descriptions.

Signed-off-by: travisn <tnielsen@redhat.com>
@travisn
Copy link
Member

travisn commented Aug 17, 2023

@ideepika how about you just rebase with master and push, we have multiple CI fixes so we should have more green CI

I missed waiting for this, now opened #12751 for the follow-up PR to fix the CI.

travisn added a commit that referenced this pull request Aug 17, 2023
mon: add mon storage class support (backport #12384)
mergify bot pushed a commit that referenced this pull request Aug 17, 2023
The CRDs were not generated for the latest since #12384
was merged without rebasing, and other updates with a newer
dependency required the newer descriptions.

Signed-off-by: travisn <tnielsen@redhat.com>
(cherry picked from commit 438b617)
malayparida2000 added a commit to malayparida2000/ocs-operator that referenced this pull request Aug 23, 2023
Ref-rook/rook#12384

Signed-off-by: Malay Kumar Parida <mparida@redhat.com>
malayparida2000 added a commit to malayparida2000/ocs-operator that referenced this pull request Aug 23, 2023
Ref-rook/rook#12384

Signed-off-by: Malay Kumar Parida <mparida@redhat.com>
malayparida2000 added a commit to malayparida2000/ocs-operator that referenced this pull request Aug 23, 2023
Ref-rook/rook#12384

Signed-off-by: Malay Kumar Parida <mparida@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ocs-operator that referenced this pull request Aug 23, 2023
Ref-rook/rook#12384

Signed-off-by: Malay Kumar Parida <mparida@redhat.com>
raaizik pushed a commit to raaizik/ocs-operator that referenced this pull request Sep 21, 2023
Ref-rook/rook#12384

Signed-off-by: Malay Kumar Parida <mparida@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants