New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
block creation of StorageClass until prerequisites are met #1224
block creation of StorageClass until prerequisites are met #1224
Conversation
3222fa3
to
9a791e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, this doesn't seem to take into account that either of CephBlockPool or CephFilesystem could be disabled. Would it make more sense to extend StorageClusterConfiguration to include a "required kind" or something so that we can just do the check on a per-SC basis here? https://github.com/openshift/ocs-operator/blob/master/controllers/storagecluster/storageclasses.go#L88-L93
Also, CI failures look valid. Check make ocs-operator-ci output.
Don't we always create both the StorageClasses? I don't think a "required kind" is necessary since we know what is required.
Yes, they are valid and I give up on fixing it. It requires re-factoring entire unit test framework (which is actually running integration test internally). |
No. If there's no CephBlockPool created it makes no sense to create a StorageClass pointing to something that doesn't exist.
It would be a cleaner implementation. Rather than having an
The The e2e failures look like CI infra issues, so we just need to retest. That can wait until LGTM. |
da259e7
to
6fb50fc
Compare
Updated to check if particular StorageClass is enabled before trying to wait. But, it seems like StorageClass creation doesn't care about the setting anyway and that's a different PR.
Fixed the annoying uppercase/lowercase issue. |
|
/retest |
6fb50fc
to
d8b4e05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, this doesn't seem to take into account that either of CephBlockPool or CephFilesystem could be disabled. Would it make more sense to extend
StorageClusterConfigurationto include a "required kind" or something so that we can just do the check on a per-SC basis here? https://github.com/openshift/ocs-operator/blob/master/controllers/storagecluster/storageclasses.go#L88-L93Updated to check if particular StorageClass is enabled before trying to wait. But, it seems like StorageClass creation doesn't care about the setting anyway and that's a different PR.
Oh it absolutely does!! It's embedded into the StorageClassConfiguration: https://github.com/openshift/ocs-operator/blob/master/controllers/storagecluster/storageclasses.go#L81-L83 That said, I understand it may be difficult (and a bad idea) to try and generalize the logic of checking for valid resources. As such I recommend just moving the existing logic you have to either the new*StorageClassConfiguration() functions, or newStorageClassConfigurations(), or maybe even in createStorageClasses() itself. Don't block the creation of all StorageClasses if one fails, but do return err so we block further reconciliation of dependent resources.
Also, CI failures look valid. Check
make ocs-operator-cioutput.Fixed the annoying uppercase/lowercase issue.
You know I'm like this. 😛 Apologies nonetheless. 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
Gonna run this one more time just to be sure.... /test ocs-operator-bundle-e2e-aws |
|
/test ocs-operator-bundle-e2e-aws |
| if cephFilesystem.Status == nil { | ||
| return fmt.Errorf("cephFilesystem %q is not reporting status", key) | ||
| } | ||
| r.Log.Info("CephFilesystem %q is in phase %q", key, cephFilesystem.Status.Phase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically:
"msg":"non-string key argument passed to logging, ignoring all later arguments",
"invalid key":{"namespace":"openshift-storage","name":"test-storagecluster-cephfilesystem"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You go with my blessing. 🙏 🙏 🙏
|
/hold |
|
/lgtm cancel |
d8b4e05
to
5e41656
Compare
|
...test failed on RPC timeout. BECAUSE OF COURSE IT DID. 😠 Anyway, code looks solid, TYVM. 🙇 /lgtm |
|
/test ocs-operator-bundle-e2e-aws |
|
/retest-required |
| if err != nil || cephFilesystem.Status == nil || cephFilesystem.Status.Phase != cephv1.ConditionType(util.PhaseReady) { | ||
| r.Log.Info("Waiting for CephFilesystem to be Ready. Skip reconciling StorageClass", | ||
| "CephFilesystem", klog.KRef(key.Name, key.Namespace), | ||
| "StorageClass", klog.KRef("", sc.Name), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StorageCluster Conditions show the following:
Last Heartbeat Time: 2021-08-27T18:33:40Z
Last Transition Time: 2021-08-27T18:12:08Z
Message: Error while reconciling: some StorageClasses [test-storagecluster-cephfs] were skipped while waiting for pre-requisites to be met
Reason: ReconcileFailed
Status: False
Type: ReconcileComplete
ocs-operator logs show the following:
2021-08-27T18:33:40.740590363Z {"level":"info","ts":1630089220.7405756,"logger":"controllers.StorageCluster","msg":"Waiting for CephFilesystem to be Ready. Skip reconciling StorageClass","Request.Namespace":"openshift-storage","Request.Name":"test-storagecluster","CephFilesystem":"test-storagecluster-cephfilesystem/openshift-storage","StorageClass":"test-storagecluster-cephfs"}
2021-08-27T18:33:40.756182902Z {"level":"error","ts":1630089220.7561336,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","name":"test-storagecluster","namespace":"openshift-storage","error":"some StorageClasses [test-storagecluster-cephfs] were skipped while waiting for pre-requisites to be met","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/ocs-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/openshift/ocs-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"}
Looking at the if conditions you have, I checked the CephFilesystem and indeed it has a nil Status: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ocs-operator/1224/pull-ci-openshift-ocs-operator-master-ocs-operator-bundle-e2e-aws/1431303805407858688/artifacts/ocs-operator-bundle-e2e-aws/ocs-must-gather/artifacts/ocs-must-gather/registry-build02-ci-openshift-org-ci-op-s10lq9fp-stable-sha256-2cabcf61a08025e75eba8b82626357ddb48a44bf609f7d1483242ad5d0b1f5f0/ceph/namespaces/openshift-storage/ceph.rook.io/cephfilesystems/test-storagecluster-cephfilesystem.yaml
The rook-ceph-operator logs don't show anything obvious to me, but maybe I'm missing something: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ocs-operator/1224/pull-ci-openshift-ocs-operator-master-ocs-operator-bundle-e2e-aws/1431303805407858688/artifacts/ocs-operator-bundle-e2e-aws/ocs-must-gather/artifacts/ocs-must-gather/registry-build02-ci-openshift-org-ci-op-s10lq9fp-stable-sha256-2cabcf61a08025e75eba8b82626357ddb48a44bf609f7d1483242ad5d0b1f5f0/namespaces/openshift-storage/pods/rook-ceph-operator-8c8468d9d-mwwvh/rook-ceph-operator/rook-ceph-operator/logs/current.log
So... offhand, I'm stuck. @travisn any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a test cluster I don't see any status ever updated on the cephfilesystem either. You're just now relying on it for the first time, or perhaps there was a regression? Still looking in the code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there was a bug such that the filesystem status was not being updated if mirroring was not enabled on the filesystem. See rook/rook#8609
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
Signed-off-by: Umanga Chapagain <chapagainumanga@gmail.com>
Signed-off-by: Umanga Chapagain <chapagainumanga@gmail.com>
rearranging resourveManager instances to preserve the order in which they need to be processed. This helps in providing a visual cue about the sequence of tasks. It also reduces chances of deadlock due to tasks waiting for each other to complete. Signed-off-by: Umanga Chapagain <chapagainumanga@gmail.com>
CephBlockPool and CephFilesystem updates were incorrectly handled. This commit fixes it by updating only the spec and not resetting the Status. This also helps in unit testing. Signed-off-by: Umanga Chapagain <chapagainumanga@gmail.com>
StorageClass should only be created after the CephBlockPool and CephFilesystem are Ready. So, added a prereq check to skip reconciling if prereq is not met. Also, added a field to flag external storage clusters. This is required to skip prereq checks on external mode clusters. Signed-off-by: Umanga Chapagain <chapagainumanga@gmail.com>
5e41656
to
a5d5f7c
Compare
Signed-off-by: Umanga Chapagain <chapagainumanga@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ONE. MORE. TIME.
/lgtm
|
/override ci/prow/ci-index-dev-master-dependencies |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jarrpa The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jarrpa: Overrode contexts on behalf of jarrpa: ci/prow/ci-index-dev-master-dependencies In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issue: Due to PR red-hat-storage/ocs-operator#1224, creation of ocs-storagecluster-ceph-rbd storage class is taking moretime Signed-off-by: vavuthu <vavuthu@redhat.com>

CephBlockPool waits for CephCluster to be ready.
Similarly, StorageClass needs to wait for CephBlockPool to be ready.
Same for CephFilesystem.