cephcluster: tune devices according to the deviceType #944

crombus · 2020-12-07T16:10:54Z

Signed-off-by: crombus pkundra@redhat.com

obnoxxx

Can you please add to the commit message and the PR description explanations what problem this PR is trying to solve?

obnoxxx

If I get it right, the deviceType can be set in the storageDeviceSet in the storage cluster yaml input. If this is set, this can override the otherwise potentially wrongly detected device type. I.e. it has to be set explicitly by the admin or whoever is crafting the input (GUI..). This PR is taking this specified device type and propagates corresponding OSD tuning settings to rook, in addition to (and overriding) the settings of PR #864 that enabled tuning settings based on known storage class types for backend disks. To be honest the code is getting more and more confusing. I suggest rebasing these changes on top of the refactoring of the tuning code in #946 which will make the resulting code more obvious and this change even shorter.

I further suggest to add this new code to the throttleStorageDevices() function (called checkTuneStorageDevices after the refactor), instead of directly to the ensureCephCluster() function.

pkg/controller/storagecluster/cephcluster.go

crombus · 2020-12-08T09:52:47Z

If I get it right, the deviceType can be set in the storageDeviceSet in the storage cluster yaml input. If this is set, this can override the otherwise potentially wrongly detected device type. I.e. it has to be set explicitly by the admin or whoever is crafting the input (GUI..). This PR is taking this specified device type and propagates corresponding OSD tuning settings to rook, in addition to (and overriding) the settings of PR #864 that enabled tuning settings based on known storage class types for backend disks. To be honest the code is getting more and more confusing. I suggest rebasing these changes on top of the refactoring of the tuning code in #946 which will make the resulting code more obvious and this change even shorter.

I further suggest to add this new code to the throttleStorageDevices() function (called checkTuneStorageDevices after the refactor), instead of directly to the ensureCephCluster() function.

Done the changes on top of the refactor code. I need suggestion to make it little better.

obnoxxx

Thanks for rebasing!

See inline request/suggestions to make this even better / more minimal.

Thanks!

pkg/controller/storagecluster/cephcluster.go

obnoxxx

oh, and if you would update the commit message a bit explaining that this is not about auto-detected hdd types, but about hdd types that explicitly configured by the admin (or gui?) in the deviceset section in the storagecluster yaml.

pkg/controller/storagecluster/cephcluster.go

crombus · 2020-12-08T16:47:00Z

/test ci/prow/ocs-operator-ci

openshift-ci-robot · 2020-12-08T16:47:19Z

@crombus: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

/test ci-index
/test images
/test ocs-operator-bundle-e2e-aws
/test ocs-operator-ci
/test red-hat-storage-ocs-ci-e2e-aws
/test verify-latest-csv

Use /test all to run all jobs.

In response to this:

/test ci/prow/ocs-operator-ci

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tune devices according to the deviceType added by the admin in storagecluster.yaml and modify the testcase Signed-off-by: crombus <pkundra@redhat.com>

check for that deviceType is considered as priority Signed-off-by: crombus <pkundra@redhat.com>

obnoxxx

/lgtm

Thanks for the updates!

openshift-ci-robot · 2020-12-08T16:57:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iamniting, obnoxxx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [obnoxxx]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-12-08T17:30:43Z

/retest

Please review the full test history for this PR and help us cut down flakes.

obnoxxx · 2020-12-08T17:38:30Z

Now the bot has retriggered the ci runs, but before, ocs-operator-ci was failing in a strange way (already the second time on this PR):

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ocs-operator/944/pull-ci-openshift-ocs-operator-master-ocs-operator-ci/1336353842756849664

First, it is failing in TestEnsureExternalStorageClusterResources like this:

     external_resources_test.go:76: 
        	Error Trace:	external_resources_test.go:76
        	Error:      	Received unexpected error:
        	            	dial tcp 127.0.0.1:13095: connect: connection refused
        	Test:       	TestEnsureExternalStorageClusterResources
    external_resources_test.go:194: 
        	Error Trace:	external_resources_test.go:194
        	            				external_resources_test.go:78
        	Error:      	Received unexpected error:
        	            	storageclasses.storage.k8s.io "ocsinit-ceph-rbd" not found
        	Test:       	TestEnsureExternalStorageClusterResources
    external_resources_test.go:198: 
        	Error Trace:	external_resources_test.go:198
        	            				external_resources_test.go:78
        	Error:      	Not equal: 
        	            	expected: "device_health_metrics"
        	            	actual  : ""
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-device_health_metrics
        	            	+
        	Test:       	TestEnsureExternalStorageClusterResources
    external_resources_test.go:194: 
        	Error Trace:	external_resources_test.go:194
        	            				external_resources_test.go:78
        	Error:      	Received unexpected error:
        	            	storageclasses.storage.k8s.io "ocsinit-cephfs" not found
        	Test:       	TestEnsureExternalStorageClusterResources
    external_resources_test.go:198: 
        	Error Trace:	external_resources_test.go:198
        	            				external_resources_test.go:78
        	Error:      	Not equal: 
        	            	expected: "myfs"
        	            	actual  : ""
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-myfs
        	            	+
        	Test:       	TestEnsureExternalStorageClusterResources
    external_resources_test.go:198: 
        	Error Trace:	external_resources_test.go:198
        	            				external_resources_test.go:78
        	Error:      	Not equal: 
        	            	expected: "myfs-data0"
        	            	actual  : ""
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-myfs-data0
        	            	+
        	Test:       	TestEnsureExternalStorageClusterResources
    external_resources_test.go:194: 
        	Error Trace:	external_resources_test.go:194
        	            				external_resources_test.go:78
        	Error:      	Received unexpected error:
        	            	storageclasses.storage.k8s.io "ocsinit-ceph-rgw" not found
        	Test:       	TestEnsureExternalStorageClusterResources

but before that, there is a strange PASS:

 === RUN   TestCollectObjectStoreHealth
E1208 16:59:46.159345   11090 ceph-object-store.go:123] CephObjectStore in unexpected phase. Must be "Connected", "Progressing" or "Failure"
--- PASS: TestCollectObjectStoreHealth (0.00s)

Is this expected? 🤔

obnoxxx · 2020-12-08T17:48:52Z

now ocs-operator-ci has passed - strange flake...

openshift-merge-robot · 2020-12-08T18:20:36Z

@crombus: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/red-hat-storage-ocs-ci-e2e-aws	`ada7e14`	link	`/test red-hat-storage-ocs-ci-e2e-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2020-12-08T18:22:47Z

/retest

Please review the full test history for this PR and help us cut down flakes.

crombus requested review from jarrpa and umangachapagain December 7, 2020 16:11

openshift-ci-robot requested review from agarwal-mudit and obnoxxx December 7, 2020 16:11

crombus added this to the ocs 4.7 feature freeze milestone Dec 7, 2020

crombus added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 7, 2020

obnoxxx requested changes Dec 7, 2020

View reviewed changes

obnoxxx reviewed Dec 7, 2020

View reviewed changes

iamniting requested changes Dec 8, 2020

View reviewed changes

pkg/controller/storagecluster/cephcluster.go Outdated Show resolved Hide resolved

crombus force-pushed the tuning_device branch from 80e2194 to b0e721c Compare December 8, 2020 09:51

crombus requested review from iamniting and obnoxxx December 8, 2020 09:53

iamniting approved these changes Dec 8, 2020

View reviewed changes

obnoxxx requested changes Dec 8, 2020

View reviewed changes

obnoxxx reviewed Dec 8, 2020

View reviewed changes

crombus force-pushed the tuning_device branch 2 times, most recently from b2b2ad1 to f7b9fa4 Compare December 8, 2020 14:05

crombus requested a review from obnoxxx December 8, 2020 14:05

obnoxxx requested changes Dec 8, 2020

View reviewed changes

pkg/controller/storagecluster/cephcluster.go Outdated Show resolved Hide resolved

pkg/controller/storagecluster/cephcluster.go Outdated Show resolved Hide resolved

crombus force-pushed the tuning_device branch from f7b9fa4 to 6c7e711 Compare December 8, 2020 16:35

crombus requested a review from obnoxxx December 8, 2020 16:35

crombus added 2 commits December 8, 2020 22:23

cephcluster: tune devices according to the deviceType

169972d

tune devices according to the deviceType added by the admin in storagecluster.yaml and modify the testcase Signed-off-by: crombus <pkundra@redhat.com>

test: add test cases for deviceType

ada7e14

check for that deviceType is considered as priority Signed-off-by: crombus <pkundra@redhat.com>

crombus force-pushed the tuning_device branch from 6c7e711 to ada7e14 Compare December 8, 2020 16:55

obnoxxx approved these changes Dec 8, 2020

View reviewed changes

openshift-ci-robot assigned obnoxxx Dec 8, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 8, 2020

openshift-merge-robot merged commit a602d12 into red-hat-storage:master Dec 8, 2020

obnoxxx mentioned this pull request Dec 11, 2020

cephcluster: tune disks according to plaftorm #955

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cephcluster: tune devices according to the deviceType #944

cephcluster: tune devices according to the deviceType #944

crombus commented Dec 7, 2020

obnoxxx left a comment

obnoxxx left a comment

crombus commented Dec 8, 2020

obnoxxx left a comment

obnoxxx left a comment

crombus commented Dec 8, 2020

openshift-ci-robot commented Dec 8, 2020

obnoxxx left a comment

openshift-ci-robot commented Dec 8, 2020

openshift-bot commented Dec 8, 2020

obnoxxx commented Dec 8, 2020

obnoxxx commented Dec 8, 2020

openshift-merge-robot commented Dec 8, 2020

openshift-bot commented Dec 8, 2020

cephcluster: tune devices according to the deviceType #944

cephcluster: tune devices according to the deviceType #944

Conversation

crombus commented Dec 7, 2020

obnoxxx left a comment

Choose a reason for hiding this comment

obnoxxx left a comment

Choose a reason for hiding this comment

crombus commented Dec 8, 2020

obnoxxx left a comment

Choose a reason for hiding this comment

obnoxxx left a comment

Choose a reason for hiding this comment

crombus commented Dec 8, 2020

openshift-ci-robot commented Dec 8, 2020

obnoxxx left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Dec 8, 2020

openshift-bot commented Dec 8, 2020

obnoxxx commented Dec 8, 2020

obnoxxx commented Dec 8, 2020

openshift-merge-robot commented Dec 8, 2020

openshift-bot commented Dec 8, 2020