-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cephcluster: tune disks according to plaftorm #955
cephcluster: tune disks according to plaftorm #955
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed before, looks good in general, but some requests inside.
@@ -55,3 +61,18 @@ func avoidObjectStore(p configv1.PlatformType) bool { | |||
} | |||
return false | |||
} | |||
|
|||
func (p *Platform) TunePlatformForFastDevices(c client.Client) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do
func (r *ReconcileStorageCluster) TunePlatformForFastDevices() (bool, error) {
platform := r.platform
client := r.client
...
}
Makes it easier on the caller, Imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that we keep the old way.
It seems like we made it easier on the caller but we actually made it difficult.
While with the old implementation caller only required a *Platform
and a client.Client
, with the suggested change caller would now need to create a concrete *ReconcileStorageCluster
even though it isn't necessary.
It is easy to ignore this as we seem to have a *ReconcileStorageCluster
always available but we shouldn't rely on that.
Also, since this tuning is a behavior that depends on platform rather than the reconciler, it should really be a method on Platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have an internal method on platform.
But I request the convenience wrapper on the reconciler, since it has all we need, and it avoids code duplication (like with avoidObjectStore
) when there might be multiple callers.
This pattern of GetPlatform and then invoke the other method is cumbersome and error-prone imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the end, I won't insist too much if folks are more inline with the original approach. I think both approaches have pros and cons. ;-)
56c5a9d
to
99c3a78
Compare
@crombus: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good now, thanks!
There is one change in the test code that seems to not really belong into this commit. (see inline) - it could be a separate patch to test the effects of PR #944.
And shouldn't we have a test case for this new behaviour?
@@ -304,11 +304,12 @@ func TestThrottleStorageDevices(t *testing.T) { | |||
StorageClassName: &fakestorageClassName, | |||
}, | |||
}, | |||
Portable: true, | |||
Portable: true, | |||
DeviceType: "hdd", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a change to the test case that is not related to this code change. 🤔
It explicitly configures the type "hdd" in the device set and correctly changes the expected result to Slow. But isn't that the test for the previous PR #944 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to solve this error Error Trace: storagecluster_controller_test.go:426 Error: Received unexpected error: could not get infrastructure details to determine cloud platform: infrastructures.config.openshift.io "cluster" not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yes for this we will have to modify the creation of cluster with platform adding new test cases for this addidtion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crombus - hmmm - wouldn't it be better / more natural to fix this problem by changing the cluster structure in the test so that the platform detection can not fail? It does not seem to be conceptually correct to fix it by avoiding the call in the first place by adding a configuration we know will be checked first...
Can the platform detection fail in the real world? (not mocked-up test code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it can fail. And cration of cluster with platform is the correct fix
@@ -550,6 +550,14 @@ func (r *ReconcileStorageCluster) checkTuneStorageDevices(ds ocsv1.StorageDevice | |||
return dt.speed, nil | |||
} | |||
|
|||
ret, err := r.TunePlatformForFastDevices() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a more simpler implementation would be to check if TuneFastPlatforms
contains r.platform.GetPlatform()
.
That way I wouldn't have to jump to definition of TunePlatformForFastDevices
to understand what is happening and why we are tuning.
Currently it seems like TunePlatformForFastDevices
is doing tuning intensive stuff while in reality it's just checking if TuneFastPlatforms
contains r.platform.GetPlatform()
.
As a follow-up I'd look into re-factoring platform detection and maybe storing it in ReconcileStorageCluster if it makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@obnoxxx ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me I feel both ways are good and maybe if adding comment solve this problem that is also a way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If your code needs a comment to explain what it does, it isn't really a readable/understandable code".
Read this somewhere and I relate.
We could inline the call, rename the method, use better variables etc to make it better. Adding a comment would be the last thing I'd suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a more simpler implementation would be to check if
TuneFastPlatforms
containsr.platform.GetPlatform()
.
Well, that is exactly what the function does!?
That way I wouldn't have to jump to definition of
TunePlatformForFastDevices
to understand what is happening and why we are tuning.
That's more a naming thing than an implementation thing. I explicitly asked @crombus to write such a function to not have the implementation bleed into the caller. I didn't want to repeat the (imho) bad approach of aviodObjectStore()
where each caller first has to go and call GetPlatform
and then insert it into avoidObjectStore
. This is bad encapsulation/abstraction, and error prone. If there is more than one caller, it's also code duplication.
So it boils down to: we only need to name the function / vars better to make it more obvious what is going on.
Currently it seems like
TunePlatformForFastDevices
is doing tuning intensive stuff while in reality it's just checking ifTuneFastPlatforms
containsr.platform.GetPlatform()
.
Right, there needs to be a better name. I like it when in some languages, like ruby, functions can end with a '?' symbol. That makes it more obvious. But here we don't have that option. We could do something like this:
tuneFastDevices, err := r.DevicesDefaultToFastForThisPlatform()
...
or
tuneDevicesFast, err := DoTuneDevicesFastForPlatform()
...
or similar...
As a follow-up I'd look into re-factoring platform detection and maybe storing it in ReconcileStorageCluster if it makes more sense.
Possible. I have started a discussion with @crombus already about possible refactorings of the platform-detection code triggered by this change. Storing data sounds like an optimization. But let's get the APIs good in the first place. And let's not add new stuff that needs immediate refactoring afterwards. And to reiterate my initial point: I think the current approach is the right encapsulation and just needs better naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crombus - can you rename the function and variable along the above lines to make the code more obvious?
@@ -304,11 +304,12 @@ func TestThrottleStorageDevices(t *testing.T) { | |||
StorageClassName: &fakestorageClassName, | |||
}, | |||
}, | |||
Portable: true, | |||
Portable: true, | |||
DeviceType: "hdd", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see cases of different possible values of DeviceType covered. Also, a case for invalid DeviceType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but this is not the topic of this PR. That was introduced in a different PR. So we can (and should) enhance that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not the topic of this PR, why does it add DeviceType: "hdd"
to test case?
Honestly, I've lost track of "Tune disks" PRs and enhancements 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not the topic of this PR, why does it add
DeviceType: "hdd"
to test case?
This is exactly what I told @crombus: Using this to fix the test failure is wrong. It is just avoiding the code path.
Instead we need to make the fake cluster so that it has platform info, so the new code does not error out.
Honestly, I've lost track of "Tune disks" PRs and enhancements sweat_smile
😉
@@ -550,6 +550,14 @@ func (r *ReconcileStorageCluster) checkTuneStorageDevices(ds ocsv1.StorageDevice | |||
return dt.speed, nil | |||
} | |||
|
|||
ret, err := r.TunePlatformForFastDevices() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crombus - can you rename the function and variable along the above lines to make the code more obvious?
@crombus Hey, can we revisit this? |
99c3a78
to
b2ad7a5
Compare
45dbe6f
to
425844e
Compare
All requests have been addressed except from changing the test-case accordingly . I get this error for changes in tc
|
set tunefastdeviceclass according to plaftorm which are not correctly detected by ceph. modify tc according to the changes by adding a default platform (of None) to the reconciler. Signed-off-by: crombus <pkundra@redhat.com>
This is tested with the example of the Azure platform. Note: This can probably be done more elegantly, as currently, the test adds the configured platform to the reconciler after its creation and not as part of the creator function. But it should be good enough for a start. Signed-off-by: Michael Adam <obnox@redhat.com>
425844e
to
4091a66
Compare
Updated the tc fix (by adding a default platform entry to the reconciler) and added a specific tc for the new code path after discussing with @crombus . let's see what the test runs bring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: obnoxxx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@crombus: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cherry-pick release-4.7 |
@crombus: new pull request created: #994 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
set tunefastdeviceclass according to plaftorm
which are not correctly detected by ceph.
modify tc according to the changes.
Signed-off-by: crombus pkundra@redhat.com