-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph: Allow using lvm batch for ceph 14.2.15 #6831
Conversation
fc44b3c
to
dcaf3c7
Compare
from test results, seems like cephcluster is created successfully, but creating snapshot failed.
[2020-12-16T00:58:48.334Z] --- FAIL: TestCephHelmSuite (722.71s)
[2020-12-16T00:58:48.334Z] --- PASS: TestCephHelmSuite/TestARookInstallViaHelm (0.10s)
[2020-12-16T00:58:48.334Z] --- FAIL: TestCephHelmSuite/TestBlockStoreOnRookInstalledViaHelm (260.86s)
[2020-12-16T00:58:48.334Z] ceph_base_block_test.go:194:
[2020-12-16T00:58:48.334Z] Error Trace: ceph_base_block_test.go:194
[2020-12-16T00:58:48.334Z] ceph_base_block_test.go:433
[2020-12-16T00:58:48.334Z] ceph_helm_test.go:99
[2020-12-16T00:58:48.334Z] Error: Received unexpected error:
[2020-12-16T00:58:48.334Z] giving up waiting for "rbd-pvc-snapshot" snapshot in namespace "default"
[2020-12-16T00:58:48.334Z] Test: TestCephHelmSuite/TestBlockStoreOnRookInstalledViaHelm
[2020-12-16T00:58:48.334Z] ceph_base_block_test.go:473: |
e78fc2c
to
aa1343c
Compare
The snapshot tests are already failing in master, I opened #6837 for this, so it should be unrelated to your changes. |
pkg/daemon/ceph/osd/volume.go
Outdated
|
||
logger.Debugf("ceph-volume report: %+v", cvOut) | ||
if !cephVersion.IsNautilus() || cephver.IsInferior(cephVersion, cephver.CephVersion{Major: 14, Minor: 2, Extra: 13}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment to this line explaining why it's needed for when we come back to it in 6-12 months?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, we want to run the report for the following?
- Octopus or newer
- Nautilus if less than 14.2.13
Is the report being fixed in Nautilus? Seems like we should expect a fix in 14.2.16 instead of blocking the report in all future Nautilus releases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking for the version and skipping it if a broken version, what if we always run the report, but just log and ignore any errors that it produces? Seems like a good pattern so we can always attempt to create the OSDs even if the report fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible, but maybe a new type should be introduced to reflect the new return structure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What new type are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before 14.2.13, it looks like
{"changed": true, "vg": {"devices": "xxx"}}
For 14.2.15, it looks like
2020-12-14 02:15:28.771476 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 --crush-device-class sjt /dev/vdc --db-devices /dev/vdd --report --format json
2020-12-14 02:15:37.640274 D | cephosd: ceph-volume report: --> passed data devices: 1 physical, 0 LVM
--> relative data size: 1.0
--> passed block_db devices: 1 physical, 0 LVM
[{"block_db": "/dev/vdd", "encryption": "None", "data": "/dev/vdc", "data_size": "100.00 GB", "block_db_size": "100.00 GB"}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Octopus v15.2.8 was just released and the release notes indicated it also received an update to ceph-volume batch mode, so I assume we also need the same change on v15.2.8 and newer.
tests/framework/utils/snapshot.go
Outdated
@@ -39,6 +39,12 @@ const ( | |||
// CheckSnapshotISReadyToUse checks snapshot is ready to use | |||
func (k8sh *K8sHelper) CheckSnapshotISReadyToUse(name, namespace string, retries int) (bool, error) { | |||
for i := 0; i < retries; i++ { | |||
|
|||
output, err := k8sh.executor.ExecuteCommandWithOutput("kubectl", "get", "volumesnapshot", name, "--namespace", namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since #6840 is merged, you can revert this as the snapshot tests are fixed now
pkg/daemon/ceph/osd/volume.go
Outdated
|
||
logger.Debugf("ceph-volume report: %+v", cvOut) | ||
if !cephVersion.IsNautilus() || cephver.IsInferior(cephVersion, cephver.CephVersion{Major: 14, Minor: 2, Extra: 13}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, we want to run the report for the following?
- Octopus or newer
- Nautilus if less than 14.2.13
Is the report being fixed in Nautilus? Seems like we should expect a fix in 14.2.16 instead of blocking the report in all future Nautilus releases
pkg/daemon/ceph/osd/volume.go
Outdated
|
||
logger.Debugf("ceph-volume report: %+v", cvOut) | ||
if !cephVersion.IsNautilus() || cephver.IsInferior(cephVersion, cephver.CephVersion{Major: 14, Minor: 2, Extra: 13}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking for the version and skipping it if a broken version, what if we always run the report, but just log and ignore any errors that it produces? Seems like a good pattern so we can always attempt to create the OSDs even if the report fails.
e0f5ff1
to
1ea2c77
Compare
Looks like ceph-volume will reject creating osd on partition ... |
a76d2b1
to
6969f42
Compare
Right, looks like there is an issue with partitions, I opened #6849. The latest changes are looking good. I am about to merge #6847, which will take care of the issue with tests failing on partitions. After that, you can rebase and won't need my commit you cherry-picked. |
fe74444
to
6506cea
Compare
This pull request has merge conflicts that must be resolved before it can be merged. @jshen28 please rebase it. https://rook.io/docs/rook/master/development-flow.html#updating-your-fork |
return true | ||
} | ||
|
||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Pacific or newer we should return true, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems 16.0.0 still uses legacy format ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s unexpected, the changes should be in master first, oh well we can revisit separately.
14.2.15 lvm batch command prepare report changes output format. This commit skips md check if ceph version is greater than 14.2.13. Signed-off-by: shenjiatong <yshxxsjt715@gmail.com>
Signed-off-by: shenjiatong <yshxxsjt715@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more small suggestions...
pkg/daemon/ceph/osd/volume.go
Outdated
logger.Debugf("ceph-volume report: %+v", cvOut) | ||
// ceph version v14.2.13 ~ v14.2.16 changes output of `lvm batch --prepare --report` | ||
// use previous logic if ceph version does not fall into this range | ||
if !isNewStyledLvmBatch(cephVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor suggestion instead of declaring a separate variable.
if !isNewStyledLvmBatch(cephVersion) { | |
if !isNewStyledLvmBatch(a.clusterInfo.CephVersion) { |
pkg/daemon/ceph/osd/volume.go
Outdated
|
||
logger.Debugf("ceph-volume report: %+v", cvOut) | ||
// ceph version v14.2.13 ~ v14.2.16 changes output of `lvm batch --prepare --report` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ceph version v14.2.13 ~ v14.2.16 changes output of `lvm batch --prepare --report` | |
// ceph version v14.2.13 and v15.2.8 changed the output format of `lvm batch --prepare --report` |
pkg/daemon/ceph/osd/volume.go
Outdated
if err = json.Unmarshal([]byte(cvOut), &cvReport); err != nil { | ||
return errors.Wrap(err, "failed to unmarshal ceph-volume report json") | ||
} | ||
cvOut, err := context.Executor.ExecuteCommandWithCombinedOutput(baseCommand, reportArgs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command is the same for both types of reports, the output is just different, right? Can you factor out this command before the version check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly it is not the same... new format will print something into stderr which is problematic if using ExecuteCommandWithCombinedOutput
which seems collecting stdout + stderr....
get_deployment_layout will call debug which prints some extra stuff to stderr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But actually it might be ok to change both of them to ExecuteCommandWithOutput
....
pkg/daemon/ceph/osd/volume.go
Outdated
|
||
if len(strings.Split(conf["devices"], " ")) != len(cvReports) { | ||
return fmt.Errorf("failed to create enough required devices, required: %s, actual: %v", cvOut, cvReports) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you don't need this else
. For example:
if len(strings.Split(conf["devices"], " ")) != len(cvReports) {
return ...
}
for _, report := range cvReports {
...
}
return errors.Wrap(err, "failed to unmarshal ceph-volume report json") | ||
} | ||
|
||
if len(strings.Split(conf["devices"], " ")) != len(cvReports) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to check for the length of reports? Just wondering what issues it will really find in the device configuration, or if it might just be a ceph-volume bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not, I am just wondering if if devices are '/dev/sda1', '/dev/sdb', will ceph volume ignores /dev/sda1, and uses only /dev/sdb?
aslo, will it be possible that cvReports could be an empty list if ceph-volume reject all inputs?
56b23bf
to
47d354c
Compare
Signed-off-by: shenjiatong <yshxxsjt715@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ceph: Allow using lvm batch for ceph 14.2.15 (bp #6831)
14.2.15 lvm batch command prepare report changes
output format. This commit skips md check if ceph version
is greater than 14.2.13.
Signed-off-by: shenjiatong yshxxsjt715@gmail.com
Description of your changes:
In 14.2.15,
lvm batch --prepare --report
breaks report format which causescreating osd failed.
Which issue is resolved by this Pull Request:
Resolves #
Checklist:
make codegen
) has been run to update object specifications, if necessary.[test full]