Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk zapping script for OSD used by Rook fails at removing the device mappers #9764

Closed
frouzbeh opened this issue Feb 17, 2022 · 4 comments · Fixed by #9765
Closed

disk zapping script for OSD used by Rook fails at removing the device mappers #9764

frouzbeh opened this issue Feb 17, 2022 · 4 comments · Fixed by #9765
Labels
Projects

Comments

@frouzbeh
Copy link

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
Disk zapping script sometimes returns error when removing the device mapper:

Creating new GPT entries in memory.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.397611 s, 264 MB/s
blkdiscard: /dev/vdb: BLKDISCARD ioctl failed: Operation not supported
device-mapper: remove ioctl on ceph--27387f93--51fa--411b--b01c--b4f91e2962bd-osd--block--67ecef68--8e98--4d9b--96d9--b1d9659e2188  failed: Device or resource busy
Command failed.

For now I added set -e to the script, so I still would be able to unlock the device if it fails, otherwise the rest of the script deletes the device mappers. But We can use lsof to find the open files of the device and kill them to make the device free and then try to remove it.

#!/usr/bin/env bash
DISK="/dev/sda"

# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)

# You will have to run this step for all disks.
sgdisk --zap-all $DISK

# Clean hdds with dd
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync

# Clean disks such as ssd with blkdiscard instead of dd
blkdiscard $DISK

# These steps only have to be run once on each node
# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %

# ceph-volume setup can leave ceph-<UUID> directories in /dev and /dev/mapper (unnecessary clutter)
rm -rf /dev/ceph-*
rm -rf /dev/mapper/ceph--*

# Inform the OS of partition table changes
partprobe $DISK

Expected behavior:

How to reproduce it (minimal and precise):

It doesn't happen always, so it would be good to be prepared for when it happens.
Environment:

  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  • Kernel (e.g. uname -a): Linux dev-ops-server-1 5.4.0-84-generic #94-Ubuntu SMP Thu Aug 26 20:27:37 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Cloud provider or hardware configuration: Bare-metal
  • Storage backend version (e.g. for ceph do ceph -v): ceph:v16.2.5
@frouzbeh frouzbeh added the bug label Feb 17, 2022
@leseb
Copy link
Member

leseb commented Feb 17, 2022

What's the request exactly?

@travisn
Copy link
Member

travisn commented Feb 17, 2022

@frouzbeh The suggestion is to add set -e to the script so errors can be addressed instead of skipped?

@frouzbeh
Copy link
Author

@travisn I guess set -e is an option. But a better solution would be finding where the device is busy and free it. For example if we can use lsof or any other tools.

@BlaineEXE
Copy link
Member

I have encountered the same issue where dmsetup remove hangs, but that isn't an issue that Rook can fix. Ultimately, this script is meant to be helpful to as many users as possible, but we can't ensure its correctness across all possible operating systems. I do however have something else to update with regards to this doc, so I will try to make that more clear in an upcoming update.

BlaineEXE added a commit to BlaineEXE/rook that referenced this issue Feb 17, 2022
Users have reported some problems understanding the teardown
"Zapping Devices" section. Break the section in to per-disk and
all-disks-on-host parts, and clarify that they are not one-size-fits-all
scripts.

Resolves rook#9542
Resolves rook#9764

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
mergify bot pushed a commit that referenced this issue Feb 21, 2022
Users have reported some problems understanding the teardown
"Zapping Devices" section. Break the section in to per-disk and
all-disks-on-host parts, and clarify that they are not one-size-fits-all
scripts.

Resolves #9542
Resolves #9764

Signed-off-by: Blaine Gardner <blaine.gardner@redhat.com>
(cherry picked from commit 325eda6)
@BlaineEXE BlaineEXE added this to To do in v1.8 via automation Feb 21, 2022
@BlaineEXE BlaineEXE moved this from To do to Done in v1.8 Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
v1.8
Done
Development

Successfully merging a pull request may close this issue.

4 participants