New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s390x (IBM Z) disk formatting fixes #2943
Conversation
- Simplify syntax of dasd_channel directive. The previous syntax had some useless fields, like the major:minor number and status. - Fix obtaining the entry from lsdasd output: "grep $blockd" did not work properly, because if $blockd == dasda and there also disks named dasdaa and dasdab (this will be the case with more than 26 DASDs), the command would print all three lines instead of just the correct one and break the resulting format. - Eliminate the "dasdfmt" directive, merge its fields into the "disk" directive. This has the advantage that the exclusion code eliminates the "disk" lines for unused disks automatically. Previously, the "dasdfmt" lines for unused disks stayed in the layout file, which later caused the unused disks to be reformatted unconditionally (including disk with the backup itself on it if one used a local storage for backup). Unfortunately it means that the format of the "disk" directive now depends on the disk label type: dasd disks have extra fields that other disk types don't have. - Add information about number of cylinders to the disk directive for DASDs, it is more stable than the size in bytes (which depends on the format). Currently unused.
- Adapt to the new format of disklayout.conf entries (dasd_channel and disk directives) - Move the DASD formatting code after the mapping code. Otherwise when device names change, the code would format the old names, not the new names. - Generate the DASD formatting code as a separate script and let the user confirm it before it gets executed. Heavily inspired by the disk wiping code (it is similarly destructive), but executed during the "layout/prepare" stage and not during the "layout/recreate" stage. The reason is that unformatted DASDs do not have their size in bytes known (it depends on format), but other scripts in the "layout/prepare" stage need to know the disk sizes (e.g. for resizing partitions). - The disk comparison and mapping script will now be less accurate, because it won't know the current sizes if disks are not formatted. To compensate, introduce a variable called DISK_MAPPING_HINTS that allows the DASD activation code to pass the information on changed disk names to the rest of the layout code (the identification is based on virtual device numbers).
Reformatting is a dangerous operation, equivalent to wiping disks (which we also let the user always confirm).
520_exclude_components.sh excludes them as well, we run before it, so we have to reimplement it here.
When disk mapping hints (DISK_MAPPING_HINTS) indicate we know the correct mapping for a disk, use them in preference to device name and size comparison. They are supposed to be more reliable and also they are set by the s390-specific code for disks (DASDs) that are not yet formatted and thus their size is not yet known, so guessing according to disk size does not work properly.
Allow to choose to skip DASD formatting from the DASD format script confrmation dialog. Can be also set by setting the FORMAT_DASDS variable to a false value.
The default value is "" and it means to format the DASDs by default, in an analogy with DISKS_TO_BE_WIPED.
@mutable-dan I believe you are the original author of the S/390 code - can you please have a look? |
@@ -57,7 +58,8 @@ sync | |||
|
|||
EOF | |||
|
|||
create_partitions "$disk" "$label" | |||
# $junk can contain useful DASD-specific fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a variable is named 'junk'
but (sometimes) it contains useful data,
then the varaiable should not be named 'junk'
(even if in most cases that data is useless).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I am going to rename it to "extrafields" or something similar where applicable.
Log "Excluding $disk from DASD reformatting." | ||
continue | ||
fi | ||
# dasd has more fields - junk is not junk anymore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above:
When a variable is named 'junk'
but (sometimes) it contains useful data,
then the varaiable should not be named 'junk'
(even if in most cases that data is useless).
WOW! @pcahyna
Will it be less accurate only in case of DASD disks My fear is that a possibly disastrous automated mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve it basically blindly
(fortunately I don't have a IBM Z machine in my homeoffice)
and I trust @pcahyna to have things properly tested
Currently (i.e. since ReaR 2.6) there is only
see @pcahyna |
Only in case of DASDs, because the loss of accuracy results from the loss of size information on unformatted DASDs at this point, due to low-level formatting being performed after mapping. This is not an issue for disks that do not need low-level formatting. |
It is limited to virtual machines using z/VM hypervisors, I have no idea whether the code would work in a LPAR or under KVM. Concerning disks, the other choice of disks on IBM Z are SCSI disks accessed using Fibre Channel (FCP). My limited understanding is that the SCSI disks do not need any special handling compared to SCSI disks on other architectures, so they might "just work", but I have not tested them. |
@@ -24,6 +24,7 @@ fi | |||
### Prepare a disk for partitioning/general usage. | |||
create_disk() { | |||
local component disk size label junk | |||
local blocksize layout dasdtype dasdcyls junk2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that I introduced unused variables here.
And yes, with these changes I would not consider ReaR support "preliminary" anymore (on the supported configurations). |
# (pipelines run in subshells) | ||
LogPrint "Enabling DASD $device with virtual device number $bus" | ||
if chccwdev -e $bus ; then | ||
newname=$(lsdasd $bus | awk "/$bus/ { print \$3}" ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: all those s390 specific tools are added in usr/share/rear/prep/Linux-s390/305_include_s390_tools.sh
to PROGS
- shouldn't that be rather REQUIRED_PROGS
to ensure that the recovery code won't fail due to missing tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should. I noticed that as well and intended to submit a separate PR.
rear_workflow="rear $WORKFLOW" | ||
original_disk_space_usage_file="$VAR_DIR/layout/config/df.txt" | ||
rear_shell_history="$( echo -e "cd $VAR_DIR/layout/\nvi $DASD_FORMAT_CODE\nless $DASD_FORMAT_CODE" )" | ||
unset choices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better define like this
local choices=(
"line 1"
"line 2"
...)
or, once we decide to do #2941 mapfile -t choices <<EOF
makes this much simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a copy of my code
that I used at several places,
cf. what
# find usr/sbin/rear usr/share/rear -type f | xargs grep -A2 'unset choices'
shows like
unset choices
choices[0]="fist choice"
choices[1]="second choice"
...
if CONDITION ; then
choices[1]="another second choice"
fi
...
case "$choice" in
(${choices[0]})
...
(${choices[1]})
...
e.g. see
layout/recreate/default/200_run_layout_code.sh
where choices are rather complicated.
I have no idea why I did not use a local array.
I think I used the explicit choices[0], choices[1]
in the initial assingments to make it more clear
what each choice in the subsequent code is.
|
||
rear_workflow="rear $WORKFLOW" | ||
original_disk_space_usage_file="$VAR_DIR/layout/config/df.txt" | ||
rear_shell_history="$( echo -e "cd $VAR_DIR/layout/\nvi $DASD_FORMAT_CODE\nless $DASD_FORMAT_CODE" )" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vi
is probably not really guaranteed to be the there.
Ah, see #2734 (comment) and #1298 so yes, vi
is a safe choice.
rear_shell_history="$( echo -e "cd $VAR_DIR/layout/\nvi $DASD_FORMAT_CODE\nless $RUNTIME_LOGFILE" )" | ||
wilful_input="" | ||
|
||
unset choices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above. In general we want to prefix all variables with local
unless they have global relevance
|
||
# Local functions must be 'unset' because bash does not support 'local function ...' | ||
# cf. https://unix.stackexchange.com/questions/104755/how-can-i-create-a-local-function-in-my-bashrc | ||
unset -f lsdasd_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not required if you use local
due to the function scope created by Source
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required because of what the comment at this code place tells
# again and again until it succeeds or the user aborts | ||
# or the user confirms to continue with what is currently on the disks | ||
# (the user may have setup manually what he needs via the Relax-and-Recover shell): | ||
while true ; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is a huge state machine, how did you test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is derived from my "huge state machine" in
layout/recreate/default/200_run_layout_code.sh
There I had tested it manually as good as I could
(but I would not be surprised if I missed something).
On the other hand no user reported an issue with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see by my nitpicking I have no basis on which to judge the functionality.
Great job on advancing s390 support here, big thanks!
We have too many open PRs that could be merged IMHO, so I'll just go ahead and merge this now. @pcahyna maybe you can follow this up with some documentation changes to update the support status of s390x? |
@schlomo thanks, sorry for my belated reply. As @jsmeix said - the scripts to confirm and rerun the generated DASD format scrips are adapted from existing scripts, especially in the case of Ideally all the boilerplate (the state machine) should be shared and only the messages, file names etc. would change, but I don't see how to do that easily, especially in shell. |
Yes, I totally understand that. Let's hope that there will be more users testing this and provide us feedback |
- remove unused variables - rename "junk" to more meaningful name when it contains useful data (addresses review comment in PR rear#2943) - fix typo
Pull Request Details:
Type: Bug Fix
Impact: High
Reference to related issue (URL):
How was this pull request tested?
Full backup and recovery on RHEL 8 s390x VM with 4 DASDs: 2 in the root VG, one for storing the backup and one that was not included in the backup (
ONLY_INCLUDE_VG=( rootvg )
was set). Verified that the extra VG and the disk with the backup survived recovery and that DASDs were properly mapped during recovery (in a situation where the correct disk mapping is not an identity map).Brief description of the changes in this pull request:
The s390x architecture needs some unique handling of disks (called DASDs - Direct Access Storage Devices) during layout restoration. The DASDs need to be enabled, otherwise they are not recognized as block devices, and then to get formatted (at low level), unless they are formatted already, before they can be partitioned. The current code to do this suffers from several problems.
Formatting of the disks is controlled by the
dasdfmt
directive in the disklayout.conf file. This directive is emitted for all dasds, regardless of whether they should be included in the layout. While there is component exclusion code that comments out the "disk" lines for unused disks automatically, thedasdfmt
lines for unused disks stay in the layout file, which later causes the unused disks to be reformatted unconditionally. Crucially, this includes the disk with the backup (one must use network backup storage because of this). Moreover, when the user is prompted to confirm the disk layout file and layout recreation script, it is too late, because the formatting code is unconditionally executed very early, before the rest of processing of the layout, and is not part of the disklayout.sh script.As the formatting code runs early, it is also not affected by disk mappings, so when the device names change, formatting will be applied to wrong disks.
Another problem is that obtaining the disk entry from the
lsdasd
command output uses a simplegrep
command. Unfortunately, this breaks for more than 26 DASDs, becausegrep dasda
will also matchdasdaa
,dasdab
etc., so multiple lines will pass through the filter instead of just one and break the format of the disklayout.conf file.This PR does the following changes:
dasd
disks have extra fields that other disk types don't have.FORMAT_DASDS
variable to a false value.