Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s390x (IBM Z) disk formatting fixes #2943

Merged
merged 9 commits into from Feb 28, 2023
Merged

Conversation

pcahyna
Copy link
Member

@pcahyna pcahyna commented Feb 21, 2023

Pull Request Details:
  • Type: Bug Fix

  • Impact: High

  • Reference to related issue (URL):

  • How was this pull request tested?
    Full backup and recovery on RHEL 8 s390x VM with 4 DASDs: 2 in the root VG, one for storing the backup and one that was not included in the backup (ONLY_INCLUDE_VG=( rootvg ) was set). Verified that the extra VG and the disk with the backup survived recovery and that DASDs were properly mapped during recovery (in a situation where the correct disk mapping is not an identity map).

  • Brief description of the changes in this pull request:
    The s390x architecture needs some unique handling of disks (called DASDs - Direct Access Storage Devices) during layout restoration. The DASDs need to be enabled, otherwise they are not recognized as block devices, and then to get formatted (at low level), unless they are formatted already, before they can be partitioned. The current code to do this suffers from several problems.
    Formatting of the disks is controlled by the dasdfmt directive in the disklayout.conf file. This directive is emitted for all dasds, regardless of whether they should be included in the layout. While there is component exclusion code that comments out the "disk" lines for unused disks automatically, the dasdfmt lines for unused disks stay in the layout file, which later causes the unused disks to be reformatted unconditionally. Crucially, this includes the disk with the backup (one must use network backup storage because of this). Moreover, when the user is prompted to confirm the disk layout file and layout recreation script, it is too late, because the formatting code is unconditionally executed very early, before the rest of processing of the layout, and is not part of the disklayout.sh script.
    As the formatting code runs early, it is also not affected by disk mappings, so when the device names change, formatting will be applied to wrong disks.
    Another problem is that obtaining the disk entry from the lsdasd command output uses a simple grep command. Unfortunately, this breaks for more than 26 DASDs, because grep dasda will also match dasdaa, dasdab etc., so multiple lines will pass through the filter instead of just one and break the format of the disklayout.conf file.

This PR does the following changes:

  • Simplify syntax of dasd_channel directive. The previous syntax had some useless fields, like the major:minor number and status.
  • Fix obtaining the entry from lsdasd output
  • Eliminate the "dasdfmt" directive, merge its fields into the "disk" directive. This has the advantage that the exclusion code eliminates the "disk" lines for unused disks automatically and thus prevents them from being reformatted (in addition to preventing them from being repartitioned). Unfortunately it means that the format of the "disk" directive now depends on the disk label type: dasd disks have extra fields that other disk types don't have.
  • Generate the DASD formatting code as a separate script and let the user confirm it before it gets executed. Heavily inspired by the disk wiping code (it is similarly destructive), but executed during the "layout/prepare" stage and not during the "layout/recreate" stage. The reason is that unformatted DASDs do not have their size in bytes known (it depends on format), but other scripts in the "layout/prepare" stage need to know the disk sizes (e.g. for resizing partitions). The script is generated after the mapping code has mapped original to current disk devices, ensuring that we format the correct DASDs. Reformatting can be switched off entirely by setting the FORMAT_DASDS variable to a false value.
  • The heuristics in disk comparison and mapping script will now be less accurate, because it won't know the current sizes if disks are not formatted. Since the DASDs have permanent identifiers (virtual device number) that can be used to reliably identify disks, pass this information to the mapping code in a variable called DISK_MAPPING_HINTS and use them there in preference to the usual device name and size comparison.
  • Since DASD enabling and DASD formatting code are now separate and the former is non-destructive, run it also for "rear mountonly". This fixes "rear mountonly" on s390x.

- Simplify syntax of dasd_channel directive. The previous syntax had
  some useless fields, like the major:minor number and status.
- Fix obtaining the entry from lsdasd output: "grep $blockd" did not
  work properly, because if $blockd == dasda and there also disks named
  dasdaa and dasdab (this will be the case with more than 26 DASDs), the
  command would print all three lines instead of just the correct one
  and break the resulting format.
- Eliminate the "dasdfmt" directive, merge its fields into
  the "disk" directive. This has the advantage that the exclusion code
  eliminates the "disk" lines for unused disks automatically.
  Previously, the "dasdfmt" lines for unused disks stayed in the layout
  file, which later caused the unused disks to be reformatted
  unconditionally (including disk with the backup itself on
  it if one used a local storage for backup).
  Unfortunately it means that the format of the "disk" directive now
  depends on the disk label type: dasd disks have extra fields that
  other disk types don't have.
- Add information about number of cylinders to the disk directive for
  DASDs, it is more stable than the size in bytes (which depends on
  the format). Currently unused.
- Adapt to the new format of disklayout.conf entries (dasd_channel and
  disk directives)
- Move the DASD formatting code after the mapping code. Otherwise when
  device names change, the code would format the old names, not the new
  names.
- Generate the DASD formatting code as a separate script and let the
  user confirm it before it gets executed. Heavily inspired by the disk
  wiping code (it is similarly destructive), but executed during the
  "layout/prepare" stage and not during the "layout/recreate" stage. The
  reason is that unformatted DASDs do not have their size in bytes known
  (it depends on format), but other scripts in the "layout/prepare"
  stage need to know the disk sizes (e.g. for resizing partitions).
- The disk comparison and mapping script will now be less accurate,
  because it won't know the current sizes if disks are not formatted. To
  compensate, introduce a variable called DISK_MAPPING_HINTS that allows
  the DASD activation code to pass the information on changed disk names
  to the rest of the layout code (the identification is based on virtual
  device numbers).
Reformatting is a dangerous operation, equivalent to wiping disks (which
we also let the user always confirm).
520_exclude_components.sh excludes them as well, we run before it, so we
have to reimplement it here.
When disk mapping hints (DISK_MAPPING_HINTS) indicate we know the
correct mapping for a disk, use them in preference to device name and
size comparison. They are supposed to be more reliable and also they
are set by the s390-specific code for disks (DASDs) that are not yet
formatted and thus their size is not yet known, so guessing according to
disk size does not work properly.
Allow to choose to skip DASD formatting from the DASD format script
confrmation dialog. Can be also set by setting the FORMAT_DASDS variable
to a false value.
The default value is "" and it means to format the DASDs by default, in
an analogy with DISKS_TO_BE_WIPED.
@pcahyna
Copy link
Member Author

pcahyna commented Feb 21, 2023

@mutable-dan I believe you are the original author of the S/390 code - can you please have a look?

@pcahyna pcahyna requested a review from a team February 21, 2023 23:14
@@ -57,7 +58,8 @@ sync

EOF

create_partitions "$disk" "$label"
# $junk can contain useful DASD-specific fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a variable is named 'junk'
but (sometimes) it contains useful data,
then the varaiable should not be named 'junk'
(even if in most cases that data is useless).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I am going to rename it to "extrafields" or something similar where applicable.

Log "Excluding $disk from DASD reformatting."
continue
fi
# dasd has more fields - junk is not junk anymore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above:
When a variable is named 'junk'
but (sometimes) it contains useful data,
then the varaiable should not be named 'junk'
(even if in most cases that data is useless).

@jsmeix jsmeix added enhancement Adaptions and new features bug The code does not do what it is meant to do cleanup labels Feb 22, 2023
@jsmeix jsmeix added this to the ReaR v2.8 milestone Feb 22, 2023
@jsmeix
Copy link
Member

jsmeix commented Feb 22, 2023

WOW!
Looks like a severe improvement.

@pcahyna
I have a question because you wrote

The heuristics in disk comparison and mapping script
will now be less accurate

Will it be less accurate only in case of DASD disks
or will it be less accurate in general?

My fear is that a possibly disastrous automated mapping
might happen when it is less accurate in general.

Copy link
Member

@jsmeix jsmeix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve it basically blindly
(fortunately I don't have a IBM Z machine in my homeoffice)
and I trust @pcahyna to have things properly tested

@jsmeix
Copy link
Member

jsmeix commented Feb 22, 2023

Currently (i.e. since ReaR 2.6) there is only

Initial preliminary first basic support for IBM Z
...
so that interested users can try out early
how far things work

see
https://github.com/rear/rear/blob/rear-2.7/doc/rear-release-notes.txt#L1520

@pcahyna
do I understand it correctly that this pull request
enhances ReaR towards official basic support for IBM Z
that is limited to DASD disks and perhaps some other
limitations?

@pcahyna
Copy link
Member Author

pcahyna commented Feb 22, 2023

WOW! Looks like a severe improvement.

@pcahyna I have a question because you wrote

The heuristics in disk comparison and mapping script
will now be less accurate

Will it be less accurate only in case of DASD disks or will it be less accurate in general?

My fear is that a possibly disastrous automated mapping might happen when it is less accurate in general.

Only in case of DASDs, because the loss of accuracy results from the loss of size information on unformatted DASDs at this point, due to low-level formatting being performed after mapping. This is not an issue for disks that do not need low-level formatting.

@pcahyna
Copy link
Member Author

pcahyna commented Feb 22, 2023

Currently (i.e. since ReaR 2.6) there is only

Initial preliminary first basic support for IBM Z
...
so that interested users can try out early
how far things work

see https://github.com/rear/rear/blob/rear-2.7/doc/rear-release-notes.txt#L1520

@pcahyna do I understand it correctly that this pull request enhances ReaR towards official basic support for IBM Z that is limited to DASD disks and perhaps some other limitations?

It is limited to virtual machines using z/VM hypervisors, I have no idea whether the code would work in a LPAR or under KVM.

Concerning disks, the other choice of disks on IBM Z are SCSI disks accessed using Fibre Channel (FCP). My limited understanding is that the SCSI disks do not need any special handling compared to SCSI disks on other architectures, so they might "just work", but I have not tested them.

@@ -24,6 +24,7 @@ fi
### Prepare a disk for partitioning/general usage.
create_disk() {
local component disk size label junk
local blocksize layout dasdtype dasdcyls junk2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that I introduced unused variables here.

@pcahyna
Copy link
Member Author

pcahyna commented Feb 22, 2023

And yes, with these changes I would not consider ReaR support "preliminary" anymore (on the supported configurations).

# (pipelines run in subshells)
LogPrint "Enabling DASD $device with virtual device number $bus"
if chccwdev -e $bus ; then
newname=$(lsdasd $bus | awk "/$bus/ { print \$3}" )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: all those s390 specific tools are added in usr/share/rear/prep/Linux-s390/305_include_s390_tools.sh to PROGS - shouldn't that be rather REQUIRED_PROGS to ensure that the recovery code won't fail due to missing tools?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should. I noticed that as well and intended to submit a separate PR.

rear_workflow="rear $WORKFLOW"
original_disk_space_usage_file="$VAR_DIR/layout/config/df.txt"
rear_shell_history="$( echo -e "cd $VAR_DIR/layout/\nvi $DASD_FORMAT_CODE\nless $DASD_FORMAT_CODE" )"
unset choices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better define like this

local choices=( 
"line 1"
"line 2"
...)

or, once we decide to do #2941 mapfile -t choices <<EOF makes this much simpler

Copy link
Member

@jsmeix jsmeix Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a copy of my code
that I used at several places,
cf. what

# find usr/sbin/rear usr/share/rear -type f | xargs grep -A2 'unset choices'

shows like

unset choices
choices[0]="fist choice"
choices[1]="second choice"
...
if CONDITION ; then
   choices[1]="another second choice"
fi 
...
        case "$choice" in
            (${choices[0]})
               ...
            (${choices[1]})
...

e.g. see
layout/recreate/default/200_run_layout_code.sh
where choices are rather complicated.

I have no idea why I did not use a local array.

I think I used the explicit choices[0], choices[1]
in the initial assingments to make it more clear
what each choice in the subsequent code is.


rear_workflow="rear $WORKFLOW"
original_disk_space_usage_file="$VAR_DIR/layout/config/df.txt"
rear_shell_history="$( echo -e "cd $VAR_DIR/layout/\nvi $DASD_FORMAT_CODE\nless $DASD_FORMAT_CODE" )"
Copy link
Member

@schlomo schlomo Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vi is probably not really guaranteed to be the there.

Ah, see #2734 (comment) and #1298 so yes, vi is a safe choice.

rear_shell_history="$( echo -e "cd $VAR_DIR/layout/\nvi $DASD_FORMAT_CODE\nless $RUNTIME_LOGFILE" )"
wilful_input=""

unset choices
Copy link
Member

@schlomo schlomo Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above. In general we want to prefix all variables with local unless they have global relevance


# Local functions must be 'unset' because bash does not support 'local function ...'
# cf. https://unix.stackexchange.com/questions/104755/how-can-i-create-a-local-function-in-my-bashrc
unset -f lsdasd_output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required if you use local due to the function scope created by Source function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required because of what the comment at this code place tells

# again and again until it succeeds or the user aborts
# or the user confirms to continue with what is currently on the disks
# (the user may have setup manually what he needs via the Relax-and-Recover shell):
while true ; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is a huge state machine, how did you test this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is derived from my "huge state machine" in
layout/recreate/default/200_run_layout_code.sh
There I had tested it manually as good as I could
(but I would not be surprised if I missed something).
On the other hand no user reported an issue with it.

Copy link
Member

@schlomo schlomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see by my nitpicking I have no basis on which to judge the functionality.

Great job on advancing s390 support here, big thanks!

@schlomo
Copy link
Member

schlomo commented Feb 28, 2023

We have too many open PRs that could be merged IMHO, so I'll just go ahead and merge this now.

@pcahyna maybe you can follow this up with some documentation changes to update the support status of s390x?

@schlomo schlomo merged commit 015c1ff into rear:master Feb 28, 2023
@pcahyna
Copy link
Member Author

pcahyna commented Mar 14, 2023

@schlomo thanks, sorry for my belated reply. As @jsmeix said - the scripts to confirm and rerun the generated DASD format scrips are adapted from existing scripts, especially in the case of 400_run_dasd_format_code.sh. I have not tested it in entirety - I believe have I tested editing and confirming the script, but not rerunning it in case it has failed. I trust this part because it was adapted from prior code with only minimal changes. That's why I have not followed your style suggestions: they would increase the difference from the original script and thus increase the risk of introducing a bug.

Ideally all the boilerplate (the state machine) should be shared and only the messages, file names etc. would change, but I don't see how to do that easily, especially in shell.

@schlomo
Copy link
Member

schlomo commented Mar 14, 2023

Yes, I totally understand that. Let's hope that there will be more users testing this and provide us feedback

pcahyna added a commit to pcahyna/rear that referenced this pull request Mar 15, 2023
- remove unused variables
- rename "junk" to more meaningful name when it contains useful data
  (addresses review comment in PR rear#2943)
- fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The code does not do what it is meant to do cleanup enhancement Adaptions and new features fixed / solved / done
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants