Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sles4sap: Use rsync to copy media #14846

Closed
wants to merge 1 commit into from

Conversation

ricardobranco777
Copy link
Contributor

@ricardobranco777 ricardobranco777 commented May 6, 2022

This PR:

  • Uses rsync to copy media instead of cp.
  • As a result of using rsync with the --checksum option we can drop the MD5 checksum test later.

Related MR to copy 4G instead of 16G:
https://gitlab.suse.de/qa-css/openqa_ha_sap/-/merge_requests/357

  • Verification run: after I get feedback

@github-actions
Copy link

github-actions bot commented May 6, 2022

Great PR! Please pay attention to the following items before merging:

Files matching lib/**.pm:

  • Consider adding or extending unit tests in t/

This is an automatically generated QA checklist based on modified files

lib/sles4sap.pm Show resolved Hide resolved
@ricardobranco777 ricardobranco777 added the WIP Work in progress label May 6, 2022
@mpagot
Copy link
Contributor

mpagot commented May 6, 2022

LGTM

@lpalovsky
Copy link
Contributor

lpalovsky commented May 6, 2022

Looks good so far 👍

Copy link
Contributor

@mpagot mpagot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lib/sles4sap.pm Outdated Show resolved Hide resolved
Copy link
Contributor

@alvarocarvajald alvarocarvajald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding my comments/concerns here as well:

  • Great idea substituting NFS with rsync and script_retry. This should help with the constants timeout on copies over NFS.
  • However I do not like that this makes sles4sap/hana_install a wholly different module to sles4sap/wizard_hana_install. It used to be that both modules took the same assets and performed the same installation, leaving the SUT in the same state so it can be checked with sles4sap/hana_test, with the only difference being the Wizard vs. CLI bit. By changing NFS for Rsync, and copying only HDB_SERVER_LINUX as referenced in https://gitlab.suse.de/qa-css/openqa_ha_sap/-/merge_requests/357, the differences between both modules widens. My vote would be for both modules to consume the same settings and assets.
  • HANA pre-validation tests on x86_64 and ppc64le outside of osd also use this module. Not sure if it would be possible to set up a Rsync server in that network for ppc64le, but pretty sure there would be no issues to do so for x86_64. @lpalovsky this is something that needs to be confirmed before merging this as is, otherwise I would say you may be force to keep the rsync command over NFS instead of using the rsync protocol.
  • In the HANA training (HA200/HA201) we were told to use the hdblcm that is located in the HDB_LCM_* sub-directory. There is a HANA_HDBLCM setting defined in the module. If the changes in https://gitlab.suse.de/qa-css/openqa_ha_sap/-/merge_requests/357 are merged and if there are tests using HANA_HDBLCM pointing to the HDB_LCM_* sub-directory, then these will fail.
  • Keeping in mind what I wrote above about having both sles4sap/hana_install and sles4sap/wizard_hana_install to be as close as possible, I would assume that the Wizard tests would also be changed to only use the HDB_SERVER_LINUX sub-directory.

@@ -347,26 +342,10 @@ sub copy_media {
assert_script_run "mkdir $target";
assert_script_run "mount -t $proto -o ro $path $mnt_path";
$media_path = $mnt_path if script_run "[[ -d $media_path ]]"; # Check if specific ARCH subdir exists
assert_script_run "cp -ax $media_path/. $target/", $nettout;
script_retry ("rsync --archive --checksum $rsync_server:$media_path/. $target/", timeout =>$nettout, delay => 60, retry => 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where $rsync_server is initialized. That may cause the module to fail to load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, it was meant to be a placeholder in my comment and got just copied over.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$media_path probably has to be set to a different value than before as well, as the rsync path is relative and not absolute.

@@ -211,7 +211,7 @@ sub run {
# hdblcm is used for installation, verify if it exists
my $hdblcm = '/sapinst/' . get_var('HANA_HDBLCM', "DATA_UNITS/HDB_SERVER_LINUX_" . uc(get_required_var('ARCH')) . "/hdblcm");
die "hdblcm is not in [$hdblcm]. Set HANA_HDBLCM to the appropiate relative path. Example: DATA_UNITS/HDB_SERVER_LINUX_X86_64/hdblcm"
if (script_run "ls $hdblcm");
if (script_run "ls $hdblcm /sapinst/hdblcm");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hard-code the path here. Set HANA_HDBLCM: 'hdbclm' in the test settings instead, or change the default in L212.

@bs-suse
Copy link
Contributor

bs-suse commented May 10, 2022

My vote would be for both modules to consume the same settings and assets.

I did set up the rsync daemon on the same VM as the NFS server is on and have it pointing to the same directory, so it will be the same assets at least.

@@ -347,26 +342,10 @@ sub copy_media {
assert_script_run "mkdir $target";
assert_script_run "mount -t $proto -o ro $path $mnt_path";
$media_path = $mnt_path if script_run "[[ -d $media_path ]]"; # Check if specific ARCH subdir exists
assert_script_run "cp -ax $media_path/. $target/", $nettout;
script_retry ("rsync --archive --checksum $rsync_server:$media_path/. $target/", timeout =>$nettout, delay => 60, retry => 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricardobranco777 @alvarocarvajald
WDF tests use HANA_REL variable which specifies directory in $media_path for HANA release being used.
For example: /data/HANA2/Last_SPS06 or /data/HANA2/Last_SPS05
One option would be just appending variable after $media_path/HANA_REL.
Second option could be to have if statement which will do the 'cp' in case HANA_REL is defined, otherwise rsync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricardobranco777 @alvarocarvajald WDF tests use HANA_REL variable which specifies directory in $media_path for HANA release being used. For example: /data/HANA2/Last_SPS06 or /data/HANA2/Last_SPS05 One option would be just appending variable after $media_path/HANA_REL. Second option could be to have if statement which will do the 'cp' in case HANA_REL is defined, otherwise rsync.

But HANA_REL is already appended to HANA, for example: https://gitlab.suse.de/qa-css/openqa_ha_sap/-/blob/master/openQA/yaml_job_groups/openqa.wdf.sap.corp/sles4sap-15sp3.yml#L30

And HANA in turn points to a NFS share.

So, if it's possible to have an rsync server there, then the HANA setting would need to be changed. Otherwise, mounting over NFS and then copying with rsync as @ricardobranco777 is proposing here, should be safe.

IMO, before switching completely to rsync, a rsync server needs to be configured in WDF and rsync connectivity to it from all openQA workers and ppc64le LPARs need to be confirmed to be working. If rsync is somehow blocked by an ACL, then I would say it is better to keep NFS in the picture, as we know that protocol is allowed. An hybrid approach, i.e., rsync for osd, and NFS for everywhere else could also be possible, but I feel like that makes things unnecessarily complicated.

@dzedro
Copy link
Contributor

dzedro commented May 13, 2022

The main reason why we have failures due to slow copy is worker in Prague which is making the copy timeout.
https://openqa.suse.de/tests/8750059
https://openqa.suse.de/tests/8750060

We don't have to change path in test, there is HANA variable and we can have just copy of all the data we don't use or just delete it.
https://progress.opensuse.org/issues/110671#note-4

@ricardobranco777
Copy link
Contributor Author

Superseded by:
#14964

@ricardobranco777 ricardobranco777 deleted the rsync branch May 25, 2022 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in progress
Projects
None yet
7 participants