Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data file write failure for LTO-6 drives #99

Closed
rohr22 opened this issue Jun 3, 2022 · 8 comments
Closed

Data file write failure for LTO-6 drives #99

rohr22 opened this issue Jun 3, 2022 · 8 comments

Comments

@rohr22
Copy link

rohr22 commented Jun 3, 2022

I am noticing a lot of errors like this from mhvtl:

Jun  2 18:38:42 lews /usr/bin/vtltape[2249]: ssc_write_6(): ssc_write_6(): 1 block of 524288 bytes (158341) **
Jun  2 18:38:42 lews /usr/bin/vtltape[2249]: retrieve_CDB_data(): retrieving 524288 bytes from kernel
Jun  2 18:38:42 lews /usr/bin/vtltape[2249]: check_restrictions(): returning: writable
Jun  2 18:38:42 lews /usr/bin/vtltape[2249]: writeBlock_lzo(): Compression: Orig 524288, after comp: 488361
Jun  2 18:38:42 lews /usr/bin/vtltape[2249]: write_tape_block(): CRC is 0xfd73edbe
Jun  2 18:38:42 lews /usr/bin/vtltape[2249]: return_sense(): [Key/ASC/ASCQ] [03 0c 00]
Jun  2 18:38:42 lews /usr/bin/vtltape[2249]: ERROR: write_tape_block(): Data file write failure, pos: 163: No message of desired type
Jun  2 18:38:42 lews /usr/bin/vtltape[2249]: write_tape_block(): Truncating data file size: 163

This causes an end of media error for our application and then we lock the drive so it cannot be written to afterward.

What would cause the "Data file write failure"? I looked at the code in vtlcart.c and found this:

        /* Now write out both the data and the header. */
        if (null_media_type) {
                nwrite = disk_blk_size;
        } else
                nwrite = pwrite(datafile, buffer, disk_blk_size, data_offset);
        if (nwrite != disk_blk_size) {
                sam_medium_error(E_WRITE_ERROR, sam_stat);

                MHVTL_ERR("Data file write failure, pos: %" PRId64 ": %s",
                        data_offset, strerror(errno));

                /* Truncate last partital write */
                MHVTL_DBG(1, "Truncating data file size: %"PRId64, data_offset);
                if (ftruncate(datafile, data_offset) < 0) {
                        MHVTL_ERR("Error truncating data: %s", strerror(errno));
                }

                mkEODHeader(blk_number, data_offset);
                return -1;
        }

The pwrite call is returning an unexpected value. This issue seems to occur on multiple different machines. I am using the mhvtl code from the commit on Mar 13th of this year.

The errors are occurring on RHEL8 x86_64 machines.

Thank you,
Peter

@markh794
Copy link
Owner

markh794 commented Jun 3, 2022

Hello Peter,

Can I enquire as to the underlying file system on /opt/mhvtl - is it ext3,4,xfs etc and if the underlying block device is 'local' or nfs/cifs etc.

@rohr22
Copy link
Author

rohr22 commented Jun 3, 2022

Hi, Mark.

It is ext4. We put our mhvtl tape volume files under the old default directory of /opt/vtl. The disk where the files reside is local. We used a opt-vtl.mount file under the /etc/systemd/system directory to mount the disk to the /opt/vtl directory. It has these contents:

[Unit]
Description=MHVTL
[Mount]
What=/dev/disk/by-uuid/c1debf41-51c5-41dd-9b90-54d974e455b6
Where=/opt/vtl
Type=ext4
Options=defaults
[Install]
WantedBy=multi-user.target

Does that answer your questions?

Thank you,
Peter

@rohr22
Copy link
Author

rohr22 commented Jun 7, 2022

It seems like there might be issues with the latest mhvtl version and LTO-6 drives on RHEL8. I noticed that with mhvtl 1.6.4 the problems with LTO-6 drives don't occur on RHEL8. When I updated my system that has mhvtl 1.7.0 to use LTO-5 drives the problems did not occur. The only difference between what I was testing with before with mhvtl 1.7.0 was that the etc/generate_device_conf.in file had:

        add_library 10 0 0 0 "IBM" "3584"  "2160"  "XYZZY_A"
        #         index channel target LUN S/No Lib# Slot
        add_ibm_ultrium_5_drive 11 0 1 0 "XYZZY_A1" 10 1
        add_ibm_ultrium_5_drive 12 0 2 0 "XYZZY_A2" 10 2
        add_ibm_ultrium_5_drive 13 0 3 0 "XYZZY_A3" 10 3
        add_ibm_ultrium_5_drive 14 0 4 0 "XYZZY_A4" 10 4
        add_ibm_ultrium_5_drive 15 0 5 0 "XYZZY_A5" 10 5
        add_ibm_ultrium_5_drive 16 0 6 0 "XYZZY_A6" 10 6
        add_ibm_ultrium_5_drive 17 0 7 0 "XYZZY_A7" 10 7
        add_ibm_ultrium_5_drive 18 0 8 0 "XYZZY_A8" 10 8
        add_ibm_ultrium_5_drive 19 0 9 0 "XYZZY_A9" 10 9
        add_ibm_ultrium_5_drive 20 0 10 0 "XYZZY_AA" 10 10

instead of:

        add_library 10 0 0 0 "IBM" "3584"  "2160"  "XYZZY_A"
        #         index channel target LUN S/No Lib# Slot
        add_ibm_ultrium_6_drive 11 0 1 0 "XYZZY_A1" 10 1
        add_ibm_ultrium_6_drive 12 0 2 0 "XYZZY_A2" 10 2
        add_ibm_ultrium_6_drive 13 0 3 0 "XYZZY_A3" 10 3
        add_ibm_ultrium_6_drive 14 0 4 0 "XYZZY_A4" 10 4
        add_ibm_ultrium_6_drive 15 0 5 0 "XYZZY_A5" 10 5
        add_ibm_ultrium_6_drive 16 0 6 0 "XYZZY_A6" 10 6
        add_ibm_ultrium_6_drive 17 0 7 0 "XYZZY_A7" 10 7
        add_ibm_ultrium_6_drive 18 0 8 0 "XYZZY_A8" 10 8
        add_ibm_ultrium_6_drive 19 0 9 0 "XYZZY_A9" 10 9
        add_ibm_ultrium_6_drive 20 0 10 0 "XYZZY_AA" 10 10

Could there be something miscoded with the use of add_ibm_ultrium_6_drive vs add_ibm_ultrium_5_drive in the latest mhvtl version? (I used the zip file from the latest code as of June 5, 2022.)

Thank you,
Peter

@rohr22
Copy link
Author

rohr22 commented Jun 7, 2022

Note that the system with issues is a ppc64le system with RHEL 8.5. But I also seemed to have similar issues when the etc/generate_device_conf.in file had add_ibm_ultrium_6_drive calls on a RHEL 8.5 x86_64 system.

@markh794
Copy link
Owner

markh794 commented Jun 7, 2022

The read/write path is exactly the same for all 'emulations'.
The only difference in emulation is the MODE, LOG and INQUIRY (some emulations also include persistent SCSI reservation & Security Protocol IN/OUT) op codes.
Using the 'edit_tape' - you can flip the media type between LTO to DLT to AIT and successfully read/write the same media files from appropriate drive emulation.

@rohr22
Copy link
Author

rohr22 commented Jun 7, 2022

The only difference in emulation is the MODE, LOG and INQUIRY (some emulations also include persistent SCSI reservation & Security Protocol IN/OUT) op codes.

Could this account for the issues I am seeing with the drives created with add_ibm_ultrium_6_drive vs add_ibm_ultrium_5_drive?

@markh794
Copy link
Owner

markh794 commented Jun 8, 2022

No - LTO-4+ emulations all support the same PR & SPIN/SPOUT code.
I just checked (diff -u) the init_ult3580_td5 and init_ult3580_td6 functions and the mode pages and log pages are the same.
The only difference is the list of media permitted to be mounted :
LTO-3 R/O, LTO-4 & LTO-5 R/W for -TD5,
LTO-4 R/O, LTO-5 & LTO-6 R/W for -TD6
i.e. mhvtl code does not treat the two emulations any differently.

Is it possible the application expects the -TD5 / -TD6 to behave slightly differently - and my 'emulation' is incomplete ?

@rohr22
Copy link
Author

rohr22 commented Jun 27, 2022

There might be an issue with our hardware that is causing us issues. The mhvtl volumes seem to work fine for a while, but eventually we may run into the issues noted here on some of our machines. However, this can happen for both LTO-5 and LTO-6 drives.

@rohr22 rohr22 closed this as completed Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants