Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block/009 failed #17

Closed
gurugio opened this issue Apr 23, 2018 · 10 comments
Closed

block/009 failed #17

gurugio opened this issue Apr 23, 2018 · 10 comments

Comments

@gurugio
Copy link

gurugio commented Apr 23, 2018

Hi,

The block/009 test case failed as below.
I tested it on two machines and both resulted the same failure.

  • I tried with xfsprogs 4.9.0 on Ubuntu and xfsprogs 4.16.0-rc1 on Debian but failed
  • I tried "noop" and "deadline" IO scheduler but failed
gohkim@ws00837:~/work/tools/blktests$ sudo ./check block/009
block/009 (check page-cache coherency after BLKDISCARD)      [failed]
    runtime  ...  0,977s
    --- tests/block/009.out	2018-04-23 17:52:00.222822301 +0200
    +++ results/nodev/block/009.out.bad	2018-04-23 17:54:34.571254091 +0200
    @@ -1,8 +1,17 @@
     Running block/009
     0000000 0000 0000 0000 0000 0000 0000 0000 0000
     *
    +0200000 bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
    +*
    +0c00000 0000 0000 0000 0000 0000 0000 0000 0000
    +*
    ...
    (Run 'diff -u tests/block/009.out results/nodev/block/009.out.bad' to see the entire diff)
gohkim@ws00837:~/work/tools/blktests$ cat results/nodev/block/009.out.bad 
Running block/009
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000 bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
*
0c00000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000 bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
*
0c00000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
modprobe: FATAL: Module scsi_debug is in use.
Test complete

If I ran blkdiscard after the test, the sdg device was cleared.

root@ib1:/home/gkim/pb-ltp/blktests# ./check block/009
block/009 (check page-cache coherency after BLKDISCARD)      [failed]
    runtime  2.334s  ...  2.380s
    --- tests/block/009.out	2018-04-23 14:21:33.011890789 +0200
    +++ results/nodev/block/009.out.bad	2018-04-23 18:07:12.451914680 +0200
    @@ -1,8 +1,17 @@
     Running block/009
     0000000 0000 0000 0000 0000 0000 0000 0000 0000
     *
    +0200000 bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
    +*
    +0c00000 0000 0000 0000 0000 0000 0000 0000 0000
    +*
    ...
    (Run 'diff -u tests/block/009.out results/nodev/block/009.out.bad' to see the entire diff)
root@ib1:/home/gkim/pb-ltp/blktests# blkdiscard /dev/sdg
root@ib1:/home/gkim/pb-ltp/blktests# dd if=/dev/sdg bs=4M iflag=direct | hexdump
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
8+0 records in
8+0 records out
33554432 bytes (34 MB) copied, 0.122978 s, 273 MB/s
2000000

I hope you could guess the reason.
Please inform me if you need more information.

@sitsofe
Copy link

sitsofe commented Apr 24, 2018

@gurugio Might be linked to your message:

modprobe: FATAL: Module scsi_debug is in use.

So maybe you made stacked device or a filesystem atop the device that scsi_debug made thus locking it open...

@gurugio
Copy link
Author

gurugio commented Apr 24, 2018

I checked a reference counter of the scsi_debug module.
The reference counter was 0 just after it was loaded.
So I think there is nothing else to use the scsi_debug module.
The reference counter was 1 when modprobe tried to unload the scsi_debug module.
And it became 0 at the end of the test.

After two xfs_io executed, the reference counter became 2.
Is there a possibility that xfs_io was not finished?

gohkim@ws00837:~/work/tools/blktests$ sudo modprobe -r scsi_debug
gohkim@ws00837:~/work/tools/blktests$ git diff tests/block/009
diff --git a/tests/block/009 b/tests/block/009
index 69d0be7..9b64500 100755
--- a/tests/block/009
+++ b/tests/block/009
@@ -38,19 +38,27 @@ test() {
                return 1
        fi
 
+       lsmod | grep scsi
+       
        local dev="/dev/${SCSI_DEBUG_DEVICES[0]}"
        # Initialize data with known pattern
        xfs_io -c "pwrite -S 0xaa -b 2M 0 32M -w" -d "$dev" >>"$FULL"
 
+       lsmod | grep scsi
        # Fill page cache with dirty data
        xfs_io -c "pwrite -S 0xbb -b 2M 2M 10M" -f "$dev" >>"$FULL"
+       lsmod | grep scsi
 
        blkdiscard "$dev"
+
        # After device was fully discarded, read sould return all zeroes
        dd if="$dev" bs=4k  2>>"$FULL" | hexdump
        dd if="$dev" bs=4M iflag=direct 2>>"$FULL" | hexdump
+       lsmod | grep scsi
 
        _exit_scsi_debug
+       lsmod | grep scsi
 
        echo "Test complete"
+       lsmod | grep scsi
 }
gohkim@ws00837:~/work/tools/blktests$ cat results/nodev/block/009.out.bad 
Running block/009
scsi_debug             77824  0
scsi_debug             77824  1
scsi_debug             77824  2
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
scsi_debug             77824  1
modprobe: FATAL: Module scsi_debug is in use.
scsi_debug             77824  1
Test complete
scsi_debug             77824  0

I added a "-d" option to pwrite for direct IO and had no data conflict.

gohkim@ws00837:~/work/tools/blktests$ git diff
diff --git a/tests/block/009 b/tests/block/009
index 69d0be7..c72627f 100755
--- a/tests/block/009
+++ b/tests/block/009
@@ -40,12 +40,13 @@ test() {
 
        local dev="/dev/${SCSI_DEBUG_DEVICES[0]}"
        # Initialize data with known pattern
-       xfs_io -c "pwrite -S 0xaa -b 2M 0 32M -w" -d "$dev" >>"$FULL"
+       xfs_io -c "pwrite -S 0xaa -b 2M 0 32M -d -w" -d "$dev" >>"$FULL"
 
        # Fill page cache with dirty data
-       xfs_io -c "pwrite -S 0xbb -b 2M 2M 10M" -f "$dev" >>"$FULL"
+       xfs_io -c "pwrite -S 0xbb -b 2M 2M 10M -d" -f "$dev" >>"$FULL"
 
        blkdiscard "$dev"
+
        # After device was fully discarded, read sould return all zeroes
        dd if="$dev" bs=4k  2>>"$FULL" | hexdump
        dd if="$dev" bs=4M iflag=direct 2>>"$FULL" | hexdump
gohkim@ws00837:~/work/tools/blktests$ sudo ./check block/009
block/009 (check page-cache coherency after BLKDISCARD)      [failed]
    runtime  1,080s  ...  1,074s
    --- tests/block/009.out	2018-04-23 17:52:00.222822301 +0200
    +++ results/nodev/block/009.out.bad	2018-04-24 10:56:02.220782318 +0200
    @@ -5,4 +5,5 @@
     0000000 0000 0000 0000 0000 0000 0000 0000 0000
     *
     2000000
    +modprobe: FATAL: Module scsi_debug is in use.
     Test complete

I checked /sys/block/sdc/stat and found that data was written to the disk after blkdiscard.

gohkim@ws00837:~/work/tools/blktests$ git diff
diff --git a/tests/block/009 b/tests/block/009
index 69d0be7..fbf91f4 100755
--- a/tests/block/009
+++ b/tests/block/009
@@ -44,13 +44,17 @@ test() {
 
        # Fill page cache with dirty data
        xfs_io -c "pwrite -S 0xbb -b 2M 2M 10M" -f "$dev" >>"$FULL"
-
+       cat /sys/block/sdc/stat
        blkdiscard "$dev"
+       cat /sys/block/sdc/stat
        # After device was fully discarded, read sould return all zeroes
        dd if="$dev" bs=4k  2>>"$FULL" | hexdump
        dd if="$dev" bs=4M iflag=direct 2>>"$FULL" | hexdump
+       cat /sys/block/sdc/stat
 
        _exit_scsi_debug
 
        echo "Test complete"
+       cat /sys/block/sdc/stat
+
 }
gohkim@ws00837:~/work/tools/blktests$ cat results/nodev/block/009.out.bad 
Running block/009
      33        0      704      132       65        0    65536      264        0      188      392
      34        0      712      136       67        0   131072      280        1      196      408  ---> after blkdiscard
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000 bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
*
0c00000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000 bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
*
0c00000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
     308        0   111296     1244       87     2540   151552      360        0      640     1596 ---> after dd
modprobe: FATAL: Module scsi_debug is in use.
Test complete
     308        0   111296     1244       87     2540   151552      360        0      640     1596

@osandov
Copy link
Owner

osandov commented Apr 24, 2018

@gurugio what kernel version is this on? It looks like the fix that this test is testing went in for 4.15.

I've seen the modprobe issue but haven't figured out who is still holding the device open. But let's focus on the test failure first.

@gurugio
Copy link
Author

gurugio commented Apr 25, 2018

I am using Ubuntu and the standard kernel as following.

gohkim@ws00837:~/work$ uname -a
Linux ws00837 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
gohkim@ws00837:~/work$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=17.10
DISTRIB_CODENAME=artful
DISTRIB_DESCRIPTION="Ubuntu 17.10"

@gurugio
Copy link
Author

gurugio commented Apr 25, 2018

My colleague found a patch related to discard block device in kernel log.
Should I apply that patch?

351499a1 block: Invalidate cache on discard v2

@osandov
Copy link
Owner

osandov commented Apr 25, 2018

Yes, that is the patch I'm talking about. All of the regression tests in blktests mention what patch they depend on, see https://github.com/osandov/blktests/blob/master/tests/block/009.

@osandov
Copy link
Owner

osandov commented Apr 25, 2018

You will also need 0bd1ed4860d0 ("block: pass inclusive 'lend' parameter to truncate_inode_pages_range").

@gurugio
Copy link
Author

gurugio commented Apr 26, 2018

OK, thank you.
Then could you inform me what to do for "modprobe: FATAL: Module scsi_debug is in use." error?

osandov added a commit that referenced this issue May 24, 2018
I'm still seeing "modprobe: FATAL: Module scsi_debug is in use." errors.
At least on v4.17-rc6, this gets rid of them. Fixes GitHub issue #17.

Signed-off-by: Omar Sandoval <osandov@fb.com>
@osandov osandov closed this as completed May 24, 2018
@osandov
Copy link
Owner

osandov commented May 24, 2018

@gurugio, I just pushed a commit which should fix this.

@gurugio
Copy link
Author

gurugio commented May 24, 2018

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants