Skip to content

rsahlberg/flaky-stgt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is probably not the version of TGT you are looking for.

This is a special version of TGTD that is used to emulate a flaky/broken disk
and is not useful to you unless you are a SCSI initiator driver developer or
a filesystem developer and you want to test how your error recovery paths
work when disks go bad.

Do not use this unless you are a SCSI initiator or filesystem developer!

Show all current error injections:
./usr/tgtadm --mode error --op show

Discard all current error injections:
./usr/tgtadm --mode error --op delete

Add a new error injection:
./usr/tgtadm --mode error --op new --tid 1 --lun 3 --error op=<command>,lba=<lba>,len=<len>,pct=<pct>,pause=<seconds>,repeat=<count>,action=<action>,key=<key>,asc=<asc>

<command> is the SCSI command. Supported SCSI commands are
READ6/10/12/16, WRITE6/10/12/16

<lba> is the first LBA affected by this error.

<len> is num ber of blocks, starting at <lba> that is affected by this error.

<pct> is probability in percent that this error will trigger. It is usually 100
but you can set it to something smaller to emulate a transient error.
Perhaps setting pct=10 for any READ/WRITE to make a disk fail 10% of all
operations could be a good torture test for an initiator/filesystem to test
that it can handle and redrive I/O on recoverable errors.

<seconds> is number of seconds the device will hand for errors triggered by this
rule before it will respond to the request. This is probably almost always
going to be 0.

<count> If set to > 0 then this error will only trigger <count> times before
the error rule will become disabled. This is useful for transient errors.

<action> Is the type of action that willb e taken if the rule triggers.
CHECK_CONDITION means that we will fail the I/O with a SCSI Check Condition.
For Check Condition you must also specify key and asc.

<key> is the SCSI sense key to return for check condition.
Example: key==3 means MEDIUM_ERROR and is useful for emulating damaged disks.

<asc> is the SCSI ASC to return for check conditions.
Example: asc==0x0c02 means WRITE ERROR AUTOREALLOCATION FAILED.
See SPC4 for a list of all ASCs.



===
Example: show how to create a multi-device filesystem and then "break"
one of the devices to see how the fulesystem copes.
===
#
# Create three disks and export them through flaky iSCSI
#
truncate -s 1G /data/tmp/disk1.img
truncate -s 1G /data/tmp/disk2.img
truncate -s 1G /data/tmp/disk3.img

killall -9 tgtd
./usr/tgtd -f -d 1 &

sleep 3

./usr/tgtadm --op new --mode target --tid 1 -T iqn.ronnie.test

./usr/tgtadm --op new --mode logicalunit --tid 1 --lun 1 -b /data/tmp/disk1.img --blocksize=4096
./usr/tgtadm --op new --mode logicalunit --tid 1 --lun 2 -b /data/tmp/disk2.img --blocksize=4096
./usr/tgtadm --op new --mode logicalunit --tid 1 --lun 3 -b /data/tmp/disk3.img --blocksize=4096

./usr/tgtadm --op bind --mode target --tid 1 -I ALL


#
# connect to the three disks
#
iscsiadm --mode discoverydb --type sendtargets --portal 127.0.0.1 --discover
iscsiadm --mode node --targetname iqn.ronnie.test --portal 127.0.0.1:3260 --login
#
# check dmesg, you should now have three new 1G disks
#
# Use: iscsiadm --mode node --targetname iqn.ronnie.test \
#      --portal 127.0.0.1:3260 --logout
# to disconnect the disks when you are finished.


# create a btrfs filesystem
mkfs.btrfs -f -d raid1 /dev/disk/by-path/ip-127.0.0.1:3260-iscsi-iqn.ronnie.test-lun-1 /dev/disk/by-path/ip-127.0.0.1:3260-iscsi-iqn.ronnie.test-lun-2 /dev/disk/by-path/ip-127.0.0.1:3260-iscsi-iqn.ronnie.test-lun-3

# mount the filesystem
mount /dev/disk/by-path/ip-127.0.0.1:3260-iscsi-iqn.ronnie.test-lun-1 /mnt

# copy some big files to the filesystem

# make it impossible to write to the third disk
# 3 - MEDIUM ERROR
# 0x0c02 - WRITE ERROR AUTOREALLOCATION FAILED
#
./usr/tgtadm --mode error --op new --tid 1 --lun 3 --error op=WRITE10,lba=0,len=99999999,pct=100,pause=0,repeat=0,action=CHECK_CONDITION,key=3,asc=0x0c02

which means:
WRITE10 overlaping the area LBA:0 and the next 99999999 blocks
will in 100% of the time fail with a Check condition and medium error/write error.

Setting delay!=0 means that the scsi disk will "hang" for that many seconds
before responding.

Setting repeat!=0 means that this condition will ONLY apply that many times and then it will "self heal".


 To show all current error injects:
# ./usr/tgtadm --mode error --op show
#
# To delete/clear all current error injects:
# ./usr/tgtadm --mode error --op delete

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published