Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when issuing concurrent vgcreate/vgremove #23

Closed
gpaul opened this issue Aug 8, 2019 · 6 comments
Closed

Deadlock when issuing concurrent vgcreate/vgremove #23

gpaul opened this issue Aug 8, 2019 · 6 comments

Comments

@gpaul
Copy link

gpaul commented Aug 8, 2019

Version information

URL: https://www.sourceware.org/pub/lvm2/LVM2.2.02.183.tgz
sha1sum: c73173d73e2ca17da254883968fbd52a6ce5c2a6

Build steps

export PKG_PATH=/opt/lvm/

./configure --with-confdir=$PKG_PATH/etc --with-default-system-dir=$PKG_PATH/etc/lvm --prefix=$PKG_PATH --sbindir=$PKG_PATH/bin --with-usrsbindir=$PKG_PATH/bin --enable-static_link
make
make install

What were you trying to do?

I remove a volume group using vgremove while creating a different volume group with different PVs using vgcreate.

What happened?

The commands hang both hang. It looks like vgcreate tries to acquire a lock while vgremove holds it.

Steps to reproduce

$ mkdir -p /var/lib/gpaul
$ dd if=/dev/zero of=/var/lib/gpaul/disk1 count=1024 bs=1M
$ dd if=/dev/zero of=/var/lib/gpaul/disk2 count=1024 bs=1M
$ dd if=/dev/zero of=/var/lib/gpaul/disk3 count=1024 bs=1M

$ losetup -f /var/lib/gpaul/disk1
$ losetup -f /var/lib/gpaul/disk2
$ losetup -f /var/lib/gpaul/disk3

$ losetup -a
/dev/loop0: [51713]:41951014 (/var/lib/gpaul/disk1)
/dev/loop1: [51713]:41951015 (/var/lib/gpaul/disk2)
/dev/loop2: [51713]:41951016 (/var/lib/gpaul/disk3)

$ pvcreate /dev/loop0
$ pvcreate /dev/loop1
$ pvcreate /dev/loop2

$ vgcreate gpaul-vg-1 /dev/loop0
$ vgremove --config="log {level=7 verbose=1}" gpaul-vg-1 & vgcreate --config="log {level=7 verbose=1}" gpaul-vg-2 /dev/loop1 /dev/loop2
[1] 22111
    Logging initialised at Thu Aug  8 12:24:47 2019
    Logging initialised at Thu Aug  8 12:24:47 2019
    Archiving volume group "gpaul-vg-1" metadata (seqno 1).
^C  Interrupted...
  Giving up waiting for lock.
  Can't get lock for gpaul-vg-1
  Cannot process volume group gpaul-vg-1
  Interrupted...
  Interrupted...
  Device /dev/loop1 excluded by a filter.
  Device /dev/loop2 excluded by a filter.
    Removing physical volume "/dev/loop0" from volume group "gpaul-vg-1"
  Volume group "gpaul-vg-1" successfully removed
    Reloading config files
$     Reloading config files

[1]+  Done                    vgremove --config="log {level=7 verbose=1}" gpaul-vg-1
$ date
Thu Aug  8 12:25:01 UTC 2019

Note, in the following interleaved logging, process 22112 is vgcreate, process 22111 is vgremove.

I'm attaching the interleaved, verbose, debug logs for the processes as sent to journald.
lvm-deadlock.log

@tasleson
Copy link
Member

tasleson commented Aug 8, 2019

@gpaul
Copy link
Author

gpaul commented Aug 8, 2019

Also of note: the lvm.conf I use is different to the one bundled with the RHEL lvm2 rpms, I use a very standard lvm.conf as generated by the ./configure parameters. I've attached it anyway.
lvm.conf.txt

@gpaul
Copy link
Author

gpaul commented Aug 8, 2019

@tasleson indeed, that looks related. It seems there are still some deadlock issues lurking about.

@gpaul
Copy link
Author

gpaul commented Aug 8, 2019

This looks identical, actually:

...
Aug 08 12:24:47 lvm[22111]: Dropping cache for #orphans.
Aug 08 12:24:47 lvm[22111]: Locking /run/lock/lvm/P_orphans WB
Aug 08 12:24:47 lvm[22111]: _do_flock /run/lock/lvm/P_orphans:aux WB
Aug 08 12:24:47 lvm[22111]: _do_flock /run/lock/lvm/P_orphans WB
...

Here the first process is acquiring /run/lock/lvm/P_orphans.

...
Aug 08 12:24:47 lvm[22112]: Locking /run/lock/lvm/V_gpaul-vg-1 RB
Aug 08 12:24:47 lvm[22112]: _do_flock /run/lock/lvm/V_gpaul-vg-1:aux WB
Aug 08 12:24:47 lvm[22112]: _undo_flock /run/lock/lvm/V_gpaul-vg-1:aux
Aug 08 12:24:47 lvm[22112]: _do_flock /run/lock/lvm/V_gpaul-vg-1 RB
Aug 08 12:24:57 lvm[22112]: Interrupted...
...

...and the second process tries to acquire the volume group lock.

If we look at _do_flock and _undo_flock calls only:

Aug 08 12:24:47 lvm[22112]: _do_flock /run/lock/lvm/V_gpaul-vg-2:aux WB
Aug 08 12:24:47 lvm[22112]: _do_flock /run/lock/lvm/V_gpaul-vg-2 WB
Aug 08 12:24:47 lvm[22112]: _undo_flock /run/lock/lvm/V_gpaul-vg-2:aux
Aug 08 12:24:47 lvm[22112]: _undo_flock /run/lock/lvm/V_gpaul-vg-2
Aug 08 12:24:47 lvm[22112]: _do_flock /run/lock/lvm/P_orphans:aux WB
Aug 08 12:24:47 lvm[22112]: _do_flock /run/lock/lvm/P_orphans WB
Aug 08 12:24:47 lvm[22112]: _undo_flock /run/lock/lvm/P_orphans:aux
Aug 08 12:24:47 lvm[22111]: _do_flock /run/lock/lvm/V_gpaul-vg-1:aux WB
Aug 08 12:24:47 lvm[22111]: _do_flock /run/lock/lvm/V_gpaul-vg-1 WB
Aug 08 12:24:47 lvm[22111]: _undo_flock /run/lock/lvm/V_gpaul-vg-1:aux
Aug 08 12:24:47 lvm[22111]: _do_flock /run/lock/lvm/P_orphans:aux WB
Aug 08 12:24:47 lvm[22111]: _do_flock /run/lock/lvm/P_orphans WB
Aug 08 12:24:47 lvm[22112]: _do_flock /run/lock/lvm/V_gpaul-vg-1:aux WB
Aug 08 12:24:47 lvm[22112]: _undo_flock /run/lock/lvm/V_gpaul-vg-1:aux
Aug 08 12:24:47 lvm[22112]: _do_flock /run/lock/lvm/V_gpaul-vg-1 RB
... deadlocked, eventually interrupted with ctrl+c ...
Aug 08 12:24:57 lvm[22112]: _undo_flock /run/lock/lvm/P_orphans
Aug 08 12:24:57 lvm[22111]: _undo_flock /run/lock/lvm/P_orphans:aux
Aug 08 12:24:57 lvm[22111]: _undo_flock /run/lock/lvm/P_orphans
Aug 08 12:24:57 lvm[22111]: _undo_flock /run/lock/lvm/V_gpaul-vg-1

@gpaul
Copy link
Author

gpaul commented Aug 8, 2019

Yeah,
It looks like 22112 (vgcreate) acquires the P_orphans lock, then the V_gpaul-vg-1 lock.
It looks like 22111 (vgremove) acquires the V_gpaul-vg-1 lock, then the P_orphans lock.

@teigland
Copy link
Contributor

teigland commented Aug 8, 2019

This is a design problem in lvm locking which uses two "global" locks, uses them inconsistently, and in the wrong places. It is fixed in the lvm 2.03 versions by:
https://sourceware.org/git/?p=lvm2.git;a=commit;h=8c87dda195ffadcce1e428d3481e8d01080e2b22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants