Concurrent restarts cause lxd to freeze #2189

Closed
candlerb opened this Issue Jul 7, 2016 · 17 comments

Comments

Projects
None yet
3 participants
@candlerb

candlerb commented Jul 7, 2016

Required information

  • Distribution: ubuntu
  • Distribution version: 16.04
  • The output of "lxc info"
apicompat: 0
auth: trusted
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: <<snip>>
  driver: lxc
  driverversion: 2.0.1
  kernel: Linux
  kernelarchitecture: x86_64
  kernelversion: 4.4.0-28-generic
  server: lxd
  serverpid: 2518
  serverversion: 2.0.2
  storage: zfs
  storageversion: "5"
config:
  images.auto_update_cached: "false"
  images.auto_update_interval: "0"
  storage.zfs_pool_name: lxd
public: false

Issue description

Trying to restart containers concurrently causes them to get wedged in stopped-but-not-startable state.

Steps to reproduce

Create a collection of say 10 containers (here I'm doing this in parallel and it works fine)

for c in foo{1..10}; do lxc launch images:ubuntu/xenial/amd64 $c & done
wait

Now try to restart them, in parallel

for c in foo{1..10}; do lxc restart $c & done

What I find is that typically one of them won't restart: one of the lxc restart commands freezes.

... seeing most of them terminate
... but even after a while:
root@vtp:~# jobs
[4]+  Running                 lxc restart $c &

lxc list shows that container in stopped state. But if I try to start it, it hangs:

root@vtp:~# kill %4
root@vtp:~# lxc info foo4
Name: foo4
Architecture: x86_64
Created: 2016/07/07 19:50 UTC
Status: Stopped
Type: persistent
Profiles: default
[4]+  Terminated              lxc restart $c
root@vtp:~# lxc start foo4
<< hangs here >>

If I then try the concurrent restarting again, further lxc commands get wedged:

root@vtp:~# for c in foo{1..10}; do echo $c; lxc restart $c & done
foo1
[1] 15065
foo2
[2] 15066
foo3
[3] 15067
foo4
[4] 15068
foo5
[5] 15069
foo6
[6] 15070
foo7
[7] 15073
foo8
[8] 15075
foo9
[9] 15077
foo10
[10] 15080
root@vtp:~# error: container is not running
Try `lxc info --show-log foo4` for more info

[2]   Done                    lxc restart $c
[3]   Done                    lxc restart $c
[4]   Exit 1                  lxc restart $c
[5]   Done                    lxc restart $c
[6]   Done                    lxc restart $c
[7]   Done                    lxc restart $c
[8]   Done                    lxc restart $c
[9]-  Done                    lxc restart $c
root@vtp:~#
root@vtp:~# jobs
[1]-  Running                 lxc restart $c &
[10]+  Running                 lxc restart $c &
root@vtp:~# lxc list | grep STOPPED
| foo1          | STOPPED |      |      | PERSISTENT | 0         |
| foo10         | STOPPED |      |      | PERSISTENT | 0         |
| foo4          | STOPPED |      |      | PERSISTENT | 0         |

I can do lxc delete on the wedged containers, but even after recreating them there seem to be problems with the lxd daemon:

root@vtp:~# for c in foo{1..10}; do lxc launch images:ubuntu/xenial/amd64 $c & done
... this is OK

... now I try to restart them *sequentially*
root@vtp:~# for c in foo{1..10}; do echo $c; lxc restart $c; done
foo1
foo2
foo3
foo4
<< hangs here >>

The only way I know to fix it is to reboot the host.

Information to attach

  • any relevant kernel output (dmesg)

I don't see anything obvious in here.

[11644.080361] lxdbr0: port 9(vethASWV1M) entered disabled state
[11644.080966] device vethASWV1M left promiscuous mode
[11644.080970] lxdbr0: port 9(vethASWV1M) entered disabled state
[11644.324322] lxdbr0: port 8(vethXCS51T) entered disabled state
[11644.324484] lxdbr0: port 2(veth1FA7N9) entered disabled state
[11644.324601] lxdbr0: port 10(vethCS271E) entered disabled state
[11644.324715] lxdbr0: port 4(veth2O02P1) entered disabled state
[11644.324825] lxdbr0: port 5(vethULAMI9) entered disabled state
[11644.324972] lxdbr0: port 1(veth185S1W) entered disabled state
[11644.325082] lxdbr0: port 7(vethC4U3IT) entered disabled state
[11644.325202] lxdbr0: port 6(veth5ARMOG) entered disabled state
[11644.325307] lxdbr0: port 3(veth6OF281) entered disabled state
[11644.329710] device vethXCS51T left promiscuous mode
[11644.329714] lxdbr0: port 8(vethXCS51T) entered disabled state
[11644.348973] device veth1FA7N9 left promiscuous mode
[11644.348977] lxdbr0: port 2(veth1FA7N9) entered disabled state
[11644.373028] device vethCS271E left promiscuous mode
[11644.373033] lxdbr0: port 10(vethCS271E) entered disabled state
[11644.392950] device veth2O02P1 left promiscuous mode
[11644.392962] lxdbr0: port 4(veth2O02P1) entered disabled state
[11644.413770] device vethULAMI9 left promiscuous mode
[11644.413774] lxdbr0: port 5(vethULAMI9) entered disabled state
[11644.438779] device veth185S1W left promiscuous mode
[11644.438788] lxdbr0: port 1(veth185S1W) entered disabled state
[11644.466886] device vethC4U3IT left promiscuous mode
[11644.466895] lxdbr0: port 7(vethC4U3IT) entered disabled state
[11644.482187] device veth5ARMOG left promiscuous mode
[11644.482196] lxdbr0: port 6(veth5ARMOG) entered disabled state
[11644.508727] device veth6OF281 left promiscuous mode
[11644.508730] lxdbr0: port 3(veth6OF281) entered disabled state
[11644.670517] audit: type=1400 audit(1467920257.257:132): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo6_</var/lib/lxd>" pid=11865 comm="apparmor_parser"
[11644.856593] audit: type=1400 audit(1467920257.445:133): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo2_</var/lib/lxd>" pid=11955 comm="apparmor_parser"
[11644.861701] audit: type=1400 audit(1467920257.449:134): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo7_</var/lib/lxd>" pid=11958 comm="apparmor_parser"
[11644.870041] audit: type=1400 audit(1467920257.457:135): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo10_</var/lib/lxd>" pid=11959 comm="apparmor_parser"
[11644.885573] audit: type=1400 audit(1467920257.473:136): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo3_</var/lib/lxd>" pid=11972 comm="apparmor_parser"
[11644.889624] audit: type=1400 audit(1467920257.477:137): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo4_</var/lib/lxd>" pid=11968 comm="apparmor_parser"
[11644.910261] audit: type=1400 audit(1467920257.497:138): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo5_</var/lib/lxd>" pid=11977 comm="apparmor_parser"
[11644.934184] audit: type=1400 audit(1467920257.521:139): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo8_</var/lib/lxd>" pid=11996 comm="apparmor_parser"
[11644.940937] audit: type=1400 audit(1467920257.529:140): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo1_</var/lib/lxd>" pid=12001 comm="apparmor_parser"
[11644.966730] audit: type=1400 audit(1467920257.553:141): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-foo9_</var/lib/lxd>" pid=12005 comm="apparmor_parser"
  • container log (lxc info NAME --show-log)

foo1.log.txt

  • main daemon log (/var/log/lxd.log)

lxd.log.txt

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Jul 8, 2016

Member

Could you try the same with "lxc restart --force" instead, to rule out problems with the container's init system?

Member

stgraber commented Jul 8, 2016

Could you try the same with "lxc restart --force" instead, to rule out problems with the container's init system?

@candlerb

This comment has been minimized.

Show comment
Hide comment
@candlerb

candlerb Jul 8, 2016

With lxc restart --force it seems happy enough (tried 5 times).

Without --force one of the ten containers hung on the first attempt (foo5 - showed in STOPPED state)

candlerb commented Jul 8, 2016

With lxc restart --force it seems happy enough (tried 5 times).

Without --force one of the ten containers hung on the first attempt (foo5 - showed in STOPPED state)

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Jul 8, 2016

Member

Ok, for good measure, could you try with Ubuntu 14.04 (trusty) containers?

I want to check whether this is likely to be a systemd vs upstart thing.

Member

stgraber commented Jul 8, 2016

Ok, for good measure, could you try with Ubuntu 14.04 (trusty) containers?

I want to check whether this is likely to be a systemd vs upstart thing.

@candlerb

This comment has been minimized.

Show comment
Hide comment
@candlerb

candlerb Jul 8, 2016

Still using ubuntu 16.04 as the base system

for c in trusty{1..10}; do lxc launch images:ubuntu/trusty/amd64 $c & done; wait

completes successfully

for c in trusty{1..10}; do lxc restart $c & done; wait

No, this still hangs (in my test the third container hung)

candlerb commented Jul 8, 2016

Still using ubuntu 16.04 as the base system

for c in trusty{1..10}; do lxc launch images:ubuntu/trusty/amd64 $c & done; wait

completes successfully

for c in trusty{1..10}; do lxc restart $c & done; wait

No, this still hangs (in my test the third container hung)

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Jul 8, 2016

Member

ok. It could be that you're somehow racing the SIGINT handler being setup by the container though, depending on how long it takes for the container to be functional and how long you were waiting between the launch loop and the restart loop.

To clarify, "lxc restart" simply sends SIGINT to the container's PID1, it's then entirely up to the container to perform the reboot. There is no timeout and no further LXD involvement after the signal was sent.

In contrast, --force has LXD forcefully kill the container, without any kind of communication with PID1 inside the container.

Member

stgraber commented Jul 8, 2016

ok. It could be that you're somehow racing the SIGINT handler being setup by the container though, depending on how long it takes for the container to be functional and how long you were waiting between the launch loop and the restart loop.

To clarify, "lxc restart" simply sends SIGINT to the container's PID1, it's then entirely up to the container to perform the reboot. There is no timeout and no further LXD involvement after the signal was sent.

In contrast, --force has LXD forcefully kill the container, without any kind of communication with PID1 inside the container.

@stgraber stgraber added the Incomplete label Jul 8, 2016

@candlerb

This comment has been minimized.

Show comment
Hide comment
@candlerb

candlerb Jul 8, 2016

How long is long enough - 20 seconds?

The following script seems to reproduce it easily.

#!/bin/bash
for c in trusty{1..10}; do lxc launch images:ubuntu/trusty/amd64 $c & done
date
echo "Waiting for launch complete..."
wait
echo "OK"

while true; do
  echo "Sleeping"
  sleep 20
  echo "Restarting"
  for c in trusty{1..10}; do lxc restart $c & done
  date
  echo "Waiting for restart complete..."
  wait
  echo "OK"
done

But if lxc is just sending a signal, presumably this means that somehow the containers interact with each other during a restart.

Is systemd responsible for managing the containers or their restarts?

candlerb commented Jul 8, 2016

How long is long enough - 20 seconds?

The following script seems to reproduce it easily.

#!/bin/bash
for c in trusty{1..10}; do lxc launch images:ubuntu/trusty/amd64 $c & done
date
echo "Waiting for launch complete..."
wait
echo "OK"

while true; do
  echo "Sleeping"
  sleep 20
  echo "Restarting"
  for c in trusty{1..10}; do lxc restart $c & done
  date
  echo "Waiting for restart complete..."
  wait
  echo "OK"
done

But if lxc is just sending a signal, presumably this means that somehow the containers interact with each other during a restart.

Is systemd responsible for managing the containers or their restarts?

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Jul 8, 2016

Member

LXD sends SIGINT to each of the container you're restarting. It's then entirely up to PID1 in the container to do its thing.

Typically when things don't reboot it's because the signal handler hasn't been setup yet by the init system in the container or because the init system isn't accepting the reboot request yet (because it thinks it's still booting).

There isn't anything LXD can do about this since there isn't even a way for us to know if something reacted to the signal, the kernel just tells us that it has queued the signal, that's it.

Member

stgraber commented Jul 8, 2016

LXD sends SIGINT to each of the container you're restarting. It's then entirely up to PID1 in the container to do its thing.

Typically when things don't reboot it's because the signal handler hasn't been setup yet by the init system in the container or because the init system isn't accepting the reboot request yet (because it thinks it's still booting).

There isn't anything LXD can do about this since there isn't even a way for us to know if something reacted to the signal, the kernel just tells us that it has queued the signal, that's it.

@candlerb

This comment has been minimized.

Show comment
Hide comment
@candlerb

candlerb Jul 8, 2016

There isn't anything LXD can do about this since there isn't even a way for us to know if something reacted to the signal, the kernel just tells us that it has queued the signal, that's it.

OK, but:

  1. From the users' point of view, lxc restart just hangs. It must be waiting for something. The container status shown by "lxc list" is that the container is STOPPED. Maybe it is waiting for the container to go back into RUNNING ? Does it wait forever?

  2. I don't know what PID 1 does in response to the signal; is it supposed to terminate and something else restart it? Then maybe the problem is with that "something else".

  3. I'm quite happy to accept that the problem is not with lxd per se, but the underlying container mechanism that it uses. However I don't know how to analyse or where to report the underlying issue. The net result for me is that lxd is not reliable, in a fairly reproducible way.

    I can try to minimise the problem by not intentionally restarting two containers at the same time; but if there is a buried race condition I'm sure it will bite again sooner or later.

candlerb commented Jul 8, 2016

There isn't anything LXD can do about this since there isn't even a way for us to know if something reacted to the signal, the kernel just tells us that it has queued the signal, that's it.

OK, but:

  1. From the users' point of view, lxc restart just hangs. It must be waiting for something. The container status shown by "lxc list" is that the container is STOPPED. Maybe it is waiting for the container to go back into RUNNING ? Does it wait forever?

  2. I don't know what PID 1 does in response to the signal; is it supposed to terminate and something else restart it? Then maybe the problem is with that "something else".

  3. I'm quite happy to accept that the problem is not with lxd per se, but the underlying container mechanism that it uses. However I don't know how to analyse or where to report the underlying issue. The net result for me is that lxd is not reliable, in a fairly reproducible way.

    I can try to minimise the problem by not intentionally restarting two containers at the same time; but if there is a buried race condition I'm sure it will bite again sooner or later.

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Aug 26, 2016

Member

One thing you may be running into is the inotify handle limit that the kernel has, there are some pointers in doc/production-setup.md on what sysctls should be bumped to avoid that.

You are right that an error should be printed when a container that's asked to restart gets stuck in stopped state. LXD is waiting for it to go back to running state, but I'm kinda surprised that we can hit a code path where this happens without also sending an error to the client.

Member

stgraber commented Aug 26, 2016

One thing you may be running into is the inotify handle limit that the kernel has, there are some pointers in doc/production-setup.md on what sysctls should be bumped to avoid that.

You are right that an error should be printed when a container that's asked to restart gets stuck in stopped state. LXD is waiting for it to go back to running state, but I'm kinda surprised that we can hit a code path where this happens without also sending an error to the client.

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Sep 23, 2016

Member

I fixed a number of races in container stop/restart recently which combined with what I posted before could explain what you saw. Tentatively closing this.

Member

stgraber commented Sep 23, 2016

I fixed a number of races in container stop/restart recently which combined with what I posted before could explain what you saw. Tentatively closing this.

@stgraber stgraber closed this Sep 23, 2016

@candlerb

This comment has been minimized.

Show comment
Hide comment
@candlerb

candlerb Oct 22, 2016

lxd 2.0.5 has now made its way into ubuntu release.

Unfortunately the problem is still there, in fact it's worse. Using the reproducing test case from the start of this ticket:

for c in foo{1..10}; do lxc launch images:ubuntu/xenial/amd64 $c & done
wait

for c in foo{1..10}; do lxc restart $c & done

On the first attempt, four of those containers hung in lxc restart. After several minutes:

$ jobs
[2]   Running                 lxc restart $c &
[3]   Running                 lxc restart $c &
[5]-  Running                 lxc restart $c &
[7]+  Running                 lxc restart $c &
$ lxc list
+-------+---------+---------------------+------+------------+-----------+
| NAME  |  STATE  |        IPV4         | IPV6 |    TYPE    | SNAPSHOTS |
+-------+---------+---------------------+------+------------+-----------+
| foo1  | STOPPED |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo10 | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo2  | STOPPED |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo3  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo4  | STOPPED |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo5  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo6  | STOPPED |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo7  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo8  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo9  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+

lxd 2.0.5 has now made its way into ubuntu release.

Unfortunately the problem is still there, in fact it's worse. Using the reproducing test case from the start of this ticket:

for c in foo{1..10}; do lxc launch images:ubuntu/xenial/amd64 $c & done
wait

for c in foo{1..10}; do lxc restart $c & done

On the first attempt, four of those containers hung in lxc restart. After several minutes:

$ jobs
[2]   Running                 lxc restart $c &
[3]   Running                 lxc restart $c &
[5]-  Running                 lxc restart $c &
[7]+  Running                 lxc restart $c &
$ lxc list
+-------+---------+---------------------+------+------------+-----------+
| NAME  |  STATE  |        IPV4         | IPV6 |    TYPE    | SNAPSHOTS |
+-------+---------+---------------------+------+------------+-----------+
| foo1  | STOPPED |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo10 | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo2  | STOPPED |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo3  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo4  | STOPPED |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo5  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo6  | STOPPED |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo7  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo8  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+
| foo9  | RUNNING |                     |      | PERSISTENT | 0         |
+-------+---------+---------------------+------+------------+-----------+

@stgraber stgraber reopened this Oct 23, 2016

@brauner

This comment has been minimized.

Show comment
Hide comment
@brauner

brauner Nov 23, 2016

Member

I have a theory about what might be going on: Our low-level LXC logging is probably not thread-safe, so when we set lxc.loglevel and lxc.logfile for each container and then do lxc restart we somehow seem to trigger deadlocks. If I remove the automatic log settings from the LXD codebase I cannot reproduce the issue with it, I can. This also seems to be confirmed by low-level LXC by running /lxc-test-concurrent -j 20 -m create,start,stop,destroy without logging and with logging /lxc-test-concurrent -j 20 -m create,start,stop,destroy -D. With logging enabled, it deadlocks every so often.

Member

brauner commented Nov 23, 2016

I have a theory about what might be going on: Our low-level LXC logging is probably not thread-safe, so when we set lxc.loglevel and lxc.logfile for each container and then do lxc restart we somehow seem to trigger deadlocks. If I remove the automatic log settings from the LXD codebase I cannot reproduce the issue with it, I can. This also seems to be confirmed by low-level LXC by running /lxc-test-concurrent -j 20 -m create,start,stop,destroy without logging and with logging /lxc-test-concurrent -j 20 -m create,start,stop,destroy -D. With logging enabled, it deadlocks every so often.

@brauner

This comment has been minimized.

Show comment
Hide comment
@brauner

brauner Nov 23, 2016

Member

So,

e.g.

 for c in foo{1..20}; do lxc launch images:ubuntu/yakkety $c; done
 for c in foo{1..20}; do lxc restart $c & done

with logging for each container leaves some containers in the state described here while the same without default logging in the codebase does not seem to reproduce this.

Member

brauner commented Nov 23, 2016

So,

e.g.

 for c in foo{1..20}; do lxc launch images:ubuntu/yakkety $c; done
 for c in foo{1..20}; do lxc restart $c & done

with logging for each container leaves some containers in the state described here while the same without default logging in the codebase does not seem to reproduce this.

@brauner

This comment has been minimized.

Show comment
Hide comment
@brauner

brauner Nov 25, 2016

Member

This should be fixed by: lxc/lxc#1319.

Member

brauner commented Nov 25, 2016

This should be fixed by: lxc/lxc#1319.

@brauner brauner referenced this issue in lxc/lxc Nov 25, 2016

Merged

log: fix race #1319

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Nov 25, 2016

Member

Merged the LXC fix, so closing this LXD issue. The fix will be in LXC 2.0.7 or you can grab a daily build from one of the LXC PPAs (tomorrow's will have the change).

Member

stgraber commented Nov 25, 2016

Merged the LXC fix, so closing this LXD issue. The fix will be in LXC 2.0.7 or you can grab a daily build from one of the LXC PPAs (tomorrow's will have the change).

@stgraber stgraber closed this Nov 25, 2016

@brauner

This comment has been minimized.

Show comment
Hide comment
@brauner

brauner Nov 26, 2016

Member

I've been hammering on parallel restarts for a couple of hours now using a liblxc that includes the fixes I pushed today. The only problems with hanging threads I encountered are explained by init inside the container not responding to the shutdown signal causing lxc restart to hang.

Member

brauner commented Nov 26, 2016

I've been hammering on parallel restarts for a couple of hours now using a liblxc that includes the fixes I pushed today. The only problems with hanging threads I encountered are explained by init inside the container not responding to the shutdown signal causing lxc restart to hang.

@candlerb

This comment has been minimized.

Show comment
Hide comment
@candlerb

candlerb Dec 2, 2016

Thank you all. With lxd 2.0.8, the restart problem is gone.

candlerb commented Dec 2, 2016

Thank you all. With lxd 2.0.8, the restart problem is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment