Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to stop daemon-server #16

Closed
wangwang3210 opened this issue Apr 8, 2019 · 2 comments
Closed

fail to stop daemon-server #16

wangwang3210 opened this issue Apr 8, 2019 · 2 comments

Comments

@wangwang3210
Copy link

Found a bug in function daemon_start from libdaemon/server/daemon-server.c. It may occur in daemons like lvmetad, lvmlockd, lvmpolld.
In stable-2.02 branch, if we use the daemon lvmetad without -t option. when lvmetad is running a sub process, if the daemon recieve a SIGTERM from systemd. _shutdown_requested will turn to 1, s.threads->next is not NULL, then the while loop could not break, in the next loop, select will be blocked. Sometimes, it will lead to timeout of shutdown machine.
It could be reproduced following the steps:
1.run process 1 like,

while :
do
    lvs
done

2.run process 2 like,

while : 
do 
    service lvm2-lvmetad stop
    if [ $? -ne 0 ];then 
        echo "fail to stop"
        break
    fi
    sleep 2
done

3.soon, process 2 will be blocked, then press "CTRL+C" to stop process 1, we will see process still blocke there. Using gdb attach to lvm2-lvmetad daemon, we will see, lvm2-lvmetad blocket in select function of daemon_start. Although, at this time, _shutdown_requested is 1, s.threads->next is NULL(the _client_thread thread has already quit).

Suggested patch:

diff --git a/libdaemon/server/daemon-server.c b/libdaemon/server/daemon-server.c
index a2216ac..c753473 100644
--- a/libdaemon/server/daemon-server.c
+++ b/libdaemon/server/daemon-server.c
@@ -559,6 +559,8 @@ void daemon_start(daemon_state s)
        thread_state _threads = { .next = NULL };
        unsigned timeout_count = 0;
        fd_set in;
+ struct timeval slect_timeout = { .tv_sec = 1, .tv_usec = 0 };
+        struct timeval *ptimeout;
 
        /*
         * Switch to C locale to avoid reading large locale-archive file used by
@@ -643,9 +645,15 @@ void daemon_start(daemon_state s)
 
        while (!failed) {
                _reset_timeout(s);
+                slect_timeout.tv_sec = 1;
+                slect_timeout.tv_usec = 0;
                FD_ZERO(&in);
                FD_SET(s.socket_fd, &in);
-           if (select(FD_SETSIZE, &in, NULL, NULL, _get_timeout(s)) < 0 && errno != EINTR)
+                ptimeout = _get_timeout(s);
+         if (_shutdown_requested && !ptimeout)
+                 ptimeout = &slect_timeout;
+
+         if (select(FD_SETSIZE, &in, NULL, NULL, ptimeout) < 0 && errno != EINTR)
                        perror("select error");
                if (FD_ISSET(s.socket_fd, &in)) {
                        timeout_count = 0;
@zkabelac
Copy link
Contributor

zkabelac commented Apr 9, 2019

This does not look as a correct solution - but rather a race 'shift' to lower the chances to hit it - but IMHO it should be handled properly with pselect(). - I'll take a look if there is an easy way to provide correct patch.

@zkabelac
Copy link
Contributor

Already fixed upstream with 44cfa55 and few follow up patches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants