jack_client_close is locking when the jack server is periodically restarts #395

Gravechapa · 2018-11-14T18:26:59Z

Hi, i'm working on an application which must automatically reconnect to the jack server. But i'm run into some trouble when started to test it. After several times of restarting the jack server, my application ended up stuck at jack_client_close.
Here's example to reproduce the bug:

#include <jack/jack.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>

jack_port_t *input_port;
jack_port_t *output_port;
jack_client_t *client;
pthread_spinlock_t spinlock;
int stop = 0;

int process (jack_nframes_t nframes, void *arg) {
    jack_default_audio_sample_t *in, *out;

    in = (jack_default_audio_sample_t *)jack_port_get_buffer (input_port, nframes);
    out = (jack_default_audio_sample_t *)jack_port_get_buffer (output_port, nframes);
    memcpy (out, in,
        sizeof (jack_default_audio_sample_t) * nframes);

    return 0;   
}

void jack_shutdown(void *arg) {
    pthread_spin_lock(&spinlock);
    stop = 1;
    pthread_spin_unlock(&spinlock);
}

void  silent_jack_error_callback(const char *msg) {}

int main() {
    pthread_spin_init(&spinlock, 0);
    jack_set_error_function(silent_jack_error_callback);
    const char *client_name = "simple";
    const char *server_name = NULL;
    jack_options_t options = JackNullOption;
    jack_status_t status;
    while(true){
    client = jack_client_open (client_name, JackNoStartServer, &status, server_name);
    if (client == NULL) {
		fprintf (stderr, "jack_client_open() failed, "
			 "status = 0x%2.0x\n", status);
		if (status & JackServerFailed) {
			fprintf (stderr, "Unable to connect to JACK server\n");
		}
		continue;
	}
    
    jack_on_shutdown(client, jack_shutdown, 0);
    jack_set_process_callback (client, process, 0);

    input_port = jack_port_register (client, "input",
                                    JACK_DEFAULT_AUDIO_TYPE,
                                    JackPortIsInput, 0);
    output_port = jack_port_register (client, "output",
                                    JACK_DEFAULT_AUDIO_TYPE,
                                    JackPortIsOutput, 0);
    
    if (jack_activate (client)) {
            fprintf (stderr, "cannot activate client");
            exit (1);
    }
    while (1) {
        pthread_spin_lock(&spinlock);
        if(stop) {
            pthread_spin_unlock(&spinlock);
            break;
        }
        pthread_spin_unlock(&spinlock);
    }
    fprintf (stderr, "stopping\n");
    jack_client_close(client);
    stop = 0;
    fprintf (stderr, "alive\n");
    }
    return 0;
}

#!/bin/bash 
while :
do 
jack_control start
sleep 1
jack_control stop
sleep 1
done

Here's stack traces of two non responding threads:

1 pthread_cond_wait * *GLIBC_2.3.2                                  0x7ffff73cdafc 
2 Jack::JackPosixProcessSync::Wait     JackPosixProcessSync.cpp 81  0x7ffff7b0350a 
3 Jack::JackMessageBuffer::Execute     JackMessageBuffer.cpp    104 0x7ffff7afc6a9 
4 Jack::JackPosixThread::ThreadHandler JackPosixThread.cpp      59  0x7ffff7b02812 
5 start_thread                                                      0x7ffff73c7a9d 
6 clone                                                             0x7ffff72f7b23

1 __lll_lock_wait                                   0x7ffff73d0edc 
2 pthread_mutex_lock                                0x7ffff73ca336 
3 Jack::JackPosixMutex::Lock JackPosixMutex.cpp 112 0x7ffff7b03ef2 
4 Jack::JackClient::Close    JackClient.cpp     118 0x7ffff7ae0096 
5 jack_client_close          JackLibAPI.cpp     219 0x7ffff7b0772f 
6 monitor                    jack_input.c       126 0x555555558ede 
7 start_thread                                      0x7ffff73c7a9d 
8 clone                                             0x7ffff72f7b23

Edit:
I tried to add this before jack_client_close:

 jack_deactivate(client);
 jack_port_unregister(client, input_port);
 jack_port_unregister(client, output_port);

Same result.

The text was updated successfully, but these errors were encountered:

7890 · 2019-01-11T20:25:50Z

Hi, I can't reproduce this. I use the test program above plus this non-DBus JACK start/stop script:

#!/bin/bash 
while :
do 
jackd -ddummy & pid=$!
sleep 1
kill -9 $pid
sleep 1
done

It seems to run "forever".

Gravechapa · 2019-01-12T12:43:18Z

I still can reproduce this bug.
If it help - I'm using arch linux with jack2-dbus 1.9.12-6
Maybe removing delays will increase chances of catching the bug:

#!/bin/bash 
STOP=0
trap STOP=1 SIGINT
while [ $STOP -eq 0 ]
do 
jack_control start
jack_control stop
done

Occasionally it requires quite some time before this bug is reproduced.

Gravechapa · 2019-01-12T14:48:13Z

Video demonstration of the bug

7890 · 2019-01-12T21:05:10Z

I can't test it with DBus ATM. Do you get the same result when removing every pthread related line? Eg. remove all spin locks. It seems to work in practically the same way. I let it run for a long time, without signs of a halt.

7890 · 2019-01-12T21:06:15Z

Also: do you get the same result when starting/stopping jackd directly? This could reduce the possible sources of the issue.

Gravechapa · 2019-01-13T13:33:59Z

Tested it without anything related to pthread - same result.
Tried direct use of the DBus functions:

while :
do 
qdbus org.jackaudio.service /org/jackaudio/Controller StartServer
sleep 1
qdbus org.jackaudio.service /org/jackaudio/Controller StopServer
sleep 1
done

The bug still appears.
But with non-DBus jack using your script I can't catch this bug.
It seems like only DBus version is affected.

7890 · 2019-01-13T16:26:57Z

Interesting find

fps · 2019-09-19T20:32:26Z

Here's a possibly related way to create the deadlock (only tested on dbus-enabled jackd):

while true; do parallel jack_wait ::: -w -w -w -w -w -w -w -w -w || break; done

This requires gnu parallel. Run this while the jack server is running. After a couple of seconds one of the jack_wait processes will hang with precisely the same backtraces (IIRC. Definitly the second one of the traces is precisely the same).

fps · 2019-09-19T20:35:19Z

Happens also in non-dbus-enabled jackd.

falkTX · 2019-09-19T20:41:51Z

can you tell me how you debug this? I am not used with the parallel tool

fps · 2020-07-06T17:08:22Z

Well, you can just install gnu parallel (it's packaged on every major distribution) and then just run the above command. On this raspberry pi 4 it just took about 15 seconds until one of them got stuck, and in turn parallel getting stuck, so no more "server is running" output.

Now just head over to another console and attach gdb to the only remaining jack_wait process.

fps · 2020-07-06T20:55:12Z

ok, here's a more detailed bt:

Attaching to process 17634
[New LWP 17637]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/akyy80zkwyiy0n51kc4vx0qpxma77701-glibc-2.30/lib/libthread_db.so.1".
0x0000007f9e91a6f0 in __lll_lock_wait () from /nix/store/akyy80zkwyiy0n51kc4vx0qpxma77701-glibc-2.30/lib/libpthread.so.0
(gdb) thread apply all bt

Thread 2 (Thread 0x7f9e48c1c0 (LWP 17637)):
#0  0x0000007f9e917014 in pthread_cond_wait@@GLIBC_2.17 () from /nix/store/akyy80zkwyiy0n51kc4vx0qpxma77701-glibc-2.30/lib/libpthread.so.0
#1  0x0000007f9e9ef0fc in Jack::JackPosixProcessSync::Wait (this=this@entry=0x1e288288) at ../posix/JackPosixProcessSync.cpp:81
#2  0x0000007f9e9e6be0 in Jack::JackMessageBuffer::Execute (this=0x1e280050) at ../common/JackMessageBuffer.cpp:104
#3  0x0000007f9e9ee5fc in Jack::JackPosixThread::ThreadHandler (arg=0x1e288268) at ../posix/JackPosixThread.cpp:63
#4  0x0000007f9e910834 in start_thread () from /nix/store/akyy80zkwyiy0n51kc4vx0qpxma77701-glibc-2.30/lib/libpthread.so.0
#5  0x0000007f9e795a3c in thread_start () from /nix/store/akyy80zkwyiy0n51kc4vx0qpxma77701-glibc-2.30/lib/libc.so.6

Thread 1 (Thread 0x7f9ea42010 (LWP 17634)):
#0  0x0000007f9e91a6f0 in __lll_lock_wait () from /nix/store/akyy80zkwyiy0n51kc4vx0qpxma77701-glibc-2.30/lib/libpthread.so.0
#1  0x0000007f9e912df0 in pthread_mutex_lock () from /nix/store/akyy80zkwyiy0n51kc4vx0qpxma77701-glibc-2.30/lib/libpthread.so.0
#2  0x0000007f9e9ef8ec in Jack::JackPosixMutex::Lock (this=) at ../posix/JackPosixMutex.cpp:112
#3  0x0000007f9e9d1240 in Jack::JackClient::Close (this=0x1e288490) at ../common/JackClient.cpp:118
#4  0x0000007f9e9f1600 in jack_client_close (ext_client=0x1e288490) at ../common/JackLibAPI.cpp:211
#5  0x0000000000400d90 in main (argc=, argv=) at ../example-clients/wait.c:114
(gdb)

fps · 2020-07-07T07:40:17Z

Observation: The deadlock happens more quickly when using a smaller period size: At 128/2 it happens in about 15 seconds. At 1024/2 never at all (in my experiments).It's easier to trigger on my raspberry pi 4 than on my i7 desktop PC.

fps · 2020-07-07T21:06:06Z

Observation: Above report is for version 1.9.14. Current git branch develop behaves differently: This is what happens when i run jack_wait -w often enough (see above gnu parallel usage):

[ogfx@ogfx-dev:~/nixpkgs]$ JACK_NO_AUDIO_RESERVATION=1 jackd -d alsa -d hw:iXR -p 128 -n 2
jackdmp 1.9.14
Copyright 2001-2005 Paul Davis and others.
Copyright 2004-2016 Grame.
Copyright 2016-2019 Filipe Coelho.
jackdmp comes with ABSOLUTELY NO WARRANTY
This is free software, and you are welcome to redistribute it
under certain conditions; see the file COPYING for details
no message buffer overruns
no message buffer overruns
no message buffer overruns
JACK server starting in realtime mode with priority 10
self-connect-mode is "Don't restrict self connect requests"
creating alsa driver ... hw:iXR|hw:iXR|128|2|48000|0|0|nomon|swmeter|-|32bit
configuring for 48000Hz, period = 128 frames (2.7 ms), buffer = 2 periods
ALSA: final selected sample format for capture: 32bit integer little-endian
ALSA: use 2 periods for capture
ALSA: final selected sample format for playback: 32bit integer little-endian
ALSA: use 2 periods for playback
ALSA: poll time out, polled for 3037 usecs, Retrying with a recovery, retry cnt = 1
JackPosixProcessSync::LockedTimedWait error usec = 10664 err = Connection timed out
JackEngine::ClientCloseAux wait error ref = 3
ALSA: poll time out, polled for 3053 usecs, Retrying with a recovery, retry cnt = 2
ALSA: poll time out, polled for 3047 usecs, Retrying with a recovery, retry cnt = 3
ALSA: poll time out, polled for 3052 usecs, Retrying with a recovery, retry cnt = 4
ALSA: poll time out, polled for 3039 usecs, Retrying with a recovery, retry cnt = 5
ALSA: poll time out, polled for 3112 usecs, Reached max retry cnt = 5, Exiting
JackAudioDriver::ProcessAsync: read error, stopping...
JackPosixProcessSync::LockedTimedWait error usec = 5000000 err = Connection timed out
Driver is not running
Cannot create new client
^CJack main caught signal 2
JackPosixProcessSync::LockedTimedWait error usec = 5000000 err = Connection timed out
Driver is not running
Cannot create new client

[ogfx@ogfx-dev:~/nixpkgs]$ ^C

The server starts up fine and is silent after

ALSA: use 2 periods for playback

Once jack_wait hangs the error messages start coming in.. The console with jack_wait looks like this:

server is available                                                                        
server is available                          
server is available
server is available                          
Cannot open wait client                                                                    
JackShmReadWritePtr1::~JackShmReadWritePtr1 - Init not done for -1, skipping unlock        
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock          
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock                                                                                                     
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock                                                                                             [0/1801]
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock          
jack_client_open() failed, status = 0x21                                                   
Cannot read socket fd = 3 err = Connection reset by peer                                   
CheckRes error                               
Could not read result type = 22                                                            
Client name = wait conflits with another running client                                    
Cannot connect to the server

fps · 2020-07-07T22:18:21Z

Hmm, if I increase the poll_timeout_ms in linux/alsa/alsa_driver.c to 2.5f * (driver->period_usecs / 1000.0f) then the deadlocking behaviour from 1.9.14 reappears.

fps · 2020-07-07T22:26:32Z

How could a client closing ever cause a poll on an alsa pcm filedescriptor to timeout. I also once tried increasing MAX_RETRY_COUNT in alsa_driver.c to 500. It just took longer for the driver to fail... Very weird.

fps · 2020-07-09T04:49:01Z

This PR #611 and using this shell script:

while true; do for n in {1..20}; do echo "-n $(uuidgen) "; done | parallel jack_wait -w; done

fixes the server fail in the alsa driver, but the deadlock on jack_client_close() still appears after a while..

jeberger · 2020-09-17T07:35:47Z

I was able to reproduce the issue with parallel jack_wait on an non-DBUS jackd. Moreover, it appears that the mutex involved is being held by a dead thread:

(gdb) bt
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1  0xf767215c in __lll_lock_wait (futex=0x26f34, private=0) at lowlevellock.c:43
#2  0xf766d522 in __GI___pthread_mutex_lock (mutex=0x26f34) at pthread_mutex_lock.c:115
#3  0xf76a8aa8 in Jack::JackPosixMutex::Lock() () from ./staging/usr/lib/libjack.so.0
#4  0xf7695414 in Jack::JackClient::Close() () from ./staging/usr/lib/libjack.so.0
#5  0xf76aaa54 in jack_client_close () from ./staging/usr/lib/libjack.so.0
#6  0x000109a4 in main ()
(gdb) frame 2
#2  0xf766d522 in __GI___pthread_mutex_lock (mutex=0x26f34) at pthread_mutex_lock.c:115
115     pthread_mutex_lock.c: No such file or directory.
(gdb) p *mutex
$1 = {__data = {__lock = 2, __count = 1, __owner = 15877, __kind = 1, __nusers = 1, {__spins = 0, __list = {__next = 0x0}}}, __size = "\002\
000\000\000\001\000\000\000\005>\000\000\001\000\000\000\001\000\000\000\000\000\000", __align = 2}

Note that the mutex owner is thread 15877, but:

(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 15872.15872 "jack_wait" 0xf766d522 in __GI___pthread_mutex_lock (mutex=0x26f34) at pthread_mutex_lock.c:115
  2    Thread 15872.15875 "jack_wait" __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46

jeberger · 2020-09-17T11:58:42Z

The issue disappears if you move the JackGlobals::fSynchroMutex->Lock(); line before fChannel->Stop() in JackClient::Close.

The reason being, fChannel->Stop() kills the notifications thread, which may leave the mutex dangling if it was locked at that time.

7890 added the dbus DBus related label Jan 15, 2019

fps mentioned this issue Jul 9, 2020

[RFC][WIP] posix: Set mutex type to PTHREAD_MUTEX_ERRORCHECK #613

Open

DeviLaxmii mentioned this issue Aug 15, 2022

JackClient: Lock before killing client notification thread #889

Open

riban-bw mentioned this issue Oct 28, 2022

Audio Player crashes on exit zynthian/zynthian-issue-tracking#782

Closed

theGreatWhiteShark mentioned this issue Jan 4, 2024

Fix JACK support hydrogen-music/hydrogen#1926

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jack_client_close is locking when the jack server is periodically restarts #395

jack_client_close is locking when the jack server is periodically restarts #395

Gravechapa commented Nov 14, 2018 •

edited

7890 commented Jan 11, 2019

Gravechapa commented Jan 12, 2019

Gravechapa commented Jan 12, 2019

7890 commented Jan 12, 2019

7890 commented Jan 12, 2019

Gravechapa commented Jan 13, 2019

7890 commented Jan 13, 2019

fps commented Sep 19, 2019

fps commented Sep 19, 2019

falkTX commented Sep 19, 2019

fps commented Jul 6, 2020

fps commented Jul 6, 2020

fps commented Jul 7, 2020

fps commented Jul 7, 2020 •

edited

fps commented Jul 7, 2020

fps commented Jul 7, 2020

fps commented Jul 9, 2020 •

edited

jeberger commented Sep 17, 2020

jeberger commented Sep 17, 2020 •

edited

jack_client_close is locking when the jack server is periodically restarts #395

jack_client_close is locking when the jack server is periodically restarts #395

Comments

Gravechapa commented Nov 14, 2018 • edited

7890 commented Jan 11, 2019

Gravechapa commented Jan 12, 2019

Gravechapa commented Jan 12, 2019

7890 commented Jan 12, 2019

7890 commented Jan 12, 2019

Gravechapa commented Jan 13, 2019

7890 commented Jan 13, 2019

fps commented Sep 19, 2019

fps commented Sep 19, 2019

falkTX commented Sep 19, 2019

fps commented Jul 6, 2020

fps commented Jul 6, 2020

fps commented Jul 7, 2020

fps commented Jul 7, 2020 • edited

fps commented Jul 7, 2020

fps commented Jul 7, 2020

fps commented Jul 9, 2020 • edited

jeberger commented Sep 17, 2020

jeberger commented Sep 17, 2020 • edited

Gravechapa commented Nov 14, 2018 •

edited

fps commented Jul 7, 2020 •

edited

fps commented Jul 9, 2020 •

edited

jeberger commented Sep 17, 2020 •

edited