Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX(client): Prevent endless loop when disconnecting PipeWire stream #5648

Merged
merged 1 commit into from May 6, 2022

Conversation

vimpostor
Copy link
Contributor

@vimpostor vimpostor commented May 5, 2022

Since version 0.3.51, PipeWire has support for buffers with a size of 0 being passed in the process callback. More specifically, this feature was added in commit 96286fb8b11658a0fdaa61194504a3f9541b25e6 ("resample: use a -1 buffer size to drain") in PipeWire.

Unfortunately, this causes an endless loop, because we try to destroy the PipeWire stream, but it keeps waiting for the node to be completely drained, which now never happens because it keeps calling processCallback() where we feed buffers with a size of 0 to it.

To fix this problem, we do not longer pass 0-size buffers to PipeWire.
Instead we now push silence, if we don't have any data available. This is what PipeWire prefers anyway, the process callback wants a buffer to be filled. If we don't do that, the node will stay in SPA_STATUS_NEED_DATA state and even if we ignore the obvious endless loop error, this can cause other problems.
For example above mentioned commit was fixed in PipeWire with 32e957345d63a6d6e3d223057d764ff661f5d101 ("audioadapter: don't loop forever"), which is better than the endless loop but still preventable.
So in order to avoid unnecessary xruns, we push silence instead of empty buffers.

Fixes #5647.

@davidebeatrici
Copy link
Member

Probably a race condition. The proper fix would be to lock the loop every time we perform an action on it, see https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2126.

From https://docs.pipewire.org/page_thread_loop.html#sec_thread_loop_locking:

Since the PipeWire API doesn't allow concurrent accesses to objects, a locking scheme must be used to guarantee safe usage. The threaded loop API provides such a scheme through the functions pw_thread_loop_lock() and pw_thread_loop_unlock().

The lock is recursive, so it's safe to use it multiple times from the same thread. Just make sure you call pw_thread_loop_unlock() the same number of times you called pw_thread_loop_lock().

The lock needs to be held whenever you call any PipeWire function that uses an object associated with this loop. Make sure you do not hold on to the lock more than necessary though, as the threaded loop stops while the lock is held.

@davidebeatrici davidebeatrici added client audio bug A bug (error) in the software labels May 5, 2022
@vimpostor
Copy link
Contributor Author

Probably a race condition. The proper fix would be to lock the loop every time we perform an action on it

Nope, it's not a race condition. All callbacks that come from Pipewire do not need to be guarded. We only need to guard calls that we make ourself, and only if they are in the lifetime of the threaded loop. Since there are currently no such calls, we do not need to guard them.

  • We create the stream before we create the threaded loop
  • On destruction we stop the threaded loop before we do anything

I should have probably mentioned it in my PR description, but the commit message contains detailed information on the source of the bug.
The tl;dr is that the source of the problem is not a race condition, but a behavioural change in Pipewire that I bisected.

@davidebeatrici
Copy link
Member

Could you point me to the exact commit in PipeWire's repository, please?

@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 6, 2022

@davidebeatrici see the commit message. It mentions the commit hash.

@davidebeatrici
Copy link
Member

davidebeatrici commented May 6, 2022

Oh, sorry. I assumed the commit's message to be the same as the PR's and missed it.

@vimpostor What about deactivating the stream with pw_stream_set_active()? The callback should not be called anymore after that.

@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 6, 2022

And just one small detail in terms of nomenclature: To my understanding what we are seeing here is not actually a deadlock, but an endless loop, right? 'cause from the description of the bug and its solution it does not appear to be related to locks at all but rather at one process waiting for the queue to be drained while the other process keeps pushing new content (empty in our case) into it. Thus, the first process will essentially wait indefinitely.

If my understanding of things is correct, then it might make it a bit more clear if we used "endless loop" rather than "deadlock" in the commit title. Otherwise someone in the future might stumble upon this commit and starts wondering how a deadlock can be fixed without actually doing anything lock-related :)

@davidebeatrici
Copy link
Member

I agree.

@vimpostor vimpostor changed the title FIX(client): Prevent deadlock when disconnecting PipeWire stream FIX(client): Prevent endless loop when disconnecting PipeWire stream May 6, 2022
@vimpostor
Copy link
Contributor Author

And just one small detail in terms of nomenclature: To my understanding what we are seeing here is not actually a deadlock, but an endless loop, right? 'cause from the description of the bug and its solution it does not appear to be related to locks at all but rather at one process waiting for the queue to be drained while the other process keeps pushing new content (empty in our case) into it. Thus, the first process will essentially wait indefinitely.

Everything you said is correct. And I agree, I misused the word "deadlock" here.

What about deactivating the stream with pw_stream_set_active()?

I can try that if it fixes the problem too, but I fear that it might have the same problem. When you go to the issue, you can see in my backtrace that pw_impl_node_set_active() is called automatically with active == false when destroying the stream.
In any case, the added check in this PR might make sense anyway (not passing a buffer when the Pipewire engine is not isOk()).

@vimpostor

This comment was marked as outdated.

@vimpostor
Copy link
Contributor Author

vimpostor commented May 6, 2022

I think it is safe to say, that solving this by checking for isOk() is the best way right now.
But I think I might also open an issue at Pipewire, asking if it is really intended that pw_stream_destroy() hangs forever, when the node is not drained. Of course it is also a bit weird behaviour from our side, that we keep passing buffers of size 0 forever (and in my opinion that should not happen in any case, hence I think this check would be useful even before Pipewire 0.3.51).

I will open an issue at Pipewire now, but I guess, Wim Taymans would probably respond that we should not pass buffers of size 0 forever.

Edit: Issue reported at https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2359

Copy link
Member

@Krzmbrzl Krzmbrzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'll leave it to Davide to merge as he is the one who is more familiar with our PipeWire implementation.

@vimpostor
Copy link
Contributor Author

vimpostor commented May 6, 2022

Another way to fix this would be a simple:

diff --git a/src/mumble/PipeWire.cpp b/src/mumble/PipeWire.cpp
index 6540f42ca..078301403 100644
--- a/src/mumble/PipeWire.cpp
+++ b/src/mumble/PipeWire.cpp
@@ -418,7 +418,7 @@ void PipeWireOutput::processCallback(void *param) {
 	if (pwo->mix(data.data, frames)) {
 		chunk->size = frames * chunk->stride;
 	} else {
-		chunk->size = 0;
+		chunk->size = static_cast<uint32_t>(-1);
 	}
 
 	pwo->m_engine->queueBuffer(buffer);

I think I might prefer this actually, but it relies on implementation details inside Pipewire (it is not documented publicly that Pipewire interprets a buffer size of -1 as 0 with the additional effect of starting the drain).

@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 6, 2022

but it relies on implementation details inside Pipewire (it is not documented publicly that Pipewire interprets a buffer size of -1 as 0 with the additional effect of starting the drain).

I think we should rather not depend on this kind of thing 🤔

@wtay
Copy link

wtay commented May 6, 2022

but it relies on implementation details inside Pipewire (it is not documented publicly that Pipewire interprets a buffer size of -1 as 0 with the additional effect of starting the drain).

Please don't do that, the size is actually not valid then and might just be clamped to the maxsize of the buffer. It probably won't result in the same kind of drain as the 0 buffer used to do.

What you should ideally do is use the requested field on the buffer and push exactly that many samples into the stream. If you don't have data, push silence.

Otherwise, if you expected a 0 size buffer to start a drain, just replace the size buffer push with a pw_stream_flush(stream, true);

@vimpostor

This comment was marked as outdated.

@vimpostor
Copy link
Contributor Author

vimpostor commented May 6, 2022

@wtay Thanks, just pushing silence instead of buffers with size 0 did the trick!

diff --git a/src/mumble/PipeWire.cpp b/src/mumble/PipeWire.cpp
index 6540f42ca..5b184057d 100644
--- a/src/mumble/PipeWire.cpp
+++ b/src/mumble/PipeWire.cpp
@@ -415,10 +415,14 @@ void PipeWireOutput::processCallback(void *param) {
 
 	const uint32_t frames = std::min(data.maxsize / chunk->stride, pwo->iFrameSize);
 
-	if (pwo->mix(data.data, frames)) {
-		chunk->size = frames * chunk->stride;
-	} else {
-		chunk->size = 0;
+	chunk->size = frames * chunk->stride;
+	if (!pwo->mix(data.data, frames)) {
+		// When the mixer has no data available to write, we still need to push silence.
+		// This is to avoid an infinite loop when destroying the stream.
+		// In that infinite loop, Pipewire would wait until the stream starts draining.
+		// But this never happens, if we don't push new data.
+		// Thus pw_stream_destroy() would block forever.
+		memset(data.data, 0, sizeof(float) * chunk->size);
 	}
 
 	pwo->m_engine->queueBuffer(buffer);

@Krzmbrzl @davidebeatrici Should I update this PR to do the above diff? I don't think it would be wise to do the silence-pushing inside mix(), because other backends may not like that, so we need to do it in the Pipewire specific callback.

@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 6, 2022

I don't think it would be wise to do the silence-pushing inside mix(), because other backends may not like that, so we need to do it in the Pipewire specific callback.

Agreed.

Should I update this PR to do the above diff?

If that is indeed a best-practice, then we should follow that. Though I am not familiar enough with PipeWire to form an opinion on that 👀
From my POV, either version (current PR and the latest suggest diff) seem fine.

Since version 0.3.51, PipeWire has support for buffers with a size of 0
being passed in the process callback. More specifically, this feature
was added in commit 96286fb8b11658a0fdaa61194504a3f9541b25e6 ("resample:
use a -1 buffer size to drain") in PipeWire [1].

Unfortunately, this causes an endless loop, because we try to destroy
the PipeWire stream, but it keeps waiting for the node to be completely
drained, which now never happens because it keeps calling
processCallback() where we feed buffers with a size of 0 to it.

To fix this problem, we do not longer pass 0-size buffers to PipeWire.
Instead we now push silence, if we don't have any data available. This
is what PipeWire prefers anyway, the process callback wants a buffer to
be filled. If we don't do that, the node will stay in
SPA_STATUS_NEED_DATA state and even if we ignore the obvious endless
loop error, this can cause other problems.
For example above mentioned commit was fixed in PipeWire with
32e957345d63a6d6e3d223057d764ff661f5d101 ("audioadapter: don't loop
forever") [2], which is better than the endless loop but still
preventable.
So in order to avoid unnecessary xruns, we push silence instead of empty
buffers.

Fixes mumble-voip#5647

[1] https://gitlab.freedesktop.org/pipewire/pipewire/-/commit/96286fb8b11658a0fdaa61194504a3f9541b25e7
[2] https://gitlab.freedesktop.org/pipewire/pipewire/-/commit/32e957345d63a6d6e3d223057d764ff661f5d101
@vimpostor
Copy link
Contributor Author

vimpostor commented May 6, 2022

Should I update this PR to do the above diff?

If that is indeed a best-practice, then we should follow that. Though I am not familiar enough with PipeWire to form an opinion on that

I guess, there is no being more sure of being best-practice, than having the Pipewire main developer himself proposing that this is what we should ideally do. :D See:

What you should ideally do is use the requested field on the buffer and push exactly that many samples into the stream. If you don't have data, push silence.

If @davidebeatrici agrees, I think this PR is now ready to be merged.

@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 6, 2022

I guess, there is no being more sure of being best-practice, than having the Pipewire main developer himself proposing that this is what we should ideally do. :D

Indeed 😅
I wasn't aware that wtay was the PipeWire main dev ^^

@davidebeatrici
Copy link
Member

This is absolutely perfect, thank you very much!

@davidebeatrici davidebeatrici merged commit d60b861 into mumble-voip:master May 6, 2022
@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 8, 2022

💚 All backports created successfully

Status Branch Result
1.4.x

Questions ?

Please refer to the Backport tool documentation

Krzmbrzl added a commit that referenced this pull request May 8, 2022
…s loop when disconnecting PipeWire stream"
@vimpostor vimpostor deleted the pw_freeze branch May 13, 2022 15:16
archlinux-github pushed a commit to archlinux/svntogit-community that referenced this pull request May 17, 2022
Occured e.g. when applying the settings.
mumble-voip/mumble#5648

git-svn-id: file:///srv/repos/svn-community/svn@1207665 9fca08f4-af9d-4005-b8df-a31f2cc04f65
archlinux-github pushed a commit to archlinux/svntogit-community that referenced this pull request May 17, 2022
Occured e.g. when applying the settings.
mumble-voip/mumble#5648


git-svn-id: file:///srv/repos/svn-community/svn@1207665 9fca08f4-af9d-4005-b8df-a31f2cc04f65
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audio bug A bug (error) in the software client
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mumble freezing on sound system reload with pipewire
4 participants