Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ao: add a new ao "avfoundation" to support spatial audio in macOS #11955

Merged
merged 1 commit into from
Mar 29, 2024

Conversation

ruihe774
Copy link
Contributor

As in #9252.

A new ao "avfoundation" that utilizes AVSampleBufferAudioRenderer in AVFoundation framework is added, which enables spatial audio for multichannel and stereo. (Ref)

Note that though this ao works in both bundle and commandline for normal playback, spatial audio can be turned on only when mpv is bundled (mpv.app and iina.app) and fails to work in commandline. (coreaudiod crashes if we try to turn on spatial audio for mpv running in commandline.)

@ruihe774 ruihe774 marked this pull request as ready for review July 17, 2023 21:37
@ruihe774 ruihe774 changed the title Add a new ao "avfoundation" to support spatial audio ao: add a new ao "avfoundation" to support spatial audio in macOS Jul 18, 2023
@m154k1
Copy link
Contributor

m154k1 commented Jul 18, 2023

Hey, I don't have a device with spatial audio support but the feature looks interesting.
I did a quick test and everything seems to be working (playback, seeking, etc.) except one thing.

Initial value of ao-volume is set to 1%. I think it should start from 100%, increasing it from 1% makes sound way too loud. Can be tested by adding something like this to the input.conf:

9 add ao-volume -10
0 add ao-volume 10

@rcombs has a branch with avfoundation ao, maybe it can help:
https://github.com/rcombs/mpv/tree/avfoundation

@ruihe774 ruihe774 force-pushed the avfoundation branch 2 times, most recently from 8c1414a to dcc4f66 Compare July 18, 2023 11:34
@ruihe774
Copy link
Contributor Author

ruihe774 commented Jul 18, 2023

Hey, I don't have a device with spatial audio support but the feature looks interesting. I did a quick test and everything seems to be working (playback, seeking, etc.) except one thing.

Initial value of ao-volume is set to 1%. I think it should start from 100%, increasing it from 1% makes sound way too loud. Can be tested by adding something like this to the input.conf:

9 add ao-volume -10
0 add ao-volume 10

@m154k1 Fixed in a8d3c86

@m154k1
Copy link
Contributor

m154k1 commented Jul 18, 2023

Works now, thanks!

@ruihe774 ruihe774 force-pushed the avfoundation branch 2 times, most recently from 5823b7a to accdd3f Compare July 18, 2023 14:41
audio/out/ao_avfoundation.m Outdated Show resolved Hide resolved
@ruihe774
Copy link
Contributor Author

@rcombs Would you plz review this pr? This is an independent implementation and fixes all bugs (I think) in previous implementations (like channel configuration, spinning, and timing). And spatialized stereo works (which I find buggy in your branch).

@ruihe774
Copy link
Contributor Author

ruihe774 commented Jul 19, 2023

I've tested Dolby Atmos (sample). Currently I can only render Dolby Atmos as multichannel audio. ("Multichannel" displayed in system UI, instead of "Dolby Atmos".) I've also tried @rcombs's implementation with SPDIF passthru on, but it can only activate "Multichannel" as well.

The sample EAC-3 file has a 3 layered data format:

AudioStreamBasicDescription: 12 ch,  48000 Hz, ec+3 (0x00000000) 0 bits/channel, 0 bytes/packet, 1536 frames/packet, 0 bytes/frame
AudioStreamBasicDescription:  8 ch,  48000 Hz, ec+3 (0x00000000) 0 bits/channel, 0 bytes/packet, 1536 frames/packet, 0 bytes/frame
AudioStreamBasicDescription:  6 ch,  48000 Hz, ec-3 (0x00000000) 0 bits/channel, 0 bytes/packet, 1536 frames/packet, 0 bytes/frame

(grabbed from afplay -d.) In my implementation, FFmpeg can only decode the third layer, which is plain 5.1 EAC-3 (no Dolby Atmos). @rcombs's implementation uses AudioFileGetProperty(file, kAudioFilePropertyDataFormat, &io_size, &asbd) to get the AudioStreamBasicDescription, which returns the third layer. I manage to get all three layers using kAudioFilePropertyFormatList and try to feed the first and second layers (9.1.2 and 5.1.2 Dolby Atmos) to AVSampleBufferAudioRenderer:

diff --git a/audio/out/ao_avfoundation.m b/audio/out/ao_avfoundation.m
index 7841a52f0d..a4b8ff8087 100644
--- a/audio/out/ao_avfoundation.m
+++ b/audio/out/ao_avfoundation.m
@@ -85,6 +85,9 @@ static bool enqueue_buf(struct ao *ao, void *buf, int bufsize,
     [p->renderer enqueueSampleBuffer:sBuf];
     p->enqueued += (double)samples / sample_rate;
 
+    if ([p->renderer status] == AVQueuedSampleBufferRenderingStatusFailed)
+        goto coreaudio_error;
+
     ret = true;
 
 coreaudio_error:
@@ -155,16 +158,32 @@ static bool enqueue_data(struct ao *ao, void *buf, int samples) API_AVAILABLE(ma
             err = AudioFileOpenWithCallbacks(ao, read_cb, NULL, get_size_cb, NULL, file_type, &file);
             CHECK_CA_ERROR("unable to parse packet properties");
 
-            AudioStreamBasicDescription asbd;
-            UInt32 io_size = sizeof(asbd);
-            err = AudioFileGetProperty(file, kAudioFilePropertyDataFormat, &io_size, &asbd);
+            AudioFormatListItem afl[8];
+            UInt32 io_size = sizeof(afl);
+            err = AudioFileGetProperty(file, kAudioFilePropertyFormatList, &io_size, &afl);
             AudioFileClose(file);
             CHECK_CA_ERROR("unable to get ASBD for packet");
 
+            AudioStreamBasicDescription asbd = afl[0].mASBD;
+
             if (p->asbd.mSampleRate != asbd.mSampleRate ||
                 p->asbd.mChannelsPerFrame != asbd.mChannelsPerFrame ||
                 p->asbd.mFormatID != asbd.mFormatID ||
                 p->asbd.mFramesPerPacket != asbd.mFramesPerPacket) {
+                for (int i = 0; i < io_size / sizeof(AudioFormatListItem); ++i) {
+                    AudioStreamBasicDescription asbd = afl[i].mASBD;
+                    MP_VERBOSE(
+                        ao,
+                        "asbd[%d]: sample rate: <%f>, channels: <%u>, format id: <%u>, frames per packet: <%u>, tag: <%u>\n",
+                        i,
+                        asbd.mSampleRate,
+                        asbd.mChannelsPerFrame,
+                        asbd.mFormatID,
+                        asbd.mFramesPerPacket,
+                        afl[i].mChannelLayoutTag
+                    );
+                }
+
                 if (p->desc) {
                     CFRelease(p->desc);
                     p->desc = NULL;
@@ -183,6 +202,7 @@ static bool enqueue_data(struct ao *ao, void *buf, int samples) API_AVAILABLE(ma
 
             if (!enqueue_buf(ao, p->pkt->data, p->pkt->size, asbd.mFramesPerPacket, asbd.mSampleRate, true)) {
                 success = false;
+                MP_FATAL(ao, "%s\n", cfstr_get_cstr((CFStringRef)[[p->renderer error] localizedFailureReason]));
                 goto coreaudio_error;
             }

Info of all three layers is got:

asbd[0]: sample rate: <48000.000000>, channels: <12>, format id: <1700997939>, frames per packet: <1536>, tag: <12582924>
asbd[1]: sample rate: <48000.000000>, channels: <8>, format id: <1700997939>, frames per packet: <1536>, tag: <12713992>
asbd[2]: sample rate: <48000.000000>, channels: <6>, format id: <1700998451>, frames per packet: <1536>, tag: <8060934>

and the first layer is fed to the renderer. However, the playback fails with error kAudioFormatUnsupportedDataFormatError (1718449215). I don't know what goes wrong.

@tmm1
Copy link
Contributor

tmm1 commented Jul 23, 2023

However, the playback fails with error kAudioFormatUnsupportedDataFormatError (1718449215). I don't know what goes wrong.

you should file radar or DTS

@ruihe774
Copy link
Contributor Author

However, the playback fails with error kAudioFormatUnsupportedDataFormatError (1718449215). I don't know what goes wrong.

you should file radar or DTS

I didn't get your point. 😵‍💫 I don't think Apple support DTS.

@tmm1
Copy link
Contributor

tmm1 commented Jul 24, 2023

I didn't get your point. 😵‍💫 I don't think Apple support DTS.

Sorry I mean:

https://feedbackassistant.apple.com/

https://developer.apple.com/support/technical/

real_sample_count = request_sample_count;
}

CMSampleTimingInfo sample_timing_into[] = {(CMSampleTimingInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_timing_info

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this typo wasn't fixed

Suggested change
CMSampleTimingInfo sample_timing_into[] = {(CMSampleTimingInfo) {
CMSampleTimingInfo sample_timing_info[] = {(CMSampleTimingInfo) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I didn't notice that.

@orion1vi
Copy link
Contributor

orion1vi commented Oct 7, 2023

Not sure what's up but when stop is called during pause I see ao->buffer_state->pending->pts being ~2 seconds further, so on unpause ~2 seconds skip happens.

@ruihe774
Copy link
Contributor Author

ruihe774 commented Oct 8, 2023

Not sure what's up but when stop is called during pause I see ao->buffer_state->pending->pts being ~2 seconds further, so on unpause ~2 seconds skip happens.

@orion1vi This AO requires 1 second device buffering so there is at least 1 second preroll. mpv's decoder may also read ahead further. After stop, the 1~2 second buffer is cleared. On unpause, mpv does not refill it. So there is silence. I don't think this issue can be and should be handled in AO level.

@orion1vi
Copy link
Contributor

orion1vi commented Oct 8, 2023

Same issue here #11488, but not in https://github.com/rcombs/mpv/tree/avfoundation, this one is push based though.
Have you attempted push based approach or are/were there other issues with it?
Edit: Never mind, probably bad idea #12322.

@ruihe774
Copy link
Contributor Author

ruihe774 commented Oct 8, 2023

Seems like same/similar issue? #6942 Same issue here #11488, but not in https://github.com/rcombs/mpv/tree/avfoundation, this one is push based though. Have you attempted push based approach or are/were there other issues with it? Edit: Never mind, probably bad idea #12322.

Yes, there are multiple implementations for avfoundation now, and you can choose the one you like. I'm not interested in implementing a push-based ao, because avfoundation must have reason to use a large device buffer, and push-based ao does not feed enough data to it. Anyway, I don't think this issue should be solved at ao level.

@Akemi
Copy link
Member

Akemi commented Mar 16, 2024

hey @ruihe774 first thanks for your PR and sry for me to take so long. i think we should get this rolling again. fist some questions, sorry if they have been answered already, a small summary for me would be helpful

  1. what are the difference to ao_avfoundation: added ao driver with AirPlay support #11488, feature parity, different problems, etc?
  2. what are the difference to rcombs branch https://github.com/rcombs/mpv/tree/avfoundation, feature parity, different problems, etc?
  3. what are the differences to coreaudio and coreaudio_exclusive? is this a possible full replacement for those in the future?
  4. are there any known issues that should be fixed or can't be fixed?
  5. are you interested in maintaining this feature in the future, eg tackling reported issues etc?

@ruihe774
Copy link
Contributor Author

@Akemi Hi!

what are the difference to #11488, feature parity, different problems, etc?
what are the difference to rcombs branch https://github.com/rcombs/mpv/tree/avfoundation, feature parity, different problems, etc?

One difference is channel configuration. gazsiazasz's implementation hard codes mono and stereo, and use kAudioChannelLayoutTag_UseChannelBitmap for rest cases, which IMO is wrong: not all two-channel layouts are stereo, and there is no direct mapping from mpv's speaker id to CoreAudio's channel bitmap.

rcombs's implementation constructs a custom CoreAudio channel layout using kAudioChannelLayoutTag_UseChannelDescriptions from mpv's channel layout. This is correct in theory. However, spatial audio cannot be activated when using custom channel layout.

Mine uses a lookup of predefined CoreAudio channel layout and fallbacks to custom channel layout. With my implementation, spatial audio can work well with not only stereo audio but also, e.g., 5.1, 7.1, and any other standard channel layouts.

Another difference between rcombs's and mine is that rcombs's is push-based ao and mine is pull-based. IMO AVSampleBufferAudioRenderer is intentionally a pull-driven API: it does heavy post-effecting (like spatial audio) and requires a large preroll buffer (or "device buffer" in mpv's context; the preroll can be even longer than half second). Using it in a push-driven way may make it starve and result in instability.

what are the differences to coreaudio and coreaudio_exclusive? is this a possible full replacement for those in the future?

CoreAudio is a low level API and AVFoundation is a high level API that is build upon CoreAudio. I don't think it is feasible for a full replacement. Many low level features, e.g., low latency, configurable audio graph, device selection, is only available in CoreAudio, while high level features like AirPlay and spatial audio are only available in AVFoundation.

In fact, I think Apple actually implements these high level features in some private CoreAudio audio units (i.e., nodes in the audio processing graph) and exposes these "Apple-designed" audio processing graphs as a whole to the high level API. It is a pity that Apple does not provide a low level interface to these advanced features to allow flexible audio graph configuration with them.

are there any known issues that should be fixed or can't be fixed?

One issue is the effect that a very large device buffer (1 second) brings about. AVSampleBufferAudioRenderer needs such a large preroll otherwise it will starve. In mpv's architecture, the device buffer needs to be filled before ao_read_data can return data to the ao. When playback starts or seeks, this will cause a short period of silence.

There is no way this issue can be handled in my ao implementation. In fact, if a large device buffer is set in other ao, such silence will also appear.

Another issue is that currently it does not support proprietary spatial audio formats like Dolby Atoms, which needs metadata passthru. I'm not very interested in implementing this in a near future.

Apart from these, the ao works pretty fine. I actually uses this ao in my everyday watching and have not spotted problems.

are you interested in maintaining this feature in the future, eg tackling reported issues etc?

Yes for sure.

@Akemi
Copy link
Member

Akemi commented Mar 16, 2024

okay that sounds good. it would be nice if you could rabase your changes and also squash most/all of your commits into one "initial avfoundation ao support" or similar. if a commit makes sense to keep separate you can/should leave it.

i am going to review the code this or next weekend and i think we could get this merged then.

@ruihe774 ruihe774 force-pushed the avfoundation branch 2 times, most recently from 186295a to f3d7b22 Compare March 16, 2024 20:16
@ruihe774
Copy link
Contributor Author

i only had one small 'issue' when testing, though i guess this is because of the initial buffer that needs to be filled, when pausing the playback and resuming it again it drops some frames for me (around 30 with my test file). this results in the audio/video desync warning. not sure if this can be mitigated.

I've just looked into it. The reason why desync happens is that avfoundation itself queues samples. When paused, ao->driver->reset is called, and the queued samples are discarded. When resumed, ao->driver->start is called, and avfoundation will request samples that are flushed. But mpv does not keep them either. Thus the desync happens.

For example, the playing pts is 00:10, and avfoundation has prerolled for 1 second. In this case, next call to ao_read_data will return audio data for pts 00:11. When paused, ao->driver->reset is called and avfoundation will discard queued samples from 00:10 to 00:11. When resumed, avfoundation will request audio data from pts 00:10, but mpv does not have it; mpv only has data from pts 00:11. Thus the desync happens.

This is an issue caused by the fact that mpv uses ao->driver->reset to pause pull based ao. When it is called, the ao has no idea whether it is a pause or a reset, and any queued samples are flushed. When resumed, no one holds these samples, and desync is caused.

To solve this issue, we need introduce an explicit API to pause pull-based ao. I've implemented it in 9abcdd2 and fixed the desync issue. Since this is a API change, I will put it into a separated PR for discussion after this PR is merged.

@tmm1
Copy link
Contributor

tmm1 commented Mar 25, 2024

Adding pause support to pull-based ao is the correct solution 👍🏼

@ruihe774
Copy link
Contributor Author

@t-8ch I'm wondering how I can deal with the situation that zero samples are returned from ao_read_data_nonblocking (e.g., EOF and mutex contention). I've just looked through your implementation in ao_pipewire. A buffer with zero samples is enqueued to pipewire in such scenario. I'm afraid that the processing thread will spin if pipewire begins to starve. My test shows that near EOF on_process is called hundreds of times with queued 0 of xxx samples. If I just use a similar handling in ao_avfoundation, giving its large device buffer, it will make the processing thread spin for more than 1 second.

Also, I don't think spinning in a RT thread as in ao_pipewire is a good idea. I'm afraid it will block preemption and reduce responsiveness of other threads.

@tmm1
Copy link
Contributor

tmm1 commented Mar 27, 2024

I think a sleep in the underrun scenario is the most practical solution. But the duration of the sleep should probably be proportional to the amount of data already enqueued into the ao. And ideally the sleep would be interruptible by the ao's wakeup callback.

@ruihe774
Copy link
Contributor Author

ruihe774 commented Mar 27, 2024

And ideally the sleep would be interruptible by the ao's wakeup callback.

I don't think it is relevant. wakeup_cb is called when the buffer is run out. See

mpv/audio/out/buffer.c

Lines 187 to 192 in f4a7931

if (pos < samples && p->playing && !p->paused) {
p->playing = false;
ao->wakeup_cb(ao->wakeup_ctx);
// For ao_drain().
mp_cond_broadcast(&p->wakeup);
}
and

mpv/audio/out/buffer.c

Lines 674 to 684 in f4a7931

eof:
MP_VERBOSE(ao, "audio end or underrun\n");
// Normal AOs signal EOF on underrun, untimed AOs never signal underruns.
if (ao->untimed || !state.playing || ao->stream_silence) {
p->streaming = state.playing && !ao->untimed;
p->playing = false;
}
ao->wakeup_cb(ao->wakeup_ctx);
// For ao_drain().
mp_cond_broadcast(&p->wakeup);
return true;

Instead, we want to be woken up when new data comes into the buffer. The most close one is p->pt_wakeup, but it is not broadcasted for pull-based ao. Still, p->pt_wakeup does not guarantee p->lock is unlocked so there may still be contention.

I think a sleep in the underrun scenario is the most practical solution.

I think ao_pipewire cannot sleep with current implementation because zero buffers are queued, and sleeping is going to make a gap. BTW I'm not sure if it is safe to sleep in a RT thread. A solution may be to queue some buffers for some degree of tolerance of underruns, and not to use a RT thread.

@ruihe774
Copy link
Contributor Author

ruihe774 commented Mar 28, 2024

I'll be honest ao_read_data_nonblocking is quiet recent addition and may not be fully correct. There were multiple reports about audio issue with it.

I think this is weird. p->end_time_ns is set to "delay + requested samples", not "delay + read samples". I'm confused. 😵‍💫

You still set it to "expected" time of last sample, if there is no enough data, you will get less samples, but the time is already advanced in full step.

@kasper93 So how can playing pts be properly calculated? I find such code in audio.c:

// Return pts value corresponding to currently playing audio.
double playing_audio_pts(struct MPContext *mpctx)
{
    double pts = written_audio_pts(mpctx);
    if (pts == MP_NOPTS_VALUE || !mpctx->ao)
        return pts;
    return pts - ao_get_delay(mpctx->ao);
}

while ao_get_delay is

        int64_t end = p->end_time_ns;
        int64_t now = mp_time_ns();
        driver_delay = MPMAX(0, MP_TIME_NS_TO_S(end - now));

I don't think written_audio_pts will include the duration of padded silence, or the requested-but-not-filled samples. If ao_read_data and ao_read_data_nonblocking is returning partial data, won't the real delay be smaller than ao_get_delay, and playing_audio_pts be smaller than actual value?

BTW, I'm confused about how av diff is calculated. In video.c, I find audio_speed is multiplied with delay in calculation in update_av_diff:

    double a_pos = written_audio_pts(mpctx);
    if (a_pos != MP_NOPTS_VALUE && mpctx->video_pts != MP_NOPTS_VALUE) {
        a_pos -= mpctx->audio_speed * ao_get_delay(mpctx->ao);
        mpctx->last_av_difference = a_pos - mpctx->video_pts
                                  + opts->audio_delay + offset;
    }

This is inconsistent with the calculation in playing_audio_pts. In fact, the audio data that has been read by ao does not subject to the change of audio speed anymore. So I think both methods of calculation are not correct. For example, if the ao has read 0.5 second of audio data when the audio speed is 1x, and it has then read 0.6 second of audio data when the speed is 2x, the total adjustment should be:

a_pos -= 0.5 * 1 + 0.6 * 2

What's more, adjust_sync has a logic that differs from the above two:

    double a_pts = written_audio_pts(mpctx) + opts->audio_delay - mpctx->delay;

I'm confused.

@Dudemanguy would you please look into this?

@Dudemanguy
Copy link
Member

Playback speed is not relevant on the ao level. Speed considerations come into play on a higher level in the core code. It should not matter for what you are trying to do. Or at least, this is the way mpv is designed.

@ruihe774
Copy link
Contributor Author

ruihe774 commented Mar 28, 2024

Playback speed is not relevant on the ao level. Speed considerations come into play on a higher level in the core code. It should not matter for what you are trying to do. Or at least, this is the way mpv is designed.

@Dudemanguy I'm not discussing the ao level, and not about this PR. It is some side product of my investigation of how mpv handles out_time. Maybe I shall open a separated issue or discussion. My question is, how mpv (the core part) calculates AV diff? And I don't think a_pos -= mpctx->audio_speed * ao_get_delay(mpctx->ao) is the correct way to calculate that.

I think this is the cause of AV diff when using avfoundation during playback speed change. avfoundation of a 2 second device buffer, So the value of ao_get_delay can be very large. During playback speed change, a jumped audio_speed is multiplied with it and this change a_pos to a wrong value.

@Dudemanguy
Copy link
Member

And I don't think a_pos -= mpctx->audio_speed * ao_get_delay(mpctx->ao) is the correct way to calculate that.

That is part of the calculation, yes. The discontinuous jump that occurs when you change speeds could be more smooth but you do have to take the speed into consideration here.

@Akemi
Copy link
Member

Akemi commented Mar 28, 2024

To solve this issue, we need introduce an explicit API to pause pull-based ao. I've implemented it in 9abcdd2 and fixed the desync issue. Since this is a API change, I will put it into a separated PR for discussion after this PR is merged.

is this something you want to get in before this merge, so you can fix it here, or do you want to fix it in a another PR? a PR for it would also be appreciated.

also is there still something missing here?

@ruihe774
Copy link
Contributor Author

is this something you want to get in before this merge, so you can fix it here, or do you want to fix it in a another PR? a PR for it would also be appreciated.

I'd prefer fixing it in an another PR.

also is there still something missing here?

I think not.

@ruihe774
Copy link
Contributor Author

@Akemi Do I need to squash the commits?

@Akemi
Copy link
Member

Akemi commented Mar 28, 2024

yeah, please squash them.

@ruihe774
Copy link
Contributor Author

@Akemi Time to merge it?

@Akemi
Copy link
Member

Akemi commented Mar 29, 2024

thank you for your contribution and sorry again that it took me so long to get this rolling.

[edit]
also looking forward to the other fixes, i going to try to review and merge those faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants