-
Notifications
You must be signed in to change notification settings - Fork 277
Conversation
Did you consider using OnAudioFilterRead? It allows you to write the streaming audio data (after decoding it to WAVE_FORMAT_IEEE_FLOAT) directly to an |
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
I saw that technique yesterday in that Gamasutra article and was wondering about the difference with the streaming Ping @stephenatwork, what do you think? The docs explicitly mention the technique for procedural audio:
|
I was hoping to sign up for this work but @stephenatwork beat me to it :)
My prior experience is one-sided in that I've only used the That said, if the solution in this PR works and it's a good Unity integration, then fantastic. |
Thanks for the tip. That looks like a much better choice than the one I found. I did a few local tests and it seems to have very little delay (which is an issue in the AudioClip version). It shouldn't be much work to port the implementation, except to add the resampling. |
I've updated this PR to use the OnAudioFilterRead API. Also, managing buffers and resampling in C# seems inefficient. So this PR adds some a new high-level API which manages buffering and provides a way to consume the audio as a stream, resampling and/or changing the number of channels on the fly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed in principle on design. The C# interface is a nice touch. Need some clean-up and finishing the implementation obviously. Thanks!
libs/Microsoft.MixedReality.WebRTC.Native/src/peer_connection.h
Outdated
Show resolved
Hide resolved
libs/Microsoft.MixedReality.WebRTC.Native/src/peer_connection.h
Outdated
Show resolved
Hide resolved
All the scaffolding is in place, we just need to choose and wire up a resampler. Luckily webrtc seems to include at least 2 which I'll need to evaluate: |
libs/Microsoft.MixedReality.WebRTC/Interop/AudioReadStreamInterop.cs
Outdated
Show resolved
Hide resolved
The current branch supports (only) spatial audio. Objects with a RemoteAudioSource component, can have an AudioSource attached. Remote audio tracks are not piped to the output device. When there is no remote audio data, either because of underrun or not yet connected, the RemoteAudioSource outputs a low buzz for debugging. |
libs/Microsoft.MixedReality.WebRTC.Native/include/interop_api.h
Outdated
Show resolved
Hide resolved
@@ -450,6 +453,11 @@ MRS_API void MRS_CALL mrsPeerConnectionRegisterRemoteAudioFrameCallback( | |||
PeerConnectionAudioFrameCallback callback, | |||
void* user_data) noexcept; | |||
|
|||
// Experimental. Render or not remote audio tracks on the audio device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Explain about remote audio callbacks? It looks like a "mute" function with this comment alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -622,6 +630,21 @@ MRS_API mrsResult MRS_CALL mrsPeerConnectionRemoveLocalVideoTrack( | |||
MRS_API void MRS_CALL mrsPeerConnectionRemoveLocalAudioTrack( | |||
PeerConnectionHandle peerHandle) noexcept; | |||
|
|||
MRS_API mrsResult MRS_CALL | |||
mrsAudioReadStreamCreate(PeerConnectionHandle peerHandle, | |||
int bufferMs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document those parameters, especially bufferMs
? And for mrsAudioReadStreamRead()
below too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure bufferMs should even be in the API. (I added it originally because I needed to test different values).
While on the topic of buffering, it's worth mentioning an issue if there's a hiccup (delay) in unity. There is no logic for 'catching up' and so in the worst case, and even with a great connection, the buffer can be completely full and all it's doing is adding latency. Half a second by default!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no logic for 'catching up'
This can be solved in AudioReadStream::Read
by taking the newest n frames that fill the requested buffer, rather than taking the oldest ones, right? It seems that should be easy to do - or am i missing some fundamental issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that if you always take the latest, then you run the risk of dropouts which is what the buffering is trying to avoid! So the idea is that you're willing to trade N ms of delay for reduced probability of dropouts.
You can do this by skipping some part of the buffer, but it's likely (?) that creates pops/artifacts. Maybe there's something better? Perhaps temporarily speed up the audio by messing with the sample rates, or something more complex which doesn't change the pitch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right (also if frames are pushed more frequently than when they are pulled that wont' work at all, but it shouldn't be our case).
Are we sure this is a problem though? I would expect OnAudioFilterRead to request more data after a hiccup to get up to speed.
In any case I have looked at how WebRTC handles audio packets buffering, doing it properly (and generically) is complicated unsurprisingly (see modules/audio_coding/net_eq/net_eq.cc). Though they have an Accelerate
class that we might use directly (modules/audio_coding/net_eq/accelerate.h). Looks like some work so I'd be for logging this separately (if it is indeed an issue that can happen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look in to the pipeline of buffers, but would expect OnAudioFilterRead to be much more dumb/realtime and always ask for the same number of samples. I think it's only a short hop from there to handing to the output device.
Absolutely, it's complex and can be deferred. Perhaps there is some simple adaptive scheme we can use to choose the buffer size based on network conditions. Or perhaps webrtc is already doing all the smart stuff and we can reduce bufferMs to 20 or 30.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll mark the API as experimental then and log an issue to investigate this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look in to the pipeline of buffers, but would expect OnAudioFilterRead to be much more dumb/realtime and always ask for the same number of samples.
That seems what's happening actually 😞
libs/Microsoft.MixedReality.WebRTC.Native/src/peer_connection.cpp
Outdated
Show resolved
Hide resolved
libs/Microsoft.MixedReality.WebRTC.Native/src/peer_connection.cpp
Outdated
Show resolved
Hide resolved
|
||
/// Fill data with samples at the given sampleRate and number of channels. | ||
/// If the internal buffer overruns, the oldest data will be dropped. | ||
/// If the internal buffer is exhausted, the data is padded with white noise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation pads with sine, not white noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #99 (comment) this should be debugging only.
@stephenatwork any reason to pad with noise rather than simply silence here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was @djee-ms suggestion to help debugging, since there are many states which can lead to silence. It was moderately useful to experimentally choose a buffer size. I suppose it may be more useful to have underrun/overrun status available programmatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have Read
return the actual consumed samples then.
Also I'd say we can make sine padding opt-in (easier than leave it to higher layers and not too cumbersome).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have it opt-out. Better have strong indications that something is wrong, and let user ignore them (opt out and get silence instead), than hide a potential issue from the user without them knowing.
...y.WebRTC.Unity/Assets/Microsoft.MixedReality.WebRTC.Unity/Scripts/Media/RemoteAudioSource.cs
Outdated
Show resolved
Hide resolved
} | ||
|
||
/// <summary> | ||
/// High level interface for consuming WebRTC audio streams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't say much about why one would use that and what features it enables. Can we add an example of feature to show why/how to use an audio read stream?
Average before resampling and duplicate at the end, so we do work on a single channel when possible
Audio won't be played if missing
if (!IsPlaying) | ||
{ | ||
IsPlaying = true; | ||
//PeerConnection.Peer.RemoteAudioFrameReady += RemoteAudioFrameReady; | ||
OnAudioConfigurationChanged(deviceWasChanged: false); | ||
_audioTrackReadBuffer = PeerConnection.Peer.CreateAudioTrackReadBuffer(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenatwork @djee-ms it feels weird that we can control play state on the linked AudioSource and on this object independently. Looks like an easy source of mistakes. Also we need to decide a sane behavior for when the AudioSource plays but this doesn't (at the moment you get beeping).
I propose to
- create/destroy the read buffer automatically when a track is added/removed
- return empty frames in OnAudioFilterRead if there is no track
- remove Play/Stop/IsPlaying from this class, or make them a shortcut for
_linkedAudioSource.(Play|Stop|IsPlaying)
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I suppose that there is a small overhead even when paused via the unity AudioSource, because IIRC the buffering/transcoding still happens internally, even if we're not consuming the data via OnAudioFilterRead. Not sure if it's worth doing anything about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment we transcode only on read so we don't have this issue*. For the sake of correctness though I suppose we can override OnEnable/OnDisable to turn the buffer on/off.
*Maybe we shouldn't wait for a Read() call (I am not sure if the processing adds any meaningful latency) but we can deal with this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should continue to transcode on read because this means transcoding inside the audio thread, which is the right place to do so, instead of some other thread like the WebRTC signaling thread or the main Unity app thread, which are busy doing other stuffs. And yes we should return silence when there's no track, to be consistent with the WebRTC behavior of sending silence when there's no track on a transceiver. I also agree on removing the playing state from here and use the audio source one only to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filippo - when I say on read I mean OnAudioFilterRead. Do you mean something else? IIRC we're transcoding when the frame arrives from webrtc (addFrame), not when requested from OnAudioFilterRead (that part is a memcpy). The transcoding is the extra work I mean if the unity source is paused, but data continues to arrive from webrtc.
Remote audio interface has been refactored so that it compiles, but still needs to be ported to multi-track world. Part of RemoteAudioSource has been moved to AudioReceiver, but the class is not functional yet.
…remote-audio-92 CustomAudioMixer moved to peer_connection.h due to include order issues.
52c6ea0
to
b20e16d
Compare
Will be refactored later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken offline: Let's merge as is and improve from there.
…oft/user/stephenatwork/remote-audio-92)
} | ||
|
||
mrsResult MRS_CALL | ||
mrsAudioTrackReadBufferRead(AudioTrackReadBufferHandle readStream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: readStream -> readBuffer or readHandle
The API has been refactored to create a buffer from a track (rather than the PeerConnection) and to address comments on microsoft#99.
Remote audio can now be redirected away from the default audio device.
RemoteAudioSource on Unity now plays audio using the OnAudioFilterRead API.
This PR also adds some a new high-level API which manages buffering and provides a way to consume the audio as a stream, resampling and/or changing the number of channels on the fly.
Open issues: