Skip to content
This repository has been archived by the owner on Mar 22, 2022. It is now read-only.

Remote audio feature #99

Merged
merged 39 commits into from
Apr 24, 2020
Merged

Remote audio feature #99

merged 39 commits into from
Apr 24, 2020

Conversation

stephenatwork
Copy link
Member

@stephenatwork stephenatwork commented Oct 14, 2019

Remote audio can now be redirected away from the default audio device.

RemoteAudioSource on Unity now plays audio using the OnAudioFilterRead API.

This PR also adds some a new high-level API which manages buffering and provides a way to consume the audio as a stream, resampling and/or changing the number of channels on the fly.

Open issues:

  • Add proper interface to enable/disable individual tracks
  • Refactor AudioTrackReadBuffer to buffer an individual track, and tidy up interface according to comments.
  • Tidy up Unity integration.

@eanders-ms
Copy link
Contributor

Did you consider using OnAudioFilterRead? It allows you to write the streaming audio data (after decoding it to WAVE_FORMAT_IEEE_FLOAT) directly to an AudioSource, bypassing the need for an AudioClip.

@djee-ms
Copy link
Member

djee-ms commented Oct 14, 2019

Did you consider using OnAudioFilterRead? It allows you to write the streaming audio data (after decoding it to WAVE_FORMAT_IEEE_FLOAT) directly to an AudioSource, bypassing the need for an AudioClip.

I saw that technique yesterday in that Gamasutra article and was wondering about the difference with the streaming AudioClip. @eanders-ms do you have any experience with this? How do the two compare? It sounds (pun intended) like OnAudioFilterRead might be better for performance and/or latency, no? And that avoids the issue with Unity object access when changing channel count and sample rate I assume, at the expense of re-sampling since you seem to be limited to the internal sampling rate of the Unity DSP (but anyway Unity will do that resampling on the AudioSource anyway I am pretty sure).

Ping @stephenatwork, what do you think? The docs explicitly mention the technique for procedural audio:

If this is the first filter in the chain and a clip isn't attached to the audio source, this filter will be played as the audio source. In this way you can use the filter as the audio clip, procedurally generating audio.

@eanders-ms
Copy link
Contributor

I was hoping to sign up for this work but @stephenatwork beat me to it :)

@eanders-ms do you have any experience with this? How do the two compare?

My prior experience is one-sided in that I've only used the OnAudioFilterRead approach. On a previous project I incorporated a streaming mp3 player into Unity. It would read output from ffmpeg and pipe it to an AudioSource via OnAudioFilterRead. There was some timing code to ensure the video stream stayed in sync to the audio. I think the same approach is applicable here, given the similarities.

That said, if the solution in this PR works and it's a good Unity integration, then fantastic.

@stephenatwork
Copy link
Member Author

Did you consider using OnAudioFilterRead? It allows you to write the streaming audio data (after decoding it to WAVE_FORMAT_IEEE_FLOAT) directly to an AudioSource, bypassing the need for an AudioClip.

Thanks for the tip. That looks like a much better choice than the one I found. I did a few local tests and it seems to have very little delay (which is an issue in the AudioClip version). It shouldn't be much work to port the implementation, except to add the resampling.

@stephenatwork
Copy link
Member Author

I've updated this PR to use the OnAudioFilterRead API.

Also, managing buffers and resampling in C# seems inefficient. So this PR adds some a new high-level API which manages buffering and provides a way to consume the audio as a stream, resampling and/or changing the number of channels on the fly.

Copy link
Member

@djee-ms djee-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed in principle on design. The C# interface is a nice touch. Need some clean-up and finishing the implementation obviously. Thanks!

libs/Microsoft.MixedReality.WebRTC.Native/src/api.cpp Outdated Show resolved Hide resolved
libs/Microsoft.MixedReality.WebRTC.Native/src/api.cpp Outdated Show resolved Hide resolved
libs/Microsoft.MixedReality.WebRTC/PeerConnection.cs Outdated Show resolved Hide resolved
@stephenatwork
Copy link
Member Author

All the scaffolding is in place, we just need to choose and wire up a resampler. Luckily webrtc seems to include at least 2 which I'll need to evaluate:
webrtc\xplatform\opus\silk
webrtc\xplatform\webrtc\common_audio\resampler\include

@stephenatwork
Copy link
Member Author

The current branch supports (only) spatial audio. Objects with a RemoteAudioSource component, can have an AudioSource attached. Remote audio tracks are not piped to the output device. When there is no remote audio data, either because of underrun or not yet connected, the RemoteAudioSource outputs a low buzz for debugging.
TODO: remove the dialtone, allow per track routing of output to speakers or AudioSource.

@@ -450,6 +453,11 @@ MRS_API void MRS_CALL mrsPeerConnectionRegisterRemoteAudioFrameCallback(
PeerConnectionAudioFrameCallback callback,
void* user_data) noexcept;

// Experimental. Render or not remote audio tracks on the audio device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Explain about remote audio callbacks? It looks like a "mute" function with this comment alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -622,6 +630,21 @@ MRS_API mrsResult MRS_CALL mrsPeerConnectionRemoveLocalVideoTrack(
MRS_API void MRS_CALL mrsPeerConnectionRemoveLocalAudioTrack(
PeerConnectionHandle peerHandle) noexcept;

MRS_API mrsResult MRS_CALL
mrsAudioReadStreamCreate(PeerConnectionHandle peerHandle,
int bufferMs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document those parameters, especially bufferMs? And for mrsAudioReadStreamRead() below too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure bufferMs should even be in the API. (I added it originally because I needed to test different values).

While on the topic of buffering, it's worth mentioning an issue if there's a hiccup (delay) in unity. There is no logic for 'catching up' and so in the worst case, and even with a great connection, the buffer can be completely full and all it's doing is adding latency. Half a second by default!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no logic for 'catching up'

This can be solved in AudioReadStream::Read by taking the newest n frames that fill the requested buffer, rather than taking the oldest ones, right? It seems that should be easy to do - or am i missing some fundamental issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that if you always take the latest, then you run the risk of dropouts which is what the buffering is trying to avoid! So the idea is that you're willing to trade N ms of delay for reduced probability of dropouts.

You can do this by skipping some part of the buffer, but it's likely (?) that creates pops/artifacts. Maybe there's something better? Perhaps temporarily speed up the audio by messing with the sample rates, or something more complex which doesn't change the pitch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right (also if frames are pushed more frequently than when they are pulled that wont' work at all, but it shouldn't be our case).

Are we sure this is a problem though? I would expect OnAudioFilterRead to request more data after a hiccup to get up to speed.

In any case I have looked at how WebRTC handles audio packets buffering, doing it properly (and generically) is complicated unsurprisingly (see modules/audio_coding/net_eq/net_eq.cc). Though they have an Accelerate class that we might use directly (modules/audio_coding/net_eq/accelerate.h). Looks like some work so I'd be for logging this separately (if it is indeed an issue that can happen).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look in to the pipeline of buffers, but would expect OnAudioFilterRead to be much more dumb/realtime and always ask for the same number of samples. I think it's only a short hop from there to handing to the output device.

Absolutely, it's complex and can be deferred. Perhaps there is some simple adaptive scheme we can use to choose the buffer size based on network conditions. Or perhaps webrtc is already doing all the smart stuff and we can reduce bufferMs to 20 or 30.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mark the API as experimental then and log an issue to investigate this more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look in to the pipeline of buffers, but would expect OnAudioFilterRead to be much more dumb/realtime and always ask for the same number of samples.

That seems what's happening actually 😞


/// Fill data with samples at the given sampleRate and number of channels.
/// If the internal buffer overruns, the oldest data will be dropped.
/// If the internal buffer is exhausted, the data is padded with white noise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation pads with sine, not white noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #99 (comment) this should be debugging only.

@stephenatwork any reason to pad with noise rather than simply silence here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was @djee-ms suggestion to help debugging, since there are many states which can lead to silence. It was moderately useful to experimentally choose a buffer size. I suppose it may be more useful to have underrun/overrun status available programmatically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have Read return the actual consumed samples then.

Also I'd say we can make sine padding opt-in (easier than leave it to higher layers and not too cumbersome).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have it opt-out. Better have strong indications that something is wrong, and let user ignore them (opt out and get silence instead), than hide a potential issue from the user without them knowing.

libs/Microsoft.MixedReality.WebRTC/IAudioReadStream.cs Outdated Show resolved Hide resolved
libs/Microsoft.MixedReality.WebRTC/PeerConnection.cs Outdated Show resolved Hide resolved
}

/// <summary>
/// High level interface for consuming WebRTC audio streams.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't say much about why one would use that and what features it enables. Can we add an example of feature to show why/how to use an audio read stream?

Comment on lines 69 to 75
if (!IsPlaying)
{
IsPlaying = true;
//PeerConnection.Peer.RemoteAudioFrameReady += RemoteAudioFrameReady;
OnAudioConfigurationChanged(deviceWasChanged: false);
_audioTrackReadBuffer = PeerConnection.Peer.CreateAudioTrackReadBuffer();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenatwork @djee-ms it feels weird that we can control play state on the linked AudioSource and on this object independently. Looks like an easy source of mistakes. Also we need to decide a sane behavior for when the AudioSource plays but this doesn't (at the moment you get beeping).

I propose to

  1. create/destroy the read buffer automatically when a track is added/removed
  2. return empty frames in OnAudioFilterRead if there is no track
  3. remove Play/Stop/IsPlaying from this class, or make them a shortcut for _linkedAudioSource.(Play|Stop|IsPlaying)

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I suppose that there is a small overhead even when paused via the unity AudioSource, because IIRC the buffering/transcoding still happens internally, even if we're not consuming the data via OnAudioFilterRead. Not sure if it's worth doing anything about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we transcode only on read so we don't have this issue*. For the sake of correctness though I suppose we can override OnEnable/OnDisable to turn the buffer on/off.

*Maybe we shouldn't wait for a Read() call (I am not sure if the processing adds any meaningful latency) but we can deal with this later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should continue to transcode on read because this means transcoding inside the audio thread, which is the right place to do so, instead of some other thread like the WebRTC signaling thread or the main Unity app thread, which are busy doing other stuffs. And yes we should return silence when there's no track, to be consistent with the WebRTC behavior of sending silence when there's no track on a transceiver. I also agree on removing the playing state from here and use the audio source one only to avoid confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filippo - when I say on read I mean OnAudioFilterRead. Do you mean something else? IIRC we're transcoding when the frame arrives from webrtc (addFrame), not when requested from OnAudioFilterRead (that part is a memcpy). The transcoding is the extra work I mean if the unity source is paused, but data continues to arrive from webrtc.

Remote audio interface has been refactored so that it compiles, but still needs
to be ported to multi-track world.

Part of RemoteAudioSource has been moved to AudioReceiver, but the class is not
functional yet.
…remote-audio-92

CustomAudioMixer moved to peer_connection.h due to include order issues.
@fibann fibann force-pushed the user/stephenatwork/remote-audio-92 branch from 52c6ea0 to b20e16d Compare April 23, 2020 18:18
Will be refactored later.
@fibann fibann requested a review from djee-ms April 24, 2020 10:53
Copy link
Member

@djee-ms djee-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken offline: Let's merge as is and improve from there.

@fibann fibann merged commit 48b9429 into master Apr 24, 2020
@fibann fibann deleted the user/stephenatwork/remote-audio-92 branch April 24, 2020 10:59
mr-webrtc-buildbot added a commit that referenced this pull request Apr 24, 2020
…oft/user/stephenatwork/remote-audio-92)
}

mrsResult MRS_CALL
mrsAudioTrackReadBufferRead(AudioTrackReadBufferHandle readStream,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: readStream -> readBuffer or readHandle

fibann added a commit to fibann/MixedReality-WebRTC that referenced this pull request May 19, 2020
The API has been refactored to create a buffer from a track (rather than the PeerConnection) and to address comments on microsoft#99.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants