Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out muting in VoIP calls #16862

Closed
SimonBrandner opened this issue Apr 5, 2021 · 13 comments
Closed

Figure out muting in VoIP calls #16862

SimonBrandner opened this issue Apr 5, 2021 · 13 comments
Assignees
Labels
A-VoIP P1 T-Enhancement T-Task Tasks for the team like planning

Comments

@SimonBrandner
Copy link
Contributor

SimonBrandner commented Apr 5, 2021

Muting in VoIP calls

This is my current stream of thoughts regarding muting in VoIP calls. The issue will be closed as soon as a better place for this appears (probably an MSC). Also, this is currently opened in element-web as it focuses on its implementation but it will be necessary to eventually expand the focus.

Problems of the current implementation

Clients may wish to inform the user of the opponent's mute state.

If a user mutes their camera, their opponent just sees a black screen, but the opponent's client may wish to show a placeholder instead (e.g. an avatar).

The current solution is to set MediaStreamTrack.enabled = false for the tracks we want to mute. This works but isn't ideal. There is no way for the opponent to know that a track is muted. This in itself isn't that big of a problem and could be solved by additional signalling. The big problem is even with the data being meaningless there is still significant bandwidth usage. This is a problem because a user might want to turn on their webcam for a moment causing a call upgrade to voice + video and then turn their camera back off. This would keep sending black frames which isn't completely awful but still uses some bandwidth.

Solutions

This lists three solutions to muting. Let's apply them to our use case.

removeTrack(sender)

We can rule this out because it requires re-negotiation which isn't desired

Disabling tracks

This is the current solution which has many problems. Additional signalling using m.cal.mute could be added (onmute events aren't fired when a track is disabled) but we still need to fix the bandwidth issue. Would it make sense to downgrade the call after some time for which the user had their camera turned off?

replaceTrack(null)

This solution seems to be the most promising. It should be theoretically supported in Firefox, Chromium and Safari, but some testing wouldn't be a bad idea. I am also very unsure how this translates to other platforms. If this really worked we need to find out if the other side can see the mute state using onmute events on MediaStreamTrack. If not we would still need mute events (m.call.mute).

@SimonBrandner SimonBrandner added T-Enhancement T-Task Tasks for the team like planning and removed T-Enhancement labels Apr 5, 2021
@SimonBrandner SimonBrandner self-assigned this Apr 5, 2021
@dbkr
Copy link
Member

dbkr commented Apr 6, 2021

Ah, thanks for doing the research - this is super helpful. Kurento are mostly focusing on how to avoid wasting bandwidth when a track is muted rather than signalling the mute state to the other side which is our primary concern. For that, I think the 3 options for this are:

  1. Do it in the SDP: probably by setting the track to recvonly, ie. kurento's option 1. The fact that it triggers a negotiation isn't ideal as it will be slow and more back & forth than necessary. Would be nice that it's the same mechanism we use for hold though. There's not really any specification of mute for SIP/SDP, but this is probably the closest to being such a thing.
  2. Use a different Matrix event. This would just be one message rather than a message + reply, so a bit faster & more efficient.
  3. Signal it over a webrtc data channel. Would be near-instant, but we'd need a data channel just for signalling mute status, it's more of pain for clients to implement and would be awful for a signalling-only bridge.

@SimonBrandner
Copy link
Contributor Author

SimonBrandner commented Apr 6, 2021

This whole thing has started out as a draft MSC for 2. (mute events) but halfway through I've realized that the current solution is wasting data. I had something like this in mind:

{
    "type": "m.call.mute",
    "room_id": "!roomId",
    "content": {
        "call_id": "1414213562373095",
        "party_id": "1732050807568877",
        "mute_info": {
            "streamId": "271828182845",
            "audioMuted:": false,
            "videoMuted": true,
        },
        "version": "1",
    },
}

Is there any reason not to do replaceTrack(null)? It at least seems like an interesting idea that I'd like to play with but it might have some problems that I am overlooking.

I'd like to do some testing but a lot of things would be a little easier with some other stuff done, IMO. I think currently these things should make it easier: Resizable CallView → My dangerous branches (if you like that solution) → MSC3077 implementation. But I think some testing should be doable with what I already have - I'll see what I can do (test :D).

@dbkr
Copy link
Member

dbkr commented Apr 6, 2021

Isn't the point of replaceTrack() that it just changes the source of a track without signalling anything to the other side though?

@SimonBrandner
Copy link
Contributor Author

Yes, but hypothetically we could detect it on the other side using the onmute events or send mute events over Matrix. It should waste less bandwidth than the current solution

@dbkr
Copy link
Member

dbkr commented Apr 6, 2021

Right yep, but my question is more what signalling we expect to happen that might cause the onmute event to actually fire on the other side, or whether we'd need the extra Matrix event.

@SimonBrandner
Copy link
Contributor Author

Yep, that is the part I am very unsure about - that is why I'd like to test this

@SimonBrandner
Copy link
Contributor Author

Note to self: We should explore how others do this (Wire, Signal...)

@SimonBrandner
Copy link
Contributor Author

Test 1: SDP (branch)

Benefits

  • The other side doesn't need any additional info

Problems

  • It takes a few seconds
  • The other side sees a frozen frame for a while
  • Firefox asks for permissions every time we ask for user media
  • Clicking the button twice in a short period of time will result in call fail

Improvements that could be made

  • We might be able to listen for onmute on the other side to avoid showing frozen frames
  • We could just remove and add tracks and keep the stream locally - Firefox wouldn't need to ask for permissions

I'll try to make these changes later

Thoughts

It feels like this could be acceptable for video but it seems unusable for audio. With audio, I would imagine we want this to be pretty quick.

@SimonBrandner
Copy link
Contributor Author

SimonBrandner commented Apr 12, 2021

Test 2: Non-dumb SDP (js-sdk branch, react-sdk branch)

Benefits

  • The other side doesn't need any additional info
  • Less prone to Signalling failed: M_LIMIT_EXCEEDED: Too Many Requests

Problems

  • It takes a few seconds
  • Spamming the button will result in Signalling failed: M_LIMIT_EXCEEDED: Too Many Requests
  • Needs separate handling for muting locally

Thoughts

It feels like this could be acceptable for video but it seems unusable for audio. With audio, I would imagine we want this to be pretty quick.

I also don't really know what I was thinking on Friday

@dbkr
Copy link
Member

dbkr commented Jul 21, 2021

Thanks for doing the tests. Agreed we'd want it to be fairly fast. Also: would using sendonly / recvonly start conflicting with hold status? This makes me more inclined to think that a dedicated event may be the way to go.

@SimonBrandner
Copy link
Contributor Author

Also: would using sendonly / recvonly start conflicting with hold status?

I don't think it necessarily would - with muting it would always be the opposite though I might be overlooking something.

Right now I think it might be worth trying out to do this the SDP way for video which would solve a portion of the problem we currently have and proceed from there

@SimonBrandner
Copy link
Contributor Author

SimonBrandner commented Jul 21, 2021

Anyway, we're probably going to need a special event anyway (at least for audio), I should have some time tomorrow to draft an MSC

(I actually might already have something somewhere, I just need to find it 😄 )

@SimonBrandner
Copy link
Contributor Author

There is probably no reason to keep this open now with the MSC and implementations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-VoIP P1 T-Enhancement T-Task Tasks for the team like planning
Projects
None yet
Development

No branches or pull requests

2 participants