Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2746: Improved VoIP Signalling #2746

Merged
merged 81 commits into from Apr 28, 2023
Merged
Changes from 39 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
cf50137
Placeholder for reliable VoIP MSC
dbkr Aug 21, 2020
a9b17fc
First version written up
dbkr Aug 21, 2020
37c1f98
Typo
dbkr Aug 24, 2020
5156caf
Typo
dbkr Aug 24, 2020
fe8b1eb
Switch to `m.call.select_answer`
dbkr Aug 24, 2020
6c4a077
Make self-calling possible
dbkr Aug 25, 2020
25ed29a
Nobody spotted the deliberate typo
dbkr Sep 3, 2020
bec62ab
Fixes & clarifications from Brendan
dbkr Sep 4, 2020
019bcdd
answers ID -> party ID
dbkr Sep 4, 2020
66179f1
clarify party_id
dbkr Sep 4, 2020
9e8c829
require that the client tries to decrypt all events before ringing
dbkr Sep 4, 2020
e224af3
not all of these necessary's were necessary
dbkr Sep 4, 2020
2561820
Apply suggestions from code review
dbkr Sep 11, 2020
63cecd1
line break
dbkr Sep 11, 2020
c6f6ca1
workaround markdown being awful
dbkr Sep 11, 2020
e9fe3af
specify grammar for IDs
dbkr Sep 11, 2020
563dba5
document why not mandate the same device IDs
dbkr Sep 11, 2020
070451e
rejection is about what the caller sees, not what's been sent
dbkr Sep 11, 2020
7bca76e
Explain use of the age field
dbkr Sep 11, 2020
e361fa9
Clarify party_id / user_id tuple in negotiate events
dbkr Sep 15, 2020
722ee0d
Require end-of-candidates candidate
dbkr Sep 17, 2020
8e76616
Add alternatives note for trickle ICE discovery mechansim
dbkr Sep 17, 2020
7e742d4
add that chrome spits out `icegatheringstatechange`
dbkr Sep 21, 2020
8da4b7c
clients must accept string version
dbkr Oct 14, 2020
18200d0
Add text on unstable prefixng (and how/why we aren't)
dbkr Oct 20, 2020
9e22601
Specify what happens when someone leaves the room
dbkr Oct 20, 2020
a446629
Rejig m.call.negotiate
dbkr Oct 22, 2020
1326a22
Explain politness & glare in a simpler way (I hope)
dbkr Oct 23, 2020
a669828
Add note on why we don't allow for ICE before an answer.
dbkr Oct 26, 2020
0fa0770
Define WebRTC track & stream configs for calls
dbkr Oct 26, 2020
599ad3c
select_answer was missing a version
dbkr Dec 3, 2020
6478f9d
Fix old type
dbkr Dec 3, 2020
b62f842
Clarfy that whatever codecs webrtc say is what goes
dbkr Feb 15, 2021
9156c80
Typos
dbkr Feb 15, 2021
834bc3b
Typo
dbkr Mar 4, 2021
ec2c7fe
only allow the number zero as numeric version
dbkr Mar 4, 2021
a572eb8
Update 2746-reliable-voip.md
ara4n Apr 6, 2021
6592023
Add user_busy hangup / reject reason
dbkr May 26, 2021
996adab
Add capability for DTMF
dbkr Jun 21, 2021
91428c8
Be clear about versions
SimonBrandner Jul 11, 2022
e42dd41
Clarify clients must respond to `m.call.negotiate`
SimonBrandner Jul 11, 2022
29485c4
Give `m.call.negotiate` a version
SimonBrandner Jul 11, 2022
f69ae72
Remove repeated words
SimonBrandner Jul 11, 2022
51e02b2
Be clearer about types
SimonBrandner Jul 11, 2022
312ffe5
Avoid defining call types
SimonBrandner Jul 11, 2022
4617af5
Specify minimal `lifetime`
SimonBrandner Jul 11, 2022
289fb3f
Use MSC1597 grammar for call / party IDs.
dbkr Nov 8, 2022
1392eae
Add more rationale around voip event version
dbkr Feb 6, 2023
46bfbde
Change advice for calls in public rooms.
dbkr Feb 6, 2023
9dcda02
Typo
dbkr Feb 6, 2023
6dc85a8
Clarify reject/hangup sending
dbkr Feb 6, 2023
f138bfe
Merge branch 'dbkr/msc2746' of github.com:matrix-org/matrix-spec-prop…
dbkr Feb 6, 2023
2fd97c9
Clarify hangup reason backwards compat
dbkr Feb 6, 2023
04eaee2
Clarify party ID
dbkr Feb 6, 2023
c52a845
There is no sender field.
dbkr Feb 6, 2023
6dcf65c
Require ignoring negotiates not matching party ID
dbkr Feb 6, 2023
859cf6d
Word negotiate events better
dbkr Feb 6, 2023
340e769
Don't forget the txn ID is returned by the send call.
dbkr Feb 6, 2023
8be57ed
Enumerate all the current VoIP events in 'version' section.
dbkr Feb 6, 2023
a09be95
Clarify treatment of version numeric 1.
dbkr Feb 6, 2023
5d38f15
Clarify that track/stream layout is new.
dbkr Feb 6, 2023
3162911
Link to m.call.invite
dbkr Feb 6, 2023
097fa58
Suggestions from richvdh
dbkr Feb 6, 2023
4eaa0b4
More suggestions from richvdh
dbkr Feb 6, 2023
db5ca80
More suggestions from richvdh
dbkr Feb 6, 2023
48527fc
Clarify call invite
dbkr Feb 6, 2023
312cdf7
Pluralise
dbkr Feb 6, 2023
8286d10
Reflect that MSC1597 hasn't landed yet.
dbkr Feb 6, 2023
a5e963f
Politeness only applies to renegotiation
dbkr Feb 6, 2023
b09af73
s/Mandate/define/
dbkr Feb 6, 2023
1fc6b37
Remove DTMF capability section to move tov MSC2747.
dbkr Feb 6, 2023
c9f0574
Clarify backwards compat
dbkr Mar 28, 2023
0880475
Grammar
dbkr Mar 28, 2023
e45c1e0
Fix quotes
dbkr Mar 28, 2023
bdf9639
Typo
dbkr Mar 28, 2023
082f216
Remove sentence that I think is now just redundant
dbkr Mar 28, 2023
ce0e338
Clarify mre on type field
dbkr Mar 28, 2023
2919112
Clarify end-of-candidates
dbkr Mar 28, 2023
c949b32
Add comma
dbkr Mar 29, 2023
7d8d527
Update 2746-reliable-voip.md (#3992)
richvdh Apr 5, 2023
3925586
wording changes
anoadragon453 Apr 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
313 changes: 313 additions & 0 deletions proposals/2746-reliable-voip.md
@@ -0,0 +1,313 @@
# MSC2746: Improved Signalling for 1:1 VoIP
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is an MSC which touches event schemas while Extensible Events is on the battlefield, just a heads up that a v3 of call events is somewhat on the horizon to make them legal in extensible event-supported rooms. I don't think this MSC needs to do anything specific to solve the conflict (unless you want it to, but that means putting an even longer delay on it landing), but implementations of calls should be aware that calls will be changing again (sorry).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is probably fine. I think we would want this version of calls defined in a version of the spec in any case.


Historically, Matrix has basic support for signalling 1:1 WebRTC calls which suffer a number of shortcomings:

* If several devices try to answer the same call, there is no way for them to determine clearly
that the caller has set up the call with a different device, and no way for the caller to
determine which candidate events map to which answer.
* Hangup reasons are often incorrect.
* There is confusion and no clear guidance on how clients should determine whether an incoming
invite is stale or not.
* There is no support for renegotiation of SDP, for changing ICE candidates / hold/resume
functionality, etc.
* There is no distinction between rejecting a call and ending it, which means that in trying
to reject a call, a client can inadvertantly cause a call that has been sucessfully set up
on a different device to be hung up.

## Proposal
### Change the `version` field in all VoIP events to `1`
This will be used to determine whether determine whether devices support this new version of the protocol.
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
If clients see events with `version` other than `0` or `1`, they should treat these the same as if they had
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we treat 3 as version 1? Won't that cause issues in the future? Why do we do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is that newer versions will have their own MSC. This is describing what clients should do should they encounter an unknown version, 0 and 1 being the only known versions in this context.

Copy link
Contributor

@deepbluev7 deepbluev7 Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it currently reads like version 3 should be treated as 1, which sounds wrong. If a client implements it that way now, you can't make a new version ever, but I guess that is not the intention here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New versions mean implementation changes - in which case you can also change what it considers to be the most recent known version from 1 to something else. It's a bit like with room versions: currently a homeserver will refuse to create/join a room in a version it doesn't know about, but that doesn't mean it will never be able to support this new version. The only difference here is the fallback; in the case of a room version the server refuses to perform the action, in the case of a call the client substitutes it with the most recent known version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So future versions will be compatible enough, that this doesn't cause issues?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, version 3 should be treated identically to 1 by anything implementing this spec. Something implementing a future spec may treat it differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like it would cause issues, whenever we actually need to make a breaking change. How would that work? Shouldn't clients instead negotiate the lowest common version instead of treating every newer version as an older version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to make a breaking change, probably just easier to move to a whole new set of event types.

`version` == `1`. In addition, clients must accept either the number `0` or a string for the value of the `version`
dbkr marked this conversation as resolved.
Show resolved Hide resolved
field, in order to allow for namespaced versions in the future.

### Define the configurations of WebRTC streams and tracks in each call type
We define that:
* A voice call has at least one track of kind 'audio' in the first stream
* A video call has at least one track of kind 'video' and at least one track of kind 'audio' in the first stream
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
Clients implementing this specification use the first stream and will ignore any streamless tracks. Note that
in the Javascript WebRTC API, this means `addTrack()` must be passed two parameters: a track and a stream,
not just a track, and in a video call the stream must be the same for both audio and video track.

A client may send other streams and tracks but the behaviour of the other party with respect to presenting
such streams and tracks is undefined.

This follows the existing known implementations of v0 VoIP.

### Add `invitee` field to `m.call.invite`
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
dbkr marked this conversation as resolved.
Show resolved Hide resolved
This allows for the following use cases:
* Placing a call to a specifc user in a room where other users are also present.
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
* Placing a call to oneself.

The field should be added for all invites where the target is a specific user. Invites without an `invitee`
dbkr marked this conversation as resolved.
Show resolved Hide resolved
dbkr marked this conversation as resolved.
Show resolved Hide resolved
field are defined to be intended for any member of the room other than the sender of the event. Clients
should consider an incoming call if they see a non-expired invite event where the `invitee` field is either
dbkr marked this conversation as resolved.
Show resolved Hide resolved
absent or equal to their user's Matrix ID, however they should evaluate whether or not to ring based on their
user's trust relationship with the caller, eg. ignoring call invites from users in public rooms that they have
richvdh marked this conversation as resolved.
Show resolved Hide resolved
no other connection with. As a starting point, it is suggested that clients ring for any call invite from a user
that they have a direct message room with. It is strongly recommended that when clients do not ring for an
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
incoming call invite, they still display the invite in the room and annotate that it was ignored.
richvdh marked this conversation as resolved.
Show resolved Hide resolved

### Add `party_id` to all VoIP events
babolivier marked this conversation as resolved.
Show resolved Hide resolved
Whenever a client first participates in a new call, it generates a `party_id` for itself to use for the
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
duration of the call. This needs to be long enough that the chance of a collision between multiple devices
both generating an answer at the same time generating the same party ID is vanishingly small: 8 uppercase +
lowercase alphanumeric characters is recommended. Parties in the call are identified by the tuple of
`(user_id, party_id)`.

The client adds a `party_id` field containing this ID alongside the `user_id` field to all VoIP events it sends on the
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
call. Clients use this to identify remote echo of their own events, since a user may now call themselves,
they can no longer ignore events from their own user. This field also identifies different answers sent
by different clients to an invite, and matches `m.call.candidate` events to their respective answer/invite.
richvdh marked this conversation as resolved.
Show resolved Hide resolved

A client implementation may choose to use the device ID used in end-to-end cryptography for this purpose,
or it may choose, for example, to use a different one for each call to avoid lekaing information on which
dbkr marked this conversation as resolved.
Show resolved Hide resolved
devices were used in a call (in an unencrypted room) or if a single device (ie. access token were used to
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
send signalling for more than one call party.

dbkr marked this conversation as resolved.
Show resolved Hide resolved
### Introduce `m.call.select_answer`
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
This event is sent by the caller's client once it has chosen an answer. Its
`selected_party_id` field indicates the answer it's chosen (and has `call_id`
and its own `party_id` too). If the callee's client sees a `select_answer` for an answer
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
with party ID other than the one it sent, it ends the call and informs the user the call
was answered elsewhere. It does not send any events. Media can start flowing
before this event is seen or even sent. Clients that implement previous
versions of this specification will ignore this event and behave as they did
before.

Example:
```
{
"type": "m.call.select_answer",
"content": {
"version": 1,
"call_id": "12345",
"party_id": "67890",
"selected_party_id": "111213",
},
dbkr marked this conversation as resolved.
Show resolved Hide resolved
}
```

### Introduce `m.call.reject`

* If the `m.call.invite` event has `version` `1`, a client wishing to reject the call
sends an `m.call.reject` event. This rejects the call on all devices, but if the calling
device sees an accept, it disregards the reject event and carries on. The reject has a
dbkr marked this conversation as resolved.
Show resolved Hide resolved
`party_id` just like an answer, and the caller sends a `select_answer` for it just like an
answer. If the other client that had already sent an answer sees the caller select the
reject response instead of its answer, it ends the call.
dbkr marked this conversation as resolved.
Show resolved Hide resolved
dbkr marked this conversation as resolved.
Show resolved Hide resolved
* If the `m.call.invite` event has `version` `0`, the callee sends an `m.call.hangup` event before.
uhoreg marked this conversation as resolved.
Show resolved Hide resolved

Example:
```
{
"type": "m.call.reject",
"content" : {
"version": 1,
"call_id": "12345",
"party_id": "67890",
}
}
```

If the calling user chooses to end the call before setup is complete, the client sends `m.call.hangup`
as previously.

### Clarify what actions a client may take in response to an invite
The client may:
* Attempt to accept the call by sending an answer
dbkr marked this conversation as resolved.
Show resolved Hide resolved
* Actively reject the call everywhere: reject the call as per above, which will stop the call from
dbkr marked this conversation as resolved.
Show resolved Hide resolved
ringing on all the user's devices and the caller's client will inform them that the user has
rejected their call.
* Ignore the call: send no events, but stop alerting the user about the call. The user's other
devices will continue to ring, and the caller's device will continue to indicate that the call
is ringing, and will time the call out in the normal way if no other device responds.

### Introduce more reason codes to `m.call.hangup`
dbkr marked this conversation as resolved.
Show resolved Hide resolved
* `ice_timeout`: The connection failed after some media was exchanged (as opposed to current
`ice_failed` which means no media connection could be established). Note that, in the case of
an ICE renegotiation, a client should be sure to send `ice_timeout` rather than `ice_failed` if
media had previously been received successfully, even if the ICE renegotiation itself failed.
* `user_hangup`: Clients must now send this code when the user chooses to end the call, although
for backwards compatability, a clients should treat an absence of the `reason` field as
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
`user_hangup`.
* `user_media_failed`: The client was unable to start capturing media in such a way as it is unable
dbkr marked this conversation as resolved.
Show resolved Hide resolved
to continue the call.
* `user_busy`: The user is busy. Note that this exists primarily for bridging to other networks such
as the PSTN. A Matrix client that receives a call whilst already in a call would not generally reject
the new call unless the user had specifically chosen to do so.
* `unknown_error`: Some other failure occurred that meant the client was unable to continue the call
rather than the user choosing to end it.

### Introduce `m.call.negotiate`
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
This introduces SDP negotiation semantics for media pause, hold/resume, ICE restarts and voice/video
call up/downgrading. Clients should implement & honour hold functionality as per WebRTC's
recommendation: https://www.w3.org/TR/webrtc/#hold-functionality

If both the invite event and the accepted answer event have `version` equal to `1`, either party may
send `m.call.negotiate` with a `description` field to offer new SDP to the other party. This event has
`call_id` with the ID of the call and `party_id` equal to the client's party ID for that call.
The caller ignores any negotiate events with `party_id` + `user_id` tuple not equal to that of the
answer it accepted. Clients should use the `party_id` field to ignore the remote echo of their
dbkr marked this conversation as resolved.
Show resolved Hide resolved
own negotiate events.

This has a `lifetime` field as in `m.call.invite`, after which the sender of the negotiate event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Matrix, I think it'd be better if lifetimes (both in m.call.negotiate and m.call.invite events) were replaced by absolute timestamps for when the call should time out. That way we don't rely on an arbitrary field set by the server to determine when the call should expire, and we have a single source of truth for when this expiration should happen (i.e. the content set by the client) rather than hoping the client and the server agree on the time (which they often do, but if they don't then it can get complicated). The concern then becomes that clients can get out-of-sync time-wise but in my experience client terminals (i.e. desktops, mobiles, tablets, etc) are much more likely to be time-synced out of the box than servers, so I'm not sure it should be that big of a concern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we might also have covered this on Matrix at the time, but this is basically designed to avoid ever assuming the client clocks will be synced, but assuming those on the HSes are correct (at least within a second or so). It is somewhat interesting that since clients are usually on consumer devices which are managed by the device / OS vendor who sets up NTP, they're now often more likely to have correct clocks than servers which are subject to the server admin forgetting to start ntpd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@babolivier, what's your view on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this copies what has already been done in a different place, I'd suggest sticking with this for now and rethinking it in either a new MSC or if we ultimately phase this out in favour of MSC3401.

should consider the negotiation failed (timed out) and the recipient should ignore it.
dbkr marked this conversation as resolved.
Show resolved Hide resolved

The `description` field is the same as the `offer` field in in `m.call.invite` and `answer`
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
field in `m.call.answer` and is an `RTCSessionDescriptionInit` object as per
https://www.w3.org/TR/webrtc/#dom-rtcsessiondescriptioninit.

Example:
```
{
"type": "m.call.negotiate",
"content": {
"call_id": "12345",
"party_id": "67890",
"lifetime": 10000,
"description": {
"sdp": "[some sdp]",
"type": "offer",
},
}
}
```

This MSC also proposes clarifying the `m.call.invite` and `m.call.answer` events to state that
the `offer` and `answer` fields respectively are objects of type `RTCSessionDescriptionInit`
(and hence the `type` field, whilst redundant in these events, is included for ease of working
with the WebRTC API).
richvdh marked this conversation as resolved.
Show resolved Hide resolved

### Designate one party as 'polite'
In line with WebRTC perfect negotiation (https://w3c.github.io/webrtc-pc/#perfect-negotiation-example)
we introduce rules to establish which party is polite. The callee is always the polite party. In a
glare situation, the politenes of a party is therefore determined by whether the inbound or outbound
richvdh marked this conversation as resolved.
Show resolved Hide resolved
call is used: if a client discards its outbound call in favour of an inbound call, it becomes the polite
party.

### Add explicit recommendations for call event liveness.
`m.call.invite` contains a `lifetime` field that indicates how long the offer is valid for. When
a client receives an invite, it should use the event's `age` field in the sync response plus the
time since it received the event from the homeserver to determine whether the invite is still valid.
The use of the `age` field ensures that incorrect clocks on client devices don't break calls.
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
richvdh marked this conversation as resolved.
Show resolved Hide resolved
If the invite is still valid *and will remain valid for long enough for the user to accept the call*,
it should signal an incoming call. The amount of time allowed for the user to accept the call may
vary between clients, for example, it may be longer on a locked mobile device than on an unlocked
dbkr marked this conversation as resolved.
Show resolved Hide resolved
desktop device.

The client should only signal an incoming call in a given room once it has completed processing the
entire sync response and, for encrypted rooms, attempted to decrypt all encrypted events in the
sync response for that room. This ensures that if the sync response contains subsequent events that
indicate the call has been hung up, rejected, or answered elsewhere, the client does not signal it.

If on startup, after processing locally stored events, the client determines that there is an invite
that is still valid, it should still signal it but only after it has completed a sync from the homeserver.

### Introduce recommendations for batching of ICE candidates
Clients should aim to send a small number of candidate events, with guidelines:
* ICE candidates which can be discovered immediately or almost immediately in the invite/answer
event itself (eg. host candidates). If server reflexive or relay candiates can be gathered in
a sufficiently short period of time, these should be sent here too. A delay of around 200ms is
suggested as a starting point.
* The client should then allow some time for further candidates to be gathered in order to batch them,
rather than sending each candidate as it arrives. A starting point of 2 seconds after sending the
invite or 500ms after sending the answer is suggested as starting point (since a delay is natural
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
anyway after the invite whilst the client waits for the user to accept it).

### Mandate the end-of-candidates candidate
Mandate that an ICE candidate whose value is the empty string must be sent in an m.call.candidates
richvdh marked this conversation as resolved.
Show resolved Hide resolved
richvdh marked this conversation as resolved.
Show resolved Hide resolved
message to signal that no more ICE candidates will be sent. The WebRTC spec requires browsers to
generate such a candidate, however note that at time of writing, not all browsers do (Chrome does
not, but does generate an `icegatheringstatechange` event). The client should send any remaining
candidates once candidate generation finishes, ignoring timeouts above.

This allows bridges to batch the candidates together when bridging to protocols that don't support
trickle ICE.

### Add DTMF
Add that Matrix clients can send DTMF as specified by WebRTC. The WebRTC standard as of August
2020 does not support receiving DTMF but a Matrix client can receive and interpret the DTMF sent
in the RTP payload.
dbkr marked this conversation as resolved.
Show resolved Hide resolved
richvdh marked this conversation as resolved.
Show resolved Hide resolved

We also add a capability to the `capabilities` section of invites and answers (detailed in
[MSC2747](https://github.com/matrix-org/matrix-doc/pull/2747) called `m.call.dtmf`. Clients
dbkr marked this conversation as resolved.
Show resolved Hide resolved
should only display UI for sending DTMF during a call if the other party advertises this
capability (boolean value `true`).
dbkr marked this conversation as resolved.
Show resolved Hide resolved

### Specify exact grammar for VoIP IDs
dbkr marked this conversation as resolved.
Show resolved Hide resolved
`call_id`s and the newly introduced `party_id` are explicitly defined to be up to 32 characters
from the set of `A-Z` `a-z` `0-9` `.-_`.
richvdh marked this conversation as resolved.
Show resolved Hide resolved

### Specify behaviour on room leave
If the client sees the party it is in a call with leave the room, the client should treat this
dbkr marked this conversation as resolved.
Show resolved Hide resolved
as a hangup event for any calls that are in progress. No specific requirement is given for the
situation where a client has sent an invite and the invitee leaves the room, but the client may
wish to treat it as a rejection if there are no more users in the room who could answer the call
(eg. the user is now alone or the `invitee` field was set on the invite).

The same behaviour applies when a client is looking at historic calls.

### Clarify that supported codecs should follow the WebRTC spec
The Matrix spec does not mandate particular audio or video codecs, but instead defers to the
WebRTC spec. A compliant matrix VoIP client will behave in the same way as a supported 'browser'
in terms of what codecs it supports and what variants thereof. The latest WebRTC specification
applies, so clients should keep up to date with new versions of the WebRTC specification whether
or not there have been any changes to the Matrix spec.

## Potential issues
* The ability to call yourself makes the protocol a little more complex for clients to implement,
and is somewhat of a special case. However, some of the necessary additions are also required for
other features so this MSC elects to make it possible.
* Clients must make a decision on whether to ring for any given call: defining this in the spec
would be cumbersome and would limit clients' ability to use reputation-based systems for this
decision in the future. However, having a call ring on one client and not the other because one
had categorised it as a junk call and not the other would be confusing for the user.

## Alternatives
* This MSC does not allow for ICE negotiation before the user chooses to answer the call. This can
make call setup faster by allowing connectivity to be established whilst the call is ringing. This
is problematic with Matrix since any device or user could answer the call, so it is not known which
device is going to answer before the user chooses to answer. It would also leak information on which
of a user's devices were online.
* We could define that the ID of a call is implcitly the event IDs of the invite event rather than
having a specific `call_id` field. This would mean that a client would be unable to know the ID of
a call before the remote echo of the invite came back, which could complicate implementations.
dbkr marked this conversation as resolved.
Show resolved Hide resolved
There is probably no compelling reason to change this.
* `m.call.select_answer` was chosen such that its name reflect the intention of the event. `m.call.ack`
is more succinct and mirrors SIP, but this MSC opts for the more descriptive name.
* This MSC elects to allow invites without an `invitee` field to mean a call for anyone in the room.
This could be useful for hunt group style semantics where an incoming call causes many different
users' phones to ring and any one of them may pick up the call. This does mean clients will need
to not blindly ring for any call invites in any room, since this would make unsolicited calls in
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
easy in public rooms. We could opt to leave this out, or make it more explicit with a specific value
for the `invitee` field.
* `party_id` is one of many potential solutions: callees could add `answer_id`s to their events and
callers could be identified by the lack of an `answer_id`. An explicit field on every event may be
easier to comprehend, less error-prone and clearer in the backwards-compatibility scenario.
* We could make `party_id`s more prescriptive, eg. the caller could always have a `party_id` of the
empty string, the word `caller` or equal to the `call_id`, which may make debugging simpler.
* To allow for bridging into protocols that don't support trickle ICE, this proposal requires that
clients send an empty candidate to signal the end of candidates. This means it will be up to bridges
to buffer the invite and edit the SDP to add the candidates once they arrive, adding complexity to
bridges. The alternative would be a discovery mechanism so clients could know whether a callee supports
trickle ICE before calling, and disable it if so. This would add complexity to every Matrix client as
well as having to assume that all current clients did not, disabling trickle ICE everywhere until clients
support the discovery mechanism. The mechanism would also have to be per-user which would make sense for
bridged users, but not where some of a users devices support trickle ICE and some do not.

## Security considerations
* IP addresses remain in the room in candidates, as they did in the previous version of the spec.
This is not ideal, but alternatives were either sending candidates over to-device messages
(would slow down call setup because a target device would have to be established before sending
candidates) or redacting them afterwards (the volume of events sent during calls can already
cause rate limiting issues and this would exacerbate this).
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
* Clients must take care to not ring for any call, as per the 'alternatives' section.

## Unstable prefix
Since VoIP events already have a 'version' field, we would ideally use a string, namespaced version during
development, but this field is defined to be an int in version 0. This MSC proposes changing the version
field to a string so that this namespacing can be used for future changes. Since there is no other easy way
SimonBrandner marked this conversation as resolved.
Show resolved Hide resolved
to namespace events whilst in development and ensure interoperability, we have chosen not to use an unstable
prefix for this change, on the understanding that in future we will be able to use the string `version` field
for the unstable prefix.