Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multi-stream VoIP #1735

Merged
merged 13 commits into from
Mar 19, 2024
1 change: 1 addition & 0 deletions changelogs/client_server/newsfragments/1735.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add support for multi-stream VoIP, as per [MSC3077](https://github.com/matrix-org/matrix-spec-proposals/pull/3077).
40 changes: 28 additions & 12 deletions content/client-server-api/modules/voip_events.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,18 +171,34 @@ In response to an incoming invite, a client may do one of several things:

##### Streams
richvdh marked this conversation as resolved.
Show resolved Hide resolved
richvdh marked this conversation as resolved.
Show resolved Hide resolved

Clients are expected to send one stream with one track of kind `audio` (creating a
voice call). They can optionally send a second track in the same stream of kind
`video` (creating a video call).

Clients implementing this specification use the first stream and will ignore
any streamless tracks. Note that in the JavaScript WebRTC API, this means
`addTrack()` must be passed two parameters: a track and a stream, not just a
track, and in a video call the stream must be the same for both audio and video
track.

A client may send other streams and tracks but the behaviour of the other party
with respect to presenting such streams and tracks is undefined.
Clients may send more than one stream in a VoIP call. Metadata can be included in
the [`m.call.invite`](/client-server-api/#mcallinvite), [`m.call.answer`](/client-server-api/#mcallanswer)
and [`m.call.negotiate`](/client-server-api/#mcallnegotiate) events to describe
the streams being sent, using the `sdp_stream_metadata` property.

`sdp_stream_metadata` maps from the `id` of a stream in the session description,
to metadata about that stream. Currently only one property is defined for the
metadata. This is `purpose`, which should be a string indicating the purpose of
the stream. The following `purpose`s are defined:

* `m.usermedia` - stream that contains the webcam and/or microphone tracks
* `m.screenshare` - stream with the screen-sharing tracks

If an incoming stream is not described in `sdp_stream_metadata` and
`sdp_stream_metadata` is present, the stream should be ignored. If a stream has
a `purpose` of an unknown type (i.e. not `m.usermedia` or `m.screenshare`), it
should be ignored.

Clients implementing this specification will ignore any streamless tracks. Note
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richvdh said:

I know this is existing text, but it now seems even more confusing than before, so I think we need to clarify.

  1. What is the mechanism by which streamless tracks are ignored? Or is this an instruction to clients to ignore such tracks?
  2. what is that to do with addTrack's parameters?
  3. is it related to sdp_stream_metadata? if not, it should probably not be in the middle of the text about it?

Copy link
Contributor Author

@zecakeh zecakeh Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I believe it's the job of the client implementation to ignore streamless tracks. So the will probably needs to be changed to a MUST, like in other places in the spec.
  2. I believe that the addTrack note is a bit of help for client implementation. For context it refers to RTCPeerConnection: addTrack(). It's a method that takes a track and any number of streams. If there is no stream provided, it creates a streamless track, which is why the method should be called with a track and a stream. In my opinion the note could be removed.
  3. Even if the whole section talks a lot about sdp_stream_metadata, it is about streams, and it just happens that it is the only paragraph left that doesn't mention the field. It could be placed at the top or bottom of the section if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Indeed. Can you change will to MUST (or, maybe more readably, should)?
  2. Right. Yes. I think we should remove it.
  3. Yes, I think it would make more sense at the end of the section, after the note on backwards-compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

that in the JavaScript WebRTC API, this means `addTrack()` must be passed two
parameters: a track and a stream, not just a track, and in a video call the
stream must be the same for both audio and video track.

For backwards compatibility, if `sdp_stream_metadata` is not present in the
initial [`m.call.invite`](/client-server-api/#mcallinvite) or [`m.call.answer`](/client-server-api/#mcallanswer)
event sent by the other party, the client should ignore any new incoming streams
(i.e. it should use the first one) and it shouldn't send more than one stream
(i.e. clients cannot send a video feed and a screenshare at the same time).

##### Invitees
The `invitee` field should be added whenever the call is intended for one
Expand Down
8 changes: 8 additions & 0 deletions data/event-schemas/examples/m.call.answer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@
"answer": {
"type" : "answer",
"sdp" : "v=0\r\no=- 6584580628695956864 2 IN IP4 127.0.0.1[...]"
},
"sdp_stream_metadata": {
"271828182845": {
"purpose": "m.screenshare"
},
"314159265358": {
"purpose": "m.usermedia"
}
}
}
}
8 changes: 8 additions & 0 deletions data/event-schemas/examples/m.call.invite.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@
"offer": {
"type" : "offer",
"sdp" : "v=0\r\no=- 6584580628695956864 2 IN IP4 127.0.0.1[...]"
},
"sdp_stream_metadata": {
"271828182845": {
"purpose": "m.screenshare"
},
"314159265358": {
"purpose": "m.usermedia"
}
}
}
}
8 changes: 8 additions & 0 deletions data/event-schemas/examples/m.call.negotiate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@
"description": {
"type" : "offer",
"sdp" : "v=0\r\no=- 6584580628695956864 2 IN IP4 127.0.0.1[...]"
},
"sdp_stream_metadata": {
"271828182845": {
"purpose": "m.screenshare"
},
"314159265358": {
"purpose": "m.usermedia"
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
type: object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this belongs in core-event-schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Since it doesn't seem to belong directly in schemas either, because it only contains full event schemas, I took the liberty to create a components directory (named like this for similarity with the property that includes reusable bits in the OpenAPI spec)

x-addedInMatrixVersion: "1.10"
description: |-
Metadata describing the [streams](/client-server-api/#streams) that will be
sent.

This is a map of stream ID to metadata about the stream.
additionalProperties:
type: object
title: StreamMetadata
description: Metadata describing a stream.
properties:
purpose:
type: string
enum:
- m.usermedia
- m.screenshare
description: |-
The purpose of the stream.

The possible values are:

* `m.usermedia` - stream that contains the webcam and/or microphone
tracks
* `m.screenshare` - stream with the screen-sharing tracks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `m.usermedia` - stream that contains the webcam and/or microphone
tracks
* `m.screenshare` - stream with the screen-sharing tracks
* `m.usermedia`: Stream that contains the webcam and/or microphone
tracks.
* `m.screenshare`: Stream with the screen-sharing tracks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied

required:
- purpose
3 changes: 3 additions & 0 deletions data/event-schemas/schema/m.call.answer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@
}
},
"required": ["type", "sdp"]
},
"sdp_stream_metadata": {
"$ref": "core-event-schema/sdp_stream_metadata.yaml"
}
},
"required": ["answer"]
Expand Down
5 changes: 4 additions & 1 deletion data/event-schemas/schema/m.call.invite.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,10 @@
"invitee": {
"type": "string",
"description": "The ID of the user being called. If omitted, any user in the room can answer.",
"x-addedInMatrixVersion": "1.7",
"x-addedInMatrixVersion": "1.7"
},
"sdp_stream_metadata": {
"$ref": "core-event-schema/sdp_stream_metadata.yaml"
}
},
"required": ["offer", "lifetime"]
Expand Down
2 changes: 2 additions & 0 deletions data/event-schemas/schema/m.call.negotiate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ properties:
type: integer
description: The time in milliseconds that the negotiation is valid for.
Once the negotiation age exceeds this value, clients should discard it.
sdp_stream_metadata:
$ref: core-event-schema/sdp_stream_metadata.yaml
required:
- description
- lifetime
Expand Down
Loading