MSC2516: Add a new message type for voice messages #2516

ludwigbald · 2020-04-27T11:54:39Z

superceded by #3245

Signed-off by Ludwig Bald, hallo at ludwigbald.com

uhoreg · 2020-04-27T23:34:25Z

For reference: a previous attempt at defining voice messages (written before we had MSCs).

uhoreg

Thank you for your contribution. It seems like a fairly straightforward proposal.

As a general nit-picky comment, your lines are wrapped inconsistently -- some are short, and some are long. Please try to wrap them at ~80 characters.

proposals/2516-new-type-for-voice-messages.md

typo Co-Authored-By: Hubert Chathi <hubert@uhoreg.ca>

ludwigbald · 2020-04-28T09:14:36Z

I still have not completely figured out if it is best to strictly follow WhatsApp's example here, even from a UX perspective. I am going to collect some more examples on how different apps handle this.

I'm also not sure where the semantics should be documented, i.e. how a voice message is expected to be recorded or rendered or auto-downloaded. That should be somewhat standardised across clients. Is the Client-Server-API spec the right place for that?

ludwigbald · 2020-04-28T12:52:29Z

For reference: a previous attempt at defining voice messages (written before we had MSCs).

I searched around a bit and included links to related issues in the proposal.

Included a few comments from the PR

Added links.

ludwigbald · 2020-04-28T13:40:09Z

As I said, here are a few examples on how voice messages differ from audio files in other Apps:

Whatsapp marks voice messages with the profile picture of the sender and a little microphone. They are also automatically downloaded. If you forward a voice message, it becomes an audio clip.
Telegram shows the waveform for voice messages, title and artist for songs, and nothing for generic audio files. If you forward a voice message, it stays a voice message with a reference to the original source.
Facebook Messenger, Snapchat and Instagram Direct don't offer an easy way to send audio files, but voice messages are a thing. When forwarding, they stay voice messages.

uhoreg · 2020-04-28T15:13:01Z

Let's try this again...

For reference: a previous attempt at defining voice messages (written before we had MSCs): #310

uhoreg · 2020-04-28T15:22:45Z

I'm also not sure where the semantics should be documented, i.e. how a voice message is expected to be recorded or rendered or auto-downloaded. That should be somewhat standardised across clients. Is the Client-Server-API spec the right place for that?

In general, we try not to mandate specifics of how clients must render content. I think it's OK to make suggestions, but since there are a variety of different clients using different interfaces, some UI may not make sense for some clients. (For example, if we said that a voice message must have a play button that the user could click on, this would not make sense for a text-based client that doesn't use the mouse.)

But in general, feel free to just put these UI suggestions in the MSC for now, and we'll worry about where the right place to put them when it's time to put it in the spec itself.

ludwigbald · 2020-04-28T16:21:26Z

Nico (@deepbluev7:neko.dev) offered:

Afaik m.audio was always meant for voice messages. If you want to send an audio file, you would send it as m.file

This is not how the spec currently reads (the example for m.audio is a song), but it would only require a minor spec change.

At least Riot Web sends an m.audio event whenever an audio file is sent. You can't easily send an audio file as m.file right now.

Added sentence about what it means to be a voice message. & used shorter line breaks

ludwigbald · 2020-05-05T15:22:59Z

It would definitely make sense to roll the m.typing question from #310 into this one.

JuniorJPDJ · 2020-08-18T04:43:36Z

IMO Flag is the way to go, it's still audio file and introducing new event type is not necessary, especially that you said those would basically be the same.
I would also keep base m.audio treatment as "I'm audio message", not "I'm a voice message" - I like Telegram's approach for music files, those are being treated differently than other files and it makes them streamable and playable directly in client.

proposals/2516-new-type-for-voice-messages.md

yajo · 2020-11-25T18:16:39Z

I think a specific voice audio type is important to let bridges do their magic, so voice messages appear as native in both ends of a bridge. It might even require renaming or reencoding the file on some cases.

yajo · 2020-11-25T18:20:00Z

Notice that both Whatsapp and Telegram distinguish between " read" (I saw I have a voice message) and "heard" (I listened to it). I think it is an important feature to be added here too.

proposals/2516-new-type-for-voice-messages.md

richvdh · 2021-03-16T16:50:01Z

proposals/2516-new-type-for-voice-messages.md

@@ -0,0 +1,66 @@
+# Add a seperate message type for voice messages
+
+In the matrix spec right now, there is a message type `m.audio` for audio files.


link to https://matrix.org/docs/spec/client_server/r0.6.1#m-audio would be helpful here.

penn5 · 2021-03-16T19:10:21Z

This is looking very plausible now, although I'm going to stay out of the m.voice / m.audio debate: there are enough cooks in this kitchen. I agree some thought about a 'listened' state could be useful though (I suspect it could just be a relation message rather than another type of read-receipt: it seems to map to the semantics better?). @penn5 - you downvoted: can you tell us why?

I don't find it useful in Telegram. I guess it would be fine if there's a way to ignore the listen state in the client. I find it really annoying in Telegram that the @ can't be dismissed without listening.

turt2live · 2021-03-16T19:25:06Z

Notifications control is certainly something we'd take a look at in another MSC. This MSC should cause the voice message to just be an unread message rather than a notification (by default) though.

penn5 · 2021-03-16T19:30:34Z

Notifications control is certainly something we'd take a look at in another MSC. This MSC should cause the voice message to just be an unread message rather than a notification (by default) though.

The point is a frequently don't listen to voice messages, and there would have to be a way to dismiss without listening. Of course if it's out of scope just ignore me. But overall I just don't really care about that stuff.

turt2live · 2021-03-16T19:38:43Z

That would be more of a client UX problem than a spec issue. The spec just defines how to throw the voice message over the network - it's up to the clients to decide how best to render that (with whatever dismiss functionality, etc)

dbkr · 2021-03-19T15:56:34Z

proposals/2516-new-type-for-voice-messages.md

+I propose to introduce a new message type `m.voice` with the same
+contents as `m.audio`.
+Voice messages MUST be OGG files, Opus encoded. Other files can be 
+sent as `m.audio`or `m.file`.


Another thought here: can we add something about audio format, ie sample rate / channel count here? I think not mandating one is probably fine, but if so we should make it clear that clients should expect anything.

fwiw the more I think about this the more I draw a conclusion that the spec shouldn't care about a mandated set of values, but it should probably recommend some sane defaults (for those who just want to whack in some libraries and call it good). Clients expecting anything is fairly on-par with the latest directions of Matrix, anyhow.

@dbkr #3052 has that feature.

proposals/2516-new-type-for-voice-messages.md

kevincox · 2021-03-20T13:50:08Z

proposals/2516-new-type-for-voice-messages.md

+@uhoreg offers:
+> Auto-downloading of files (if clients follow WhatsApp's example) sounds
+like it could be a security issue. (e.g. DoS by using up users' bandwidth,
+could cause malicious content to be automatically downloaded)


I don't think this is a concern.

The client should consider the size before deciding to download or cut off the download past an "acceptable size" (AKA cache the prefix of the file) if bandwidth usage is a concern. This can be applied to all media content.

Most client will already auto-download things like images, so if the payload can be triggered just by downloading this adds no new attack surface. (Assuming that the file isn't decoded before play is pressed).

dbkr · 2021-03-23T10:39:51Z

proposals/2516-new-type-for-voice-messages.md

+
+I propose to introduce a new message type `m.voice` with the same
+contents as `m.audio`.
+Voice messages MUST be OGG files, Opus encoded. Other files can be 


I'm starting to wonder about the choice of ogg here: it looks like webm might be edging out ogg as a more widely supported container format, plus all the common browsers can produce it natively (chrome can't mux into ogg). Would be good to at least note why we picked ogg.

I would also prefer something more easily accessible on client platforms, which indeed seems to be WebM, at least for the web browser case. (I assume for native platforms, the various containers are roughly identical in implementation complexity.)

What's the reason for selecting Ogg? Could we use WebM instead?

I'd really like us to stick with opus for several reasons, some of which are already discussed in #matrix-spec (and I thought translated here, but apparently not):

File size is small in nearly every case.

Playback is supported on all modern environments - anything not modern is unlikely to be using voice messages (IoT) or will be unable to do other things anyways.

It's what all the other platforms do, making it easy/trivial for compatibility. If we had to transcode like bridges already do for video then we've somewhat failed to maintain our interoperability flag. (Video is transcoded because remote networks use insane formats, but opus/ogg isn't that insane)

...and a couple more less important reasons that hurt to type on a phone, but that's the gist of it

To be clear: I'm just questioning the container format, not the codec, ie. Opus in WebM, so the file size would be basically the same. Ogg does seem to be what other messaging platforms use though, so yeah, bridges would have to remux.

ah, fair enough. We could probably get away with webm though I think we'd also have to have a good reason for going against the grain, imo

turt2live · 2021-04-15T18:37:48Z

proposals/2516-new-type-for-voice-messages.md

+This could be solved by having clients handle auto-download responsibly,
+e.g. only auto-download voice messages from trusted contacts.
+
+## Unstable prefix


ftr Element Web is going to use org.matrix.experimental.* to see how this MSC can benefit from something like #1767

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>

turt2live · 2021-05-03T16:16:14Z

proposals/2516-new-type-for-voice-messages.md

@@ -0,0 +1,66 @@
+# Add a separate message type for voice messages


Having worked with this for a while now, it seems it would in fact be best to go with an extensible events format on top of m.audio. This might block the MSC behind extensible events, but the intention would be to land with an m.voice event type that has content containing m.message, m.audio, and m.voice. As a fallback, implementations would use a regular m.room.message event for msgtype: "m.audio" and "m.voice": {} in the content (org.matrix.msc2516.voice during unstable implementation).

Does this sound sane? If it's too different from what you're comfortable with, let me know and I can open a new MSC to describe it in detail.

I've since split it out: #3245

jtrees · 2021-08-23T09:19:02Z

What's the status here? Looks like something has already been implemented and released for Element Android. Is there more to be done for the spec?

tulir · 2021-08-23T23:43:44Z

@jtrees The Element devs decided that this proposal had too many issues and decided to do something else (#3245)

turt2live · 2021-08-24T00:16:55Z

*Matrix devs. The MSC is done from the perspective of the foundation.

jtrees · 2021-08-24T10:45:21Z

@jtrees The Element devs decided that this proposal had too many issues and decided to do something else (#3245)

Ah thanks. Does that make this proposal obsolete then? Can the PR be closed?

ludwigbald · 2021-12-27T18:53:35Z

I'm closing this MSC, it is superceded by #3245. I'm confident that the extensible events approach works at least as well as the one I proposed.

I started this one almost a year ago, and soon felt in over my head. I am not involved with matrix development at all, so I didn't really know how to move this one forward. I'm glad that we're doing voice messages right, and I hope this MSC helped more than it delayed things. It sure was an interesting experience for me to dabble in the MSC process, so thanks for sticking with me! :)

turt2live · 2021-12-27T18:55:50Z

Thank you for the MSC :)

As a first MSC, this one is well on the better side - it definitely helped lead towards an answer of some kind.

ludwigbald added 2 commits April 27, 2020 13:53

Create wip.md

3a7d67f

Rename wip.md to 2516-new-type-for-voice-messages.md

4607d6d

ludwigbald changed the title ~~Introduce a new message type for voice messages~~ MSC2516: Add a new message type for voice messages Apr 27, 2020

uhoreg added kind:feature MSC for not-core and not-maintenance stuff proposal A matrix spec change proposal labels Apr 27, 2020

turt2live self-requested a review April 27, 2020 23:37

turt2live added the proposal-in-review label Apr 27, 2020

uhoreg reviewed Apr 27, 2020

View reviewed changes

proposals/2516-new-type-for-voice-messages.md Outdated Show resolved Hide resolved

proposals/2516-new-type-for-voice-messages.md Outdated Show resolved Hide resolved

proposals/2516-new-type-for-voice-messages.md Show resolved Hide resolved

Update proposals/2516-new-type-for-voice-messages.md

a85b80d

typo Co-Authored-By: Hubert Chathi <hubert@uhoreg.ca>

ludwigbald added 2 commits April 28, 2020 15:03

Update 2516-new-type-for-voice-messages.md

e93cf14

Included a few comments from the PR

Update 2516-new-type-for-voice-messages.md

87ac8dd

Added links.

Update 2516-new-type-for-voice-messages.md

442b900

Added sentence about what it means to be a voice message. & used shorter line breaks

turt2live mentioned this pull request Apr 28, 2020

Voice messages #1238

Closed

dbkr reviewed Sep 25, 2020

View reviewed changes

proposals/2516-new-type-for-voice-messages.md Show resolved Hide resolved

dbkr reviewed Sep 25, 2020

View reviewed changes

proposals/2516-new-type-for-voice-messages.md Show resolved Hide resolved

dbkr reviewed Sep 25, 2020

View reviewed changes

proposals/2516-new-type-for-voice-messages.md Show resolved Hide resolved

turt2live removed their request for review November 10, 2020 05:28

This was referenced Nov 24, 2020

Send and play voice messages element-hq/element-android#29

Closed

Button to record audio snippets and send them as audio events (voice messages) element-hq/element-web#1358

Closed

richvdh reviewed Mar 16, 2021

View reviewed changes

proposals/2516-new-type-for-voice-messages.md Outdated Show resolved Hide resolved

richvdh reviewed Mar 16, 2021

View reviewed changes

proposals/2516-new-type-for-voice-messages.md Outdated Show resolved Hide resolved

richvdh reviewed Mar 16, 2021

View reviewed changes

turt2live added this to Awaiting SCT input in Spec Core Team Backlog Mar 16, 2021

giomfo mentioned this pull request Mar 16, 2021

Voice Messages - Hold and send mode element-hq/element-android#3009

Closed

5 tasks

dbkr reviewed Mar 19, 2021

View reviewed changes

kevincox reviewed Mar 20, 2021

View reviewed changes

dbkr reviewed Mar 23, 2021

View reviewed changes

turt2live mentioned this pull request Mar 23, 2021

Labs feature: Early implementation of voice messages matrix-org/matrix-react-sdk#5769

Merged

DoM1niC mentioned this pull request Mar 25, 2021

[Feature] Support new Voice Messages from Matrix <-> Element Clients mautrix/whatsapp#286

Closed

turt2live reviewed Apr 15, 2021

View reviewed changes

ludwigbald and others added 2 commits April 25, 2021 15:37

typo

30100fa

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>

typo: Update proposals/2516-new-type-for-voice-messages.md

19d4d5c

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>

turt2live reviewed May 3, 2021

View reviewed changes

turt2live moved this from Awaiting SCT input to Temp column 001 in Spec Core Team Backlog Jun 8, 2021

turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021

turt2live force-pushed the old_master branch from e895827 to dca99ee Compare August 30, 2021 22:34

ludwigbald closed this Dec 27, 2021

turt2live added obsolete A proposal which has been overtaken by other proposals and removed proposal-in-review labels Dec 27, 2021

		@@ -0,0 +1,66 @@
		# Add a seperate message type for voice messages

		In the matrix spec right now, there is a message type `m.audio` for audio files.

		@@ -0,0 +1,66 @@
		# Add a separate message type for voice messages

MSC2516: Add a new message type for voice messages #2516

MSC2516: Add a new message type for voice messages #2516

Conversation

ludwigbald commented Apr 27, 2020 • edited Loading

uhoreg commented Apr 27, 2020

uhoreg left a comment

Choose a reason for hiding this comment

ludwigbald commented Apr 28, 2020

ludwigbald commented Apr 28, 2020 • edited Loading

ludwigbald commented Apr 28, 2020

uhoreg commented Apr 28, 2020

uhoreg commented Apr 28, 2020

ludwigbald commented Apr 28, 2020

ludwigbald commented May 5, 2020 • edited Loading

JuniorJPDJ commented Aug 18, 2020

yajo commented Nov 25, 2020

yajo commented Nov 25, 2020

Choose a reason for hiding this comment

penn5 commented Mar 16, 2021

turt2live commented Mar 16, 2021

penn5 commented Mar 16, 2021

turt2live commented Mar 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jryans Mar 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtrees commented Aug 23, 2021

tulir commented Aug 23, 2021

turt2live commented Aug 24, 2021

jtrees commented Aug 24, 2021

ludwigbald commented Dec 27, 2021

turt2live commented Dec 27, 2021

ludwigbald commented Apr 27, 2020 •

edited

Loading

ludwigbald commented Apr 28, 2020 •

edited

Loading

ludwigbald commented May 5, 2020 •

edited

Loading

jryans Mar 23, 2021 •

edited

Loading