Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2516: Add a new message type for voice messages #2516

Closed
wants to merge 9 commits into from

Conversation

ludwigbald
Copy link

@ludwigbald ludwigbald commented Apr 27, 2020

Rendered

superceded by #3245

Signed-off by Ludwig Bald, hallo at ludwigbald.com

@ludwigbald ludwigbald changed the title Introduce a new message type for voice messages MSC2516: Add a new message type for voice messages Apr 27, 2020
@uhoreg uhoreg added kind:feature MSC for not-core and not-maintenance stuff proposal A matrix spec change proposal labels Apr 27, 2020
@uhoreg
Copy link
Member

uhoreg commented Apr 27, 2020

For reference: a previous attempt at defining voice messages (written before we had MSCs).

Copy link
Member

@uhoreg uhoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. It seems like a fairly straightforward proposal.

As a general nit-picky comment, your lines are wrapped inconsistently -- some are short, and some are long. Please try to wrap them at ~80 characters.

proposals/2516-new-type-for-voice-messages.md Outdated Show resolved Hide resolved
proposals/2516-new-type-for-voice-messages.md Outdated Show resolved Hide resolved
proposals/2516-new-type-for-voice-messages.md Show resolved Hide resolved
typo

Co-Authored-By: Hubert Chathi <hubert@uhoreg.ca>
@ludwigbald
Copy link
Author

I still have not completely figured out if it is best to strictly follow WhatsApp's example here, even from a UX perspective. I am going to collect some more examples on how different apps handle this.

I'm also not sure where the semantics should be documented, i.e. how a voice message is expected to be recorded or rendered or auto-downloaded. That should be somewhat standardised across clients. Is the Client-Server-API spec the right place for that?

@ludwigbald
Copy link
Author

ludwigbald commented Apr 28, 2020

For reference: a previous attempt at defining voice messages (written before we had MSCs).

I searched around a bit and included links to related issues in the proposal.

@ludwigbald
Copy link
Author

As I said, here are a few examples on how voice messages differ from audio files in other Apps:

  • Whatsapp marks voice messages with the profile picture of the sender and a little microphone. They are also automatically downloaded. If you forward a voice message, it becomes an audio clip.
    WhatsApp
  • Telegram shows the waveform for voice messages, title and artist for songs, and nothing for generic audio files. If you forward a voice message, it stays a voice message with a reference to the original source.
    image
  • Facebook Messenger, Snapchat and Instagram Direct don't offer an easy way to send audio files, but voice messages are a thing. When forwarding, they stay voice messages.

@uhoreg
Copy link
Member

uhoreg commented Apr 28, 2020

Let's try this again...

For reference: a previous attempt at defining voice messages (written before we had MSCs): #310

@uhoreg
Copy link
Member

uhoreg commented Apr 28, 2020

I'm also not sure where the semantics should be documented, i.e. how a voice message is expected to be recorded or rendered or auto-downloaded. That should be somewhat standardised across clients. Is the Client-Server-API spec the right place for that?

In general, we try not to mandate specifics of how clients must render content. I think it's OK to make suggestions, but since there are a variety of different clients using different interfaces, some UI may not make sense for some clients. (For example, if we said that a voice message must have a play button that the user could click on, this would not make sense for a text-based client that doesn't use the mouse.)

But in general, feel free to just put these UI suggestions in the MSC for now, and we'll worry about where the right place to put them when it's time to put it in the spec itself.

@ludwigbald
Copy link
Author

Nico (@deepbluev7:neko.dev) offered:

Afaik m.audio was always meant for voice messages. If you want to send an audio file, you would send it as m.file

This is not how the spec currently reads (the example for m.audio is a song), but it would only require a minor spec change.

At least Riot Web sends an m.audio event whenever an audio file is sent. You can't easily send an audio file as m.file right now.

Added sentence about what it means to be a voice message. & used shorter line breaks
@turt2live turt2live mentioned this pull request Apr 28, 2020
@ludwigbald
Copy link
Author

ludwigbald commented May 5, 2020

It would definitely make sense to roll the m.typing question from #310 into this one.

@JuniorJPDJ
Copy link

IMO Flag is the way to go, it's still audio file and introducing new event type is not necessary, especially that you said those would basically be the same.
I would also keep base m.audio treatment as "I'm audio message", not "I'm a voice message" - I like Telegram's approach for music files, those are being treated differently than other files and it makes them streamable and playable directly in client.

@yajo
Copy link

yajo commented Nov 25, 2020

I think a specific voice audio type is important to let bridges do their magic, so voice messages appear as native in both ends of a bridge. It might even require renaming or reencoding the file on some cases.

@yajo
Copy link

yajo commented Nov 25, 2020

Notice that both Whatsapp and Telegram distinguish between " read" (I saw I have a voice message) and "heard" (I listened to it). I think it is an important feature to be added here too.

@@ -0,0 +1,66 @@
# Add a seperate message type for voice messages

In the matrix spec right now, there is a message type `m.audio` for audio files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live turt2live added this to Awaiting SCT input in Spec Core Team Backlog Mar 16, 2021
@penn5
Copy link
Contributor

penn5 commented Mar 16, 2021

This is looking very plausible now, although I'm going to stay out of the m.voice / m.audio debate: there are enough cooks in this kitchen. I agree some thought about a 'listened' state could be useful though (I suspect it could just be a relation message rather than another type of read-receipt: it seems to map to the semantics better?). @penn5 - you downvoted: can you tell us why?

I don't find it useful in Telegram. I guess it would be fine if there's a way to ignore the listen state in the client. I find it really annoying in Telegram that the @ can't be dismissed without listening.

@turt2live
Copy link
Member

Notifications control is certainly something we'd take a look at in another MSC. This MSC should cause the voice message to just be an unread message rather than a notification (by default) though.

@penn5
Copy link
Contributor

penn5 commented Mar 16, 2021

Notifications control is certainly something we'd take a look at in another MSC. This MSC should cause the voice message to just be an unread message rather than a notification (by default) though.

The point is a frequently don't listen to voice messages, and there would have to be a way to dismiss without listening. Of course if it's out of scope just ignore me. But overall I just don't really care about that stuff.

@turt2live
Copy link
Member

That would be more of a client UX problem than a spec issue. The spec just defines how to throw the voice message over the network - it's up to the clients to decide how best to render that (with whatever dismiss functionality, etc)

I propose to introduce a new message type `m.voice` with the same
contents as `m.audio`.
Voice messages MUST be OGG files, Opus encoded. Other files can be
sent as `m.audio`or `m.file`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought here: can we add something about audio format, ie sample rate / channel count here? I think not mandating one is probably fine, but if so we should make it clear that clients should expect anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw the more I think about this the more I draw a conclusion that the spec shouldn't care about a mandated set of values, but it should probably recommend some sane defaults (for those who just want to whack in some libraries and call it good). Clients expecting anything is fairly on-par with the latest directions of Matrix, anyhow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbkr #3052 has that feature.

proposals/2516-new-type-for-voice-messages.md Show resolved Hide resolved
Comment on lines +55 to +58
@uhoreg offers:
> Auto-downloading of files (if clients follow WhatsApp's example) sounds
like it could be a security issue. (e.g. DoS by using up users' bandwidth,
could cause malicious content to be automatically downloaded)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a concern.

  • The client should consider the size before deciding to download or cut off the download past an "acceptable size" (AKA cache the prefix of the file) if bandwidth usage is a concern. This can be applied to all media content.
  • Most client will already auto-download things like images, so if the payload can be triggered just by downloading this adds no new attack surface. (Assuming that the file isn't decoded before play is pressed).


I propose to introduce a new message type `m.voice` with the same
contents as `m.audio`.
Voice messages MUST be OGG files, Opus encoded. Other files can be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to wonder about the choice of ogg here: it looks like webm might be edging out ogg as a more widely supported container format, plus all the common browsers can produce it natively (chrome can't mux into ogg). Would be good to at least note why we picked ogg.

Copy link
Contributor

@jryans jryans Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer something more easily accessible on client platforms, which indeed seems to be WebM, at least for the web browser case. (I assume for native platforms, the various containers are roughly identical in implementation complexity.)

What's the reason for selecting Ogg? Could we use WebM instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like us to stick with opus for several reasons, some of which are already discussed in #matrix-spec (and I thought translated here, but apparently not):

  1. File size is small in nearly every case.
  2. Playback is supported on all modern environments - anything not modern is unlikely to be using voice messages (IoT) or will be unable to do other things anyways.
  3. It's what all the other platforms do, making it easy/trivial for compatibility. If we had to transcode like bridges already do for video then we've somewhat failed to maintain our interoperability flag. (Video is transcoded because remote networks use insane formats, but opus/ogg isn't that insane)

...and a couple more less important reasons that hurt to type on a phone, but that's the gist of it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I'm just questioning the container format, not the codec, ie. Opus in WebM, so the file size would be basically the same. Ogg does seem to be what other messaging platforms use though, so yeah, bridges would have to remux.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, fair enough. We could probably get away with webm though I think we'd also have to have a good reason for going against the grain, imo

This could be solved by having clients handle auto-download responsibly,
e.g. only auto-download voice messages from trusted contacts.

## Unstable prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftr Element Web is going to use org.matrix.experimental.* to see how this MSC can benefit from something like #1767

ludwigbald and others added 2 commits April 25, 2021 15:37
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
@@ -0,0 +1,66 @@
# Add a separate message type for voice messages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having worked with this for a while now, it seems it would in fact be best to go with an extensible events format on top of m.audio. This might block the MSC behind extensible events, but the intention would be to land with an m.voice event type that has content containing m.message, m.audio, and m.voice. As a fallback, implementations would use a regular m.room.message event for msgtype: "m.audio" and "m.voice": {} in the content (org.matrix.msc2516.voice during unstable implementation).

Does this sound sane? If it's too different from what you're comfortable with, let me know and I can open a new MSC to describe it in detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've since split it out: #3245

@turt2live turt2live moved this from Awaiting SCT input to Temp column 001 in Spec Core Team Backlog Jun 8, 2021
@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021
@jtrees
Copy link

jtrees commented Aug 23, 2021

What's the status here? Looks like something has already been implemented and released for Element Android. Is there more to be done for the spec?

@tulir
Copy link
Member

tulir commented Aug 23, 2021

@jtrees The Element devs decided that this proposal had too many issues and decided to do something else (#3245)

@turt2live
Copy link
Member

*Matrix devs. The MSC is done from the perspective of the foundation.

@jtrees
Copy link

jtrees commented Aug 24, 2021

@jtrees The Element devs decided that this proposal had too many issues and decided to do something else (#3245)

Ah thanks. Does that make this proposal obsolete then? Can the PR be closed?

@ludwigbald
Copy link
Author

I'm closing this MSC, it is superceded by #3245. I'm confident that the extensible events approach works at least as well as the one I proposed.

I started this one almost a year ago, and soon felt in over my head. I am not involved with matrix development at all, so I didn't really know how to move this one forward. I'm glad that we're doing voice messages right, and I hope this MSC helped more than it delayed things. It sure was an interesting experience for me to dabble in the MSC process, so thanks for sticking with me! :)

@ludwigbald ludwigbald closed this Dec 27, 2021
@turt2live
Copy link
Member

Thank you for the MSC :)

As a first MSC, this one is well on the better side - it definitely helped lead towards an answer of some kind.

@turt2live turt2live added obsolete A proposal which has been overtaken by other proposals and removed proposal-in-review labels Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. obsolete A proposal which has been overtaken by other proposals proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet