MSC3552: Extensible Events - Images and Stickers#3552
Conversation
|
I want to bring to your attention that imo Stickers should also allow video/mp4, video/webm etc as gif replacements. My point is, I'm not sure if Images and Stickers should be lumped together. |
|
@HarHarLinks please use threads on the diff for feedback to be considered. Accepting videos would be an entirely different MSC, I think. |
| Note that there is no `m.sticker` event in the content: this is because the primary type accurately | ||
| describes the event and no further metadata is needed. In future, if the specification requires | ||
| sticker-specific metadata to be added (like which pack it came from), this would likely appear under | ||
| an `m.sticker` object in the event content. |
There was a problem hiding this comment.
MSC1767 already specifies "m.emote": {} and "m.notice": {} in the event content for extensibility purposes:
In the case an event needs to fallback to
m.emoteorm.notice, the appropriate type can be included in the event content
Shouldn't that apply here too? Otherwise there's no way to tell clients that a custom event should fall back to a sticker.
This comment was marked as duplicate.
This comment was marked as duplicate.
| @@ -0,0 +1,144 @@ | |||
| # MSC3552: Extensible Events - Images and Stickers | |||
There was a problem hiding this comment.
@maltee1 says:
What about sending multiple images in a single message with a single caption? I imagine people would want to send an "album" rather than cluttering the timeline with individual messages, if they have lots of related pictures to share. It's also something that other platforms support (I know signal does) and would improve bridging if it's available in matrix.
There was a problem hiding this comment.
Events in Matrix are intended to represent a single unit of information - it's already a bit questionable to have a caption on an image in the same event, but there are slightly more pros than cons for doing so (currently). Albums would instead be represented by a series of image events, linked together using relationships of some kind, potentially with an "information event" to store things like the caption.
A relationship-based system would allow for richer support too: adding images/videos after the fact, mixed content, content from other senders, edits, better organization (moving images between albums), etc. Representing the whole thing as one event gets complicated to manage at a technical level.
There was a problem hiding this comment.
Excuse my ignorance, but wouldn't splitting up an album in several events lead to the same problems that splitting up caption+imagine into two events would have? For example a bridge not knowing how long to wait for related events to appear before bridging.
There was a problem hiding this comment.
Some of the problems are shared, though from looking at it in the past it always felt like albums should be distinct events.
Regardless, albums would be handled by another MSC (this MSC defines an image format and is tightly scoped to that).
There was a problem hiding this comment.
For what it's worth, I see value in adding caption+image in the same event from an accessibility standpoint (see: alt text in HTML/some social media platforms)
This comment was marked as duplicate.
This comment was marked as duplicate.
| @@ -0,0 +1,144 @@ | |||
| # MSC3552: Extensible Events - Images and Stickers | |||
There was a problem hiding this comment.
@maltee1 says:
Another question: Does it make sense to offer an image in several resolutions rather than distinguishing between a thumbnail and the full image? I know of at least one platform (WeChat) that offers sending and receiving images in two resolutions, one that should be sufficient for most phone screens and "full size", which is just the original image. I don't know exactly how the protocol works, but I suppose it also uses thumbnails, which brings the number of resolutions to 3. The current proposal allows for several thumbnails, which could be used to represent that, but defining an image as a thumbnail implies that it shouldn't be used to show full-screen, even if the resolution may be sufficient. Instead, we should possibly not distinguish a "thumbnail" but simply offer several resolutions as desired (with the suggestion that one of them is thumbnail-sized) and have clients pick the one they want to show for a given purpose. Depending on available image resolution and expected bandwidth a thumbnail might even be unnecessary but we want to include a high-res version. The current proposal is not designed for that.
There was a problem hiding this comment.
In short, a 1080p thumbnail wouldn't be unreasonable to include here. Some clients already rely on "high quality" thumbnails, which would be represented here.
Clients are already asked to find the thumbnail that works for them.
As an aside, please use comments on the diff - comments not on the diff are likely to get ignored.
There was a problem hiding this comment.
Thanks for your patience, I will reply to the diff from now on.
If clients are expected to go through all available image sizes and pick the one that suits a particular purpose, isn't it just complicating implementations to include all sizes but one in a single list and keep the largest size separate? What's the meaning of "thumbnail" in that case?
There was a problem hiding this comment.
It's mostly to deliminate "original copy" and "machine-resized" images.
| Using [MSC1767](https://github.com/matrix-org/matrix-doc/pull/1767)'s system, a new event type | ||
| is introduced to describe applicable functionality: `m.image`. This event type is simply an image | ||
| upload, akin to the now-legacy [`m.image` `msgtype` from `m.room.message`](https://spec.matrix.org/v1.1/client-server-api/#mimage). |
There was a problem hiding this comment.
In my view, creating a different event type is less "extensible". With this proposal, it is possible to add a text caption to an existing m.image event by replacing (editing) it and adding the caption, but it is not possible to add an image to an existing m.message text event, since you cannot replace the type of an event. This doesn't make any sense. Text messages and image messages should both be candidates for being edited into a text+image message.
I think it would make more sense if the m.message event was more extensible and allowed the inclusion of text, images, or both.
Is there any rationale for making it a different message type?
| "m.text": [ | ||
| // Format of the fallback is not defined, but should have enough information for a text-only | ||
| // client to do something with the image, just like with plain file uploads. | ||
| {"body": "matrix.png (12 KB) https://example.org/_matrix/media/v3/download/example.org/abcd1234"} | ||
| ], |
There was a problem hiding this comment.
given media is now authenticated, particularly this link no longer makes sense as a workaround to be able to access the image (assuming this were replaced with the new API). clients would still need to understand it to supply an access token etc when rendering it as a link etc to really be useful.
instead, I would expect that even clients that cannot display images (e.g. cli) would implement the image type and generate something useful to "display" the image, and in reality implementations for this in sdks will be commonplace and trivial (to the required extent) even if not.
in reality, at least the intended use of alt text matches the intention of the text fallback much better: next to some alternative way to access the actual image, the image description is the closest thing to getting the actual image content in text form. whether this is realistically how the alt text will be used is another question (the one currently in this case is not a great example, though we also can only imagine the image it is supposedly referencing).
in terms of fallback for the whole image event, it nowadays probably makes the most sense if clients populated it from a combination of filename, alt text, and caption.
| "m.text": [{"body": "Look at this cool Matrix logo"}] | ||
| }, | ||
| "m.alt_text": { // optional - accessibility consideration for image | ||
| "m.text": [{"body": "matrix logo"}] |
There was a problem hiding this comment.
| "m.text": [{"body": "matrix logo"}] | |
| "m.text": [{"body": "The Matrix logo, consisting of black, bold, sans serif lowercase letters, framed by two thin square brackets, alluding to customary matrix notation in mathematics."}] |
| blocks, however as per the extensible events system, receivers which understand image events should not | ||
| honour them. | ||
|
|
||
| To represent stickers, we instead use a mixin on `m.image_details`. A new (optional) boolean field |
There was a problem hiding this comment.
I think mixin terminology here is incorrect or at least confusing given its definition in MSC1767: it has a meaning when added parallel to content blocks in any event,
instead here m.sticker is added inside a single specific content block, i.e. it is simply an optional attribute to that content block.
In don't really understand either design decision:
- why the sticker property is in image details
- why the thing marking something as sticker isn't the event type - this seems to generally be the pattern of extensible events
There was a problem hiding this comment.
One thing that would be nice is to be able to send animated stickers as videos, so it might be easier to do with a mixin?
| "width": 640, | ||
| "height": 480 | ||
| }, | ||
| "m.thumbnail": [ // optional |
There was a problem hiding this comment.
I believe this array should be able to include blurhashes or similar
There was a problem hiding this comment.
The MSC2448 you're referencing (i.e. you're assuming it is accepted before extensible events) puts the blurhash in what I guess is equivalent to the image details block. I suppose this is due to being an afterthought and missing multi thumbnail support in the old event schema.
I agree with this proposal: a blurhash is functionally equivalent to a minimum resolution thumbnail.
I don't spontaneously see a solution I am entirely happy with. The best I can come up with is to define the thumbnail array as containing either m.file+m.image blocks or a new m.blurhash block which simply contains the blurhash string. Given the existing discussion about an alternative algorithm it may be preferable if this block was extensible, i.e. define a "thumbnail-as-hash" block (name?) and include a "type": "blurhash" rather than specific blocks.
However we don't have blurhashes (or variants thereof) yet.
|
|
||
| With consideration for extensible events, the following content blocks are defined: | ||
|
|
||
| * `m.image_details` - Currently records width and height (both required, in pixels), but in |
There was a problem hiding this comment.
should include the optional is_animated flag from #4230
in light of this i'm not so convinced width and height should be required; having only the is_animated flag is already useful info.
| }, | ||
| // ... | ||
| ], | ||
| "m.caption": { // optional - goes above/below image |
There was a problem hiding this comment.
I had this idea of having audio captions. Basically, it would be just an audio content block instead of a text block (and maybe still a text block as a fallback).
I guess this may be too much for one MSC, but does this sound desirable?
Rendered
Blocked by #1767
Blocked by #3551
Preview: https://pr3552--matrix-org-previews.netlify.app