Skip to content

MSC3552: Extensible Events - Images and Stickers#3552

Open
turt2live wants to merge 8 commits into
mainfrom
travis/msc/extev/images
Open

MSC3552: Extensible Events - Images and Stickers#3552
turt2live wants to merge 8 commits into
mainfrom
travis/msc/extev/images

Conversation

@turt2live

@turt2live turt2live commented Dec 7, 2021

Copy link
Copy Markdown
Member

@turt2live turt2live changed the title Extensible Events - Images and Stickers MSC2552: Extensible Events - Images and Stickers Dec 7, 2021
@turt2live turt2live added kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal. Process state. proposal-in-review labels Dec 7, 2021
Comment thread proposals/3552-extensible-events-images.md Outdated
@jryans jryans changed the title MSC2552: Extensible Events - Images and Stickers MSC3552: Extensible Events - Images and Stickers Dec 10, 2021
@HarHarLinks

Copy link
Copy Markdown
Contributor

I want to bring to your attention that imo Stickers should also allow video/mp4, video/webm etc as gif replacements. My point is, I'm not sure if Images and Stickers should be lumped together.

@turt2live

Copy link
Copy Markdown
Member Author

@HarHarLinks please use threads on the diff for feedback to be considered.

Accepting videos would be an entirely different MSC, I think.

Comment thread proposals/3552-extensible-events-images.md
Comment thread proposals/3552-extensible-events-images.md
Note that there is no `m.sticker` event in the content: this is because the primary type accurately
describes the event and no further metadata is needed. In future, if the specification requires
sticker-specific metadata to be added (like which pack it came from), this would likely appear under
an `m.sticker` object in the event content.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSC1767 already specifies "m.emote": {} and "m.notice": {} in the event content for extensibility purposes:

In the case an event needs to fallback to m.emote or m.notice, the appropriate type can be included in the event content

Shouldn't that apply here too? Otherwise there's no way to tell clients that a custom event should fall back to a sticker.

@maltee1

This comment was marked as duplicate.

@@ -0,0 +1,144 @@
# MSC3552: Extensible Events - Images and Stickers

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maltee1 says:

What about sending multiple images in a single message with a single caption? I imagine people would want to send an "album" rather than cluttering the timeline with individual messages, if they have lots of related pictures to share. It's also something that other platforms support (I know signal does) and would improve bridging if it's available in matrix.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events in Matrix are intended to represent a single unit of information - it's already a bit questionable to have a caption on an image in the same event, but there are slightly more pros than cons for doing so (currently). Albums would instead be represented by a series of image events, linked together using relationships of some kind, potentially with an "information event" to store things like the caption.

A relationship-based system would allow for richer support too: adding images/videos after the fact, mixed content, content from other senders, edits, better organization (moving images between albums), etc. Representing the whole thing as one event gets complicated to manage at a technical level.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse my ignorance, but wouldn't splitting up an album in several events lead to the same problems that splitting up caption+imagine into two events would have? For example a bridge not knowing how long to wait for related events to appear before bridging.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the problems are shared, though from looking at it in the past it always felt like albums should be distinct events.

Regardless, albums would be handled by another MSC (this MSC defines an image format and is tightly scoped to that).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I see value in adding caption+image in the same event from an accessibility standpoint (see: alt text in HTML/some social media platforms)

@maltee1

This comment was marked as duplicate.

@@ -0,0 +1,144 @@
# MSC3552: Extensible Events - Images and Stickers

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maltee1 says:

Another question: Does it make sense to offer an image in several resolutions rather than distinguishing between a thumbnail and the full image? I know of at least one platform (WeChat) that offers sending and receiving images in two resolutions, one that should be sufficient for most phone screens and "full size", which is just the original image. I don't know exactly how the protocol works, but I suppose it also uses thumbnails, which brings the number of resolutions to 3. The current proposal allows for several thumbnails, which could be used to represent that, but defining an image as a thumbnail implies that it shouldn't be used to show full-screen, even if the resolution may be sufficient. Instead, we should possibly not distinguish a "thumbnail" but simply offer several resolutions as desired (with the suggestion that one of them is thumbnail-sized) and have clients pick the one they want to show for a given purpose. Depending on available image resolution and expected bandwidth a thumbnail might even be unnecessary but we want to include a high-res version. The current proposal is not designed for that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, a 1080p thumbnail wouldn't be unreasonable to include here. Some clients already rely on "high quality" thumbnails, which would be represented here.

Clients are already asked to find the thumbnail that works for them.


As an aside, please use comments on the diff - comments not on the diff are likely to get ignored.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience, I will reply to the diff from now on.
If clients are expected to go through all available image sizes and pick the one that suits a particular purpose, isn't it just complicating implementations to include all sizes but one in a single list and keep the largest size separate? What's the meaning of "thumbnail" in that case?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly to deliminate "original copy" and "machine-resized" images.

Comment on lines +15 to +17
Using [MSC1767](https://github.com/matrix-org/matrix-doc/pull/1767)'s system, a new event type
is introduced to describe applicable functionality: `m.image`. This event type is simply an image
upload, akin to the now-legacy [`m.image` `msgtype` from `m.room.message`](https://spec.matrix.org/v1.1/client-server-api/#mimage).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, creating a different event type is less "extensible". With this proposal, it is possible to add a text caption to an existing m.image event by replacing (editing) it and adding the caption, but it is not possible to add an image to an existing m.message text event, since you cannot replace the type of an event. This doesn't make any sense. Text messages and image messages should both be candidates for being edited into a text+image message.

I think it would make more sense if the m.message event was more extensible and allowed the inclusion of text, images, or both.

Is there any rationale for making it a different message type?

Comment on lines +25 to +29
"m.text": [
// Format of the fallback is not defined, but should have enough information for a text-only
// client to do something with the image, just like with plain file uploads.
{"body": "matrix.png (12 KB) https://example.org/_matrix/media/v3/download/example.org/abcd1234"}
],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given media is now authenticated, particularly this link no longer makes sense as a workaround to be able to access the image (assuming this were replaced with the new API). clients would still need to understand it to supply an access token etc when rendering it as a link etc to really be useful.
instead, I would expect that even clients that cannot display images (e.g. cli) would implement the image type and generate something useful to "display" the image, and in reality implementations for this in sdks will be commonplace and trivial (to the required extent) even if not.

in reality, at least the intended use of alt text matches the intention of the text fallback much better: next to some alternative way to access the actual image, the image description is the closest thing to getting the actual image content in text form. whether this is realistically how the alt text will be used is another question (the one currently in this case is not a great example, though we also can only imagine the image it is supposedly referencing).

in terms of fallback for the whole image event, it nowadays probably makes the most sense if clients populated it from a combination of filename, alt text, and caption.

"m.text": [{"body": "Look at this cool Matrix logo"}]
},
"m.alt_text": { // optional - accessibility consideration for image
"m.text": [{"body": "matrix logo"}]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"m.text": [{"body": "matrix logo"}]
"m.text": [{"body": "The Matrix logo, consisting of black, bold, sans serif lowercase letters, framed by two thin square brackets, alluding to customary matrix notation in mathematics."}]

blocks, however as per the extensible events system, receivers which understand image events should not
honour them.

To represent stickers, we instead use a mixin on `m.image_details`. A new (optional) boolean field

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mixin terminology here is incorrect or at least confusing given its definition in MSC1767: it has a meaning when added parallel to content blocks in any event,

instead here m.sticker is added inside a single specific content block, i.e. it is simply an optional attribute to that content block.

In don't really understand either design decision:

  • why the sticker property is in image details
  • why the thing marking something as sticker isn't the event type - this seems to generally be the pattern of extensible events

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that would be nice is to be able to send animated stickers as videos, so it might be easier to do with a mixin?

"width": 640,
"height": 480
},
"m.thumbnail": [ // optional

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this array should be able to include blurhashes or similar

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MSC2448 you're referencing (i.e. you're assuming it is accepted before extensible events) puts the blurhash in what I guess is equivalent to the image details block. I suppose this is due to being an afterthought and missing multi thumbnail support in the old event schema.

I agree with this proposal: a blurhash is functionally equivalent to a minimum resolution thumbnail.

I don't spontaneously see a solution I am entirely happy with. The best I can come up with is to define the thumbnail array as containing either m.file+m.image blocks or a new m.blurhash block which simply contains the blurhash string. Given the existing discussion about an alternative algorithm it may be preferable if this block was extensible, i.e. define a "thumbnail-as-hash" block (name?) and include a "type": "blurhash" rather than specific blocks.

However we don't have blurhashes (or variants thereof) yet.


With consideration for extensible events, the following content blocks are defined:

* `m.image_details` - Currently records width and height (both required, in pixels), but in

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should include the optional is_animated flag from #4230

in light of this i'm not so convinced width and height should be required; having only the is_animated flag is already useful info.

@turt2live turt2live self-assigned this May 4, 2026
@turt2live turt2live moved this from Backlog to Next in Extensible Events May 4, 2026
},
// ...
],
"m.caption": { // optional - goes above/below image

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this idea of having audio captions. Basically, it would be just an audio content block instead of a text block (and maybe still a text block as a fallback).
I guess this may be too much for one MSC, but does this sound desirable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal. Process state.

Projects

Status: Next
Status: Tracking for review
Status: Scheduled - v1.10

Development

Successfully merging this pull request may close these issues.