Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2427: Proposal for JSON-based message formatting #2427

Open
wants to merge 7 commits into
base: old_master
Choose a base branch
from

Conversation

tulir
Copy link
Member

@tulir tulir commented Jan 24, 2020

tulir and others added 4 commits January 24, 2020 21:34
Co-authored-by: Pascal Abresch <nep@packageloss.eu>
Signed-off-by: Tulir Asokan <tulir@maunium.net>
Signed-off-by: Pascal Abresch <nep@packageloss.eu>
Signed-off-by: Tulir Asokan <tulir@maunium.net>
Signed-off-by: Tulir Asokan <tulir@maunium.net>
Signed-off-by: Tulir Asokan <tulir@maunium.net>
@turt2live turt2live added proposal A matrix spec change proposal proposal-in-review labels Jan 24, 2020
@dali99
Copy link

dali99 commented Jan 25, 2020

This is very similar to how Minecraft does its chat https://wiki.vg/Chat

Though I'm not sure why this makes tables and lists are impossible, wouldn't something like

{
  "m.formatted.version": "0.1",
  "m.formatted": [
    {"m.text": "Hello, here's a table:\n"},
    {"m.table": [
        [
             [{"m.text": "Row 0 "}, {"m.text": "Column 0"}],
             [{"m.text": "Row 0 Column 1"}]
        ],
        [
            [{"m.text": "Row 1 Column 0"}],
            [{"m.text": "Row 1 Column 1"}]
        ]
    ]},
    {"m.text": "And This is a list:\n"},
    {"m.list": [
        [{"m.text": "List Item 1"}],
        [{"m.text": "List Item 2"}],
        [{"m.width": 128, "m.height": 64, "m.alt_text": "Could even be an image", "m.image": "mxc://example.org/ABCDEF"}]
    ]},
  ]
}

Be fine?

Of course this would require non-string primary types and would complicate the renderer quite a bit. (and I guess it kinda looks like something extensible-eventsy at this point)

HTML can. In practice, this means:
* No tables. This is not really relevant for instant messaging: they're hard
to render, only riot web supports them anyway, and even that doesn't
support sending them after the switch to CommonMark.
Copy link
Member Author

@tulir tulir Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dali99 Travis prefers discussion in threads, so here's a thread.

It's actually true that this doesn't prevent tables completely, it's just not supported by this proposal and would make things more complicated. It could be added in a future major version (added as an example to the versioning section).

Signed-off-by: Tulir Asokan <tulir@maunium.net>
support sending them after the switch to CommonMark.
* No lists. Lists can be done with plain text. It's also better if users
don't get surprised by `6.` turning into `1.` because it's an ordered list,
although that's more of a markdown input problem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "can be done with plain text", does this also include

  • bulleted
  • lists?

such that the client will insert into an m.text? I ask as I make use of bulleted lists quite a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reply from @nep:pink.packageloss.eu: Sure, as long as the client just implements it.

There is nothing standing in the way of using markdown input or something else and just translating it to

"m.formatted": [
  { "m.text": "• Firstly\n• Secondly\n• Thirdly"}
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a formatting would have wrapping problems, if list items are very long. I would still prefer a dedicated list type. A client may wrap a plain text list, where it doesn't know, that it is a list, like so for example:

• A very long text that gets wrapped at some point, because
it is too long to fit in one line.

This can be harder to read, especially with long lists with lines spanning 3 lines or more. The following formatting, which can only be done, when the client is aware of the list semantics, is more readable imo:

• A very long text that gets wrapped at some point, because
  it is too long to fit in one line.

This is even more important for ordered lists:

1. Item one
20129. A very long text that gets wrapped at some point, because
it is too long to fit in one line.

vs

    1. Item one
20129. A very long text that gets wrapped at some point, because
       it is too long to fit in one line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, lists should definitely be representable. As currently written, this seems like too drastic of a proposal for something that is a net loss of functionality (in terms of representable semantics) and further increases bandwidth costs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also agree that it makes sense to have a dedicated type for lists so that clients can decide how to render them in a way that makes sense. I worry that with without a standard way of representing lists, client support could become fragmented.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following down that rabbit hole you are bound to reinvent html in json... ☺️

exactly one primary field.

* `m.text` (string) - A normal text chunk.
* `m.image` (string) - An inline image. The value must be a `mxc://` URI to an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: have a type filed which can be one of (text, image, usermention, channelmention, groupmention, paragraph?, list, code(pre), ...)

And then the block-like elements can have children, so something like

{
  "m.formatted": [
    {"type": "m.text", "m.text": "Hey there "},
    {"type": "m.usermention", "m.user": "@alice:example.org"},
    {"type": "m.text", "m.text": "! Have you seen my Todo list?"},
    {"type": "m.list", "m.list": "ordered", "m.start": 1, "children": [
      {"type": "m.text", "m.text": "Shopping!"},
      {"type": "m.text", "m.text": "relaxing"},
    ]},
  ],
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also nest lists etc. then

* `m.code` (string) - Monospace text. The value is the language.
* `m.link` (string) - Hyperlink. The value is the URL. As in the current HTML
subset, clients should only allow the schemes `http`, `https`, `ftp`,
`mailto`, `gopher`, `magnet` or platform specific known schemes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add mxc if/when #2398 goes through?

to render, only riot web supports them anyway, and even that doesn't
support sending them after the switch to CommonMark.
* No lists. Lists can be done with plain text. It's also better if users
don't get surprised by `6.` turning into `1.` because it's an ordered list,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually a client problem, not a spec problem. riot-web, for example, renders this correctly with 6.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the next line does say it's a client markdown problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although that's more of a markdown input problem.

That is incorrect, though, as the markdown does put that ol start attribute, the clients just ignore it when rendering (apart from riot-web)

although that's more of a markdown input problem.
* More bytes. However, switching Matrix to a more efficient binary format would
make it have less bytes. Especially a partially structured binary format
could be significantly more efficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth to introduce a filter that strips the format, then, so that clients on limited devices don't get that huge chunk data

Instead of having the text within the formatting entities, the formatting
entities could use indexes to refer to a separate plain text string. However,
this would require specifying the exact encoding used for indexing, and was
therefore rejected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should clients implement this? Render their message to this thing, html subset AND plaintext fallback? That is already three things to render messages into, then

@turt2live turt2live self-requested a review February 9, 2020 17:25
Copy link
Member

@turt2live turt2live left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read this in detail, but I like the idea of it. Might conflict slightly with the human representation of events MSC though.

@KB1RD
Copy link

KB1RD commented Mar 9, 2020

This also looks similar to https://github.com/portabletext/portabletext, which might be nice since there's already a few libraries built around it. It's spec would have to be expanded for things like mentions, though.

Copy link
Member

@KitsuneRal KitsuneRal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, my initial reaction to this was "this ain't going anywhere" but soon after I started reading I found quite a lot of sense in the proposal. Looking forward to some solution for nested structures (I'm afraid it's too limiting without those, hence no immediate approval for me). I really like the idea.

### Disadvantages compared to current Matrix HTML
* No truly nested formatting. While a chunk can have multiple styles like
`m.bold` and `m.italic` simultaneously, it can't have child elements like
HTML can. In practice, this means:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I liked the idea I think that lack of nesting is (so far) a deal-breaker. I can imagine cases like highlighting or otherwise marking up a fragment in the code (that's not about syntax highlighting, by the way - that can be done entirely client-side using some sort of m.language hint); collapsing parts of the code into spoilers; quoting a quote (ok, this can be easily covered by some m.level thing) or another already formatted (e.g. with superscripts) text.

With that said, I'd be happy to see some syntax for nesting; I really like where this MSC is leading. Sorunome proposed syntax for lists that I find promising - maybe ride on that idea?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-iterate what I said in #matrix-spec:matrix.org:
I think that nesting is the way to go since it solves the problem of things like blockquotes that may have formatting. In the blockquote case, there's no way to add styles to text within a blockquote without merging consecutive blockquotes, which is undesirable.

The solution that I believe would be best would be to have a dedicated nesting field. This way, clients that don't support certain nested types (such as lists) can still render the text inside and clients that do will show the list with fancier rendering.

For example, let's say a PR was merged that is like this one, but implements a m.nested field for nesting and does not have a list type. Let's say another PR is merged that implements lists using m.nested like so:

{
  "m.list": { "type": "ordered" },
  "m.nested": [
    { "m.text": "Hello World!" },
    { "m.text": "This is a list!" }
  ]
}

Clients that do not support the list would render the text inside m.nested as if these nodes were in the root list, so users on older clients would still be able to follow the conversation. Newer list-able clients would see the m.list and use the nodes in m.nested as list elements. The same could be true for blockquotes. Of course, whole list functionality may be best left for another PR, but I still think we shouldn't kick the can down the road so that implementing lists makes it impossible for older clients to see parts of newer messages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discuss something that doesn't exist in any kind of implementation (or did I miss something) I don't think we have "older" vs. "newer" clients... just a nitpick.
I think that's a reasonable direction, would like to see the MSC author's opinion (and if it's positive, the MSC updated respectively).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KitsuneRal the "old" and "new" client thing was in a hypothetical case of more stuff being added to the spec in the future ;) My only point was that there should be text fallbacks for things that a client may not be able to understand.

From my conversations, it would seem that the author(s?) want to keep everything flat. I pointed out that it would be impossible to distinguish between two blockquotes consecutively and one big blockquote. It was suggested that the blockquotes could be given some kind of start option to indicate that a new blockquote could be created.

I am still in favor of nesting and I think that we should distinguish between blocks and marks. Blocks would have a single type (ex, text, quote, list, link) and could have child chunks. Marks would just be properties appended to a block.

##### Secondary fields (attributes)
Secondary fields are attributes that change the way the chunk is rendered. The
number of secondary fields is not restricted, but some fields have other
conditions. Unknown secondary field types should be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a general rule when two incompatible fields clash (e.g., m.room and m.user)?

* No truly nested formatting. While a chunk can have multiple styles like
`m.bold` and `m.italic` simultaneously, it can't have child elements like
HTML can. In practice, this means:
* No tables. This is not really relevant for instant messaging: they're hard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be so quick to judge tables irrelevant for instant messaging. I think the only reason they're not used in this context as much is that the current implementations are clunky. Tables are really useful when sharing dense information inline. Personally, I'd rather leave the option to support tables open.

of a message, i.e. rich quotes. A future major version could include support
for proper nesting to make tables and lists possible.

### Disadvantages compared to current Matrix HTML
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about old messages? Given Matrix's emphasis on persistence of messages, doesn't that mean that clients will still need to support both Matrix HTML and this new JSON-based formatting so that they are still able to display old messages?


While a custom formatting system means new renderers will need to be made in
all environments, it is easier to make a custom renderer for text formatted
using a well-defined JSON system, than it is to parse HTML and try to handle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the allure of being able to simply pass the JSON input into a ready-made, ubiquitous JSON library, but wouldn't the JSON still need to be validated to ensure it has the correct structure and valid fields? But at that point, it doesn't seem much easier than parsing HTML (for which libraries are also ubiquitous) and validating that it has the valid structure of Matrix's HTML dialect. This is the approach weechat-matrix takes, for instance.

If lists get thrown out, some of this validation effort becomes moot (but not all), though I don't think this is a fair comparison because it results in a loss of very useful functionality.

Signed-off-by: Pascal Abresch <nep@packageloss.eu>
Signed-off-by: Pascal Abresch <nep@packageloss.eu>
@turt2live turt2live added the kind:feature MSC for not-core and not-maintenance stuff label Apr 20, 2020
@@ -0,0 +1,225 @@

Copy link
Contributor

@Sorunome Sorunome Jan 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this proposal initially sounds like a good idea, the more soru thinks about it it seems like not sure a good idea after all. This is to outline her thoughts on it

Original Goals

The original goals were to simplify composing rich messages, rendering rich messages, and bridging rich messages into and out of matrix. To do so, a new json-based format is proposed here, which basically acts like an AST, describing which part of the message are what (bold, list, table, spoiler, etc.). It also lists as issue custom attributes, such as data-mx-color, data-mx-spoiler or, in the future, data-mx-maths.

Problems with the original goals

HTML, and, to an extent, XML, is already wildly used. As such, in most languages, there are already parsers to parse the HTML/XML into an AST, making the current HTML representation more compact than the proposed json one here. With being able to parse HTML into an AST, and thus being somewhat equivalent to the proposed json here, we have to compare them as such.

Rendering messages

For easy rendering, many frameworks (QT, flutter, etc.) already have basic HTML parsers, to easily get some rich message support into rendering messages. Of course they don't do the custom attributes, and are thus imperfect, but some are also easily extensible / easy enough to fork. If all fails, you can just parse the HTML into an AST and then write your own renderer.

With the proposed json format here, client developers are basically forced to write their own renderer, as this is something completely new, not seen before. So, sticking with HTML here adds the benefit of HTML being a standard, and thus existing things already existing. If the existing things don't satisfy your needs, you can still do your custom rendering just as easily as with this json format.

Sending messages

Here the same basically applies as to rendering messages: There are already markdown->html parsers in many languages, or wysiwyg editors that output html. None of these, however, output a custom json format as proposed here. Again, client developers are forced to write their own implementations.

Bridging out of matrix

While on first glance it might seem that this custom json format at least makes bridging easier, this isn't really the case, once you realize that you can just as easily parse HTML into an AST and then propagate the AST, just like you'd propagate the json structure.

Speaking from personal experience, soru has written two matrix HTML --> remote parsers already, discord and slack. For both it renders the HTML into their respective markdown flavours, with slack it also renders the HTML into slacks own json format (called "blocks"). There was never a point where she thought this was needlessly complicated, she just took an existing library, parse the HTML into an AST and propagated the AST. The only difference with this json approach here would be to leave out the parsing into an AST step.

Bridging into matrix

HTML is just as easy to build up from an existing AST (from the external source) as the proposed json structure here would be. The added benefit with using HTML here is that, as HTML is a standard, there might already be libraries to parse the protocol you are trying to bridge into matrix to HTML.

Additional problems

If this MSC is approved, realistically clients and bridges would have to maintain two fallbacks for some years in addition to the "original" json message: The formatted_body for clients that don't do json rendering yet, and the body fallback we are used to.

Additionally, HTML is a very compact format, bytesize-wise. The json blobs described here would take up way more space, bytewise. This worsens the main issue soru already found when developing embedded clients: Bytesize of the initial sync. #2755 would fix this issue, however we don't have to worsen a problem while a fix for it isn't present yet, if there is little to no gain.

Conclusions

After further thinking about this proposal, to soru the tradeoffs don't seem worth it. The core issue being that this JSON is a custom format, so no existing libraries or anything exists to build it, parse it, or render it. Client authors will have to write everything from scratch. Basically there is no need to re-invent the wheel here.

That you can just parse HTML into an AST, which is just as easily propagated as the proposed json here, doesn't help in this regard, as it just shows how flexible HTML is.

While HTML may not be the best decision for formatted messages, it sure seems like a reasonable decision that works well.

Copy link
Member

@dkasak dkasak Jan 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. The new JSON format is suboptimal because it is both a backward incompatible change (thus requiring all clients to implement custom parsers) and is also completely new (thus suffering from the N+1 standards problem).

If a new format was to be introduced, it shouldn't be invented since there is already robust and flexible prior art: the Pandoc AST. The Pandoc AST is already known to work well as a bridge between a myriad of formats and supports everything that Matrix needs and more. For instance, there is native support for math blocks, ordered and unordered lists, code blocks, custom metadata (for e.g. spoilers), etc.

To give an example, consider the following Markdown snippet:

*Foo* is bar.

```python
bar
```

***

Some math: $x = 2$

1. *a*
2. **b**
3. ***c***

This is converted by Pandoc into the following AST:

[Para [Emph [Str "Foo"],Space,Str "is",Space,Str "bar."]
,CodeBlock ("",["python"],[]) "bar"
,HorizontalRule
,Para [Str "Some",Space,Str "math:",Space,Math InlineMath "x = 2"]
,OrderedList (1,Decimal,Period)
 [[Plain [Emph [Str "a"]]]
 ,[Plain [Strong [Str "b"]]]
 ,[Plain [Strong [Emph [Str "c"]]]]]]

Additionally, pandoc has a JSON writer which converts this AST into a canonical JSON representation which is similar to what this MSC suggests but is complete with respect to Matrix's feature set:

{
  "pandoc-api-version": [
    1,
    22
  ],
  "meta": {},
  "blocks": [
    {
      "t": "Para",
      "c": [
        {
          "t": "Emph",
          "c": [
            {
              "t": "Str",
              "c": "Foo"
            }
          ]
        },
        {
          "t": "Space"
        },
        {
          "t": "Str",
          "c": "is"
        },
        {
          "t": "Space"
        },
        {
          "t": "Str",
          "c": "bar."
        }
      ]
    },
    {
      "t": "CodeBlock",
      "c": [
        [
          "",
          [
            "python"
          ],
          []
        ],
        "bar"
      ]
    },
    {
      "t": "HorizontalRule"
    },
    {
      "t": "Para",
      "c": [
        {
          "t": "Str",
          "c": "Some"
        },
        {
          "t": "Space"
        },
        {
          "t": "Str",
          "c": "math:"
        },
        {
          "t": "Space"
        },
        {
          "t": "Math",
          "c": [
            {
              "t": "InlineMath"
            },
            "x = 2"
          ]
        }
      ]
    },
    {
      "t": "OrderedList",
      "c": [
        [
          1,
          {
            "t": "Decimal"
          },
          {
            "t": "Period"
          }
        ],
        [
          [
            {
              "t": "Plain",
              "c": [
                {
                  "t": "Emph",
                  "c": [
                    {
                      "t": "Str",
                      "c": "a"
                    }
                  ]
                }
              ]
            }
          ],
          [
            {
              "t": "Plain",
              "c": [
                {
                  "t": "Strong",
                  "c": [
                    {
                      "t": "Str",
                      "c": "b"
                    }
                  ]
                }
              ]
            }
          ],
          [
            {
              "t": "Plain",
              "c": [
                {
                  "t": "Strong",
                  "c": [
                    {
                      "t": "Emph",
                      "c": [
                        {
                          "t": "Str",
                          "c": "c"
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        ]
      ]
    }
  ]
}

So if a new format were to be added, I suggest that the Pandoc AST format be used instead. This would have the additional benefit of being immediately pluggable into Pandoc, leveraging its rich ecosystem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I mostly agree. I strongly feel that HTML should not be the de-facto format (Coming from someone who recently wrote a parser/renderer).
We could use an existing format to solve the re-inventing the wheel problem. Like Pandoc AST as @dkasak suggested.

Copy link
Contributor

@Sorunome Sorunome Jan 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you two summarized well the bit soru rushed at the end and wasn't able to explain well: If you can easily parse HTML into an AST it is basically equivalent to the json structure, with the added benefit of library support. Picking an already standardized (and customizable!) json structure seems appropriate, and like it could greatly improve this proposal. If clients still have markdown libraries and maybe even rendering libraries available, things would still remain simple.

This is the first time soru hears of pandoc specifically so she'd have to do some more research to be able to adequately comment on that part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, I'm aware of pandoc-ast for Rust and panflute for Python, both of which are able to deserialize the JSON representation of Pandoc's AST.

@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021
@zander
Copy link

zander commented Jun 26, 2022

The spec details the real problem of a html renderer being quite heavy and the html spec itself naturally being a huge thing to depend opon in matrix renderers.

Moving over to JSON, however, is something that looks a bit weird to my eyes. It solves a different problem, it essentially re-invents serialization without talking about the initial issue of standardizing text/image-rendering.

Using XML (lets assume html is xml for just a moment) has one big advantage in that any renderer can simply remove tags it doesn't understand and still have a good text that it can render. That is to say, XML is much better suited for markup of human readable text than JSON is. JSON is much better for structures and data.

My suggestion is to look at the ODF spec and copy those parts that will fit in the matrix need. This avoid re-inventing serialization and even giving the opportunity for software to reuse existing support for this ISO-specified format. Naturally a very strict subset is needed to be specified to avoid repeating the html renderer issue where people can just add tags that a browser understands and it works for some clients and not for others.

Because at the end of the day the point of this MSC is to avoid HTML and specify exactly what tags are supported. And if this can be done without inventing the serialization itself, that is just a bonus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet