Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3554: Extensible Events - Translatable Text #3554

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions proposals/3554-extensible-events-translatable-messages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# MSC3554: Extensible Events - Translatable Messages

[MSC1767](https://github.com/matrix-org/matrix-doc/pull/1767) describes Extensible Events in detail,
though deliberately does not include schemas for non-text messaging types. This MSC covers only support
for translations on the `m.message` type.

*Rationale*: Splitting the MSCs down into individual parts makes it easier to implement and review in
stages without blocking other pieces of the overall idea. For example, an issue with the way images
are represented should not block the overall schema from going through.

## Proposal

A new field is added to the `m.message` type definition to denote which language is being represented
by the `body`: `lang`.

An example:

```json5
{
"type": "m.message",
"content": {
"m.message": [
{
"body": "Je suis un poisson",
"lang": "fr"
},
{
"body": "I am a fish",
"lang": "en"
}
]
}
}
```

*Note*: `m.message`'s support for `mimetype` has been excluded from the example for brevity. It is still
supported in events.

As already covered by Extensible Events, the first element in the array would be the representation that
unaware clients would use, which in the example above would be French. Clients which are aware of language
support might end up picking the English version instead.

By default, messages are assumed to be sent in English (`en`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes a lot of sense to assume a language in this case. There is no fault prove way to guess a language, so many clients will probably default to just sending whatever the user typed without a language. What is the benefit of assuming English, if that is probably wrong in a lot of cases? Shouldn't it rather just be unspecified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vast majority of software in the ecosystem makes assumptions about text being English. This is just to help implementations which might be searching for a language code, not to define the language itself.

Unspecified leads to all kinds of issues with software, whereas French-as-default-English is generally fine.

anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved

`lang` must be a valid language code under [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt). This is
in line with the HTML specification which uses a similar attribute on the `<html>` node.

There is no specific guidance for when to use translation support, though cases can include automatic machine
translation, bots with internationalization support, and possibly some bridges.
turt2live marked this conversation as resolved.
Show resolved Hide resolved

## Potential issues

The language code spec might not encompass all of the possible language code combinations, but should cover
plenty given its popularity in HTML.

This does not apply to `m.text` or `m.html`, necessitating the use of the longer form `m.message` when sending
translated messages.

## Alternatives

No significant alternatives known.

## Security considerations

No specific considerations are required for this proposal.

## Unstable prefix

This MSC does not introduce anything which should conflict with stable usage. Implementations are encouraged
to review [MSC1767](https://github.com/matrix-org/matrix-doc/pull/1767)'s unstable prefixing approach.