Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3060: Room labels #3060

Open
wants to merge 4 commits into
base: old_master
Choose a base branch
from
Open

Conversation

babolivier
Copy link
Contributor

@babolivier babolivier commented Mar 12, 2021

@babolivier babolivier changed the title MSCXXXX: Room labels MSC3060: Room labels Mar 12, 2021
@turt2live turt2live added kind:feature MSC for not-core and not-maintenance stuff proposal A matrix spec change proposal proposal-in-review labels Mar 12, 2021
Copy link

@erkinalp erkinalp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it undefined client behaviour rather than ill-formed, as some languages may require words longer than 35 characters

Comment on lines +39 to +46
A label is expected to be fairly short, and to designate only one concept,
therefore each label is limited in size to 35 characters and can contain only
one word (a word being defined as a string containing any character except for a
space (` `)).

As a sidenote for context, the length of 35 characters is based on the allowed
length for a topic on a GitHub repository (since the semantics of GitHub topics
are very similar to the ones described here).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A label is expected to be fairly short, and to designate only one concept,
therefore each label is limited in size to 35 characters and can contain only
one word (a word being defined as a string containing any character except for a
space (` `)).
As a sidenote for context, the length of 35 characters is based on the allowed
length for a topic on a GitHub repository (since the semantics of GitHub topics
are very similar to the ones described here).
A label is expected to be fairly short, and to designate only one concept.
Client behaviour is undefined for labels that are longer than 35 characters
and/or contain whitespace in the label.
As a sidenote for context, the length of 35 characters is based on the allowed
length for a topic on a GitHub repository (since the semantics of GitHub topics
are very similar to the ones described here).

Copy link
Contributor Author

@babolivier babolivier Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with not setting a hard limit on the length of a label is that it then becomes an abuse vector, like e.g. display names can be (we've had issues in the past where you could set your display name as a very long string of characters and flood the room's history, or create DoSs in some clients). I believe servers need to enforce a limit of some kind (otherwise it just opens the door to issues), and that the discussion should be on what this limit should be (35 is kinda arbitrary and, as mentioned in the MSC, based on the maximum length of github's equivalent feature, but one can argue github was built for the english language, so this kind of considerations might not have been taken into account), rather than whether we need a limit at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention simply is to avoid DOS attacks, then set the limit as high as possible without degrading performance. I mean, much higher than anybody would need in a "normal" use case.

Somewhat related: I don't think that "character" is a concept common to all languages, and one that is defined in the Unicode standard. Please use a string length metric that is well-defined and works around the world.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a string length metric that is well-defined and works around the world.

Bytes? Codewords? Code units? Code points? Grapheme clusters? Glyph units?

Copy link

@kevincox kevincox Mar 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend UTF-8 bytes so it is easy for computers and make it sufficiently long (64 or 128) that it is rare to bother humans while still being trivial for servers and other users to handle.


A label is expected to be fairly short, and to designate only one concept,
therefore each label is limited in size to 35 characters and can contain only
one word (a word being defined as a string containing any character except for a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the "one word" restriction? How does it apply to languages that don't separate words with spaces? What about other whitespace characters?

desired.

Clients must display all room labels as defined, except for `m.nsfw`, which can
also be displayed as the string `nsfw` or any case variant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix seems odd here. To me we should do one of two things:

  1. Publish a list of unprefixed well-understood labels including nsfw.
  2. Put user labels in a different namespace. (For example u.technology)

It seems like this is basically the worst of both worlds because we didn't reserve any namespace, but also added a requirement to format standardized labels differently from all the rest.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to have a namespace for user labels at all? It feels to me like having a name space for Hashtags

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is that so if we add new "well-known" labels then they won't clobber user labels which may have been used for different meanings. They also degrade nicely if you add a rule that says "Unknown m.* labels should be displayed as the equivalent u.* label would"

However overall I think this isn't necessary, we can just have a list of well know that is updatable over time. These well-known labels can be suggested by clients and translated as appropriate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would be responsible for curating that set of well-known labels, and what would be the process for someone to get a label promoted to it? If a "namespaced" user label were promoted to well-known (i.e. m.), wouldn't rooms need to switch to using the well-known one to get special client behavior like the translations example?

I think the practical use-case of an nsfw marker on rooms probably goes beyond the more generalized idea of labels, given the App Store policy requirement example. I'm personally unable to think of any other "well-known" labels which would warrant specific client behavior.

@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants