-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC3060: Room labels #3060
base: old_master
Are you sure you want to change the base?
MSC3060: Room labels #3060
Conversation
24c6f67
to
c01a28f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it undefined client behaviour rather than ill-formed, as some languages may require words longer than 35 characters
A label is expected to be fairly short, and to designate only one concept, | ||
therefore each label is limited in size to 35 characters and can contain only | ||
one word (a word being defined as a string containing any character except for a | ||
space (` `)). | ||
|
||
As a sidenote for context, the length of 35 characters is based on the allowed | ||
length for a topic on a GitHub repository (since the semantics of GitHub topics | ||
are very similar to the ones described here). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A label is expected to be fairly short, and to designate only one concept, | |
therefore each label is limited in size to 35 characters and can contain only | |
one word (a word being defined as a string containing any character except for a | |
space (` `)). | |
As a sidenote for context, the length of 35 characters is based on the allowed | |
length for a topic on a GitHub repository (since the semantics of GitHub topics | |
are very similar to the ones described here). | |
A label is expected to be fairly short, and to designate only one concept. | |
Client behaviour is undefined for labels that are longer than 35 characters | |
and/or contain whitespace in the label. | |
As a sidenote for context, the length of 35 characters is based on the allowed | |
length for a topic on a GitHub repository (since the semantics of GitHub topics | |
are very similar to the ones described here). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with not setting a hard limit on the length of a label is that it then becomes an abuse vector, like e.g. display names can be (we've had issues in the past where you could set your display name as a very long string of characters and flood the room's history, or create DoSs in some clients). I believe servers need to enforce a limit of some kind (otherwise it just opens the door to issues), and that the discussion should be on what this limit should be (35 is kinda arbitrary and, as mentioned in the MSC, based on the maximum length of github's equivalent feature, but one can argue github was built for the english language, so this kind of considerations might not have been taken into account), rather than whether we need a limit at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intention simply is to avoid DOS attacks, then set the limit as high as possible without degrading performance. I mean, much higher than anybody would need in a "normal" use case.
Somewhat related: I don't think that "character" is a concept common to all languages, and one that is defined in the Unicode standard. Please use a string length metric that is well-defined and works around the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a string length metric that is well-defined and works around the world.
Bytes? Codewords? Code units? Code points? Grapheme clusters? Glyph units?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend UTF-8 bytes so it is easy for computers and make it sufficiently long (64 or 128) that it is rare to bother humans while still being trivial for servers and other users to handle.
|
||
A label is expected to be fairly short, and to designate only one concept, | ||
therefore each label is limited in size to 35 characters and can contain only | ||
one word (a word being defined as a string containing any character except for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the "one word" restriction? How does it apply to languages that don't separate words with spaces? What about other whitespace characters?
desired. | ||
|
||
Clients must display all room labels as defined, except for `m.nsfw`, which can | ||
also be displayed as the string `nsfw` or any case variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix seems odd here. To me we should do one of two things:
- Publish a list of unprefixed well-understood labels including nsfw.
- Put user labels in a different namespace. (For example
u.technology
)
It seems like this is basically the worst of both worlds because we didn't reserve any namespace, but also added a requirement to format standardized labels differently from all the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to have a namespace for user labels at all? It feels to me like having a name space for Hashtags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic is that so if we add new "well-known" labels then they won't clobber user labels which may have been used for different meanings. They also degrade nicely if you add a rule that says "Unknown m.*
labels should be displayed as the equivalent u.*
label would"
However overall I think this isn't necessary, we can just have a list of well know that is updatable over time. These well-known labels can be suggested by clients and translated as appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who would be responsible for curating that set of well-known labels, and what would be the process for someone to get a label promoted to it? If a "namespaced" user label were promoted to well-known (i.e. m.
), wouldn't rooms need to switch to using the well-known one to get special client behavior like the translations example?
I think the practical use-case of an nsfw
marker on rooms probably goes beyond the more generalized idea of labels, given the App Store policy requirement example. I'm personally unable to think of any other "well-known" labels which would warrant specific client behavior.
Rendered