Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict values for our xml:lang attributes #11

Closed
wrygiel opened this issue Oct 6, 2016 · 12 comments
Closed

Restrict values for our xml:lang attributes #11

wrygiel opened this issue Oct 6, 2016 · 12 comments

Comments

@wrygiel
Copy link
Contributor

wrygiel commented Oct 6, 2016

I have just noticed one minor issue which we should address.

The xml:lang attributes which we have used in our StringWithOptionalLang and MultilineStringWithOptionalLang data types have "a flaw" of a sort. (Or perhaps it's a feature?)

I was just implementing an update to ewp-registry-client, and have followed the XML namespace specs through xs:language specs to RFC 3066 where I found out that xml:lang accepts both 2 and 3-letter language codes.

This might be a problem, because - if we allow that - then all programmers will need to remember to search for their languages using both versions of the key (e.g. eng and en, instead of only en).

I propose to change our specs and require everyone to use 2-letter codes only (the ISO 839-1 subset of all valid xml:lang attributes). This would be documented in our common-types.xsd file.

@wrygiel
Copy link
Contributor Author

wrygiel commented Oct 6, 2016

BTW, it's not only about 3-letter codes. xml:lang accepts many other language options too. This is the regexp used there:

[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*

@wrygiel
Copy link
Contributor Author

wrygiel commented Oct 7, 2016

Solution 1. As described above. Keep using xml:lang attributes, but remind everyone to restrict their values to ISO 839-1.

Other options:

Solution 2. Deprecate xml:lang attributes in our schemas entirely, and use an equivalent of our own.

Solution 3. Don't change anything. Keep xml:lang and respect all the values it might have. This might make it harder for clients to find proper languages in various cases, but it also allows EWP to support variations of languages (e.g. American English vs British English). For example, just a minute ago I have noticed that @kaiqu has used xml:lang="en-us" in one of his examples - perhaps it would be useful to keep such distinction?

@mikesteez
Copy link

As W3C refers to http://www.rfc-editor.org/rfc/bcp/bcp47.txt when discussing xml:lang
In the text, they say "shortest ISO 639 code" and "sometimes followed by extended language subtags". I suggest we go with solution 1 with the addition that clients MAY interpret the extended language subtag, if they have the possibility to do so.

@georgschermann
Copy link

I'd suggest to stick to the ISO 639-1 codes which should be sufficient inside our use cases and the EU. (http://publications.europa.eu/code/en/en-5000800.htm)

@wrygiel
Copy link
Contributor Author

wrygiel commented Oct 10, 2016

I'd suggest to stick to the ISO 639-1 codes which should be sufficient inside our use cases and the EU.

Would you prefer solution 1 or 2? Both of them encourage you to use 2-letter codes, but solution 2 is more restrictive (it enforces it on the XML Schema level).

@georgschermann
Copy link

solution 1 should be enough I think

@wrygiel
Copy link
Contributor Author

wrygiel commented Oct 18, 2016

During discussion in Warsaw it turned out that might need these "extra features" (among other things, to be able to specify that particular value has been transliterated). I didn't understand why we need it exactly, but perhaps it doesn't matter.

In conclusion, we want to be able to use the full potential of xs:language (if not now, then perhaps in the future). This means, that we should not attempt restrict the server developers, but to instruct client developers instead.

@mikesteez
Copy link

@wrygiel could you elaborate on this? What are the implications for server- and client-developers?

@wrygiel
Copy link
Contributor Author

wrygiel commented Oct 19, 2016

Server developers will be allowed to use extensions, as xml:lang specification allows it. Client developers will be required to expect that xml:lang may include extensions. They may need to truncate these extensions before comparing or storing the values.

@wrygiel
Copy link
Contributor Author

wrygiel commented Oct 20, 2016

I didn't understand why we need it exactly

I will try to understand this topic exactly before I update the specification. I suspect that we might have misunderstood each other in this aspect.

Regardless, I think that the clients should expect that xml:lang MAY sometimes contain extensions. These extensions might be useful in some scenarios (e.g. en-US vs en-GB). It's up to the server and the clients to decide if they want to use them, but the network should not deny them to be used. What do you think?

@mikesteez
Copy link

@wrygiel yes I agree with the last comment!

@georgschermann
Copy link

georgschermann commented Oct 20, 2016

+1 for 2-letter codes with optional extension

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants