Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sense key identifiers should be valid XML IDs #1

Open
jmccrae opened this issue Jan 15, 2020 · 1 comment
Open

Sense key identifiers should be valid XML IDs #1

jmccrae opened this issue Jan 15, 2020 · 1 comment

Comments

@jmccrae
Copy link

jmccrae commented Jan 15, 2020

I think it would be much better if the SKIs are also valid as XML IDs. This means they should fit the following pattern

NameStartChar ::=   ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
                          [#xD8-#xF6] | [#xF8-#x2FF] |
                          [#x370-#x37D] | [#x37F-#x1FFF] |
                          [#x200C-#x200D] | [#x2070-#x218F] |
                          [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
                          [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
                          [#x10000-#xEFFFF]

NameChar     ::=      NameStartChar | "-" | "." | [0-9] | #xB7 |
                        [#x0300-#x036F] | [#x203F-#x2040]

Source: http://www.w3.org/TR/REC-xml/#NT-Name

This means characters like '#' can't be used and should be escaped. How about ':'?

@ekaf
Copy link
Owner

ekaf commented Jan 15, 2020

Thanks for this advice. I have changed the separator to colon (:), and intend to leave the issue open for discussion.

The XML specification that you cite allows colons in IDs, but discourages it since the main use of colons is to separate name space prefixes. The SKI project aims to support broad interoperability rather than specific formats like XML, so the separator could be anything, as long as it does not appear inside a word.

There are other reasons why the current format is not suitable for direct use as XML names: some words start with a number, or include problematic characters like slash and apostrophe. I am not sure that these issues should be handled here, because different applications require different conversions, which are straightworward to implement with very few lines of 'sed' code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants