New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove non-ASCII characters from file #129
Conversation
A few recent commits contained non-ASCII characters, of which a few 'invisible' ones like the non-breaking space. These characters are expanded with a descriptor (U+xxxx) when the markdown file is converted with mmark and xml2rfc.
|
Hi, this makes sense to me - I'd be curious to know how you detected these in the first place, did you have some automated method or was it just looking by eye after a transformation attempt? |
| 1 | s(n-1) | N/A | ||
| 2 | 2 * s(n-1) - s(n-2) | s(n-1) + ∆s(n-1) | ||
| 3 | 3 * s(n-1) - 3 * s(n-2) + s(n-3) | s(n-1) + ∆s(n-1) + ∆∆s(n-1) | ||
| 4 | 4 * s(n-1) - 6 * s(n-2) + 4 * s(n-3) - s(n-4) | s(n-1) + ∆s(n-1) + ∆∆s(n-1) + ∆∆∆s(n-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so these triangle symbols were more of a glitch and not intended?
|
As I work with Linux, I could have used grep, but in this case mmark and xml2rfc also mark non-ASCII characters. If i run As you can see the 'triangles' are marked with (U+2206) and the non-breaking space with (U+00A0). This is because in RFCs the use of non-ASCII characters should be limited and when used the unicode code points marked. So, the triangles were intentional, but I didn't know such non-ASCII characters should not be used in RFCs. That's why this PR replaces them with another representation for derivatives. |
|
Please let me know if what follows is obscure or obviously wrong. The RFC series had ASCII text as its canonical form for decades, but more recent revisions to the RFC series are using XML as the canonical form, and allow the use of non-ASCII characters. I believe that the most recent guidance is RFC 7997, which (taking a quick look at Section 3) allows non-ASCII characters throughout a document. There is more specific guidance available in Section 3, depending on how the non-ASCII characters are being used. I think what we're talking about in this document is covered under Section 3.5. Section 2 gives basic requirements (for instance, "Searches against RFC indexes and database tables need to return expected results and support appropriate Unicode string matching behaviors;"). Does this help? |
|
RFC 7997 indeed allows UTF-8 characters, but only for examples. When used in normative sections, the characters must be escaped Section 3.1 Section 3.4 In other words, I used the character ∆ to express a discrete derivative, but following RFC 7997 this should be expressed as U+2206 instead, in other words, escaped. However, the ∆ character is only used as a more correct mathematical alternative to using Lagrange's notation (with ', '' and '''), but when these ∆ have to be escaped, it becomes hard to read. |
|
@ktmf01 - thank you for helping me understand. |
A few recent commits contained non-ASCII characters, of which a few 'invisible' ones like the non-breaking space. These characters are expanded with a descriptor (U+xxxx) when the markdown file is converted with mmark and xml2rfc.