Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove non-ASCII characters from file #129

Merged
merged 1 commit into from Feb 14, 2022

Conversation

ktmf01
Copy link
Collaborator

@ktmf01 ktmf01 commented Feb 8, 2022

A few recent commits contained non-ASCII characters, of which a few 'invisible' ones like the non-breaking space. These characters are expanded with a descriptor (U+xxxx) when the markdown file is converted with mmark and xml2rfc.

A few recent commits contained non-ASCII characters, of which a few
'invisible' ones like the non-breaking space. These characters are
expanded with a descriptor (U+xxxx) when the markdown file is
converted with mmark and xml2rfc.
@kieranjol
Copy link

@kieranjol kieranjol commented Feb 8, 2022

Hi, this makes sense to me - I'd be curious to know how you detected these in the first place, did you have some automated method or was it just looking by eye after a transformation attempt?

1     | s(n-1)     | N/A
2     | 2 * s(n-1) - s(n-2) | s(n-1) + ∆s(n-1)
3     | 3 * s(n-1) - 3 * s(n-2) + s(n-3) | s(n-1) + ∆s(n-1) + ∆∆s(n-1)
4     | 4 * s(n-1) - 6 * s(n-2) + 4 * s(n-3) - s(n-4) | s(n-1) + ∆s(n-1) + ∆∆s(n-1) + ∆∆∆s(n-1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these triangle symbols were more of a glitch and not intended?

@ktmf01
Copy link
Collaborator Author

@ktmf01 ktmf01 commented Feb 8, 2022

As I work with Linux, I could have used grep, but in this case mmark and xml2rfc also mark non-ASCII characters. If i run make on the current repository (without this PR) I get a draft-ietf-cellar-flac-01.txt file with the following

   *  ∆∆∆ (U+2206 U+2206 U+2206)s(n-1) is ∆∆ (U+2206 U+2206)s(n-1) - ∆∆
      (U+00A0 U+2206 U+2206)s(n-2) or the closest available third-order
      discrete derivative

As you can see the 'triangles' are marked with (U+2206) and the non-breaking space with (U+00A0). This is because in RFCs the use of non-ASCII characters should be limited and when used the unicode code points marked.

So, the triangles were intentional, but I didn't know such non-ASCII characters should not be used in RFCs. That's why this PR replaces them with another representation for derivatives.

@SpencerDawkins
Copy link

@SpencerDawkins SpencerDawkins commented Feb 10, 2022

Please let me know if what follows is obscure or obviously wrong.

The RFC series had ASCII text as its canonical form for decades, but more recent revisions to the RFC series are using XML as the canonical form, and allow the use of non-ASCII characters. I believe that the most recent guidance is RFC 7997, which (taking a quick look at Section 3) allows non-ASCII characters throughout a document.

There is more specific guidance available in Section 3, depending on how the non-ASCII characters are being used. I think what we're talking about in this document is covered under Section 3.5.

Section 2 gives basic requirements (for instance, "Searches against RFC indexes and database tables need to return expected results and support appropriate Unicode string matching behaviors;").

Does this help?

@ktmf01
Copy link
Collaborator Author

@ktmf01 ktmf01 commented Feb 10, 2022

RFC 7997 indeed allows UTF-8 characters, but only for examples. When used in normative sections, the characters must be escaped

Section 3.1

  Where the use of non-ASCII characters is purely part of an example
  and not otherwise required for correct protocol operation, escaping
  the non-ASCII character is not required.

Section 3.4

   When the mention of non-ASCII characters is required for correct
   protocol operation and understanding, the characters' Unicode code
   points must be used in the text.

In other words, I used the character ∆ to express a discrete derivative, but following RFC 7997 this should be expressed as U+2206 instead, in other words, escaped. However, the ∆ character is only used as a more correct mathematical alternative to using Lagrange's notation (with ', '' and '''), but when these ∆ have to be escaped, it becomes hard to read.

@SpencerDawkins
Copy link

@SpencerDawkins SpencerDawkins commented Feb 11, 2022

@ktmf01 - thank you for helping me understand.

@ktmf01 ktmf01 merged commit 4b6cdd2 into ietf-wg-cellar:master Feb 14, 2022
@ktmf01 ktmf01 deleted the remove-nonascii branch Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants