Skip to content

Commit c84098f

Browse files
committed
Gluing grapheme concepts
By de-indexing some of them, linking others, and generally reviewing what #891 requires. This closes #891 as far as the original issue title is concerned; however, I would say another issue should be open to actually explain how to work with that kind of regexes (if it's not done already)
1 parent b0e6792 commit c84098f

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

doc/Language/unicode.pod6

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ X<|UTF8-C8>
5656
X<UTF-8 Clean-8> is an encoder/decoder that primarily works as the UTF-8 one.
5757
However, upon encountering a byte sequence that will either not decode as
5858
valid UTF-8, or that would not round-trip due to normalization, it will use
59-
NFG synthetics to keep track of the original bytes involved. This means that
59+
L<NFG synthetics|/language/glossary#NFG> to keep track of the original bytes involved. This means that
6060
encoding back to UTF-8 Clean-8 will be able to recreate the bytes as they
6161
originally existed. The synthetics contain 4 codepoints:
6262

doc/Type/Cool.pod6

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -707,10 +707,10 @@ number of characters in the string. Please note that on the JVM, you currently g
707707
codepoints instead of graphemes.
708708
709709
say 'møp'.chars; # OUTPUT: «3␤»
710-
say 'ã̷̠̬̊'.chars; # OUTPUT: «1␤»
711-
say '👨‍👩‍👧‍👦🏿'.chars; # OUTPUT: «1␤»
710+
say 'ã̷̠̬̊'.chars; # OUTPUT: «1␤»
711+
say '👨‍👩‍👧‍👦🏿'.chars; # OUTPUT: «1␤»
712712
713-
X<|Grapheme> X<|NFG>
713+
X<|Grapheme>
714714
715715
Graphemes are user visible characters. That is, this is what the user
716716
thinks of as a “character”.
@@ -730,7 +730,7 @@ order to see how it is going to behave:
730730
731731
You can read more about graphemes in the
732732
L<Unicode Standard|http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>,
733-
which Perl 6 tightly follows.
733+
which Perl 6 tightly follows, using a method called L<NFG, normal form graphemes|/language/glossary#NFG> for efficiently representing them.
734734
735735
=head2 routine codes
736736

0 commit comments

Comments
 (0)