Permalink
Fetching contributors…
Cannot retrieve contributors at this time
156 lines (128 sloc) 6.83 KB
=begin pod :tag<index>
=TITLE Unicode versus ASCII symbols
=SUBTITLE Unicode symbols and their ASCII equivalents
The following Unicode symbols can be used in PerlΒ 6 without needing to
load any additional modules. Some of them have equivalents
which can be typed with ASCII-only characters. These variants are often
composed of more characters than the Unicode versions and so they look
bigger.
Reference is made below to various properties of unicode codepoints.
The definitive list can be found here:
L<https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt>.
=head1 Alphabetic characters
Any codepoint that has the C<Ll> (Letter, lowercase), C<Lu> (Letter,
uppercase), C<Lt> (Letter, titlecase), C<Lm> (Letter, modifier), or
the C<Lo> (Letter, other) property can be used just like any other
alphabetic character from the ASCII range.
=head1 Numeric characters
Any codepoint that has the C<Nd> (Number, decimal digit) property, can
be used as a digit in any number. For example:
my $var = οΌ‘οΌ™; # U+FF11 U+FF19
say $var + 2; # OUTPUT: «21␀»
=head1 Numeric values
Any codepoint that has the C<No> (Number, other) or C<Nl> (Number, letter)
property can be used standalone as a numeric value, such as Β½ and β…“. (These
aren't decimal digit characters, so can't be combined.) For example:
my $var = β…’ + 2 + β…«; # here β…’ is No and Rat and β…« is Nl and Int
say $var; # OUTPUT: «14.1␀»
=head1 Whitespace characters
Besides spaces and tabs you can use any other unicode whitespace
character that has the C<Zs> (Separator, space), C<Zl> (Separator,
line), or C<Zp> (Separator, paragraph) property.
=head1 Other acceptable single codepoints
This list contains the single codepoints [and their ASCII
equivalents] that have a special meaning in PerlΒ 6.
X<|Β«>X<|Β»>X<|Γ—>X<|Γ·>X<|≀>X<|β‰₯>X<|β‰ >X<|βˆ’>X<|∘>X<|β‰…>X<|Ο€>X<|Ο„>X<|𝑒>X<|∞>X<|…>X<|β€˜>X<|’>X<|β€š>X<|β€œ>X<|”>X<|β€ž>X<|ο½’>X<|ο½£>X<|⁺>X<|⁻>X<|Β―>X<|⁰>X<|ΒΉ>X<|Β²>X<|Β³>X<|⁴>X<|⁡>X<|⁢>X<|⁷>X<|⁸>X<|⁹>X<|∈>X<|βˆ‰>X<|βˆ‹>X<|∌>X<|βŠ†>X<|⊈>X<|βŠ‚>X<|βŠ„>X<|βŠ‡>X<|βŠ‰>X<|βŠƒ>X<|βŠ…>X<|β‰Ό>X<|≽>X<|βˆͺ>X<|∩>X<|βˆ–>X<|βŠ–>X<|⊍>X<|⊎>
=table
Symbol | Codepoint | ASCII | Remarks
=======|===========|============|=========================
« | U+00AB | << | as part of «» or .« or regex left word boundary
» | U+00BB | >> | as part of «» or .» or regex right word boundary
Γ— | U+00D7 | * |
Γ· | U+00F7 | / |
≀ | U+2264 | <= |
β‰₯ | U+2265 | >= |
β‰  | U+2260 | != |
βˆ’ | U+2212 | - |
∘ | U+2218 | o |
β‰… | U+2245 | =~= |
Ο€ | U+03C0 | pi | 3.14159_26535_89793_238e0
Ο„ | U+03C4 | tau | 6.28318_53071_79586_476e0
𝑒 | U+1D452 | e | 2.71828_18284_59045_235e0
∞ | U+221E | Inf |
… | U+2026 | ... |
β€˜ | U+2018 | ' | as part of β€˜β€™ or β€™β€˜
’ | U+2019 | ' | as part of β€˜β€™ or β€šβ€™ or β€™β€˜
β€š | U+201A | ' | as part of β€šβ€˜ or β€šβ€™
β€œ | U+201C | " | as part of β€œβ€ or β€β€œ
” | U+201D | " | as part of β€œβ€ or β€β€œ or ””
β€ž | U+201E | " | as part of β€žβ€œ or β€žβ€
ο½’ | U+FF62 | Q// | as part of ο½’ο½£ (Note: Q// variant cannot be used bare in regexes)
ο½£ | U+FF63 | Q// | as part of ο½’ο½£ (Note: Q// variant cannot be used bare in regexes)
⁺ | U+207A | + | (must use explicit number) as part of exponentiation
⁻ | U+207B | - | (must use explicit number) as part of exponentiation
Β― | U+00AF | - | (must use explicit number) as part of exponentiation (macron is an alternative way of writing a minus)
⁰ | U+2070 | **0 | can be combined with ⁰..⁹
¹ | U+00B9 | **1 | can be combined with ⁰..⁹
² | U+00B2 | **2 | can be combined with ⁰..⁹
³ | U+00B3 | **3 | can be combined with ⁰..⁹
⁴ | U+2074 | **4 | can be combined with ⁰..⁹
⁡ | U+2075 | **5 | can be combined with ⁰..⁹
⁢ | U+2076 | **6 | can be combined with ⁰..⁹
⁷ | U+2077 | **7 | can be combined with ⁰..⁹
⁸ | U+2078 | **8 | can be combined with ⁰..⁹
⁹ | U+2079 | **9 | can be combined with ⁰..⁹
βˆ… | U+2205 | set() | (empty set)
∈ | U+2208 | (elem) |
βˆ‰ | U+2209 | !(elem) |
βˆ‹ | U+220B | (cont) |
∌ | U+220C | !(cont) |
βŠ† | U+2286 | (<=) |
⊈ | U+2288 | !(<=) |
βŠ‚ | U+2282 | (<) |
βŠ„ | U+2284 | !(<) |
βŠ‡ | U+2287 | (>=) |
βŠ‰ | U+2289 | !(>=) |
βŠƒ | U+2283 | (>) |
βŠ… | U+2285 | !(>) |
βˆͺ | U+222A | (|) |
∩ | U+2229 | (&) |
βˆ– | U+2216 | (-) |
βŠ– | U+2296 | (^) |
⊍ | U+228D | (.) |
⊎ | U+228E | (+) |
=head2 Atomic operators
The atomic operators have C<U+269B βš› ATOM SYMBOL> incorporated into them. Their
ASCII equivalents are ordinary subroutines, not operators:
my atomicint $x = 42;
$xβš›++; # Unicode version
atomic-fetch-inc($x); # ASCII version
The ASCII alternatives are as follows:
X<|βš›=>X<|βš›>X<|βš›+=>X<|βš›-=>X<|βš›βˆ’=>X<|++βš›>X<|βš›++>X<|--βš›>X<|βš›-->
=table
Symbol | ASCII | Remarks
===============================================================
βš›= | atomic-assign |
βš› | atomic-fetch | this is the prefix:<βš›> operator
βš›+= | atomic-add-fetch |
βš›-= | atomic-sub-fetch |
βš›βˆ’= | atomic-sub-fetch | this operator uses U+2212 minus sign
++βš› | atomic-inc-fetch |
βš›++ | atomic-fetch-inc |
--βš› | atomic-dec-fetch |
βš›-- | atomic-fetch-dec |
=head1 Multiple codepoints
This list contains multiple-codepoint operators that require special
composition for their ASCII equivalents. Note the codepoints
are shown space-separated but should be entered as adjacent codepoints
when used.
X<|Β»=Β»>X<|Β«=Β«>X<|Β«=Β»>X<|Β»=Β«>
=table
Symbol | Codepoints | ASCII | Since | Remarks
=======|==================|=========|=======|=========================
Β»=Β» | U+00BB = U+00BB | >>[=]>> | v6.c | uses ASCII '='
Β«=Β« | U+00AB = U+00AB | <<[=]<< | v6.c | uses ASCII '='
Β«=Β» | U+00AB = U+00BB | <<[=]>> | v6.c | uses ASCII '='
Β»=Β« | U+00BB = U+00AB | >>[=]<< | v6.c | uses ASCII '='
=end pod
# vim: expandtab softtabstop=4 shiftwidth=4 ft=perl6