Skip to content
Elizabeth Mattijsen edited this page Jun 29, 2023 · 16 revisions

⚠ This page is still evolving.

This page was moved from this gist. See the original gist for comments and previous edit history.

Perhaps rakudo wiki is not the best place for this page to live. It will be moved elsewhere once a better home is found.

πŸ‘ You can edit this page πŸ‘


Raku has support for various unicode characters (Β½ ΒΉ ∞ Γ— Γ·, see this link for a full list), but there are other things we can add. Here is a list of things to think about.

The idea behind this page is to store all ideas. In other words, this page is not a TODO list, but a blackboard for brainstorming.

Before we start

To keep us all on the same page, here are some things to note. Good reasons to add a Unicode operator are:

  1. When an established math/compsci notation exists for a Raku feature (and is represented in Unicode).
    Example: Γ— for multiplication, ∞ for Inf, etc.
  2. When a Raku operator or syntax is basically ASCII art trying to paint a larger β€œglyph”, and there's a Unicode character for that exact glyph (not just resembling it, but specifically intended for it).
    Example: β†’ for -> (-> is trying to paint a rightwards arrow, which is exactly what the Unicode char U+2192 RIGHTWARDS ARROW is for)
  3. When it is too hard to implement it in a module (e.g. it is so deep in the parser that trying to reimplement it in a module is completely unreasonable)
    Example: You can define your own ≀ β‰₯ β‰  ops but you cannot get them chain correctly (at least you couldn't at the time)
  4. When a particular sequence is often auto-corrected by software.
    Example: LibreOffice changes '' to β€˜β€™, ... to …, -> to β†’, etc.

Things we probably want to add

This section has things that we might want to add (… maybe!).

≀ β‰₯ β‰  (DONE)

βœ“ in https://github.com/rakudo/rakudo/pull/1032

These are more or less obvious and a lot of people have wondered why Raku does not support them yet.

See also: https://irclogs.raku.org/perl6/2016-01-09.html#11:44

β†’ { }

Why not allow β†’ in pointy blocks?

for ^5 -> $i { say $i } # noo!
for ^5 β†’ $i { say $i } # yeaah!

↔ { }

Same goes for lambdas with rw signatures:

for @values <-> $even, $odd { $even Γ·= 2 } # noo!
for @values ↔ $even, $odd { $even Γ·= 2 } # yeaah!

β‡’

We can also do the same thing with fat arrow:

my %h = 42 => 62 # noo!
my %h = 42 β‡’ 62 # yeaah!

additional quoting delimiters

ΒΏ? , Β‘! and β€” β€” as Spanish inspired quoting delimiters (to act the same as "")

say ΒΏfoo $*IN bar?             # foo <STDIN> bar
say Β‘foo %*ENV<USER> bar!      # foo liz bar
say β€”foo { Date.today } barβ€”   # foo 2019-10-06 bar

Available in https://github.com/rakudo/rakudo/pull/3218 .


Questionable things

This section is for things that we probably don't want to add, or at least not in the near future. Still, we will keep all our ideas written down.

#↓ and #←

So that this:

#| This subroutine does the real work
sub do_raw_magic (
    Spell $s,         #= Which spell to invoke
    *%options         #= How to invoke it
) {...}

Could be written as:

#↓ This subroutine does the real work
sub do_raw_magic (
    Spell $s,         #← Which spell to invoke
    *%options         #← How to invoke it
) {...}

While it makes sense in some cases, #↓ and #← can be misleading in others. See speculations for more examples. Basically, #| is used for the β€œnext thing” which can be #↓ or #β†’, and same for #= where it can mean #↑ or #←.

¬ ∧ ∨ ⊻

Β¬ for logical not
∧ for and
∨ for or
⊻ for xor

√ βˆ› ∜

√, βˆ› and ∜. Probably as prefix operators. Some argue that we should only add √ given that square root is much more common. But if we are adding √, then we should add βˆ› and ∜ for consistency. The argument against adding any of the roots is variant precedence: sqrt 4+5 is 3, but √4+5 would need to be 7 if you're following the standard mathematical precedence.

⁇ β€Ό

⁇ β€Ό as a non-ASCII version of ?? !!.

Done in https://github.com/rakudo/rakudo/pull/1029, then reverted.

See this ticket for more information: https://rt.perl.org/Ticket/Display.html?id=131002

TL;DR: it fails to satisfy criteria mentioned on top of this page. That is, there is no reason to add it (and we are not adding unicode ops just for fun).

βˆ•

βˆ• as an alternative to / and Γ·. Why? Because we already support βˆ’.

U+2215 DIVISION SLASH [Sm] (βˆ•)

(And for those wondering: U+2212 MINUS SIGN [Sm] (βˆ’))

Note that U+2044 FRACTION SLASH [Sm] (⁄) is also listed on this page below.

@Zoffix: supporting `βˆ’` has been a nightmare, with conditionals littered all over the codebase. And even after all that work, it's still not fully supported (can't use it in `sprintf` formats for example, as those are handled in NQP and I'm unsure we want to leak all these fancy ops to NQP). So adding an op just-because is a bad idea and we shouldn't add any more slashes.

3⅐

Raku already supports Unicode fractions, like Β½ and ⅝. A logical extension would be to also support literals like 1Β½.

β—Ί

Triangular reduce can have its own unicode character too.

.say for [\+] ^10
.say for [β—Ί+] ^10

Other possible candidates: β—Ώ, β—₯, etc. (which one represents it the most?)

⌁

U+2301 ELECTRIC ARROW ⌁ for the ~~ operator.

By the way, we can't use β‰ˆ for anything because it brings a confusion about whether it is an approximation or a smartmatch. In that sense β‰… (already supported by Raku) and ⌁ will play well together.

≔ β©΄

≔ as a non-ASCII version of :=. Pretty obvious.

However, if we are going to add that, then we cannot just leave out ::=, which also has a corresponding unicode character:

β©΄ for the ::= operator

The problem is that both are not rendered very nicely by current fonts. β©΄ is also very wide.

β€–

β€– for the || operator

Good, but there is no corresponding non-ASCII version of &&. So I guess that there is no reason to add that right now.

∣ ∀

∣ for  %% (U+2223 DIVIDES [Sm] (∣))
∀ for |%% (U+2224 DOES NOT DIVIDE [Sm] (∀))
The problem is obvious. If you can see the difference between | and ∣ right away, leave your name here (and also note what font you are using):
  1. …

β©΅ β©Ά

While β©΅ can possibly fit into the width of one character, β©Ά definitely won't. Normally, this kind of characters are full-width (they take double the size of a narrow character). Does it prevent us from adding it? Probably not, but it is something to think about.

⏨

It's called DECIMAL EXPONENT SYMBOL after all, that's its *job*; might as well permit it as a synonym for e in floating-point scientific notation, so that 6.02⏨23 is the same as 6.02e23. It's at least as prominent as not-a-digit as e is (and rather more so than E), and has quite a natural reading, as the subscript 10 thus places the exponent as its, well, exponent. It's a one-character change to the parser, of course.

Note that this is not the same as discussions of Subscripts in general, below.

Rakudo PR: https://github.com/rakudo/rakudo/pull/1348

Subscripts

We already have superscripts 4Β² # Woohoo!, but what about subscripts (*β‚‚)? The choice of unicode characters is pretty much obvious, but what should be the meaning?

There are *four* options:

  • Subscripts could be allowed in variable names at the end, so that you can write my $x₁ = 5. This has been implemented as [Slang::Subscripts](https://modules.perl6.org/dist/Slang::Subscripts)
  • Subscripts could be allowed in variable names at the same places where we allow digits, so that you can write my $Hβ‚‚SOβ‚„ = 5. Implemented in https://github.com/rakudo/rakudo/pull/3219
  • Subscripts could act like array subscripts so that you can write @x₁ which will be equivalent to @x[1]
  • Numbers in other bases: HU08₃₆

@AlexDaniel insists on the third option, but most people strongly want the first or second one.

If we go for the third option, then there are some other interesting possibilities:

  • We can use low asterisk to act like a subscript whatever star: @xβŽβ‚‹β‚
  • Or we can use β‚™ as a last index of the array. Like: @xβ‚™. Perhaps this would make mathematicians happy.

Or we can support both ⁎ and β‚™ (because why not).

Another option is to use unicode subscripts as array subscripts if @ sigil is used, but allow subscripts in variable names that have $ sigil. This will probably make all of us happy (but it is going to be so weird… what a horrible idea).

Whatever star

The most problematic case was with the code like (* * *)(4, 2). It got better when school-grade math ops were implemented: (* Γ— *)(4, 2), but still, it would be great to have a unicode equivalent to whatever star.

The problem is, there is no obvious character for that.

There are several classes of proposed characters:

  • Star-like symbols: β˜… β˜† Ω­ βœͺ ✢ ⭐ ✰ 🌟 and so on and so forth, unicode has so many of those it's not even funny…
  • Asterisks: βŠ› ⧆ ⁎ ⁑ * ⁂
  • Chars that look like an empty field to fill: πŸž… β—― β­• πŸ”Ύ β—Œ
  • Other: ⍰ ⍣

The problem with stars is that they perfectly represent the β€œstar” part, but not so much the β€œwhatever” part. Circles are just circles, they just don't have enough meaning in my opinion. ⍰ is an APL char, which we'd much rather leave alone.

There is one more thing: besides Whatever (*) there is also HyperWhatever(**) which perhaps should also get a unicode symbol. This means that not only we have to find one good single character, it would be better if we had a pair of similar characters (e.g. something like β˜… and β˜† but better).

⊞ ⊟ ⊠ ⧆ ⧄ Γ·βƒž

⊞ – [+]
⊟ – [-]
⊠ – [Γ—] (or something else?)
⧆ – [*]
⧄ – [/]

But of course there are many other operators that people use all the time. To solve this we can use U+20DE COMBINING ENCLOSING SQUARE [Me] (β—Œβƒž):

Γ·βƒž – [Γ·]

However, this has a limit of one character per operator, which means that in some cases you will be forced to fall back to ASCII […].

βˆ‘ ∏

A better idea for [+] and [Γ—] is to add βˆ‘ and ∏.

βˆ‘ 1..10 == [+] 1..10 == 55
∏ 1..10 == [Γ—] 1..10 == 3628800

⇔

⇔ can be used as a spaceship operator (<=>). But there are other candidates as well:

U+22DA  LESS-THAN EQUAL TO OR GREATER-THAN β‹š
U+22DB  GREATER-THAN EQUAL TO OR LESS-THAN β‹›
U+1F680 ROCKET                             πŸš€

The last one is a joke, of course.

-->

β‡’
βŽ―β†’

β€₯

say 2β€₯5

⁄

U+2044 FRACTION SLASH [Sm] (⁄)

We may want to support ⁄ in addition to Γ·, / and βˆ•β€¦ but this one is a little bit special because it is meant for creating fractions (like β…”, but with any other numbers). What is supposed to happen if you have a variable in there?

RETURN SYMBOL [So] (⏎)

What about using ⏎ for return?


πŸ†• ideas!

This section is for ideas that have no ASCII equivalents. That is, addition of these things will also require addition of ASCII versions.

Β±

Β± can be used to create ranges. Example:

sub infix:<Β±> { Range.new: $^a - $^b, $a + $b };
say 5 ± 2      # OUTPUT: «3..7␀»
say 4 ~~ 5 ± 2 # OUTPUT: «True␀»
say 0 ~~ 5 ± 2 # OUTPUT: «False␀»

Alternatively, Β± could create junctions. In this case, it'd be both a prefix and an infix operator.

Β±1 == 1 | -1
5 Β± 2 == 3 | 7
$x = (-$b Β± √($bΒ² - 4Γ—$aΓ—$c)) / (2Γ—$a);

βŒŠβŒ‹ βŒˆβŒ‰

βŒŠβ€¦βŒ‹ for floor(…)
βŒˆβ€¦βŒ‰ for ceil(…)
U+230A LEFT FLOOR [Ps] (⌊)
U+230B RIGHT FLOOR [Pe] (βŒ‹)
U+2308 LEFT CEILING [Ps] (⌈)
U+2309 RIGHT CEILING [Pe] (βŒ‰)

What would the ASCII variants for this be? |_…_| and |^…^|? These are probably better off without ASCII equivalents…

πŸ™Ό

There is an idea that πŸ™Ό (VERY HEAVY SOLIDUS) can produce a FatRat, as in:

sub infix:<πŸ™Ό> { FatRat.new: $^a, $^b }

βŸ…βŸ†

We can use βŸ…βŸ† for creating bags.

U+27C5 LEFT S-SHAPED BAG DELIMITER [Ps] (βŸ…)
U+27C6 RIGHT S-SHAPED BAG DELIMITER [Pe] (βŸ†)