/
S06-routines.pod
3614 lines (2671 loc) · 141 KB
/
S06-routines.pod
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
=encoding utf8
=head1 TITLE
Synopsis 6: Subroutines
=head1 VERSION
Created: 21 Mar 2003
Last Modified: 16 Oct 2015
Version: 169
This document summarizes Apocalypse 6, which covers subroutines and the
new type system.
=head1 Subroutines and other code objects
C<Routine> is the parent type of all keyword-declared code blocks.
All routines are born with undefined values of C<$_>, C<$!>,
and C<$/>, unless the routine declares them otherwise explicitly.
A compilation unit, such as a module file or an C<EVAL> string, is also
considered a routine, or you would not be
able to reference C<$!> or C<$/> in them.
Non-routine code C<Block>s,
declared with C<< -> >> or with bare curlies, are born only with C<$_>,
which is aliased to its OUTER::<$_> unless bound as a parameter.
A block generally uses the C<$!> and C<$/> defined by the innermost
enclosing routine, unless C<$!> or C<$/> is explicitly declared in
the block.
A thunk is a piece of code that may not execute immediately, for instance
because it is part of a conditional operator, or a default initialization of
an attribute. It has no scope of its own, so any new variables defined in
a thunk, will leak to the scope that they're in. Note however that
any and all lazy constructs, whether block-based or thunk-based,
such as gather or start or C<< ==> >> should declare their own C<$/>
and C<$!> so that the user's values for those variables cannot be
clobbered asynchronously.
B<Subroutines> (keyword: C<sub>) are non-inheritable routines with
parameter lists.
B<Methods> (keyword: C<method>) are inheritable routines which always
have an associated object (known as their invocant) and belong to a
particular kind or class.
B<Submethods> (keyword: C<submethod>) are non-inheritable methods, or
subroutines masquerading as methods. They have an invocant and belong to
a particular kind or class.
B<Regexes> (keyword: C<regex>) are methods (of a grammar) that perform
pattern matching. Their associated block has a special syntax (see
Synopsis 5). (We also use the term "regex" for anonymous patterns
of the traditional form.)
B<Tokens> (keyword: C<token>) are regexes that perform low-level
non-backtracking (by default) pattern matching.
B<Rules> (keyword: C<rule>) are regexes that perform non-backtracking
(by default) pattern matching (and also enable rules to do whitespace
dwimmery).
B<Macros> (keyword: C<macro> or C<slang>) are routines or methods that are
installed such that they will be called as part of the compilation process,
and which can therefore take temporary control of the subsequent
compilation to cheat in any of the ways that a compiler might cheat.
=head1 Routine modifiers
B<Multis> (keyword: C<multi>) are routines that can have multiple
variants that share the same name, selected by arity, types, or some
other constraints.
B<Prototypes> (keyword: C<proto>) specify the commonalities (such
as parameter names, fixity, and associativity) shared by all multis
of that name in the scope of the C<proto> declaration. Abstractly,
the C<proto> is a generic wrapper around the dispatch to the C<multi>s.
Each C<proto> is instantiated into an actual dispatcher for each scope
that needs a different candidate list.
B<Only> (keyword: C<only>) routines do not share their short names
with other routines. This is the default modifier for all routines,
unless a C<proto> of the same name was already in scope. (For subs,
the governing C<proto> must have been declared in the same file, so
C<proto> declarations from the setting or other modules don't have
this effect unless explicitly imported.)
A modifier keyword may occur before the routine keyword in a named routine:
only sub foo {...}
proto sub foo {...}
dispatch sub foo {...} # internal
multi sub foo {...}
only method bar {...}
proto method bar {...}
dispatch method bar {...} # internal
multi method bar {...}
If the routine keyword is omitted, it defaults to C<sub>.
Modifier keywords cannot apply to anonymous routines.
A C<proto> is a generic dispatcher, which any given scope with a unique
candidate list will instantiate into a C<dispatch> routine. Hence
a C<proto> is never called directly, much like a C<role> can't be
used as an instantiated object.
When you call any routine (or method, or rule) that may have multiple
candidates, the basic dispatcher is really only calling an "only"
sub or method--but if there are multiple candidates, the "only" that
will be found is really a dispatcher. This instantiated C<dispatch>
is always called first (at least in the abstract--this can often be
optimized away). In essence, a C<dispatch> is dispatched exactly
like an C<only> sub, but the C<dispatch> itself may delegate to any
of the candidates it is "managing".
It is the C<dispatch>'s responsibility to first vet the arguments for all the
candidates; any call that does not successfully bind the C<dispatch>'s signature fails outright.
(Its signature is a copy of one belonging to the C<proto> from which it was instantiated.)
The C<dispatch> does not necessarily send the original capture to its candidates, however.
Named arguments that bind to positionals in the C<dispatch> sig will become positionals
for all subsequent calls to its managed multis.
The dispatch then considers its list of managed candidates from the
viewpoint of the caller or object, sorts them into some order, and
dispatches them according to the rules of multiple dispatch as defined
for each of the various dispatchers. In the case of multi subs, the
candidate list is known at compile time. In the case of multi methods,
it may be necessary to generate (or regenerate) the candidate list at
run time, depending on what is known when about the inheritance tree.
This default dispatch behavior is symbolized within the original
C<proto> by a block containing of a single C<*> (that is, a
"whatever"). Hence the typical C<proto> will simply have a body
of C<{*}>.
proto method bar {*}
(We don't use C<...> for that because it would fail at run time,
and the proto's instantiated C<dispatch> blocks are not stubs, but
are intended to be executed.)
Other statements may be inserted before and after the C<{*}>
statement to capture control before or after the multi dispatch:
proto foo ($a,$b) { say "Called with $a $b"; {*}; say "Returning"; }
(That C<proto> is only good for C<multi>s with side effects and no return
value, since it returns the result of C<say>, which might not be what
you want. See below for how to fix that.)
The syntactic form C<&foo> (without a modifying signature) can never
refer to a C<multi> candidate or a generic C<proto>. It may only
refer to the single C<only> or C<dispatch> routine that would first
be called by C<foo()>. Individual C<multi>s may be named by appending
a signature to the noun form: C<&foo:($,$,*@)>.
We used the term "managed" loosely above to indicate the set of C<multi>s in
question; the "managed set" is more accurately defined as the intersection
of all the C<multi>s in the C<proto>'s downward scope with all the C<multi>s that
are visible to the caller's upward-looking scope. For ordinary routines
this means looking down lexical scopes and looking up lexical scopes. [This
is more or less how C<multi>s already behave.]
For methods this means looking down or up the inheritance tree; "managed set"
in this case translates to the intersection of all methods in the C<proto>'s
class or its subclasses with all C<multi> methods visible to the object in its
parent classes, that is, the parent classes of the object's actual type on
whose behalf the method was called. [Note, this is a change from prior
multi method semantics, which restricted multimethods to a single class;
the old semantics is equivalent to defining a C<proto> in every class that has
multimethods. The new way gives the user the ability to intermix C<multi>s at
different inheritance levels].
Also, the old semantics of C<proto> providing the most-default C<multi> body
is hereby deprecated. Default C<multi>s should be marked with "C<is default>".
It is still possible to provide default behavior in the C<proto>, however, by
using it as a wrapper:
my proto sub foo (@args) {
do-something-before(@args);
{*} # call into the managed set, then come back
do-something-after(@args);
}
Note that this returns the value of do-something-after(), not the C<multi>.
There are two ways to get around that. Here's one way:
my proto sub foo (@args) {
ENTER do-something-before(@args);
{*}
LEAVE do-something-after(@args);
}
Alternately, you can spell out what C<{*}> is actually sugar for,
which would be some dispatcher macro such as:
my proto sub foo (|cap (@args)) {
do-something-before(@args);
my \retcap = MULTI-DISPATCH-CALLWITH(&?ROUTINE, cap);
do-something-after(@args);
return retcap;
}
which optimizes (we hope) to an inlined multidispatcher to locate all
the candidates for these arguments (hopefully memoized), create the dynamic
scope of a dispatch, start the dispatch, manage C<callnext> and C<lastcall>
semantics, and return the result of whichever C<multi> succeeded, if any.
Which is why we have C<{*}> instead.
Another common variant would be to propagate control to the
outer/higher routine that would have been found if this one didn't
exist:
my proto method foo { {*}; UNDO nextsame; } # failover to super foo
Note that, in addition to making C<multi>s work similarly to each other,
the new C<proto> semantics greatly simplify top-level dispatchers, which
never have to worry about C<multi>s, because C<multi>s are always in the
second half of the double dispatch (again, just in the abstract, since
the first dispatch can often be optimized away, as if the C<proto> were
inlined). So in the abstract, C<foo()> only ever calls a single
C<only>/C<proto> routine, and we know which one it is at compile time.
This is less of a shift for method dispatch, which already assumed that there
is something like a single proto in each class that redispatches inside
the class. Here the change is that multi-method dispatcher needs to look
more widely for its candidates than the current class. But note that our
semantics were inconsistent before, insofar as regex methods already had to
look for this larger managed set in order to do transitive LTM correctly.
Now the semantics of normal method C<proto>s and regex C<proto>s are nearly
identical, apart from the fact that regex candidate lists naturally have
fancier tiebreaking rules involving longest token matching.
A C<dispatch> must be generated for every scope that contains one or more C<multi>
declaration. This is done by searching backwards and outwards (or up the
inheritance chain for methods) for a C<proto> to instantiate. If no such
C<proto> is found, a "most generic" C<proto> will be generated, something like:
proto sub foo (*@, *%) {*}
proto method foo (*@, *%) {*}
Obviously, no named-to-positional remapping can be done in this case.
[Conjecture: we could instead autogen a more specific signature for
each such autogenerated C<dispatch> once we know its exact candidate
set, such that consistent use of positional parameter names is rewarded
with positional names in the generated signature, which could remap
named parameters.]
=head2 Named subroutines
The general syntax for named subroutines is any of:
my RETTYPE sub NAME ( PARAMS ) TRAITS {...} # lexical only
sub NAME ( PARAMS ) TRAITS {...} # same as "my"
our RETTYPE sub NAME ( PARAMS ) TRAITS {...} # package-scoped
The return type may also be put inside the parentheses:
sub NAME (PARAMS --> RETTYPE) {...}
Unlike in Perl 5, named subroutines are considered expressions,
so this is valid Perl 6:
my @subs = (sub foo { ... }, sub bar { ... });
Another difference is that subroutines default to C<my> scope rather
than C<our> scope. However, subroutine dispatch searches lexical
scopes outward, and subroutines are also allowed to be I<postdeclared>
after their use, so you won't notice this much. A subroutine that is
not declared yet may be called using parentheses around the arguments,
in the absence of parentheses, the subroutine call is assumed to take
multiple arguments in the form of a list operator.
=head2 Anonymous subroutines
The general syntax for anonymous subroutines is:
sub ( PARAMS ) TRAITS {...}
But one can also use the C<anon> scope modifier to introduce the return type first:
anon RETTYPE sub ( PARAMS ) TRAITS {...}
When an anonymous subroutine will be assigned to a scalar variable,
the variable can be declared with the signature of the routines that
will be assigned to it:
my $grammar_factory:(Str, int, int --> Grammar);
$grammar_factory = sub (Str $name, int $n, int $x --> Grammar) { ... };
Covariance allows a routine (that has a more derived return type than what is
defined in the scalar's signature) to be assigned to that scalar.
Contravariance allows a routine (with parameter types that are less derived
than those in the scalar's signature) to be assigned to that scalar. The
compiler may choose to enforce (by type-checking) such assignments at
compile-time, if possible. Such type annotations are intended to help the
compiler optimize code to the extent such annotations are included and/or to
the extent they aid in type inference.
The same signature can be used to mark the type of a closure parameter to
another subroutine:
sub (int $n, &g_fact:(Str, int, int --> Grammar) --> Str) { ... }
B<Trait> is the name for a compile-time (C<is>) property.
See L<"Properties and traits">.
=head2 Perl5ish subroutine declarations
You can declare a sub without parameter list, as in Perl 5:
sub foo {...}
This is equivalent to one of:
sub foo () {...}
sub foo (*@_) {...}
sub foo (*%_) {...}
sub foo (*@_, *%_) {...}
depending on whether either or both of those variables are used in the body of the routine.
Positional arguments implicitly come in via the C<@_> array, but
unlike in Perl 5 they are C<readonly> aliases to actual arguments:
sub say { print qq{"@_[]"\n}; } # args appear in @_
sub cap { $_ = uc $_ for @_ } # Error: elements of @_ are read-only
Also unlike in Perl 5, Perl 6 has true named arguments, which come in
via C<%_> instead of C<@_>.
If you need to modify the elements of C<@_> or C<%_>, declare the
array or hash explicitly with the C<is rw> trait:
sub swap (*@_ is rw, *%_ is rw) { @_[0,1] = @_[1,0]; %_<status> = "Q:S"; }
Note: the C<rw> container trait is automatically distributed to the
individual elements by the slurpy star even though there is no
actual array or hash passed in. More precisely, the slurpy star
means the declared formal parameter is I<not> considered readonly; only
its elements are. See L</Parameters and arguments> below.
Note also that if the sub's block contains placeholder variables
(such as C<$^foo> or C<$:bar>), those are considered to be formal
parameters already, so in that case C<@_> or C<%_> fill the role of
sopping up unmatched arguments. That is, if those containers are
explicitly mentioned within the body, they are added as slurpy
parameters. This allows you to easily customize your error message
on unrecognized parameters. If they are not mentioned in the body,
they are not added to the signature, and normal dispatch rules will
simply fail if the signature cannot be bound.
=head2 Blocks
Raw blocks are also executable code structures in Perl 6.
Every block defines an object of type C<Block> (which C<does Callable>), which may either be
executed immediately or passed on as a C<Block> object. How a block is
parsed is context dependent.
A bare block where an operator is expected terminates the current
expression and will presumably be parsed as a block by the current
statement-level construct, such as an C<if> or C<while>. (If no
statement construct is looking for a block there, it's a syntax error.)
This form of bare block requires leading whitespace because a bare
block where a postfix is expected is treated as a hash subscript.
A bare block where a term is expected merely produces a C<Block> object.
If the term bare block occurs in a list, it is considered the final
element of that list unless followed immediately by a comma or colon
(intervening C<\h*> or "unspace" is allowed).
=head2 "Pointy blocks"
Semantically the arrow operator C<< -> >> is almost a synonym for the
C<sub> keyword as used to declare an anonymous subroutine, insofar as
it allows you to declare a signature for a block of code. However,
the parameter list of a pointy block does not require parentheses,
and a pointy block may not be given traits. In most respects,
though, a pointy block is treated more like a bare block than like
an official subroutine. Syntactically, a pointy block may be used
anywhere a bare block could be used:
my $sq = -> $val { $val**2 };
say $sq(10); # 100
my @list = 1..3;
for @list -> $elem {
say $elem; # prints "1\n2\n3\n"
}
It also behaves like a block with respect to control exceptions.
If you C<return> from within a pointy block, the block is transparent
to the return; it will return from the innermost enclosing C<sub> or
C<method> (et al.), not from the block itself. It is referenced by C<&?BLOCK>,
not C<&?ROUTINE>.
A normal pointy block's parameters default to C<readonly>, just like
parameters to a normal sub declaration. However, the double-pointy variant
defaults parameters to C<rw>:
for @list <-> $elem {
$elem++;
}
This form applies C<rw> to all the arguments:
for @kv <-> $key, $value {
$key ~= ".jpg";
$value *= 2 if $key ~~ :e;
}
=head2 Stub declarations
To predeclare a subroutine without actually defining it, use a "stub block":
sub foo {...} # Yes, those three dots are part of the actual syntax
The old Perl 5 form:
sub foo;
is a compile-time error in Perl 6 (because it would imply that the body of the
subroutine extends from that statement to the end of the file, as C<class> and
C<module> declarations do). The only allowed use of the semicolon form is to
declare a C<MAIN> sub--see L</Declaring a MAIN subroutine> below. (And this
form requires the C<unit> declarator in front.)
Redefining a stub subroutine does not produce an error, but redefining
an already-defined subroutine does. If you wish to redefine a defined sub,
you must explicitly use the "C<supersede>" declarator. (The compiler may
refuse to do this if it has already committed to the previous definition.)
The C<...> is the "yadayadayada" operator, which is executable but
returns a failure. You can also use C<???> to fail with a warning
(a lazy one, to be issued only if the value is actually used),
or C<!!!> to always die. These also officially define stub blocks.
Any of these yada operators will be taken as a stub if used as the main
operator of the first statement in the block. (Statement modifiers
are allowed on that statement.) The yada operators differ from their
respective named functions in that they all default to a message
such as: "Unimplemented stub of sub foo was executed".
It has been argued that C<...> as literal syntax is confusing when
you might also want to use it for metasyntax within a document.
Generally this is not an issue in context; it's never an issue in the
program itself, and the few places where it could be an issue in the
documentation, a comment will serve to clarify the intent, as above.
The rest of the time, it doesn't really matter whether the reader
takes C<...> as literal or not, since the purpose of C<...> is to
indicate that something is missing whichever way you take it.
=head2 Globally scoped subroutines
Subroutines and variables can be declared in the global namespace
(or any package in the global namespace), and are thereafter visible
everywhere in the program via the GLOBAL package (or one of its
subpackages). They may be made directly visible by importation,
but may not otherwise be called with a bare identifier, since subroutine
dispatch only looks in lexical scopes.
Global subroutines and variables are normally referred to by prefixing
their identifiers with the C<*> twigil, to allow dynamically scoped overrides.
GLOBAL::<$next_id> = 0;
sub GLOBAL::saith($text) { say "Yea verily, $text" }
module A {
my $next_id = 2; # hides any global or package $next_id
&*saith($next_id); # print the lexical $next_id;
&*saith($*next_id); # print the dynamic $next_id;
}
To disallow dynamic overrides, you must access the globals directly:
GLOBAL::saith($GLOBAL::next_id);
The fact that this is verbose is construed to be a feature. Alternately,
you may play aliasing tricks like this:
module B {
import GLOBAL <&saith $next_id>;
saith($next_id); # Unambiguously the global definitions
}
Despite the fact that subroutine dispatch only looks in lexical scopes, you
can always call a package subroutine directly if there's a lexical alias
to it, as the C<our> declarator does:
unit module C;
our sub saith($text) { say "Yea verily, $text" }
saith("I do!") # okay
C::saith("I do!") # also okay
=head2 Dynamically scoped subroutines
Similarly, you may define dynamically scoped subroutines:
my sub myfunc ($x) is dynamic { ... }
my sub &*myfunc ($x) { ... } # same thing
This may then be invoked via the syntax for dynamic variables:
&*myfunc(42);
=head2 Lvalue subroutines
Lvalue subroutines return a "proxy" object that can be assigned to.
It's known as a proxy because the object usually represents the
purpose or outcome of the subroutine call.
Subroutines are specified as being lvalue using the C<is rw> trait.
An lvalue subroutine may return a variable:
my $lastval;
sub lastval () is rw { return $lastval }
or the result of some nested call to an lvalue subroutine:
sub prevval () is rw { return lastval() }
or a specially tied proxy object, with suitably programmed
C<FETCH> and C<STORE> methods:
sub checklastval ($passwd) is rw {
return Proxy.new:
FETCH => method {
return lastval();
},
STORE => method ($val) {
die unless check($passwd);
lastval() = $val;
};
}
Other methods may be defined for specialized purposes such as temporizing
the value of the proxy.
=head2 Raw subroutines
If the subroutine doesn't care whether the returned value is a container or not,
it may declare this with C<is raw>, to indicate that the return value should
be returned raw, without attempting any decontainerization. This can be useful for
routines that wish to process mixed containers and non-containers without distinction.
=head2 Operator overloading
Operators are just subroutines with special names and scoping.
An operator name consists of a grammatical category name followed by
a single colon followed by an operator name specified as if it were
one or more strings. So any of these indicates the same binary addition operator:
infix:<+>
infix:«+»
infix:<<+>>
infix:['+']
infix:["+"]
Use the C<&> sigil just as you would on ordinary subs.
Unary operators are defined as C<prefix> or C<postfix>:
sub prefix:<OPNAME> ($operand) {...}
sub postfix:<OPNAME> ($operand) {...}
Binary operators are defined as C<infix>:
sub infix:<OPNAME> ($leftop, $rightop) {...}
Bracketing operators are defined as C<circumfix> where a term is expected
or C<postcircumfix> where a postfix is expected. A two-element slice
containing the leading and trailing delimiters is the name of the
operator.
sub circumfix:<LEFTDELIM RIGHTDELIM> ($contents) {...}
sub circumfix:['LEFTDELIM','RIGHTDELIM'] ($contents) {...}
Contrary to Apocalypse 6, there is no longer any rule about splitting an even
number of characters. You must use a two-element slice. Such names
are canonicalized to a single form within the symbol table, so you
must use the canonical name if you wish to subscript the symbol table
directly (as in C<< PKG::{'infix:<+>'} >>). Otherwise any form will
do. (Symbolic references do not count as direct subscripts since they
go through a parsing process.) The canonical form always uses angle
brackets and a single space between slice elements. The elements
are escaped on brackets, so C<< PKG::circumfix:['<','>'] >> is canonicalized
to C<<< PKG::{'circumfix:<\< \>>'} >>>, and decanonicalizing may always
be done left-to-right.
Operator names can be any sequence of non-whitespace characters
including Unicode characters. For example:
sub infix:<(c)> ($text, $owner) { return $text but Copyright($owner) }
method prefix:<±> (Num $x --> Num) { return +$x | -$x }
multi sub postfix:<!> (Int $n) { $n < 2 ?? 1 !! $n*($n-1)! }
my $document = $text (c) $me;
my $tolerance = ±7!;
<!-- This is now a comment -->
Whitespace may never be part of the name (except as separator
within a C<< <...> >> or C<«...»> slice subscript, as in the example above).
A null operator name does not define a null or whitespace operator, but
a default matching subrule for that syntactic category, which is useful when
there is no fixed string that can be recognized, such as tokens beginning
with digits. Such an operator I<must> supply an C<is parsed> trait.
The Perl grammar uses a default subrule for the C<:1st>, C<:2nd>, C<:3rd>,
etc. regex modifiers, something like this:
sub regex_mod_external:<> ($x) is parsed(token { \d+[st|nd|rd|th] }) {...}
Such default rules are attempted in the order declared. (They always follow
any rules with a known prefix, by the longest-token-first rule.)
Although the name of an operator can be installed into any package or
lexical namespace, the syntactic effects of an operator declaration are
always lexically scoped. Operators other than the standard ones should
not be installed into the C<GLOBAL::> namespace. Always use exportation to make
non-standard syntax available to other scopes.
=head1 Calling conventions
In Perl 6 culture, we distinguish the terms I<parameter> and
I<argument>; a parameter is the formal name that will attach to an
incoming argument during the course of execution, while an argument
is the actual value that will be bound to the formal parameter.
The process of attaching these values (arguments) to their temporary
names (parameters) is known as I<binding>. (Some C.S. literature
uses the terms "formal argument" and "actual argument" for these
two concepts, but here we try to avoid using the term "argument"
for formal parameters.)
Various Perl 6 code objects (either routines or blocks) may be
declared with parameter lists, either explicitly by use of a signature
declaration, or implicitly by use of placeholder variables within the body
of code. (Use of both for the same code block is not allowed.)
=head1 Signatures
A signature consists of a list of zero or more parameter declarations,
separated by commas. (These are described below.) Signatures are
usually found inside parentheses (within routine declarations), or
after an arrow C<< -> >> (within block declarations), but other forms
are possible for specialized cases. A signature may also indicate what
the code returns, either generally or specifically. This is indicated
by placing the return specification after a C<< --> >> token. If the
return specification names a type (that is, an indefinite object),
then a successful call to the code must always return a value of
that type. If the return specification returns a definite object,
then that value is always returned from a successful call. (For this
purpose the C<Nil> value is treated as definite.) An unsuccessful call
may always call C<fail> to return a C<Failure> object regardless of
the return specification.
Ordinarily, if the return is specified as a type (or is unspecified),
the final statement of the block will be evaluated for its return
value, and this will be the return value of the code block as a whole.
(It must conform to the return type specification, if provided.)
An explicit C<return> may be used instead to evaluate the C<return>'s
arguments as the code block's return value, and leave the code block
immediately, short-circuiting the rest of the block's execution.
If the return specification is a definite immutable value (or C<Nil>) rather than
a type, then all top-level statements in the code block are evaluated
only for their side effects; in other words, all of the statements are
evaluated in sink context, including the final statement. An explicit
C<return> statement is allowed, but only in argumentless form, to
indicate that execution is to be short-circuited and the I<declared>
return value is to be returned. No other value may be returned in
its place.
If the return specification is definite but not an immutable value,
then it must be a mutable container (variable) of some sort.
The container variable is declared as any other parameter would be, but
no incoming argument will ever be bound to it. It is permitted
to supply a default value, in which case the return variable will
always be initialized with that default value. Like other variables
declared in a signature, a new variable will B<always> be created; any
existing variable will automatically be shadowed. If you want to have
the return variable reference an existing variable, you must resort to
C<< OUTER::<foo> >> hackery. As with value return, all top-level statements
are evaluated in sink context, and only argumentless C<return> is allowed,
indicating that the current contents of the return value should be returned.
Note that the default return policy assumes functional semantics, with
the result that a loop as the final statement would be evaluated as
a map, which may surprise some people. An implementation is allowed
to warn when it finds such a loop; this warning may be suppressed by
supplying a return specification, which will also determine whether
the final loop statement is evaluated in sink context.
=head1 Parameters and arguments
By default, all Scalar parameters are readonly. When a value is passed, it
is simply directly bound to the parameter name. When a scalar is passed, the
value held in the scalar is obtained. It is then assigned into another scalar
container that will, from that point on, be readonly (that is, no further
assignments can be made to it). Implementations may, as an optimization, also
simply bind the value obtained from a passed Scalar if they can prove it is
not Iterable (and therefore elimination of the container would not affect
flattening behavior).
Array and hash parameters are simply bound "as is". (Conjectural: future
versions of Perl 6 may do static analysis and forbid assignments to array
and hash parameters that can be caught by it. This will, however, only
happen with the appropriate "use" declaration to opt in to that language
version.)
To allow modification, use the C<is rw> trait. This requires a mutable
object or container as an argument (or some kind of type object that
can be converted to a mutable object, such as might be returned
by an array or hash that knows how to autovivify new elements).
Otherwise the signature fails to bind, and this candidate routine
cannot be considered for servicing this particular call. (Other multi
candidates, if any, may succeed if they don't require C<rw> for this
parameter.) In any case, failure to bind does not by itself cause
an exception to be thrown; that is completely up to the dispatcher.
To pass-by-copy, use the C<is copy> trait. An object container will
be cloned whether or not the original is mutable, while an (immutable)
value will be copied into a suitably mutable container. The parameter
may bind to any argument that meets the other typological constraints
of the parameter.
If you have a readonly scalar parameter C<$ro>, it may never be passed
on to a C<rw> scalar parameter of a subcall, since the rw-ness was already
eliminated. A C<$ro> parameter may also not be rebound; trying to do so
results in a compile time error.
Aliases of C<$ro> are also readonly, whether generated explicitly with C<:=>
or implicitly within a C<Capture> object (which are themselves immutable).
Also, C<$ro> may not be returned from an lvalue subroutine or method.
Parameters may be required or optional. They may be passed by position,
or by name. Individual parameters may confer an item or list context
on their corresponding arguments, but unlike in Perl 5, this is decided
lazily at parameter binding time.
Arguments destined for required positional parameters must come before
those bound to optional positional parameters. Arguments destined
for named parameters may come before and/or after the positional
parameters. (To avoid confusion it is highly recommended that all
positional parameters be kept contiguous in the call syntax, but
this is not enforced, and custom arg list processors are certainly
possible on those arguments that are bound to a final slurpy or
arglist variable.)
A signature containing a name collision is considered a compile time
error. A name collision can occur between positional parameters, between
named parameters, or between a positional parameter and a named one.
The sigil is not considered in such a comparison, except in the case of
two positional parameters -- in other words, a signature in which two
or more parameters are identical except for the sigil is still OK (but
you won't be able to pass values by that name).
:($a, $a) # wrong, two $a
:($a, @a) # OK (but don't do that)
:($a, :a($b)) # wrong, one $a from positional, one $a from named parameter
:($a, :a(@b)) # wrong, same
:(:$a, :@a) # wrong, can only have one named parameter "a"
=head2 Named arguments
Named arguments are recognized syntactically at the "comma" level.
Since parameters are identified using identifiers, the recognized
syntaxes are those where the identifier in question is obvious.
You may use either the adverbial form, C<:name($value)>, or the
autoquoted arrow form, C<< name => $value >>. These must occur at
the top "comma" level, and no other forms are taken as named pairs
by default. Pairs intended as positional arguments rather than named
arguments may be indicated by extra parens or by explicitly quoting
the key to suppress autoquoting:
doit :when<now>,1,2,3; # always a named arg
doit (:when<now>),1,2,3; # always a positional arg
doit when => 'now',1,2,3; # always a named arg
doit (when => 'now'),1,2,3; # always a positional arg
doit 'when' => 'now',1,2,3; # always a positional arg
Only bare keys with valid identifier names are recognized as named arguments:
doit when => 'now'; # always a named arg
doit 'when' => 'now'; # always a positional arg
doit 123 => 'now'; # always a positional arg
doit :123<now>; # always a positional arg
Going the other way, pairs intended as named arguments that don't look
like pairs must be introduced with the C<|> prefix operator:
$pair = :when<now>;
doit $pair,1,2,3; # always a positional arg
doit |$pair,1,2,3; # always a named arg
doit |get_pair(),1,2,3; # always a named arg
doit |('when' => 'now'),1,2,3; # always a named arg
Note the parens are necessary on the last one due to precedence.
Likewise, if you wish to pass a hash and have its entries treated as
named arguments, you must dereference it with a C<|>:
%pairs = (:when<now>, :what<any>);
doit %pairs,1,2,3; # always a positional arg
doit |%pairs,1,2,3; # always named args
doit |%(get_pair()),1,2,3; # always a named arg
doit |%('when' => 'now'),1,2,3; # always a named arg
Variables with a C<:> prefix in rvalue context autogenerate pairs, so you
can also say this:
$when = 'now';
doit $when,1,2,3; # always a positional arg of 'now'
doit :$when,1,2,3; # always a named arg of :when<now>
In other words C<:$when> is shorthand for C<:when($when)>. This works
for any sigil:
:$what :what($what)
:@what :what(@what)
:%what :what(%what)
:&what :what(&what)
Ordinary hash notation will just pass the value of the hash entry as a
positional argument regardless of whether it is a pair or not.
To pass both key and value out of hash as a positional pair, use C<:p>
instead:
doit %hash<a>:p,1,2,3;
doit %hash{'b'}:p,1,2,3;
The C<:p> stands for "pairs", not "positional"--the C<:p> adverb may be
placed on any C<Associative> access subscript to make it mean "pairs" instead of "values".
If you want the pair (or pairs) to be interpreted as named arguments,
you may do so by prefixing with the C<< prefix:<|> >> operator:
doit |(%hash<a>:p),1,2,3;
doit |(%hash{'b'}:p),1,2,3;
(The parens are required to keep the C<:p> adverb from attaching to C<< prefix:<|> >> operator.)
C<Pair> constructors are recognized syntactically at the call level and
put into the named slot of the C<Capture> structure. Hence they may be
bound to positionals only by name, not as ordinary positional C<Pair>
objects. Leftover named arguments can be slurped into a slurpy hash.
Because named and positional arguments can be freely mixed, the
programmer always needs to disambiguate pairs literals from named
arguments with parentheses or quotes:
# Named argument "a"
push @array, 1, 2, :a<b>;
# Pair object (a=>'b')
push @array, 1, 2, (:a<b>);
push @array, 1, 2, 'a' => 'b';
Perl 6 allows multiple same-named arguments, and records the relative
order of arguments with the same name. When there are more than one
argument, the C<@> sigil in the parameter list causes the arguments
to be concatenated:
sub fun (Int :@x) { ... }
fun( x => 1, x => 2 ); # @x := (1, 2)
fun( x => (1, 2), x => (3, 4) ); # @x := (1, 2, 3, 4)
Other sigils bind only to the I<last> argument with that name:
sub fun (Int :$x) { ... }
fun( x => 1, x => 2 ); # $x := 2
fun( x => (1, 2), x => (3, 4) ); # $x := (3, 4)
This means a hash holding default values must come I<before> known named
parameters, similar to how hash constructors work:
# Allow "x" and "y" in %defaults to be overridden
f( |%defaults, x => 1, y => 2 );
=head2 Invocant parameters
A method invocant may be specified as the first parameter in the parameter
list, with a colon (rather than a comma) immediately after it:
method get_name ($self:) {...}
method set_name ($_: $newname) {...}
The corresponding argument (the invocant) is evaluated in item context
and is passed as the left operand of the method call operator:
print $obj.get_name();
$obj.set_name("Sam");
The invocant is actually stored as the first positional argument of a C<Capture>
object. It is special only to the dispatcher, otherwise it's just a normal
positional argument.
Single-dispatch semantics may also be requested by using the indirect object syntax, with a colon
after the invocant argument. The colon is just a special form of the comma, and has the
same precedence:
set_name $obj: "Sam";
$obj.set_name("Sam"); # same as the above
An invocant is the topic of the corresponding method if that formal
parameter is declared with the name C<$_>.
If you have a call of the form:
foo(|$capture)
the compiler must defer the decision on whether to treat it as a method
or function dispatch based on whether the supplied C<Capture>'s first
argument is marked as an invocant. For ordinary calls this can
always be determined at compile time, however.
=head2 Parameters with type constraints
Parameters can be constraint to other types than the default simply by
using the type name in from of the parameter:
sub double(Numeric $x) { 2 * $x }
If no explicit type constraint is given, it defaults to the type of the
surrounding package for method invocants, and to C<Any> everywhere else.
A bare C<:D>, C<:U> or C<:_> instead of a type constraint limits the default
type to definite objects (aka instances), undefined objects (aka type objects),
or any object, respectively. The default still applies, so in
class Con {
method man(:U: :D $x)
}
the signature is equivalent to C<(Con:U: Any:D $x)>.
=head2 Longname parameters
A routine marked with C<multi> can mark part of its parameters to
be considered in the multi dispatch. These are called I<longnames>;
see S12 for more about the semantics of multiple dispatch.
You can choose part of a C<multi>'s parameters to be its longname,
by putting a double semicolon after the last one:
multi sub handle_event ($window, $event;; $mode) {...}
multi method set_name ($self: $name;; $nick) {...}
A parameter list may have at most one double semicolon; parameters
after it are never considered for multiple dispatch (except of course
that they can still "veto" if their number or types mismatch).
[Conjecture: It might be possible for a routine to advertise multiple
long names, delimited by single semicolons. See S12 for details.]
If the parameter list for a C<multi> contains no semicolons to delimit
the list of important parameters, then all positional parameters are
considered important. If it's a C<multi method> or C<multi submethod>,
an additional implicit unnamed C<self> invocant is added to the
signature list unless the first parameter is explicitly marked with a colon.
=head2 Required parameters
Required parameters are specified at the start of a subroutine's parameter
list:
sub numcmp ($x, $y) { return $x <=> $y }
Required parameters may optionally be declared with a trailing C<!>,
though that's already the default for positional parameters:
sub numcmp ($x!, $y!) { return $x <=> $y }
Not passing all of the required arguments to a normal subroutine
is a fatal error. Passing a named argument that cannot be bound to a normal
subroutine is also a fatal error. (Methods are different.)
The number of required parameters a subroutine has can be determined by
calling its C<.arity> method:
$args_required = &foo.arity;
=head2 Optional parameters