@@ -13,8 +13,8 @@ Synopsis 2: Bits and Pieces
13
13
14
14
Created: 10 Aug 2004
15
15
16
- Last Modified: 29 December 2011
17
- Version: 245
16
+ Last Modified: 17 Jan 2012
17
+ Version: 248
18
18
19
19
This document summarizes Apocalypse 2, which covers small-scale
20
20
lexical items and typological issues. (These Synopses also contain
@@ -43,6 +43,14 @@ Recognize that a reduce operator is not really beginning a C<[...]> composer.
43
43
44
44
=back
45
45
46
+ One-pass parsing is fundamental to knowing exactly which language
47
+ you are dealing with at any moment, which in turn is fundamental
48
+ to allowing unambiguous language mutation in any desired direction.
49
+ (Generic languages are allowed, but only if intended; accidentally
50
+ generic languages lead to loss of linguistic identity and integrity.
51
+ This is the hard lesson of Perl 5's source filters and other
52
+ multi-pass parsing mistakes.)
53
+
46
54
=head1 Lexical Conventions
47
55
48
56
=head2 Unicode Semantics
@@ -2005,7 +2013,7 @@ Normal names and variables are declared using a I<scope declarator>:
2005
2013
my # introduces lexically scoped names
2006
2014
our # introduces package-scoped names
2007
2015
has # introduces attribute names
2008
- anon # introduces names that aren't to be stored anywhere
2016
+ anon # introduces names that are private to the construct
2009
2017
state # introduces lexically scoped but persistent names
2010
2018
augment # adds definitions to an existing name
2011
2019
supersede # replaces definitions of an existing name
@@ -2015,11 +2023,19 @@ equivalent to a C<my> declaration inside the block of the function,
2015
2023
except that such parameters default to readonly.
2016
2024
2017
2025
The C<anon> declarator allows a declaration to provide a name that
2018
- can be used in error messages, but that isn't put into any symbol table:
2026
+ can be used in error messages, but that isn't put into any external symbol table:
2019
2027
2020
2028
my $secret = anon sub marine () {...}
2021
2029
$secret(42) # too many arguments to sub marine
2022
2030
2031
+ However, the name is introduced into the scope of the declaration itself, so it
2032
+ may be used to call itself recursively:
2033
+
2034
+ my $secret =
2035
+ anon sub tract($n) { say $n; say tract($n-1) if $n }
2036
+
2037
+ $secret(5); # 5 4 3 2 1
2038
+
2023
2039
=head2 Invariant sigils
2024
2040
2025
2041
Sigils are now invariant. C<$> always means a scalar variable, C<@>
@@ -3243,6 +3259,21 @@ Adverbial syntax will be described more fully later.
3243
3259
3244
3260
=head1 Literals
3245
3261
3262
+ Perl 6 has a rich set of literal forms, many of which can be used
3263
+ for textual input as well. For those forms simple enough to be allowed, the C<val()> function treats
3264
+ such a string value as if it were a literal in the program. In some cases
3265
+ the C<val()> function will be applied on your behalf, and in other cases
3266
+ you must do so explicitly. The rationale for this function is that there
3267
+ are many cases where the programmer or user is forced to use a string type to
3268
+ represent a value that is intended to become a numeric type internally. Committing
3269
+ pre-emptively to either a string type or a numeric type is likely to be
3270
+ wrongish, so Perl 6 instead provides the concept of I<allomorphic> literals.
3271
+ How these work is described below in L<Allomorphic value semantics>.
3272
+
3273
+ When used as literals in a program, most of these forms produce an
3274
+ exact type, and are not subject to C<val()> processing. The exceptions
3275
+ will be noted as we go.
3276
+
3246
3277
=head2 Underscores
3247
3278
3248
3279
A single underscore is allowed only between any two digits in a
@@ -3397,6 +3428,10 @@ Decimal fractions not using "e" notation are also treated as literal C<Rat> valu
3397
3428
1.23456.WHAT # Rat
3398
3429
0.11 == 11/100 # True
3399
3430
3431
+ Literals specified in angle brackets are always subject to C<val()> processing,
3432
+ so C<< <1/2> >> produces a value that is both a C<Rat> and a C<Str>.
3433
+ See L<Allomorphic value semantics> below.
3434
+
3400
3435
=head2 Complex literals
3401
3436
3402
3437
Complex literals are similarly indicated by writing an addition or subtraction of
@@ -3413,6 +3448,10 @@ surrounding precedence.
3413
3448
rational and complex literal forms fall out naturally from the semantic
3414
3449
rules of qw quotes described below.)
3415
3450
3451
+ Literals specified in angle brackets are always subject to C<val()> processing,
3452
+ so C<< <1+2i> >> produces a value that is both a C<Complex> and a C<Str>.
3453
+ See L<Allomorphic value semantics> below.
3454
+
3416
3455
=head2 C<Blob> literals
3417
3456
3418
3457
C<Blob> literals look similar to integer literals with radix markers, but use
@@ -3450,7 +3489,7 @@ the range C<'0'..'7'>. Octal characters must use C<\o> notation.
3450
3489
Note also that backreferences are no longer represented by C<\1>
3451
3490
and the like--see S05.
3452
3491
3453
- =head2 Quoting forms
3492
+ =head2 Angle quotes (quote words)
3454
3493
3455
3494
The C<qw/foo bar/> quote operator now has a bracketed form: C<< <foo bar> >>.
3456
3495
When used as a subscript it performs a slice equivalent to C<{'foo','bar'}>.
@@ -3476,16 +3515,18 @@ is equivalent to:
3476
3515
3477
3516
$a = ('a', 'b');
3478
3517
3479
- which, because the parcel is assigned to a scalar, is mostly-eagerly evaluated as a flat list and
3480
- turned into a C<Seq> object. On the other hand, if you backslash the parcel:
3518
+ which assigns a C<Parcel> to the variable.
3519
+ On the other hand, if you backslash the parcel:
3481
3520
3482
3521
$a = \<a b>;
3483
3522
3484
3523
it is like:
3485
3524
3486
3525
$a = \('a', 'b');
3487
3526
3488
- and ends up as a non-flattening capture object).
3527
+ and ends up storing a C<Capture> object (which weeds out any named
3528
+ arguments into a separate structure, in contrast to a C<Parcel>, which keeps
3529
+ everything in its original list).
3489
3530
3490
3531
Binding is different from assignment. If bound to a signature, the
3491
3532
C<< <a b> >> parcel will be promoted to a C<Capture> object, but if
@@ -3501,9 +3542,14 @@ still act like a single value. These are all the same:
3501
3542
$a = ('a');
3502
3543
$a = 'a';
3503
3544
3504
- =head3 Forcing item context
3545
+ Strings within angle brackets are subject to C<val()> processing, and any
3546
+ component that parses successfully as a numeric literal will become
3547
+ both a string and a number. See L<Allomorphic value semantics> below.
3548
+
3549
+ =head3 Explicit Parcel construction
3505
3550
3506
- That is, a parcel is actually constructed by the comma, not by
3551
+ As the previous section shows, a parcel is not automatically constructed
3552
+ by parens; the parcel is actually constructed by the comma, not by
3507
3553
the parens. To force a single value to become a composite object in
3508
3554
item context, either add a comma inside parens, or use an appropriate
3509
3555
constructor or composer for clarity as well as correctness:
@@ -3513,50 +3559,19 @@ constructor or composer for clarity as well as correctness:
3513
3559
$a = Seq.new('a');
3514
3560
$a = ['a'];
3515
3561
3516
- For any item in the list that appears to be numeric, the literal is
3517
- stored as an object with both a string and a numeric nature, where
3518
- the string nature always returns the original string. It is as if
3519
- the item is converted to an appropriate numeric type that is mixed into
3520
- the original string value. Hence:
3521
-
3522
- < 1 1/2 6.02e23 1+2i >
3523
-
3524
- produces objects like:
3525
-
3526
- '1' but 1 # Int < Str
3527
- '1/2' but 1/2 # Rat < Str
3528
- '6.02e23' but 6.02e23 # Num < Str
3529
- '1+2i' but 1+2i # Complex < Str
3530
-
3531
- The purpose of this would be to facilitate compile-time analysis of
3532
- multi-method dispatch, when the user prefers angle notation as the most
3533
- readable way to represent a list of numbers, which it often is. Due to
3534
- the mixin semantics, the derived numeric type is taken in preference to
3535
- the string type, but the string type is always available for matching
3536
- if no numeric type matches. (The string value need not be retained
3537
- if treating the number as a string would produce an identical value.)
3538
- This behavior is not a special case; it's actually an example of
3539
- a more general process of figuring out type information by parsing
3540
- text that comes from any situation where a user is forced to enter
3541
- text when they really mean other kinds of values.
3542
-
3543
- The form with a single value serves as the literal form of numbers
3544
- such as C<Rat> and C<Complex> that would otherwise have to be constructed.
3545
- It also gives us a reasonable way of visually isolating any known
3546
- literal format as a single syntactic unit:
3547
-
3548
- <-1+2i>.polar
3549
- (-1+2i).polar # same, but only by constant folding
3562
+ =head3 Disallowed forms
3550
3563
3551
3564
The degenerate case C<< <> >> is disallowed as a probable attempt to
3552
- do IO in the style of Perl 5; that is now written C<lines()>. (C<<
3553
- <STDIN> >> is also disallowed.) Empty lists are better written with
3565
+ do IO in the style of Perl 5; that is now written C<lines()>. (C<< <STDIN> >>
3566
+ is also disallowed.) Empty lists are better written with
3554
3567
C<()> or C<Nil> in any case because C<< <> >> will often be misread
3555
3568
as meaning C<('')>. (Likewise the subscript form C<< %foo<> >>
3556
3569
should be written C<%foo{}> to avoid misreading as C<@foo{''}>.)
3557
3570
If you really want the angle form for stylistic reasons, you can
3558
3571
suppress the error by putting a space inside: C<< < > >>.
3559
3572
3573
+ =head3 Relationship between <> and «»
3574
+
3560
3575
Much like the relationship between single quotes and double quotes, single
3561
3576
angles do not interpolate while double angles do. The double angles may
3562
3577
be written either with French quotes, C<«$foo @bar[]»>, or
@@ -3575,28 +3590,6 @@ ones without whitespace in front of them, but note that this comes
3575
3590
more or less for free with a colon pair like C<< :char<#x263a> >>, since
3576
3591
comments only work in double angles, not single.
3577
3592
3578
- =head3 Allomorphic coercion
3579
-
3580
- Generalizing the policy on literal numbers above, any literal number
3581
- that would overflow a C<Rat64> in the numerator is also stored as
3582
- a string. If a coercion to a wider type, such as C<FatRat>, is
3583
- requested, the literal reconverts from the entire original
3584
- string, rather than just the value that would fit into a C<Rat64>.
3585
- (It may then cache that converted value for next time, of course.)
3586
- So if you declare a constant with excess precision, it does not
3587
- automatically become a C<FatRat>, which would force all calculations
3588
- into the pessimal C<FatRat> type.
3589
-
3590
- constant pi is export = 3.14159_26535_89793_23846_26433_83279_50288;
3591
- say pi.perl; # 3141592653589793238/1000000000000000000 (Rat64)
3592
- say pi.Num # 3.14159265358979
3593
- say pi.Str; # 3.14159_26535_89793_23846_26433_83279_50288
3594
- say pi.FatRat; # 3.14159265358979323846264338327950288
3595
-
3596
- In this case it is not necessary to put angles around to get the allomorphism.
3597
- Merely exceeding the precision of C<Rat64> is sufficient to trigger the
3598
- behavior (but only for literals).
3599
-
3600
3593
=head2 Adverbial Pair forms
3601
3594
3602
3595
There is now a generalized adverbial form of Pair notation. The
@@ -3845,12 +3838,13 @@ built-in C<< <...> >> is equivalent to C<q:w:v/.../>.)
3845
3838
3846
3839
=head2 The C<:val> modifier
3847
3840
3848
- The C<:v>/C<:val> modifier runs each word through the C<val()> function,
3849
- which will attempt to recognize literals as defined by the current
3850
- slang. Only pure literals such as numbers and enums are so recognized;
3851
- all other words are left as strings. In any case, use of such an
3852
- intuited value as a string will reproduce the original string including
3853
- any leading or trailing whitespace:
3841
+ The C<:v>/C<:val> modifier runs each word through the C<val()>
3842
+ function, which will attempt to recognize literals as defined by the
3843
+ current slang. (See L<Allomorphic value semantics> below.) Only pure
3844
+ literals such as numbers, versions, and enums are so recognized; all
3845
+ other words are left as strings. In any case, use of such an intuited
3846
+ value as a string will reproduce the original string including any
3847
+ leading or trailing whitespace:
3854
3848
3855
3849
say +val(' +2/4 ') # '0.5'
3856
3850
say ~val(' +2/4 ') # ' +2/4 '
@@ -4542,6 +4536,91 @@ external modules by exact version number. (See S11.) Only range
4542
4536
operations will be compromised by an unknown foreign collation order,
4543
4537
such as a system that sorts "delta" after "gamma".
4544
4538
4539
+ =head2 Allomorphic value semantics
4540
+
4541
+ When C<val()> processing is attempted on any list of strings (typically on
4542
+ the individual words within angle brackets), the function
4543
+ attempts to determine if the intent of the programmer or user
4544
+ might have been to provide a numeric value.
4545
+
4546
+ For any item in the list that appears to be numeric, the literal is
4547
+ stored as an object with both a string and a numeric nature, where
4548
+ the string nature always returns the original string. This is implemented
4549
+ via multiple inheritance, to truly represent the allomorphic nature of
4550
+ a literal value that has not committed to which type the user intends.
4551
+ The numeric type chosen depends on the appearance of the literal.
4552
+ Hence:
4553
+
4554
+ < 1 1/2 6.02e23 1+2i >
4555
+
4556
+ produces objects of classes defined as:
4557
+
4558
+ class IntStr is Int is Str {...}; IntStr('1')
4559
+ class RatStr is Rat is Str {...}; RatStr('1/2')
4560
+ class NumStr is Num is Str {...}; NumStr('6.02e23')
4561
+ class ComplexStr is Complex is Str {...}; ComplexStr('1+2i')
4562
+
4563
+ One purpose of this is to facilitate compile-time analysis of
4564
+ multi-method dispatch, when the user prefers angle notation as the most
4565
+ readable way to represent a list of numbers, which it often is. Due to
4566
+ the MI semantics, the new object is equally a string and a number, and
4567
+ can be bound as-is to either a string or a numeric parameter.
4568
+
4569
+ In case multiple dispatch determines that it could dispatch as either
4570
+ string or number, a tie results, which may result in an ambiguous
4571
+ dispatch error. You'll need to use prefix C<+> or C<~> on the argument
4572
+ to resolve the ambiguity in that case.
4573
+
4574
+ [Conjecture: we may someday find a way to make strings bind a little
4575
+ looser than the numeric types, but for now we conservatively outlaw the
4576
+ dispatch as ambiguous, and watch how this plays out in use.]
4577
+
4578
+ The allomorphic behavior of angle brackets is not a special case;
4579
+ it's actually an example of a more general process of figuring out
4580
+ type information by parsing text that comes from any situation where
4581
+ the user is forced to enter text when they really mean other kinds
4582
+ of values. A function prompting the user for a single value might
4583
+ usefully pass the result through C<val()> to intuit the proper type.
4584
+
4585
+ The angle form with a single value serves as the literal form of
4586
+ numbers such as C<Rat> and C<Complex> that would otherwise have to be
4587
+ constructed via constant folding. It also gives us a reasonable way of
4588
+ visually isolating any known literal format as a single syntactic unit:
4589
+
4590
+ <-1+2i>.polar
4591
+ (-1+2i).polar # same, but only by constant folding
4592
+
4593
+ Any such literal, when written without spaces, produces a pure numeric
4594
+ value without a stringy allomorphism. Put spaces to override that:
4595
+
4596
+ <1/2> # a Rat
4597
+ < 1/2 > # a RatStr
4598
+
4599
+ Or use the the C<«»> form of quotewords, which is always allomorphic:
4600
+
4601
+ «1/2» # a RatStr
4602
+ « 1/2 » # a RatStr
4603
+
4604
+ =head3 Allomorphic Rats
4605
+
4606
+ Any rational literal
4607
+ that would overflow a C<Rat64> in the numerator is also stored as
4608
+ a string. (That is, angle brackets will be assumed in this case,
4609
+ producing a C<RatStr>.)
4610
+ If a coercion to a wider type, such as C<FatRat>, is
4611
+ requested, the literal reconverts from the entire original
4612
+ string, rather than just the value that would fit into a C<Rat64>.
4613
+ (It may then cache that converted value for next time, of course.)
4614
+ So if you declare a constant with excess precision, it does not
4615
+ automatically become a C<FatRat>, which would force all calculations
4616
+ into the pessimal C<FatRat> type.
4617
+
4618
+ constant pi is export = 3.14159_26535_89793_23846_26433_83279_50288;
4619
+ say pi.perl; # 3141592653589793238/1000000000000000000 or so (Rat64)
4620
+ say pi.Num # 3.14159265358979
4621
+ say pi.Str; # 3.14159_26535_89793_23846_26433_83279_50288
4622
+ say pi.FatRat; # 3.14159265358979323846264338327950288
4623
+
4545
4624
=head1 Context
4546
4625
4547
4626
=over 4
0 commit comments