Skip to content

Commit ea4ee96

Browse files
author
bri
committed
Merge branch 'update'
2 parents ea745ba + 57a871c commit ea4ee96

File tree

6 files changed

+202
-110
lines changed

6 files changed

+202
-110
lines changed

S02-bits.pod

Lines changed: 153 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ Synopsis 2: Bits and Pieces
1313

1414
Created: 10 Aug 2004
1515

16-
Last Modified: 29 December 2011
17-
Version: 245
16+
Last Modified: 17 Jan 2012
17+
Version: 248
1818

1919
This document summarizes Apocalypse 2, which covers small-scale
2020
lexical items and typological issues. (These Synopses also contain
@@ -43,6 +43,14 @@ Recognize that a reduce operator is not really beginning a C<[...]> composer.
4343

4444
=back
4545

46+
One-pass parsing is fundamental to knowing exactly which language
47+
you are dealing with at any moment, which in turn is fundamental
48+
to allowing unambiguous language mutation in any desired direction.
49+
(Generic languages are allowed, but only if intended; accidentally
50+
generic languages lead to loss of linguistic identity and integrity.
51+
This is the hard lesson of Perl 5's source filters and other
52+
multi-pass parsing mistakes.)
53+
4654
=head1 Lexical Conventions
4755

4856
=head2 Unicode Semantics
@@ -2005,7 +2013,7 @@ Normal names and variables are declared using a I<scope declarator>:
20052013
my # introduces lexically scoped names
20062014
our # introduces package-scoped names
20072015
has # introduces attribute names
2008-
anon # introduces names that aren't to be stored anywhere
2016+
anon # introduces names that are private to the construct
20092017
state # introduces lexically scoped but persistent names
20102018
augment # adds definitions to an existing name
20112019
supersede # replaces definitions of an existing name
@@ -2015,11 +2023,19 @@ equivalent to a C<my> declaration inside the block of the function,
20152023
except that such parameters default to readonly.
20162024

20172025
The C<anon> declarator allows a declaration to provide a name that
2018-
can be used in error messages, but that isn't put into any symbol table:
2026+
can be used in error messages, but that isn't put into any external symbol table:
20192027

20202028
my $secret = anon sub marine () {...}
20212029
$secret(42) # too many arguments to sub marine
20222030

2031+
However, the name is introduced into the scope of the declaration itself, so it
2032+
may be used to call itself recursively:
2033+
2034+
my $secret =
2035+
anon sub tract($n) { say $n; say tract($n-1) if $n }
2036+
2037+
$secret(5); # 5 4 3 2 1
2038+
20232039
=head2 Invariant sigils
20242040

20252041
Sigils are now invariant. C<$> always means a scalar variable, C<@>
@@ -3243,6 +3259,21 @@ Adverbial syntax will be described more fully later.
32433259

32443260
=head1 Literals
32453261

3262+
Perl 6 has a rich set of literal forms, many of which can be used
3263+
for textual input as well. For those forms simple enough to be allowed, the C<val()> function treats
3264+
such a string value as if it were a literal in the program. In some cases
3265+
the C<val()> function will be applied on your behalf, and in other cases
3266+
you must do so explicitly. The rationale for this function is that there
3267+
are many cases where the programmer or user is forced to use a string type to
3268+
represent a value that is intended to become a numeric type internally. Committing
3269+
pre-emptively to either a string type or a numeric type is likely to be
3270+
wrongish, so Perl 6 instead provides the concept of I<allomorphic> literals.
3271+
How these work is described below in L<Allomorphic value semantics>.
3272+
3273+
When used as literals in a program, most of these forms produce an
3274+
exact type, and are not subject to C<val()> processing. The exceptions
3275+
will be noted as we go.
3276+
32463277
=head2 Underscores
32473278

32483279
A single underscore is allowed only between any two digits in a
@@ -3397,6 +3428,10 @@ Decimal fractions not using "e" notation are also treated as literal C<Rat> valu
33973428
1.23456.WHAT # Rat
33983429
0.11 == 11/100 # True
33993430

3431+
Literals specified in angle brackets are always subject to C<val()> processing,
3432+
so C<< <1/2> >> produces a value that is both a C<Rat> and a C<Str>.
3433+
See L<Allomorphic value semantics> below.
3434+
34003435
=head2 Complex literals
34013436

34023437
Complex literals are similarly indicated by writing an addition or subtraction of
@@ -3413,6 +3448,10 @@ surrounding precedence.
34133448
rational and complex literal forms fall out naturally from the semantic
34143449
rules of qw quotes described below.)
34153450

3451+
Literals specified in angle brackets are always subject to C<val()> processing,
3452+
so C<< <1+2i> >> produces a value that is both a C<Complex> and a C<Str>.
3453+
See L<Allomorphic value semantics> below.
3454+
34163455
=head2 C<Blob> literals
34173456

34183457
C<Blob> literals look similar to integer literals with radix markers, but use
@@ -3450,7 +3489,7 @@ the range C<'0'..'7'>. Octal characters must use C<\o> notation.
34503489
Note also that backreferences are no longer represented by C<\1>
34513490
and the like--see S05.
34523491

3453-
=head2 Quoting forms
3492+
=head2 Angle quotes (quote words)
34543493

34553494
The C<qw/foo bar/> quote operator now has a bracketed form: C<< <foo bar> >>.
34563495
When used as a subscript it performs a slice equivalent to C<{'foo','bar'}>.
@@ -3476,16 +3515,18 @@ is equivalent to:
34763515

34773516
$a = ('a', 'b');
34783517

3479-
which, because the parcel is assigned to a scalar, is mostly-eagerly evaluated as a flat list and
3480-
turned into a C<Seq> object. On the other hand, if you backslash the parcel:
3518+
which assigns a C<Parcel> to the variable.
3519+
On the other hand, if you backslash the parcel:
34813520

34823521
$a = \<a b>;
34833522

34843523
it is like:
34853524

34863525
$a = \('a', 'b');
34873526

3488-
and ends up as a non-flattening capture object).
3527+
and ends up storing a C<Capture> object (which weeds out any named
3528+
arguments into a separate structure, in contrast to a C<Parcel>, which keeps
3529+
everything in its original list).
34893530

34903531
Binding is different from assignment. If bound to a signature, the
34913532
C<< <a b> >> parcel will be promoted to a C<Capture> object, but if
@@ -3501,9 +3542,14 @@ still act like a single value. These are all the same:
35013542
$a = ('a');
35023543
$a = 'a';
35033544

3504-
=head3 Forcing item context
3545+
Strings within angle brackets are subject to C<val()> processing, and any
3546+
component that parses successfully as a numeric literal will become
3547+
both a string and a number. See L<Allomorphic value semantics> below.
3548+
3549+
=head3 Explicit Parcel construction
35053550

3506-
That is, a parcel is actually constructed by the comma, not by
3551+
As the previous section shows, a parcel is not automatically constructed
3552+
by parens; the parcel is actually constructed by the comma, not by
35073553
the parens. To force a single value to become a composite object in
35083554
item context, either add a comma inside parens, or use an appropriate
35093555
constructor or composer for clarity as well as correctness:
@@ -3513,50 +3559,19 @@ constructor or composer for clarity as well as correctness:
35133559
$a = Seq.new('a');
35143560
$a = ['a'];
35153561

3516-
For any item in the list that appears to be numeric, the literal is
3517-
stored as an object with both a string and a numeric nature, where
3518-
the string nature always returns the original string. It is as if
3519-
the item is converted to an appropriate numeric type that is mixed into
3520-
the original string value. Hence:
3521-
3522-
< 1 1/2 6.02e23 1+2i >
3523-
3524-
produces objects like:
3525-
3526-
'1' but 1 # Int < Str
3527-
'1/2' but 1/2 # Rat < Str
3528-
'6.02e23' but 6.02e23 # Num < Str
3529-
'1+2i' but 1+2i # Complex < Str
3530-
3531-
The purpose of this would be to facilitate compile-time analysis of
3532-
multi-method dispatch, when the user prefers angle notation as the most
3533-
readable way to represent a list of numbers, which it often is. Due to
3534-
the mixin semantics, the derived numeric type is taken in preference to
3535-
the string type, but the string type is always available for matching
3536-
if no numeric type matches. (The string value need not be retained
3537-
if treating the number as a string would produce an identical value.)
3538-
This behavior is not a special case; it's actually an example of
3539-
a more general process of figuring out type information by parsing
3540-
text that comes from any situation where a user is forced to enter
3541-
text when they really mean other kinds of values.
3542-
3543-
The form with a single value serves as the literal form of numbers
3544-
such as C<Rat> and C<Complex> that would otherwise have to be constructed.
3545-
It also gives us a reasonable way of visually isolating any known
3546-
literal format as a single syntactic unit:
3547-
3548-
<-1+2i>.polar
3549-
(-1+2i).polar # same, but only by constant folding
3562+
=head3 Disallowed forms
35503563

35513564
The degenerate case C<< <> >> is disallowed as a probable attempt to
3552-
do IO in the style of Perl 5; that is now written C<lines()>. (C<<
3553-
<STDIN> >> is also disallowed.) Empty lists are better written with
3565+
do IO in the style of Perl 5; that is now written C<lines()>. (C<< <STDIN> >>
3566+
is also disallowed.) Empty lists are better written with
35543567
C<()> or C<Nil> in any case because C<< <> >> will often be misread
35553568
as meaning C<('')>. (Likewise the subscript form C<< %foo<> >>
35563569
should be written C<%foo{}> to avoid misreading as C<@foo{''}>.)
35573570
If you really want the angle form for stylistic reasons, you can
35583571
suppress the error by putting a space inside: C<< < > >>.
35593572

3573+
=head3 Relationship between <> and «»
3574+
35603575
Much like the relationship between single quotes and double quotes, single
35613576
angles do not interpolate while double angles do. The double angles may
35623577
be written either with French quotes, C<«$foo @bar[]»>, or
@@ -3575,28 +3590,6 @@ ones without whitespace in front of them, but note that this comes
35753590
more or less for free with a colon pair like C<< :char<#x263a> >>, since
35763591
comments only work in double angles, not single.
35773592

3578-
=head3 Allomorphic coercion
3579-
3580-
Generalizing the policy on literal numbers above, any literal number
3581-
that would overflow a C<Rat64> in the numerator is also stored as
3582-
a string. If a coercion to a wider type, such as C<FatRat>, is
3583-
requested, the literal reconverts from the entire original
3584-
string, rather than just the value that would fit into a C<Rat64>.
3585-
(It may then cache that converted value for next time, of course.)
3586-
So if you declare a constant with excess precision, it does not
3587-
automatically become a C<FatRat>, which would force all calculations
3588-
into the pessimal C<FatRat> type.
3589-
3590-
constant pi is export = 3.14159_26535_89793_23846_26433_83279_50288;
3591-
say pi.perl; # 3141592653589793238/1000000000000000000 (Rat64)
3592-
say pi.Num # 3.14159265358979
3593-
say pi.Str; # 3.14159_26535_89793_23846_26433_83279_50288
3594-
say pi.FatRat; # 3.14159265358979323846264338327950288
3595-
3596-
In this case it is not necessary to put angles around to get the allomorphism.
3597-
Merely exceeding the precision of C<Rat64> is sufficient to trigger the
3598-
behavior (but only for literals).
3599-
36003593
=head2 Adverbial Pair forms
36013594

36023595
There is now a generalized adverbial form of Pair notation. The
@@ -3845,12 +3838,13 @@ built-in C<< <...> >> is equivalent to C<q:w:v/.../>.)
38453838

38463839
=head2 The C<:val> modifier
38473840

3848-
The C<:v>/C<:val> modifier runs each word through the C<val()> function,
3849-
which will attempt to recognize literals as defined by the current
3850-
slang. Only pure literals such as numbers and enums are so recognized;
3851-
all other words are left as strings. In any case, use of such an
3852-
intuited value as a string will reproduce the original string including
3853-
any leading or trailing whitespace:
3841+
The C<:v>/C<:val> modifier runs each word through the C<val()>
3842+
function, which will attempt to recognize literals as defined by the
3843+
current slang. (See L<Allomorphic value semantics> below.) Only pure
3844+
literals such as numbers, versions, and enums are so recognized; all
3845+
other words are left as strings. In any case, use of such an intuited
3846+
value as a string will reproduce the original string including any
3847+
leading or trailing whitespace:
38543848

38553849
say +val(' +2/4 ') # '0.5'
38563850
say ~val(' +2/4 ') # ' +2/4 '
@@ -4542,6 +4536,91 @@ external modules by exact version number. (See S11.) Only range
45424536
operations will be compromised by an unknown foreign collation order,
45434537
such as a system that sorts "delta" after "gamma".
45444538

4539+
=head2 Allomorphic value semantics
4540+
4541+
When C<val()> processing is attempted on any list of strings (typically on
4542+
the individual words within angle brackets), the function
4543+
attempts to determine if the intent of the programmer or user
4544+
might have been to provide a numeric value.
4545+
4546+
For any item in the list that appears to be numeric, the literal is
4547+
stored as an object with both a string and a numeric nature, where
4548+
the string nature always returns the original string. This is implemented
4549+
via multiple inheritance, to truly represent the allomorphic nature of
4550+
a literal value that has not committed to which type the user intends.
4551+
The numeric type chosen depends on the appearance of the literal.
4552+
Hence:
4553+
4554+
< 1 1/2 6.02e23 1+2i >
4555+
4556+
produces objects of classes defined as:
4557+
4558+
class IntStr is Int is Str {...}; IntStr('1')
4559+
class RatStr is Rat is Str {...}; RatStr('1/2')
4560+
class NumStr is Num is Str {...}; NumStr('6.02e23')
4561+
class ComplexStr is Complex is Str {...}; ComplexStr('1+2i')
4562+
4563+
One purpose of this is to facilitate compile-time analysis of
4564+
multi-method dispatch, when the user prefers angle notation as the most
4565+
readable way to represent a list of numbers, which it often is. Due to
4566+
the MI semantics, the new object is equally a string and a number, and
4567+
can be bound as-is to either a string or a numeric parameter.
4568+
4569+
In case multiple dispatch determines that it could dispatch as either
4570+
string or number, a tie results, which may result in an ambiguous
4571+
dispatch error. You'll need to use prefix C<+> or C<~> on the argument
4572+
to resolve the ambiguity in that case.
4573+
4574+
[Conjecture: we may someday find a way to make strings bind a little
4575+
looser than the numeric types, but for now we conservatively outlaw the
4576+
dispatch as ambiguous, and watch how this plays out in use.]
4577+
4578+
The allomorphic behavior of angle brackets is not a special case;
4579+
it's actually an example of a more general process of figuring out
4580+
type information by parsing text that comes from any situation where
4581+
the user is forced to enter text when they really mean other kinds
4582+
of values. A function prompting the user for a single value might
4583+
usefully pass the result through C<val()> to intuit the proper type.
4584+
4585+
The angle form with a single value serves as the literal form of
4586+
numbers such as C<Rat> and C<Complex> that would otherwise have to be
4587+
constructed via constant folding. It also gives us a reasonable way of
4588+
visually isolating any known literal format as a single syntactic unit:
4589+
4590+
<-1+2i>.polar
4591+
(-1+2i).polar # same, but only by constant folding
4592+
4593+
Any such literal, when written without spaces, produces a pure numeric
4594+
value without a stringy allomorphism. Put spaces to override that:
4595+
4596+
<1/2> # a Rat
4597+
< 1/2 > # a RatStr
4598+
4599+
Or use the the C<«»> form of quotewords, which is always allomorphic:
4600+
4601+
«1/2» # a RatStr
4602+
« 1/2 » # a RatStr
4603+
4604+
=head3 Allomorphic Rats
4605+
4606+
Any rational literal
4607+
that would overflow a C<Rat64> in the numerator is also stored as
4608+
a string. (That is, angle brackets will be assumed in this case,
4609+
producing a C<RatStr>.)
4610+
If a coercion to a wider type, such as C<FatRat>, is
4611+
requested, the literal reconverts from the entire original
4612+
string, rather than just the value that would fit into a C<Rat64>.
4613+
(It may then cache that converted value for next time, of course.)
4614+
So if you declare a constant with excess precision, it does not
4615+
automatically become a C<FatRat>, which would force all calculations
4616+
into the pessimal C<FatRat> type.
4617+
4618+
constant pi is export = 3.14159_26535_89793_23846_26433_83279_50288;
4619+
say pi.perl; # 3141592653589793238/1000000000000000000 or so (Rat64)
4620+
say pi.Num # 3.14159265358979
4621+
say pi.Str; # 3.14159_26535_89793_23846_26433_83279_50288
4622+
say pi.FatRat; # 3.14159265358979323846264338327950288
4623+
45454624
=head1 Context
45464625

45474626
=over 4

0 commit comments

Comments
 (0)