Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 1330 lines (956 sloc) 49.533 kB
be1862e P6 Synopsis : ws changes - to help BOMers, added leading blank line t…
Darren_Duncan authored
1
68d062f Move synopses to their new home.
pmichaud authored
2 =encoding utf8
3
4 =head1 TITLE
5
6 Synopsis 9: Data Structures
7
04840a3 [Spec] treat all authors equally
lwall authored
8 =head1 AUTHORS
68d062f Move synopses to their new home.
pmichaud authored
9
04840a3 [Spec] treat all authors equally
lwall authored
10 Larry Wall <larry@wall.org>
68d062f Move synopses to their new home.
pmichaud authored
11
12 =head1 VERSION
13
04840a3 [Spec] treat all authors equally
lwall authored
14 Created: 13 Sep 2004
15
3fbe6f7 @TimToady subscript mapping now [4,:map(*%4)], sorear++
TimToady authored
16 Last Modified: 5 Mar 2012
17 Version: 51
68d062f Move synopses to their new home.
pmichaud authored
18
19 =head1 Overview
20
21 This synopsis summarizes the non-existent Apocalypse 9, which
ea2a000 [Spec]reverted \x20 to \xC2A0. "Perl 6" and "Perl 5" are words, so we…
jimmy authored
22 discussed in detail the design of Perl 6 data structures. It was
23 primarily a discussion of how the existing features of Perl 6 combine
68d062f Move synopses to their new home.
pmichaud authored
24 to make it easier for the PDL folks to write numeric Perl.
25
26 =head1 Lazy lists
27
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
28 All list contexts are lazy by default. They might still flatten
29 eventually, but only when forced to. You have to use the C<eager>
30 list operator to get a non-lazy list context, and you have to use the
31 C<flat> operator to guarantee flattening. However, such context is
32 generally provided by the eventual destination anyway, so you don't
33 usually need to be explicit.
68d062f Move synopses to their new home.
pmichaud authored
34
35 =head1 Sized types
36
37 Sized low-level types are named most generally by appending the number
38 of bits to a generic low-level type name:
39
40 int1
41 int2
42 int4
43 int8
44 int16
45 int32 (aka int on 32-bit machines)
46 int64 (aka int on 64-bit machines)
7451644 [S09] speculate 128-bit types and see if we get away with it
lwall authored
47 int128 (aka int on 128-bit machines)
68d062f Move synopses to their new home.
pmichaud authored
48
49 uint1 (aka bit)
50 uint2
51 uint4
52 uint8 (aka byte)
53 uint16
54 uint32
55 uint64
7451644 [S09] speculate 128-bit types and see if we get away with it
lwall authored
56 uint128
68d062f Move synopses to their new home.
pmichaud authored
57
58 num16
59 num32
60 num64 (aka num on most architectures)
61 num128
62
63 complex16
64 complex32
65 complex64 (aka complex on most architectures)
66 complex128
67
8bee7b5 [S09] add missing rat native types
lwall authored
68 rat8
69 rat16
70 rat32
71 rat64
7451644 [S09] speculate 128-bit types and see if we get away with it
lwall authored
72 rat128
8bee7b5 [S09] add missing rat native types
lwall authored
73
68d062f Move synopses to their new home.
pmichaud authored
74 buf8 aka buf, a "normal" byte buffer
75 buf16 a uint16 buffer
76 buf32 a uint32 buffer
77 buf64 a uint64 buffer
78
79 Complex sizes indicate the size of each C<num> component rather than
80 the total. This would extend to tensor typenames as well if they're
81 built-in types. Of course, the typical tensor structure is just
82 reflected in the dimensions of the array--but the principle still holds
83 that the name is based on the number of bits of the simple base type.
84
85 The unsized types C<int> and C<num> are based on the architecture's
86 normal size for C<int> and C<double> in whatever version of C the
7d7fdaf @diakopter remove mentions of particular run-time systems from S09
diakopter authored
87 run-time system is compiled in. So C<int>
68d062f Move synopses to their new home.
pmichaud authored
88 typically means C<int32> or C<int64>, while C<num> usually means
89 C<num64>, and C<complex> means two of whatever C<num> turns out to be.
181694d [S02,S09] more tweakage of rat semantics
lwall authored
90 For symmetry around the decimal point, native C<rat>s have a numerator
91 that is twice the size of their denominator, such that a C<rat32> actually
92 has an C<int64> for its numerator. Custom rational types may
93 be created by instantiating the C<Rational> role with two types;
94 if both types used are native types, the resulting type is considered a native type.
68d062f Move synopses to their new home.
pmichaud authored
95
96 You are, of course, free to use macros or type declarations to
97 associate additional names, such as "short" or "single". These are
98 not provided by default. An implementation of Perl is not required
99 to support 64-bit integer types or 128-bit floating-point types unless
100 the underlying architecture supports them. 16-bit floating-point is
101 also considered optional in this sense.
102
103 And yes, an C<int1> can store only -1 or 0. I'm sure someone'll think of
104 a use for it...
105
106 Note that these are primarily intended to represent storage types;
107 the compiler is generally free to keep all intermediate results in
108 wider types in the absence of declarations or explicit casts to the
109 contrary. Attempts to store an intermediate result in a location
110 that cannot hold it will generally produce a warning on overflow.
111 Underflow may also warn depending on the pragmatic context and use
112 of explicit rounding operators. The default rounding mode from
113 C<Num> to C<Int> is to truncate the fractional part without warning.
114 (Note that warnings are by definition resumable exceptions; however,
115 an exception handler is free to either transform such a warning into
116 a fatal exception or ignore it completely.)
117
118 An explicit cast to a storage type has the same potential to throw an
119 exception as the actual attempt to store to such a storage location
120 would.
121
122 With IEEE floating-point types, we have a bias towards the use
123 of in-band C<+Inf>, C<-Inf>, and C<NaN> values in preference to
124 throwing an exception, since this is construed as friendlier to vector
125 processing and pipelining. Object types such as C<Num> and C<Int>
126 may store additional information about the nature of the failure,
127 perhaps as an unthrown exception or warning.
128
129 =head1 Compact structs
130
131 A class whose attributes are all low-level value types can behave as
132 a struct. (Access from outside the class is still only through
133 accessors, though, except when the address of a serialized version of
134 the object is used or generated for interfacing to C-like languages.)
135 Whether such a class is actually stored compactly is up to the
136 implementation, but it ought to behave that way, at least to the
137 extent that it's trivially easy (from the user's perspective) to read
138 and write to the equivalent C structure. That is, when serialized
139 or deserialized to the C view, it should look like the C struct,
140 even if that's not how it's actually represented inside the class.
141 (This is to be construed as a substitute for at least some of the
142 current uses of C<pack>/C<unpack>.) Of course, a lazy implementation will
143 probably find it easiest just to keep the object in its serialized form
144 all the time. In particular, an array of compact structs must be stored
145 in their serialized form (see next section).
146
147 For types that exist in the C programming language, the serialized
148 mapping in memory should follow the same alignment and padding
149 rules by default. Integers smaller than a byte are packed into a
150 power-of-two number of bits, so a byte holds four 2-bit integers.
151 Datum sizes that are not a power of two bits are not supported
152 unless declared by the user with sufficient information to determine
153 how to lay them out in memory, possibly with a pack/unpack format
154 associated with the class, or with the strange elements of the class,
155 or with the types under which the strange element is declared.
156
9259016 structs come as either values or objects
Larry Wall authored
157 Note that a compact struct that has no mutators is itself a value type, so except for
68d062f Move synopses to their new home.
pmichaud authored
158 performance considerations, it doesn't matter how many representations
9259016 structs come as either values or objects
Larry Wall authored
159 of it there are in memory as long as those are consistent. On the other
160 hand, structs with mutators must behave more like normal mutable objects.
68d062f Move synopses to their new home.
pmichaud authored
161
162 The packing serialization is performed by coercion to an appropriate
163 buffer type. The unpacking is performed by coercion of such a buffer
164 type back to the type of the compact struct.
165
166 =head1 Standard array indexing
167
168 Standard array indices are specified using square brackets. Standard
169 indices always start at zero in each dimension of the array (see
170 L<"Multidimensional arrays">), and are always contiguous:
171
172 @dwarves[0] = "Happy"; # The 1st dwarf
173 @dwarves[6] = "Doc"; # The 7th dwarf
174
175 @seasons[0] = "Spring"; # The 1st season
176 @seasons[2] = "Autumn"|"Fall"; # The 3rd season
177
178
179 =head1 Fixed-size arrays
180
181 A basic array declaration like:
182
183 my @array;
9d5a38d P6 Synopsis : ws changes - remove trailing spaces
Darren_Duncan authored
184
68d062f Move synopses to their new home.
pmichaud authored
185 declares a one-dimensional array of indeterminate length. Such arrays
186 are autoextending. For many purposes, though, it's useful to define
187 array types of a particular size and shape that, instead of
188 autoextending, fail if you try to access outside their
189 declared dimensionality. Such arrays tend to be faster to allocate and
190 access as well. (The language must, however, continue to protect you
191 against overflow--these days, that's not just a reliability issue, but
192 also a security issue.)
193
194 To declare an array of fixed size, specify its maximum number of elements
195 in square brackets immediately after its name:
196
197 my @dwarves[7]; # Valid indices are 0..6
198
199 my @seasons[4]; # Valid indices are 0..3
200
201 No intervening whitespace is permitted between the name and the size
202 specification, but "unspace" is allowed:
203
204 my @values[10]; # Okay
205 my @keys [10]; # Error
206 my @keys\ [10]; # Okay
207
208 Note that the square brackets are a compile-time declarator, not a run-time
209 operator, so you can't use the "dotted" form either:
210
e8527c2 @masak [S09] be more clear about what's an error
masak authored
211 my @values.[10]; # An indexing, not a fixed-size declaration
212 my @keys\ .[10]; # Ditto
68d062f Move synopses to their new home.
pmichaud authored
213
214 Attempting to access an index outside an array's defined range will fail:
215
216 @dwarves[7] = 'Sneaky'; # Fails with "invalid index" exception
217
1779592 @TimToady s/series/sequence/ to accord with math culture
TimToady authored
218 However, it is legal for a range or sequence iterator to extend beyond the end
7e40979 @TimToady [S09] too-long subscript range behaves differently as rvalue or lvalue
TimToady authored
219 of an array as long as its min value is a valid subscript; when used as an
220 rvalue, the range is truncated as necessary to map only valid locations.
221 (When used as an lvalue, any non-existent subscripts generate WHENCE proxies
222 that can receive new values and autovivify anything that needs it.)
566592f [S03,S09]
lwall authored
223
68d062f Move synopses to their new home.
pmichaud authored
224 It's also possible to explicitly specify a normal autoextending array:
225
226 my @vices[*]; # Length is: "whatever"
227 # Valid indices are 0..*
228
1779592 @TimToady s/series/sequence/ to accord with math culture
TimToady authored
229 For subscripts containing range or sequence iterators extending beyond the end of
566592f [S03,S09]
lwall authored
230 autoextending arrays, the range is truncated to the actual current
231 size of the array rather than the declared size of that dimension.
232 It is allowed for such a range to start one after the end, so that
233
234 @array[0..*]
235
236 merely returns Nil if C<@array> happens to be empty. However,
237
238 @array[1..*]
239
240 would fail because the range's min is too big.
241
242 Note that these rules mean it doesn't matter whether you say
243
244 @array[*]
245 @array[0 .. *]
246 @array[0 .. *-1]
247
248 because they all end up meaning the same thing.
249
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
250 There is no autotruncation on the left end. It's not that
251 hard to write C<0>, and standard indexes always start there.
252
3fbe6f7 @TimToady subscript mapping now [4,:map(*%4)], sorear++
TimToady authored
253 Subscript size declarations may add a named C<:map> argument
f242ab5 [S09] kill masak++'s @array[%100_000] in favor of a mapping closure
lwall authored
254 supplying a closure, indicating that all index values are to be mapped
255 through that closure. For example, a subscript may be declared as cyclical:
566592f [S03,S09]
lwall authored
256
3fbe6f7 @TimToady subscript mapping now [4,:map(*%4)], sorear++
TimToady authored
257 my @seasons[4, :map( * % 4 )];
258 my @seasons[4, :map{ $_ % 4 }]; # same thing
566592f [S03,S09]
lwall authored
259
260 In this case, all numeric values are taken modulo 4, and no range truncation can
261 ever happen. If you say
262
263 @seasons[-4..7] = 'a' .. 'l';
264
f242ab5 [S09] kill masak++'s @array[%100_000] in favor of a mapping closure
lwall authored
265 then each element is written three times and the array ends up with
266 C<['i','j','k','l']>. The mapping function is allowed to return
267 fractional values; the index will be the C<floor> of that value.
268 (It is still illegal to use a numeric index less that 0.) One could
269 map indexes logarithmically, for instance, as long as the numbers
270 aren't so small they produce negative indices.
271
272 Another use might be to map positive numbers to even slots and negative
273 numbers to odd slots, so you get indices that are symmetric around 0
274 (though Perl is not going to track the max-used even and odd slots
275 for you when the data isn't symmetric).
566592f [S03,S09]
lwall authored
276
68d062f Move synopses to their new home.
pmichaud authored
277 =head1 Typed arrays
278
683878a [S02,S09] default variables to Any, must declare Mu explicitly to hol…
lwall authored
279 The type of value stored in each element of the array (normally C<Any> for unspecified type)
68d062f Move synopses to their new home.
pmichaud authored
280 can be explicitly specified too, as an external C<of> type:
281
282 my num @nums; # Each element stores a native number
283 my @nums of num; # Same
284
285 my Book @library[1_000_000]; # Each element stores a Book object
286 my @library[1_000_000] of Book; # Same
287
288 Alternatively, the element storage type may be specified as part of the
289 dimension specifier (much like a subroutine definition):
290
291 my @nums[-->num];
292
293 my @library[1_000_000 --> Book];
294
295
296 =head1 Compact arrays
297
298 In declarations of the form:
299
300 my bit @bits;
301 my int @ints;
302 my num @nums;
303 my int4 @nybbles;
304 my buf @buffers;
305 my complex128 @longdoublecomplex;
306 my Array @ragged2d;
307
308 the presence of a low-level type tells Perl that it is free to
309 implement the array with "compact storage", that is, with a chunk
310 of memory containing contiguous (or as contiguous as practical)
311 elements of the specified type without any fancy object boxing that
312 typically applies to undifferentiated scalars. (Perl tries really
313 hard to make these elements look like objects when you treat them
314 like objects--this is called autoboxing.)
315
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
316 Unless explicitly declared to be of fixed size, such
317 arrays are autoextending just like ordinary Perl arrays
68d062f Move synopses to their new home.
pmichaud authored
318 (at the price of occasionally copying the block of data to another
319 memory location, or using a tree structure).
320
321 A compact array is for most purposes interchangeable with the
322 corresponding buffer type. For example, apart from the sigil,
323 these are equivalent declarations:
324
325 my uint8 @buffer;
326 my buf8 $buffer;
327
328 (Note: If you actually said both of those, you'd still get two
329 different names, since the sigil is part of the name.)
330
331 So given C<@buffer> you can say
332
333 $piece = substr(@buffer, $beg, $end - $beg);
334
335 and given C<$buffer> you can also say
336
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
337 @pieces = $buffer[$n ..^ $end];
68d062f Move synopses to their new home.
pmichaud authored
338
339 Note that subscripting still pulls the elements out as numbers,
340 but C<substr()> returns a buffer of the same type.
341
342 For types that exist in the C programming language, the mapping in
343 memory should follow the same alignment rules, at least in the absence
344 of any declaration to the contrary. For interfacing to C pointer
345 types, any buffer type may be used for its memory pointer; note,
346 however, that the buffer knows its length, while in C that length
347 typically must be passed as a separate argument, so the C interfacing
348 code needs to support this whenever possible, lest Perl inherit all
349 the buffer overrun bugs bequeathed on us by C. Random C pointers
350 should never be converted to buffers unless the length is also known.
351 (Any call to strlen() should generally be considered a security hole.)
352 The size of any buffer type in bytes may be found with the C<.bytes>
353 method, even if the type of the buffer elements is not C<byte>.
354 (Strings may be asked for their size in bytes only if they support
355 a buffer type as their minimum abstraction level, hopefully with a
356 known encoding. Otherwise you must encode them explicitly from the
357 higher-level abstraction into some buffer type.)
358
359
360 =head1 Multidimensional arrays
361
ea2a000 [Spec]reverted \x20 to \xC2A0. "Perl 6" and "Perl 5" are words, so we…
jimmy authored
362 Perl 6 arrays are not restricted to being one-dimensional (that's simply
68d062f Move synopses to their new home.
pmichaud authored
363 the default). To declare a multidimensional array, you specify it with a
364 semicolon-separated list of dimension lengths:
365
366 my int @ints[4;2]; # Valid indices are 0..3 ; 0..1
367
368 my @calendar[12;31;24]; # Valid indices are 0..11 ; 0..30 ; 0..23
369
370 Arrays may also be defined with a mixture of fixed and autoextending
371 dimensions. For example, there are always 12 months in a year and
372 24 hours in a day, but the number of days in the month can vary:
373
374 my @calendar[12;*;24]; # day-of-month dimension unlimited/ragged
375
376 You can pass a slice (of any dimensionality) for the shape as well:
377
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
378 @shape = 4, 2;
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
379 my int @ints[ ||@shape ];
68d062f Move synopses to their new home.
pmichaud authored
380
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
381 The C<< prefix:<||> >> operator interpolates a list into a semicolon list at
382 the semicolon level.
68d062f Move synopses to their new home.
pmichaud authored
383
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
384 The shape in the declaration merely specifies how the array will
385 autovivify on first use, but ends up as an attribute of the actual
386 container object thereby. On the other hand, the shape may be also
387 supplied entirely by an explicit constructor at run-time:
68d062f Move synopses to their new home.
pmichaud authored
388
389 my num @nums = Array of num.new(:shape(3;3;3));
9d5a38d P6 Synopsis : ws changes - remove trailing spaces
Darren_Duncan authored
390 my num @nums .=new():shape(3;3;3); # same thing
68d062f Move synopses to their new home.
pmichaud authored
391
392 A multidimensional array is indexed by a semicolon list, which is really
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
393 a list of lists in disguise. Each sublist is a slice of one
68d062f Move synopses to their new home.
pmichaud authored
394 particular dimension. So:
395
396 @array[0..10; 42; @x]
397
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
398 is really short for something like:
68d062f Move synopses to their new home.
pmichaud authored
399
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
400 @array.postcircumfix:<[ ]>( (0..10), (42), (@x) );
68d062f Move synopses to their new home.
pmichaud authored
401
e8134eb @lichtkind officially burry Seq, DANGER: some snippets are still in S02 S09 S32/…
lichtkind authored
402 The method's internal C<**@slices> parameter turns the subscripts into three
403 independent C<Parcel> lists, which can be read lazily independently of one other.
404 (Though a subscripter will typically use them left-to-right as it slices each
405 dimension in turn.)
68d062f Move synopses to their new home.
pmichaud authored
406
407 Note that:
408
409 @array[@x,@y]
410
411 is always interpreted as a one-dimensional slice in the outermost
412 dimension, which is the same as:
413
414 @array[@x,@y;]
415
416 or more verbosely:
417
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
418 @array.postcircumfix:<[ ]>( ((@x,@y)) );
68d062f Move synopses to their new home.
pmichaud authored
419
420 To interpolate an array at the semicolon level rather than the comma level,
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
421 use the C<< prefix:<||> >> operator:
68d062f Move synopses to their new home.
pmichaud authored
422
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
423 @array[||@x]
68d062f Move synopses to their new home.
pmichaud authored
424
425 which is equivalent to
426
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
427 @array.postcircumfix:<[ ]>( ((@x[0]), (@x[1]), (@x[2]), etc.) );
68d062f Move synopses to their new home.
pmichaud authored
428
429 =head2 Autoextending multidimensional arrays
430
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
431 Any dimension of the array may be declared as "C<*>", in which case
68d062f Move synopses to their new home.
pmichaud authored
432 that dimension will autoextend. Typically this would be used in the
433 final dimension to make a ragged array functionally equivalent to an
434 array of arrays:
435
436 my int @ints[42; *]; # Second dimension unlimited/ragged
437 push(@ints[41], getsomeints());
438
d419f3b fixed a typo and a missing comma
masak authored
439 but I<any> dimension of an array may be declared as autoextending:
68d062f Move synopses to their new home.
pmichaud authored
440
441 my @calendar[12;*;24]; # day-of-month dimension unlimited/ragged
442 @calendar[1;42;8] = 'meeting' # See you on January 42nd
443
444 It is also possible to specify that an array has an arbitrary number
445 of dimensions, using a "hyperwhatever" (C<**>) at the end of the
446 dimensional specification:
447
448 my @grid[**]; # Any number of dimensions
449 my @spacetime[*;*;*;**]; # Three or more dimensions
450 my @coordinates[100;100;100;**]; # Three or more dimensions
451
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
452 Note that C<**> is a shorthand for something that means C<||(* xx *)>, so the extra
68d062f Move synopses to their new home.
pmichaud authored
453 dimensions are all of arbitrary size. To specify an arbitrary number
454 of fixed-size dimensions, write:
455
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
456 my @coordinates[ ||(100 xx *) ];
68d062f Move synopses to their new home.
pmichaud authored
457
458 This syntax is also convenient if you need to define a large number of
459 consistently sized dimensions:
460
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
461 my @string_theory[ ||(100 xx 11) ]; # 11-dimensional
68d062f Move synopses to their new home.
pmichaud authored
462
463 =head1 User-defined array indexing
464
465 Any array may also be given a second set of user-defined indices, which
466 need not be zero-based, monotonic, or even integers. Whereas standard array
467 indices always start at zero, user-defined indices may start at any
468 finite value of any enumerable type. Standard indices are always
469 contiguous, but user-defined indices need only be distinct and in an
470 enumerable sequence.
471
472 To define a set of user-defined indices, specify an explicit or
473 enumerable list of the indices of each dimension (or the name of an
474 enumerable type) in a set of curly braces immediately after the
475 array name:
476
477 my @dwarves{ 1..7 };
478 my @seasons{ <Spring Summer Autumn Winter> };
479
480 my enum Months
481 «:Jan(1) Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec»;
482
483 my @calendar{ Months; 1..31; 9..12,14..17 }; # Business hours only
484
485 Array look-ups via user-defined indices are likewise specified in curly
486 braces instead of square brackets:
487
488 @dwarves{7} = "Doc"; # The 7th dwarf
489
490 say @calendar{Jan;13;10}; # Jan 13th, 10am
491
492 User-defined indices merely provide a second, non-standard "view" of the
493 array; the underlying container remains the same. Each user-defined
494 index in each dimension is mapped one-to-one back to the standard (zero-
495 based) indices of that dimension. So, given the preceding definitions:
496
497 maps to
498 @dwarves{1} ------> @dwarves[0]
499 @dwarves{2} ------> @dwarves[1]
500 : :
501 @dwarves{7} ------> @dwarves[6]
502
503 and:
504
505 maps to
c0b0845 @perlpilot Spring before Summer usually :)
perlpilot authored
506 @seasons{'Spring'} ------> @seasons[0]
507 @seasons{'Summer'} ------> @seasons[1]
68d062f Move synopses to their new home.
pmichaud authored
508 @seasons{'Autumn'} ------> @seasons[2]
509 @seasons{'Winter'} ------> @seasons[3]
510
c0b0845 @perlpilot Spring before Summer usually :)
perlpilot authored
511 @seasons<Spring> ------> @seasons[0]
512 @seasons<Summer> ------> @seasons[1]
68d062f Move synopses to their new home.
pmichaud authored
513 @seasons<Autumn> ------> @seasons[2]
514 @seasons<Winter> ------> @seasons[3]
515
516 and:
517
518 maps to
519 @calendar{Jan;1;9} ------> @calendar[0;0;0]
520 @calendar{Jan;1;10} ------> @calendar[0;0;1]
521 : :
522 @calendar{Jan;1;12} ------> @calendar[0;0;3]
523 @calendar{Jan;1;14} ------> @calendar[0;0;4]
524 : :
525 @calendar{Feb;1;9} ------> @calendar[1;0;0]
526 : :
527 @calendar{Dec;31;17} ------> @calendar[11;30;7]
528
529 User-defined indices can be open-ended, but only on the upper end (i.e.
530 just like standard indices). That is, you can specify:
531
532 my @sins{7..*}; # Indices are: 7, 8, 9, etc.
533
534 but not:
535
536 my @virtue{*..6};
537 my @celebs{*};
538
e2cd13b [S09] fixed thinko discovered by snarkyboojum++
masak authored
539 These last two are not allowed because there is no first index, and
68d062f Move synopses to their new home.
pmichaud authored
540 hence no way to map the infinity of negative user-defined indices back
541 to the standard zero-based indexing scheme.
542
543 Declaring a set of user-defined indices implicitly declares the array's
544 standard indices as well (which are still zero-based in each dimension).
545 Such arrays can be accessed using either notation. The standard indices
546 provide an easy way of referring to "ordinal" positions, independent of
547 user-specified indices:
548
549 say "The first sin was @sins[0]";
550 # First element, no matter what @sin's user-defined indexes are
551
552 Note that if an array is defined with fixed indices (either standard or
553 user-defined), any attempt to use an index that wasn't specified in the
554 definition will fail. For example:
555
556 my @values{2,3,5,7,11}; # Also has standard indices: 0..4
557
558 say @values[-1]; # Fails (not a valid standard index)
559 say @values{1}; # Fails (not a valid user-defined index)
560
561 say @values{4}; # Fails (not a valid user-defined index)
562
563 say @values[5]; # Fails (not a valid standard index)
564 say @values{13}; # Fails (not a valid user-defined index)
565
566 Furthermore, if an array wasn't specified with user-defined indices,
567 I<any> attempt to index it via C<.{}> will fail:
568
569 my @dwarves[7]; # No user-defined indices;
570
571 say @dwarves{1}; # Fails: can't map .{1} to a standard .[] index
572
573 When C<:k>, C<:kv>, or C<:p> is applied to an array slice, it returns
574 the kind of indices that were used to produce the slice:
575
576 @arr[0..2]:p # 0=>'one', 1=>'two', 2=>'three'
577 @arr{1,3,5}:p # 1=>'one', 3=>'two', 5=>'three'
578
579 Adverbs may be applied only to operators, not to terms, so C<:k>,
580 C<:kv>, and C<:p> may not be applied to a full array. However, you
581 may apply an adverb to a Zen slice, which can indicate which set of
582 keys are desired:
583
584 my @arr{1,3,5,7,9} = <one two three four five>;
585
586 say @arr[]:k; # 0, 1, 2, 3, 4
587 say @arr{}:k; # 1, 3, 5, 7, 9
588
589 The C<.keys> method also returns the keys of all existing elements.
590 For a multidimensional array each key is a list containing one value for
591 each dimension.
592
593 The C<.shape> method also works on such an array; it returns a
594 slice of the valid keys for each dimension. The component list
595 representing an infinite dimension is necessarily represented lazily.
596 (Note that the C<.shape> method returns the possible keys, but the
597 cartesian product of the key slice dimensions is not guaranteed to
598 index existing elements in every case. That is, this is not intended
599 to reflect current combinations of keys in use (use C<:k> for that).
023d54b typo fix
sunnavy authored
600 Note that you have to distinguish these two forms:
68d062f Move synopses to their new home.
pmichaud authored
601
602 @array[].shape # the integer indices
603 @array{}.shape # the user-defined indices
604
605 =head1 Inclusive subscripts
606
607 Within any array look-up (whether via C<.[]> or C<.{}>), the "whatever
608 star" can be used to indicate "all the indices". The meaning of
609 "all" here depends on the definition of the array. If there are no
610 pre-specified indices, the star means "all the indices of currently
611 allocated elements":
612
613 my @data # No pre-specified indices
614 = 21, 43, 9, 11; # Four elements allocated
615 say @data[*]; # So same as: say @data[0..3]
616
617 @data[5] = 101; # Now six elements allocated
618 say @data[*]; # So same as: say @data[0..5]
619
620 If the array is defined with predeclared fixed indices (either standard
621 or user-defined), the star means "all the defined indices":
622
2c33896 [S02,S09,S32] get rid of :by fossils
lwall authored
623 my @results{1,3...99} # Pre-specified indices
68d062f Move synopses to their new home.
pmichaud authored
624 = 42, 86, 99, 1;
625
626 say @results[*]; # Same as: say @results[0..49]
2c33896 [S02,S09,S32] get rid of :by fossils
lwall authored
627 say @results{*}; # Same as: say @results{1,3...99}
68d062f Move synopses to their new home.
pmichaud authored
628
629 You can omit unallocated elements, either by using the C<:v> adverb:
630
631 say @results[*]:v; # Same as: say @results[0..3]
632 say @results{*}:v; # Same as: say @results{1,3,5,7}
633
634 or by using a "zen slice":
635
636 say @results[]; # Same as: say @results[0..3]
637 say @results{}; # Same as: say @results{1,3,5,7}
638
639 A "whatever star" can also be used as the starting-point of a range
640 within a slice, in which case it means "from the first index":
641
642 say @calendar[*..5]; # Same as: say @calendar[0..5]
643 say @calendar{*..Jun}; # Same as: say @calendar{Jan..Jun}
644
645 say @data[*..3]; # Same as: say @data[0..3]
646
647 As the end-point of a range, a lone "whatever" means "to the maximum
648 specified index" (if fixed indices were defined):
649
650 say @calendar[5..*]; # Same as: say @calendar[5..11]
651 say @calendar{Jun..*}; # Same as: say @calendar{Jun..Dec}
652
653 or "to the largest allocated index" (if there are no fixed indices):
654
655 say @data[1..*]; # Same as: say @results[1..5]
656
657 =head1 Negative and differential subscripts
658
47bf4f1 Make 0..* versus 0..*-1 less confusing to spec greppers
skids authored
659 The "whatever star" behaves differently than described above when
9d5a38d P6 Synopsis : ws changes - remove trailing spaces
Darren_Duncan authored
660 it is treated as a number inside a standard index. In this case
661 it evaluates to the length of the array. This provides a clean
662 and consistent way to count back or forwards from the end of an
47bf4f1 Make 0..* versus 0..*-1 less confusing to spec greppers
skids authored
663 array:
68d062f Move synopses to their new home.
pmichaud authored
664
665 @array[*-$N] # $N-th element back from end of array
666 @array[*+$N] # $N-th element at or after end of array
667
668 More specifically:
669
670 @array[*-2] # Second-last element of the array
671 @array[*-1] # Last element of the array
672 @array[+*] # First element after the end of the array
673 @array[*+0] # First element after the end of the array
674 @array[*+1] # Second element after the end of the array
675
676 @array[*-3..*-1] # Slice from third-last element to last element
566592f [S03,S09]
lwall authored
677 @array[*-3..*] # (Same thing via range truncation)
68d062f Move synopses to their new home.
pmichaud authored
678
679 (Note that, if a particular array dimension has fixed indices, any
566592f [S03,S09]
lwall authored
680 attempt to index elements after the last defined index will fail,
681 except in the case of range truncation described earlier.)
68d062f Move synopses to their new home.
pmichaud authored
682
566592f [S03,S09]
lwall authored
683 Negative subscripts are never allowed for standard subscripts unless
684 the subscript is declared modular.
68d062f Move synopses to their new home.
pmichaud authored
685
ea2a000 [Spec]reverted \x20 to \xC2A0. "Perl 6" and "Perl 5" are words, so we…
jimmy authored
686 The Perl 6 semantics avoids indexing discontinuities (a source of subtle
68d062f Move synopses to their new home.
pmichaud authored
687 runtime errors), and provides ordinal access in both directions at both
688 ends of the array.
689
690 =head1 Mixing subscripts
691
692 Occasionally it's convenient to be able to mix standard and user-defined
693 indices in a single look-up.
694
695 Within a C<.[]> indexing operation you can use C<*{$idx}> to
696 convert a user-defined index C<$idx> to a standard index. That is:
697
698 my @lengths{ Months } = (31,28,31,30,31,30,31,31,30,31,30,31);
699
700 @lengths[ 2 .. *{Oct} ] # Same as: @lengths[ 2 .. 9 ]
701
702 Similarly, within a C<.{}> indexing operation you can use C<*[$idx]>
703 to convert from standard indices to user-defined:
704
705 @lengths{ *[2] .. Oct } # Same as: @lengths{ Mar .. Oct }
706
707 In other words, when treated as an array within an indexing
708 operation, C<*> allows you to convert between standard and
709 user-defined indices, by acting like an array of the indices
710 of the indexed array. This is especially useful for mixing
711 standard and user-defined indices within multidimensional
712 array look-ups:
713
714 # First three business hours of every day in December...
715 @calendar{Dec; *; *[0..2]}
716
717 # Last three business hours of first three days in July...
718 @calendar[*{July}; 0..2; *-3..*-1]
719
720 Extending this feature, you can use C<**> within an indexing operation
721 as if it were a multidimensional array of I<all> the indices of a fixed
722 number of dimensions of the indexed array:
723
724 # Last three business hours of first three days in July...
725 @calendar{ July; **[0..2; *-3..*-1] }
726
727 # Same...
728 @calendar[ **{July; 1..3}; *-3..*-1]
729
730 It is also possible to stack subscript declarations of various
731 types, including a final normal signature to specify named args
732 and return type:
733
734 my @array[10]{'a'..'z'}(:$sparse --> MyType);
735
ca7c59f [S09] reserve the final () declaration syntax without committing to i…
lwall authored
736 [Note: the final signature syntax is merely reserved for now, and
737 not expected to work until we figure out what it really means, if
738 it means anything.]
739
68d062f Move synopses to their new home.
pmichaud authored
740 =head1 PDL support
741
742 An array C<@array> can be tied to a PDL at declaration time:
743
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
744 my num @array[||@mytensorshape] is PDL;
68d062f Move synopses to their new home.
pmichaud authored
745 my @array is PDL(:shape(2;2;2;2)) of int8;
746
747 PDLs are allowed to assume a type of C<num> by default rather than
748 the usual simple scalar. (And in general, the type info is merely
749 made available to the "tie" implementation to do with what it will.
750 Some data structures may ignore the "of" type and just store everything
751 as general scalars. Too bad...)
752
753 Arrays by default are one dimensional, but may be declared to have any
754 dimensionality supported by the implementation. You may use arrays
755 just like scalars -- the main caveat is that you have to use
756 binding rather than assignment to set one without copying:
757
2c33896 [S02,S09,S32] get rid of :by fossils
lwall authored
758 @b := @a[0,2,4 ... *]
68d062f Move synopses to their new home.
pmichaud authored
759
760 With PDLs in particular, this might alias each of the individual
761 elements rather than the array as a whole. So modifications to @b
762 are likely to be reflected back into @a. (But maybe the PDLers will
763 prefer a different notation for that.)
764
765 The dimensionality of an array may be declared on the variable, but
766 the actual dimensionality of the array depends on how it was created.
767 Reconciling these views is a job for the particular array implementation.
768 It's not necessarily the case that the declared dimensionality must match
769 the actual dimensionality. It's quite possible that the array variable
770 is deliberately declared with a different dimensionality to provide a
771 different "view" on the actual value:
772
773 my int @array[2;2] is Puddle .= new(:shape(4) <== 0,1,2,3);
774
775 Again, reconciling those ideas is up to the implementation, C<Puddle>
776 in this case. The traits system is flexible enough to pass any
777 metadata required, including ideas about sparseness, raggedness,
778 and various forms of non-rectangleness such as triangleness.
779 The implementation should probably carp about any metadata it doesn't
780 recognize though. The implementation is certainly free to reject
781 any object that doesn't conform to the variable's shape requirements.
782
783 =head1 Subscript and slice notation
784
785 A subscript indicates a "slice" of an array. Each dimension of an
786 array is sliced separately, so a multidimensional slice subscript
787 may be supplied as a semicolon-separated list of slice sublists.
788 A three-dimensional slice might look like this:
789
b08753d [Spec] simplify series operator by moving generator function to the l…
lwall authored
790 @x[0..10; 1,0; 1,*+2...*]
68d062f Move synopses to their new home.
pmichaud authored
791
792 It is up to the implementation of C<@x> to decide how aggressively
793 or lazily this subscript is evaluated, and whether the slice entails
794 copying. (The PDL folks will generally want it to merely produce a
795 virtual PDL where the new array aliases its values back into the
796 old one.)
797
798 Of course, a single element can be selected merely by providing a single
799 index value to each slice list:
800
801 @x[0;1;42]
802
803 =head1 Cascaded subscripting of multidimensional arrays
804
805 For all multidimensional array types, it is expected that cascaded subscripts:
806
807 @x[0][1][42]
b08753d [Spec] simplify series operator by moving generator function to the l…
lwall authored
808 @x[0..10][1,0][1,*+2...*]
68d062f Move synopses to their new home.
pmichaud authored
809
810 will either fail or produce the same results as the equivalent
811 semicolon subscripts:
812
813 @x[0;1;42]
b08753d [Spec] simplify series operator by moving generator function to the l…
lwall authored
814 @x[0..10; 1,0; 1,*+2...*]
68d062f Move synopses to their new home.
pmichaud authored
815
816 Built-in array types are expected to succeed either way, even if
817 the cascaded subscript form must be implemented inefficiently by
818 constructing temporary slice objects for later subscripts to use.
819 (User-defined types may choose not to support the cascaded form, but
820 if so, they should fail rather than providing different semantics.)
821 As a consequence, for built-in types of declared shape, the appropriate
822 number of cascaded subscripts may always be optimized into the
823 semicolon form.
824
825 =head1 The semicolon operator
826
827 At the statement level, a semicolon terminates the current expression.
828 Within any kind of bracketing construct, semicolon notionally separates
829 the sublists of a multidimensional slice, the interpretation of
830 which depends on the context. Such a semicolon list puts each of its
4bd6b5e [S06,S09] rename 'is ref' to 'is parcel' and make it synonymous with \
lwall authored
831 sublists into a C<Parcel>, deferring the context of the sublist until
68d062f Move synopses to their new home.
pmichaud authored
832 it is bound somewhere. The storage of these sublists is hidden in
833 the inner workings of the list. It does not produce a list of lists
834 unless the list as a whole is bound into a slice context.
835
836 Single dimensional arrays expect simple slice subscripts, meaning
837 they will treat a list subscript as a slice in the single dimension of
838 the array. Multi-dimensional arrays, on the other hand, know how to
839 handle a multidimensional slice, with one subslice for each dimension. You need not specify
840 all the dimensions; if you don't, the unspecified dimensions are
841 "wildcarded". Supposing you have:
842
843 my num @nums[3;3;3];
844
845 Then
846
847 @nums[0..2]
848
849 is the same as
850
851 @nums[0..2;]
852
853 which is the same as
854
855 @nums[0,1,2;*;*]
856
857 But you should maybe write the last form anyway just for good
858 documentation, unless you don't actually know how many more dimensions
859 there are. For that case use C<**>:
860
861 @nums[0,1,2;**]
862
863 If you wanted that C<0..2> range to mean
864
865 @nums[0;1;2]
866
f4646ad [Spec] squash [;] fossils noticed by eternaleye++
lwall authored
867 it is not good enough to use the C<|> prefix operator, because
868 that interpolates at the comma level, so:
68d062f Move synopses to their new home.
pmichaud authored
869
f4646ad [Spec] squash [;] fossils noticed by eternaleye++
lwall authored
870 @nums[ |(0,1,2) ]
871
872 just means
873
874 @nums[ 0,1,2 ];
875
876 Instead, to interpolate at the semicolon level, you need to use the C<||> prefix operator:
877
878 @nums[ ||(0..2) ]
68d062f Move synopses to their new home.
pmichaud authored
879
880 The zero-dimensional slice:
881
882 @x[]
883
884 is assumed to want everything, not nothing. It's particularly handy
ea2a000 [Spec]reverted \x20 to \xC2A0. "Perl 6" and "Perl 5" are words, so we…
jimmy authored
885 because Perl 6 (unlike Perl 5) won't interpolate a bare array without brackets:
68d062f Move synopses to their new home.
pmichaud authored
886
887 @x = (1,2,3);
888 say "@x = @x[]"; # prints @x = 1 2 3
889
ea2a000 [Spec]reverted \x20 to \xC2A0. "Perl 6" and "Perl 5" are words, so we…
jimmy authored
890 Lists are lazy in Perl 6, and the slice lists are no exception.
2c33896 [S02,S09,S32] get rid of :by fossils
lwall authored
891 In particular, list generators are not flattened until they
68d062f Move synopses to their new home.
pmichaud authored
892 need to be, if ever. So a PDL implementation is free to steal the
2c33896 [S02,S09,S32] get rid of :by fossils
lwall authored
893 values from these generators and "piddle" around with them:
894
895 @nums[$min ..^ $max]
b08753d [Spec] simplify series operator by moving generator function to the l…
lwall authored
896 @nums[$min, *+3 ... $max]
897 @nums[$min, *+3 ... *]
898 @nums[1,*+2...*] # the odds
899 @nums[0,*+2...*] # the evens
2c33896 [S02,S09,S32] get rid of :by fossils
lwall authored
900 @nums[1,3...*] # the odds
901 @nums[0,2...*] # the evens
68d062f Move synopses to their new home.
pmichaud authored
902
903 =head1 PDL signatures
904
ea2a000 [Spec]reverted \x20 to \xC2A0. "Perl 6" and "Perl 5" are words, so we…
jimmy authored
905 To rewrite a Perl 5 PDL definition like this:
68d062f Move synopses to their new home.
pmichaud authored
906
907 pp_def(
908 'inner',
909 Pars => 'a(n); b(n); [o]c(); ', # the signature, see above
910 Code => 'double tmp = 0;
911 loop(n) %{ tmp += $a() * $b(); %}
912 $c() = tmp;' );
913
914 you might want to write a macro that parses something vaguely
915 resembling this:
916
917 role PDL_stuff[::TYPE] {
918 PDLsub inner (@a[$n], @b[$n] --> @c[]) {
919 my TYPE $tmp = 0;
920 for ^$n {
921 $tmp += @a[$_] * @b[$_];
922 }
923 @c[] = tmp;
924 }
925 }
926
927 where that turns into something like this:
928
929 role PDL_stuff[::TYPE] {
930 multi inner (TYPE @a, TYPE @b --> TYPE) {
931 my $n = @a.shape[0]; # or maybe $n is just a parameter
932 assert($n == @b.shape[0]); # and this is already checked by PDL
933 my TYPE $tmp = 0;
934 for ^$n {
935 $tmp += @a[$_] * @b[$_];
936 }
937 return $tmp;
938 }
939 }
940
941 Then any class that C<does PDL_stuff[num]> has an C<inner()> function that
942 can (hopefully) be compiled down to a form useful to the PDL threading
943 engine. Presumably the macro also stores away the PDL signature
944 somewhere safe, since the translated code hides that information
945 down in procedural code. Possibly some of the C<[n]> information can
946 come back into the signature via C<where> constraints on the types.
947 This would presumably make multimethod dispatch possible on similarly
948 typed arrays with differing constraints.
949
ea2a000 [Spec]reverted \x20 to \xC2A0. "Perl 6" and "Perl 5" are words, so we…
jimmy authored
950 (The special destruction problems of Perl 5's PDL should go away with
7d7fdaf @diakopter remove mentions of particular run-time systems from S09
diakopter authored
951 Perl 6's GC approach, as long as PDL's objects are registered with the
952 run-time system correctly.)
68d062f Move synopses to their new home.
pmichaud authored
953
f78c2b0 [S02,S09] capitalize Junction again
lwall authored
954 =head1 Autothreading types
955
956 =head2 Junctions
68d062f Move synopses to their new home.
pmichaud authored
957
958 A junction is a superposition of data values pretending to be a single
959 data value. Junctions come in four varieties:
960
961 list op infix op
962 ======= ========
963 any() |
964 all() &
965 one() ^
966 none() (no "nor" op defined)
967
968 Note that the infix ops are "list-associative", insofar as
969
970 $a | $b | $c
971 $a & $b & $c
972 $a ^ $b ^ $c
973
974 mean
975
976 any($a,$b,$c)
977 all($a,$b,$c)
978 one($a,$b,$c)
979
980 rather than
981
982 any(any($a,$b),$c)
983 all(all($a,$b),$c)
984 one(one($a,$b),$c)
985
986 Some contexts, such as boolean contexts, have special rules for dealing
987 with junctions. In any item context not expecting a junction of
988 values, a junction produces automatic parallelization of the algorithm.
989 In particular, if a junction is used as an argument to any routine
990 (operator, closure, method, etc.), and the scalar parameter you
991 are attempting to bind the argument to is inconsistent with the
f78c2b0 [S02,S09] capitalize Junction again
lwall authored
992 C<Junction> type, that routine is "autothreaded", meaning the routine
68d062f Move synopses to their new home.
pmichaud authored
993 will be called automatically as many times as necessary to process
994 the individual scalar elements of the junction in parallel.
f78c2b0 [S02,S09] capitalize Junction again
lwall authored
995 (C<Each> types are also autothreaded, but are serial and lazy in nature.)
68d062f Move synopses to their new home.
pmichaud authored
996
997 The results of these separate calls are then recombined into a
998 single junction of the same species as the junctive argument.
999 If two or more arguments are junctive, then the argument that is
1000 chosen to be "autothreaded" is:
1001
1002 =over
1003
1004 =item *
1005
1006 the left-most I<all> or I<none> junction (if any), or else
1007
1008 =item *
1009
1010 the left-most I<one> or I<any> junction
1011
1012 =back
1013
1014 with the tests applied in that order.
1015
1016 Each of the resulting set of calls is then recursively autothreaded
1017 until no more junctive arguments remain. That is:
1018
1019 substr("camel", 0|1, 2&3)
1020
1021 -> all( substr("camel", 0|1, 2), # autothread the conjunctive arg
1022 substr("camel", 0|1, 3)
1023 )
1024
1025 -> all( any( substr("camel", 0, 2), # autothread the disjunctive arg
1026 substr("camel", 1, 2),
1027 ),
1028 any( substr("camel", 0, 3), # autothread the disjunctive arg
1029 substr("camel", 1, 3),
1030 )
1031 )
1032
1033 -> all( any( "ca", # evaluate
1034 "am",
1035 ),
1036 any( "cam",
1037 "ame",
1038 )
1039
1040 -> ("ca"|"am") & ("cam"|"ame") # recombine results in junctions
1041
1042 Junctions passed as part of a container do not cause autothreading
1043 unless individually pulled out and used as a scalar. It follows that
1044 junctions passed as members of a "slurpy" array or hash do not cause
1045 autothreading on that parameter. Only individually declared parameters
1046 may autothread. (Note that positional array and hash parameters are
1047 in fact scalar parameters, though, so you could pass a junction of
1048 array or hash objects.)
1049
2083dff Be explicit about design uncertainty wrt semantics of junctional coll…
lwall authored
1050 The exact semantics of autothreading with respect to control structures
1051 are subject to change over time; it is therefore erroneous to pass
1052 junctions to any control construct that is not implemented via as a
831d805 [spec] random cleanup of fossils from before proto became a multi wra…
lwall authored
1053 normal single dispatch or function call. In particular, threading junctions
2083dff Be explicit about design uncertainty wrt semantics of junctional coll…
lwall authored
1054 through conditionals correctly could involve continuations, which
1055 are almost but not quite mandated in Perl 6.0.0. Alternately, we
1056 may decide that boolean contexts always collapse the junction by
1057 default, and the exact value that allowed the collapse to "true"
1058 is not available. A variant of that is to say that if you want
1059 autothreading of a control construct, you must assign or bind to
e7a6479 much less tentatively go with the Mu suggestion from TheDamian++
lwall authored
1060 a non-C<Mu> container before the control construct, and that
2083dff Be explicit about design uncertainty wrt semantics of junctional coll…
lwall authored
1061 assignment or binding to any such container results in autothreading
1062 the rest of the dynamic scope. (The performance ramifications of this
1063 are not clear without further experimentation, however.) So for now,
1064 please limit use of junctions to situations where the eventual binding
1065 to a scalar formal parameter is clear.
1066
f78c2b0 [S02,S09] capitalize Junction again
lwall authored
1067 =head2 Each
8d3a005 [S02] get rid of the each() comprehension
moritz authored
1068
f78c2b0 [S02,S09] capitalize Junction again
lwall authored
1069 [This section is considered conjectural.]
8d3a005 [S02] get rid of the each() comprehension
moritz authored
1070
f78c2b0 [S02,S09] capitalize Junction again
lwall authored
1071 An C<Each> type autothreads like a junction, but does so serially and lazily,
1072 and is used only for its mapping capabilities. The prototypical use
1073 case is where a hyperoperator would parallelize in an unfortunate way:
8d3a005 [S02] get rid of the each() comprehension
moritz authored
1074
f78c2b0 [S02,S09] capitalize Junction again
lwall authored
1075 @array».say # order not guaranteed
1076 @array.each.say # order guaranteed
8d3a005 [S02] get rid of the each() comprehension
moritz authored
1077
68d062f Move synopses to their new home.
pmichaud authored
1078 =head1 Parallelized parameters and autothreading
1079
1080 Within the scope of a C<use autoindex> pragma (or equivalent, such as
1081 C<use PDL> (maybe)), any closure that uses parameters as subscripts
1082 is also a candidate for autothreading. For each such parameter, the
1083 compiler supplies a default value that is a range of all possible
1084 values that subscript can take on (where "possible" is taken to
1085 mean the declared shape of a shaped array, or the actual shape of an
1086 autoextending array). That is, if you have a closure of the form:
1087
1088 -> $x, $y { @foo[$x;$y] }
1089
1090 then the compiler adds defaults for you, something like:
1091
1092 -> $x = @foo.shape[0].range,
1093 $y = @foo.shape[1].range { @foo[$x;$y] }
1094
1095 where each such range is autoiterated for you.
1096
1097 In the abstract (and often in the concrete), this puts an implicit
1098 loop around the block of the closure that visits all the possible
1099 subscript values for that dimension (unless the parameter is actually
1100 supplied to the closure, in which case the supplied value is used as
1101 the slice subscript instead).
1102
1103 This implicit loop is assumed to be parallelizable.
1104
1105 So to write a typical tensor multiplication:
1106
1107 Cijkl = Aij * Bkl
1108
1109 you can simply call a closure with no arguments, allowing the C<autoindex>
1110 pragma to fill in the defaults:
1111
1112 use autoindex;
1113 -> $i, $j, $k, $l { @c[$i; $j; $k; $l] = @a[$i; $j] * @b[$k; $l] }();
1114
1115 or you can use the C<do BLOCK> syntax (see L<S04/"The do-once loop">) to
1116 call that closure, which also implicitly iterates:
1117
1118 use autoindex;
1119 do -> $i, $j, $k, $l {
1120 @c[$i; $j; $k; $l] = @a[$i; $j] * @b[$k; $l]
1121 }
1122
1123 or even use placeholder variables instead of a parameter list:
1124
1125 use autoindex;
1126 do { @c[$^i; $^j; $^k; $^l] = @a[$^i; $^j] * @b[$^k; $^l] };
1127
1128 That's almost pretty.
1129
1130 It is erroneous for an unbound parameter to match multiple existing array
1131 subscripts differently. (Arrays being created don't count.)
1132
1133 Note that you could pass any of $i, $j, $k or $l explicitly, or prebind
1134 them with a C<.assuming> method, in which only the unbound parameters
1135 autothread.
1136
1137 If you use an unbound array parameter as a semicolon-list interpolator
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
1138 (via the C<< prefix:<||> >> operator), it functions as a wildcard list of
68d062f Move synopses to their new home.
pmichaud authored
1139 subscripts that must match the same everywhere that parameter is used.
1140 For example,
1141
86b057a [S06,S09] change **() special form to prefix:<||> by analogy to prefi…
lwall authored
1142 do -> @wild { @b[ ||@wild.reverse ] = @a[ ||@wild ] };
68d062f Move synopses to their new home.
pmichaud authored
1143
1144 produces an array with the dimensions reversed regardless of the
96ff2c0 [S09] @@, what's that? Never heard of it...
lwall authored
1145 dimensionality of C<@a>.
68d062f Move synopses to their new home.
pmichaud authored
1146
1147 The optimizer is, of course, free to optimize away any implicit loops
1148 that it can figure out how to do more efficiently without changing
1149 the semantics.
1150
1151 See RFC 207 for more ideas on how to use autothreading (though the syntax
1152 proposed there is rather different).
1153
1154 =head1 Hashes
1155
1156 Like arrays, you can specify hashes with multiple dimensions and fixed
1157 sets of keys:
1158
1159 my num %hash{<a b c d e f>}; # Only valid keys are 'a'..'f'
1160 my num %hash{'a'..'f'}; # Same thing
1161
1162 my %rainfall{ Months; 1..31 } # Keys: Jan..Dec ; 1..31
1163
1164 Unlike arrays, you can also specify a hash dimension via a non-
1165 enumerated type, which then allows all values of that type as keys in
1166 that dimension:
1167
1168 my num %hash{<a b c d e f>; Str}; # 2nd dimension key may be any string
1169 my num %hash{'a'..'f'; Str}; # Same thing
1170
1171 my %rainfall{ Months; Int }; # Keys: Jan..Dec ; any integer
1172
1173 To declare a hash that can take any object as a key rather than
1174 just a string or integer, say something like:
1175
1176 my %hash{Any};
1177 my %hash{*};
1178
1179 A hash of indeterminate dimensionality is:
1180
1181 my %hash{**};
1182
1183 You can limit the keys to objects of particular types:
1184
be3c784 [S02,S03,S09,S11] Changed any example that had the animal "Cat" to ha…
wayland authored
1185 my Fight %hash{Dog; Squirrel where {!.scared}};
68d062f Move synopses to their new home.
pmichaud authored
1186
e7a6479 much less tentatively go with the Mu suggestion from TheDamian++
lwall authored
1187 The standard Hash:
68d062f Move synopses to their new home.
pmichaud authored
1188
e7a6479 much less tentatively go with the Mu suggestion from TheDamian++
lwall authored
1189 my %hash;
1190
1191 is really short for:
1192
524d26c @moritz [S09] fix coercion syntax usage, TimToady++
moritz authored
1193 my Mu %hash{Str(Any)};
68d062f Move synopses to their new home.
pmichaud authored
1194
1195 Note that any type used as a key must be intrinsically immutable,
1196 or it has to be able to make a copy that functions as an immutable key,
1197 or it has to have copy-on-write semantics. It is erroneous to change
1198 a key object's value within the hash except by deleting it and reinserting
1199 it.
1200
2e9c0e4 Clarification of hash key sort order, based on S32
jani authored
1201 The order of hash keys is implementation dependent and arbitrary.
1202 Unless C<%hash> is altered in any way, successive calls to C<.keys>,
1203 C<.kv>, C<.pairs>, C<.values>, or C<.iterator> will iterate over the
1204 elements in the same order.
1205
68d062f Move synopses to their new home.
pmichaud authored
1206 =head1 Autosorted hashes
1207
1208 The default hash iterator is a property called C<.iterator> that can be
1209 user replaced. When the hash itself needs an iterator for C<.pairs>,
1210 C<.keys>, C<.values>, or C<.kv>, it calls C<%hash.iterator()> to
1211 start one. In item context, C<.iterator> returns an iterator object.
1212 In list context, it returns a lazy list fed by the iterator. It must
1213 be possible for a hash to be in more than one iterator at a time,
1214 as long as the iterator state is stored in a lazy list.
1215
1216 The downside to making a hash autosort via the iterator is that you'd
1217 have to store all the keys in sorted order, and resort it when the
1218 hash changes. Alternately, the entire hash could be tied to an ISAM
1219 implementation (not included (XXX or should it be?)).
1220
1221 For multidimensional hashes, the key returned by any hash iterator is
1222 a list of keys, the size of which is the number of declared dimensions
1223 of the hash. [XXX but this seems to imply another lookup to find the
1224 value. Perhaps the associated value can also be bundled in somehow.]
1225
1226 =head1 Autovivification
1227
1228 Autovivification will only happen if the vivifiable path is bound to
1229 a read-write container. Value extraction (that is, binding to a readonly
1230 or copy container) does not autovivify.
1231
1232 Note that assignment is treated the same way as binding to a copy container,
1233 so it does not autovivify its right side either.
1234
4bd6b5e [S06,S09] rename 'is ref' to 'is parcel' and make it synonymous with \
lwall authored
1235 Any mention of an expression within a C<Parcel> delays the autovivification
1236 decision to binding time. (Binding to a parcel parameter also defers the
68d062f Move synopses to their new home.
pmichaud authored
1237 decision.)
1238
ea2a000 [Spec]reverted \x20 to \xC2A0. "Perl 6" and "Perl 5" are words, so we…
jimmy authored
1239 This is as opposed to Perl 5, where autovivification could happen
68d062f Move synopses to their new home.
pmichaud authored
1240 unintentionally, even when the code looks like a non-destructive test:
1241
1242 # This is Perl 5 code
1243 my %hash;
1244 exists $hash{foo}{bar}; # creates $hash{foo} as an empty hash reference
1245
ea2a000 [Spec]reverted \x20 to \xC2A0. "Perl 6" and "Perl 5" are words, so we…
jimmy authored
1246 In Perl 6 these read-only operations are indeed non-destructive:
68d062f Move synopses to their new home.
pmichaud authored
1247
1248 my %hash;
1249 %hash<foo><bar> :exists; # %hash is still empty
1250
1251 But these bindings I<do> autovivify:
1252
1253 my %hash;
1254 my $val := %hash<foo><bar>;
1255
1256 my @array;
4bd6b5e [S06,S09] rename 'is ref' to 'is parcel' and make it synonymous with \
lwall authored
1257 my $parcel = \@array[0][0]; # $parcel is a Parcel object - see S02
1258 my :($obj) := $parcel; # @array[0][0] created here
68d062f Move synopses to their new home.
pmichaud authored
1259
1260 my @array;
1261 foo(@array[0][0]);
1262 sub foo ($obj is rw) {...} # same thing, basically
1263
1264 my %hash;
1265 %hash<foo><bar> = "foo"; # duh
1266
1267 This rule applies to C<Array>, C<Hash>, and any other container type that
f8fa7ce get rid of the term "protoobject" in favor of "type object" or just "…
lwall authored
1268 chooses to return an autovivifiable type object (see S12) rather than simply
68d062f Move synopses to their new home.
pmichaud authored
1269 returning C<Failure> when a lookup fails. Note in particular that, since
f8fa7ce get rid of the term "protoobject" in favor of "type object" or just "…
lwall authored
1270 autovivification is defined in terms of type objects rather than failure,
68d062f Move synopses to their new home.
pmichaud authored
1271 it still works under "use fatal".
1272
747e143 [S09] added table of operations that autovivify
masak authored
1273 This table solidifies the intuition that an operation pertaining to some data
0698449 @felher fix a few typos
felher authored
1274 structure causes the type object to autovivify to such an object:
747e143 [S09] added table of operations that autovivify
masak authored
1275
1276 operation autovivifies to
1277 ========= ===============
1278 push, unshift, .[] Array
eea3b6b [S09] removed .<> from the table
masak authored
1279 .{} Hash
747e143 [S09] added table of operations that autovivify
masak authored
1280
1281 In addition to the above data structures autovivifying, C<++> and C<--> will
1282 cause an C<Int> to appear, C<~=> will create a C<Str> etc; but these are
1283 natural consequences of the operators working on C<Failure>, qualitatively
1284 different from autovivifying containers.
1285
9d5a38d P6 Synopsis : ws changes - remove trailing spaces
Darren_Duncan authored
1286 The type of the type object returned by a non-successful lookup should
68d062f Move synopses to their new home.
pmichaud authored
1287 be identical to the type that would be returned for a successful lookup.
f8fa7ce get rid of the term "protoobject" in favor of "type object" or just "…
lwall authored
1288 The only difference is whether it's officially instantiated (defined) yet.
1289 That is, you cannot distinguish them via C<.WHAT> or C<.HOW>, only via
1290 C<.defined>.
68d062f Move synopses to their new home.
pmichaud authored
1291
f8fa7ce get rid of the term "protoobject" in favor of "type object" or just "…
lwall authored
1292 Binding of an autovivifiable type object to a non-writeable container
1293 translates the type object into a similar type object without
1294 its autovivifying closure and puts that new type object into the
68d062f Move synopses to their new home.
pmichaud authored
1295 container instead (with any pertinent historical diagnostic information
1296 carried over). There is therefore no magical method you can call on
f8fa7ce get rid of the term "protoobject" in favor of "type object" or just "…
lwall authored
1297 the readonly parameter that can magically autovivify the type object
68d062f Move synopses to their new home.
pmichaud authored
1298 after the binding. The newly bound variable merely appears to be a
f8fa7ce get rid of the term "protoobject" in favor of "type object" or just "…
lwall authored
1299 simple uninitialized value. (The original type object retains its
68d062f Move synopses to their new home.
pmichaud authored
1300 closure in case it is rebound elsewhere to a read-write container.)
1301
1302 Some implementation notes: Nested autovivifications work by making
f8fa7ce get rid of the term "protoobject" in favor of "type object" or just "…
lwall authored
1303 nested type objects that depend on each other. In the general case
1304 the containers must produce type objects any time they do not know
68d062f Move synopses to their new home.
pmichaud authored
1305 how the container will be bound. This includes when interpolated into
1306 any capture that has delayed binding:
1307
1308 \( 1, 2, %hash<foo><bar> ) # must defer
1309 \%hash<foo><bar> # must defer
1310
1311 In specific situations however, the compiler can know that a value
1312 can only be bound readonly. For instance, C<< infix:<+> >> is
1313 prototyped such that this can never autovivify:
1314
1315 %hash<foo><bar> + 42
1316
1317 In such a case, the container object need not go through the agony
1318 of calculating an autovivifying closure that will never be called.
1319 On the other hand:
1320
1321 %hash<foo><bar> += 42
1322
1323 binds the left side to a mutable container, so it autovivifies.
1324
1325 Assignment doesn't look like binding, but consider that it's really
1326 calling some kind of underlying set method on the container, which
1327 must be mutable in order to change its contents.
1328
1329 =for vim:set expandtab sw=4:
Something went wrong with that request. Please try again.