Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 331 lines (234 sloc) 11.522 kB
56282c1 @rurban 'Updated 'parrot.github.com'
rurban authored
1 # Copyright (C) 2001-2014, Parrot Foundation.
2
3 =head1 NAME
4
5 docs/parrotbyte.pod - The Parrot Bytecode (PBC) Format
6
7 =head1 DESCRIPTION
8
9 This document describes the Parrot bytecode format.
10
11 =head1 THE HEADER
12
13 The 18-byte header consists of,
14
15 0 7
16 +----------+----------+----------+----------+
17 | Parrot Magic = fe:50:42:43:0d:0a:1a:0a |
18 +----------+----------+----------+----------+
19
20 which reads as C<"<fe>PBC\r\n<1a>\n">. The B<Magic> is stored as
21 endian-less sequence of C<unsigned char magic[18]>. The loader uses
22 the byteorder header to convert the rest to the host format. More
23 specifically, ALL words (non-bytes) in the bytecode file are stored in
24 native order, unless otherwise specified.
25
26 8 9 10 11
27 +----------+----------+----------+
28 | Wordsize | Byteorder| FloatType|
29 +----------+----------+----------+
30
31 The B<Wordsize> (or C<opcode_t> size) must be 4 (32-bit) or 8 (64 bit). The
32 bytecode loader is responsible for transforming the file into the VM native
33 wordsize on the fly. For performance, a utility F<pbc_dump> is provided to convert
34 PBCs on disk if they cannot be recompiled. See F<src/pbc_dump.c> for more
35 information.
36
37 B<Byteorder> currently supports two values: (0-Little Endian, 1-Big Endian)
38
39 B<FloatType> 0 is IEEE 754 8 byte double, FloatType 1 is
40 i386 little endian 12 byte long double, FloatType 2 is native
41 16 byte long double, either powerpc double double, intel 12 byte
42 padded long double or true IEEE 16 byte long double.
43
44 11 12 13 14 15 16
45 +----------+----------+----------+----------+----------+
46 | Major | Minor | Patch | BC Major | BC Minor |
47 +----------+----------+----------+----------+----------+
48
49 B<Major>, B<Minor>, B<Patch> for the version of Parrot that wrote
50 the bytecode file.
51
52 B<BC Major> and B<BC Minor> are for the internal bytecode version.
53
54 16 17 18 19 20 21 22
55 +----------+----------+----------+----------+----------+----------+
56 | UUID type| UUID size| *UUID data |
57 +----------+----------+----------+----------+----------+----------+
58
59 After the UUID type and size comes the UUID data pointer.
60
61 22*
62 +----------+----------+----------+----------+
63 | dir_format (1) |
64 +----------+----------+----------+----------+
65 | padding (0) |
66 +----------+----------+----------+----------+
67
68 B<dir_format> has length opcode_t and value 1 for PBC FORMAT 1,
69 defined in F<packfile.h>
70
71 =head1 PBC FORMAT 1
72
73 All segments are aligned at a 16 byte boundary. All segments share a common
74 header and are kept in directories, which itself is a PBC segment. All offsets
75 and sizes are in native opcodes of the machine that produced the PBC.
76
77 After the PBC header, the first PBC directory follows at offset 24*
78 starting with a:
79
80 =head2 Format 1 Segment Header
81
82 +----------+----------+----------+----------+
83 | total size in opcodes including this size |
84 +----------+----------+----------+----------+
85 | internal type (itype) |
86 +----------+----------+----------+----------+
87 | internal id (id) |
88 +----------+----------+----------+----------+
89 | size of opcodes following |
90 +----------+----------+----------+----------+
91
92 The B<size> entry may be followed by a stream of B<size> opcodes (starting 16
93 byte aligned), which may of course be no opcode stream at all for size zero.
94
95 After this common segment header there can be segment specific data determined
96 by the segment type. A segment without additional data, like the bytecode
97 segment, is a B<default> segment. No additional routines are required to unpack
98 such a segment.
99
100 =head2 Directory Segment
101
102 +----------+----------+----------+----------+
103 | number of directory entries |
104 +----------+----------+----------+----------+
105
106 +----------+----------+----------+----------+
107 | segment type |
108 +----------+----------+----------+----------+
109 | segment name ... |
110 | ... 0x00 padding |
111 +----------+----------+----------+----------+
112 | segment offset |
113 +----------+----------+----------+----------+
114 | segment op_count |
115 +----------+----------+----------+----------+
116
117 The B<op_count> at B<offset> must match the segments B<op_count> and is used to
118 verify the PBCs integrity.
119
120 Currently these segment types are defined:
121
122 =over 4
123
124 =item Directory segment
125
126 =item Unknown segment (conforms to a default segment)
127
128 =item Fixup segment
129
130 =item Constant table segment
131
132 =item Bytecode segment
133
134 =item Debug segment
135
136 =back
137
138 =head2 Segment Names
139
140 This is not determined yet.
141
142 =head2 Unknown (default) and bytecode segments
143
144 These have only the common segment header and the opcode stream appended. The
145 opcode stream is an F<mmap()>ed memory region, if your operating system
146 supports this (and if the PBC was read from a disk file). You have therefore to
147 consider these data as readonly.
148
149 =head2 Fixup segment
150
151 +----------+----------+----------+----------+
152 | number of fixup entries |
153 +----------+----------+----------+----------+
154
155 +----------+----------+----------+----------+
156 | fixup type (0) |
157 +----------+----------+----------+----------+
158 | label name ... |
159 | ... 0x00 padding |
160 +----------+----------+----------+----------+
161 | label offset |
162 +----------+----------+----------+----------+
163
164 The fixup type for constant or ascii strings has a label symbol that is the
165 name of the "sub" and an offset into the constant table, referencing a
166 Sub, Closure or Coroutine PMC.
167
168 =head2 Debug Segment
169
170 The opcode stream will contain one line number per bytecode instruction. No
171 information as to what file that line is from will be stored in this stream.
172
173 The header will start with a count of the number of source file to bytecode
174 position mappings that are in the header.
175
176 0 (relative)
177 +----------+----------+----------+----------+
178 | number of source => bytecode mappings |
179 +----------+----------+----------+----------+
180
181 A source to bytecode position mapping simply states that the bytecode that
182 starts from the specified offset up until the offset in the next mapping, or
183 if there is none up until the end of the bytecode, has its source in
184 location X.
185
186 A mapping always starts with the offset in the bytecode, followed by the
187 type of the mapping.
188
189 0 (relative)
190 +----------+----------+----------+----------+
191 | bytecode offset |
192 +----------+----------+----------+----------+
193
194 4
195 +----------+----------+----------+----------+
196 | mapping type |
197 +----------+----------+----------+----------+
198
199 There are 3 mapping types.
200
201 Type B<0> means there is no source available for the bytecode starting at the
202 given offset. No further data is stored with this type of mapping; the next
203 mapping continues immediately after it.
204
205 Type B<1> means the source is available in a file. An index into the constants
206 table follows, which will point to a string containing the filename.
207
208 Type B<2> means the source is available in a source segment. Another integer
209 follows, which will specify which source file in the source segment to use.
210
211 Note that the ordering of the offsets into the bytecode must be sequential;
212 a mapping for offset 100 cannot follow a mapping for offset 200, for
213 example.
214
215 =head2 CONSTANT TABLE SEGMENT
216
217 0 (relative)
218 +----------+----------+----------+----------+
219 | Constant Count (N) |
220 +----------+----------+----------+----------+
221
222 For each constant:
223
224 +----------+----------+----------+----------+
225 | Constant Type (T) |
226 +----------+----------+----------+----------+
227 | |
228 | S bytes of constant content |
229 : appropriate for representing :
230 | a value of type T |
231 | |
232 +----------+----------+----------+----------+
233
234
235 =head2 CONSTANTS
236
237 For integer constants:
238
239 << integer constants are represented as manifest constants in
240 the bytecode stream currently, limiting them to 32 bit values. >>
241
242 For number constants (S is constant, and is equal to C<sizeof(FLOATVAL)>):
243
244 +----------+----------+----------+----------+
245 | |
246 | S' bytes of Data |
247 | |
248 +----------+----------+----------+----------+
249
250 where
251
252 S' = S + (S % 4) ? (4 - (S % 4)) : 0
253
254 If S' E<gt> S, then the extra bytes are filled with zeros.
255
256
257 For string constants (S varies, and is the size of the particular string):
258
259 4, 4 + (16 + S'0), 4 + (16 + S'0) + (16 + S'1)
260 +----------+----------+----------+----------+
261 | Flags |
262 +----------+----------+----------+----------+
263 | Encoding |
264 +----------+----------+----------+----------+
265 | Type |
266 +----------+----------+----------+----------+
267 | Size (S) |
268 +----------+----------+----------+----------+
269 | |
270 : S' bytes of Data :
271 | |
272 +----------+----------+----------+----------+
273
274 where
275
276 S' = S + (S % 4) ? (4 - (S % 4)) : 0
277
278 If S' E<gt> S, then the extra bytes are filled with zeros.
279
280
281 =head2 BYTE CODE SEGMENT
282
283 The pieces that can be found in the bytecode segment are as follows:
284
285 +----------+----------+----------+----------+
286 | Operation Code |
287 +----------+----------+----------+----------+
288
289 +----------+----------+----------+----------+
290 | Register Argument |
291 +----------+----------+----------+----------+
292
293 +----------+----------+----------+----------+
294 | Integer Argument (Manifest Constant) |
295 +----------+----------+----------+----------+
296
297 +----------+----------+----------+----------+
298 | String Argument (Constant Table Index) |
299 +----------+----------+----------+----------+
300
301 +----------+----------+----------+----------+
302 | Number Argument (Constant Table Index) |
303 +----------+----------+----------+----------+
304
305 +----------+----------+----------+----------+
306 | PMC Argument (Constant Table Index) |
307 +----------+----------+----------+----------+
308
309 The number of arguments and the type of each argument can usually be determined
310 by consulting Parrot::Opcode, or programmatically by obtaining the op_info_t
311 structure for the opcode in question.
312
313 There are currently 4 opcodes that can take a variable number of arguments:
314 set_args, get_params, set_returns and get_results. These ops always have one
315 required argument, which is a PMC constant. Calling the elements VTABLE
316 function on this PMC will give the number of extra variable arguments that
317 follow.
318
319
320 =head2 SOURCE CODE SEGMENT
321
322 Currently there are no utilities that use this segment, even though it is
323 mentioned in some of the early Parrot documents.
324
325 =head1 SEE ALSO
326
327 F<packfile.c>, F<packfile.h>, F<packdump.c>, F<pf/*.c>, and the
328 B<pbc_dump> utility F<pbc_dump.c>.
329
330 =cut
Something went wrong with that request. Please try again.