-
Notifications
You must be signed in to change notification settings - Fork 138
/
infant.dev
328 lines (246 loc) · 12.4 KB
/
infant.dev
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
# Copyright: 2001-2004 The Perl Foundation. All Rights Reserved.
# $Id$
=head1 NAME
docs/dev/infant.dev - Infant Mortality
=head1 Overview
We have a garbage collector that needs to detect all dead objects (the
process is called DOD, for Dead Object Detection). Any Parrot
operation that might allocate memory can potentially trigger a DOD run
(if not enough memory is available, we need to free up some unused
objects to give us enough room to allocate our new object. But in
order to know what can be freed up, we need to know what's alive, so
we do a DOD pass.)
The DOD pass begins with a "root set" of objects that are known to be
in use -- the PMC and String registers, the Parrot stack, the global
symbol table, etc. Each of these objects is scanned for references to
other objects, recursively. All objects found during this search are
regarded as live. Everything else is considered dead and so is
available for reclamation.
The question of what should be in the root set is problematic.
Consider the case where you've created a PMC. If you immediately stuff
it into a register, then you're all set. But what if you're putting it
into an aggregate? If the aggregate isn't large enough, you'll need to
resize it, which might allocate memory, which might trigger a DOD run.
And that DOD run may free your freshly-created PMC.
In this case, the solution is simple: resize the array, if necessary,
before creating the PMC. But many situations are not so simple, and
the set of operations which could conceivably require more memory is
vast. This problem is referred to as the "infant mortality" problem.
Peter Gibbs, Mike Lambert, Dan Sugalski and others have considered
various solutions to this problem. Most of those solutions have been
implemented in some form or other. This document is an attempt to
compare and contrast all such proposals.
=head1 Solution 1: Stack Walking
One possible solution is to add the C stack into the root set. This
way, temporary PMCs that have not yet been anchored to the root set
will be found on the stack and treated as live. Actually, the C stack
is insufficient -- you must also scan through all processor registers,
and for some processors there may be a separate backing store for
those registers (eg the Sparc's register windows).
Uninitialized data on the stack and high alignment requirements for
stackframes can fool the DOD system with left over pointers from
previous calls. This is especially a problem for objects which need
early destruction.
+ No slowdown in the common case of no DOD
+ Convenient: the programmer does not need to code differently to
accommodate the garbage collection system.
- Unportable. Different processors need different code for scanning
their registers, stacks, register windows, etc. Also, on some
architectures you must scan every possible offset into the stack to
find all the pointers, while on others you must NOT scan every
possible offset or you'll get a bus error due to a misaligned
access.
- Slow DOD. A full stack walk takes quite a while, and this time
grows with nested interpreter calls, external code's stack usage, etc.
- Complex. The stack walk is necessarily conservative, in that it
must consider every valid pointer on the stack to potentially be a
traceable object. But some of those pointers may be stale, in which
case the memory they point to may have been partially reused for
some other purpose. Everything must operate within certain
constraints that guarantee that no invalid pointers will be
dereferenced and trigger a segmentation fault or bus error.
- Another side effect of the conservative nature of stack walking is
that the memory for these objects may never be returned to the
system, because it is always possible that there will be a stale
pointer lying around on the stack or in a register, and all such
pointers will be dereferenced.
=head1 Solution 2: Neonate flag
There are several possible variants of the neonate flag, but the basic
idea is to set a flag on newly created objects that prevents them from
being collected. At some point, this flag is cleared -- either as the
newborn object is anchored to the root set, or during the first DOD
pass after it was anchored, or explicitly when the object is no longer
needed.
+ Portable
+ Can return memory to the system when unneeded
+ Exact: the state of every object is always known precisely (no more
"this object MIGHT still be reachable")
- The coder must remember to clear the flag before discarding
unanchored objects
- The flag-clearing takes a small amount of time
- For some variants of this scheme, some time is consumed to clear
the flag for the common case of rooted objects
=head2 Subspecies of neonate flags
The variants of the neonate idea all hinge on exactly how and when the
flag is cleared.
=head2 Variant 1: explicit
The flag is always explicitly cleared by the coder.
+ Very simple
+ Fast DOD
- Slow for unanchored temporaries
- Slow for anchored objects
- Easy to forget to clear the flag for unanchored temporaries
- Easy to forget to clear the flag for anchored objects
- longjmp() can bypass the clearing
=head2 Variant 2: explicit for temporaries, cleared during anchoring
The flag is explicitly set for temporaries. All routines which anchor
an object to the root set also clear the flag.
+ Simple
+ Fast DOD
- Slow for unanchored temporaries
- Slow for anchored objects
- Easy to forget to clear the flag for unanchored temporaries
- Forces all anchoring operations to set the flag (so this disallows
direct assignment into a PMC register, for example)
- longjmp() can bypass the clearing
=head2 Variant 3: clear during DOD
The neonate flag is cleared during DOD when an object is encountered
during the recursive root set traversal. (Leopold Toetsch's trick of
setting the live_FLAG during creation is equivalent to this variation,
I think.)
+ Simple
+ Fast DOD (DOD already manipulates the flags)
- If there are multiple DOD runs before the object is anchored or
dies, it will be prematurely freed
=head2 Variant 4: generation count
This is the same as variant 3, except a "generation count" is
maintained in the interpreter so that the neonate flag is only cleared
during a later generation. The generation is incremented only at major
control points such between opcodes, so that there is no chance of
unanchored temporaries.
+ Fast DOD (DOD already manipulates the flags)
- Generation count must be maintained
- Disallows recursive opcode calls (necessary for eg implementing
vtable methods in pasm)
- Can temporarily use more memory (dead objects accumulate during the
current generation)
In order to allow recursive opcode calls, we could increment the
generation count in more places and make sure nothing is left
unanchored at those points, but that would gradually remove all
advantages of this scheme and make it more difficult to call existing
vtable methods (since you never know when they might start running
pasm code.)
=head2 Variant 5: generation stack
Notice that when using a generational count, you really only need to
test whether the current generation is _different_ from an object's
creation generation (which eliminates wraparound problems, too.) So
rather than testing against a single "current" generation, allow a
stack of multiple "current" generations. An object encountered during
DOD will have its neonate flag cleared only if it doesn't match any of
the "current" generation ids. This check can be optimized using a
conservative bit mask as a preliminary test.
+ Still faster DOD than stackwalking, though slower than the
other neonate variants
- Generation count must be maintained
- Generation stack must be maintained
- Disallows longjmp()'ing out of recursive opcode calls
- Can temporarily use more memory (dead objects accumulate during all
current generations)
=head2 Variant 6: Generation based on stack depth
Another similar idea is to use a generational system, with the
"current generation" as a value on the C stack, passed as an extra
argument after the interpreter. If a function creates temporary
objects it calls other functions with an increased generational count.
During a DOD run, any PMC with a generation less than the
current generation is considered live. Any PMC with a generation
greater than the current generation is considered free. This works
through longjmps and recursive run_cores.
+ Simple
+ No stack-walking
+ Works through longjmps and recursive run_cores
+ No explicit setting and clearing of flags
- Needs to change to change the signature of every Parrot function
- Nested temporaries can survive if there is no DOD-run between two
function calls with increased generation count
=head1 Solution 3: Explicit root set augmentation
=head2 Variant 1: Temporarily anchor objects
Provide a mechanism to temporarily
anchor an otherwise unanchored object to the root set. (eg, have an
array of objects associated with the interpreter that are all
considered to be part of the root set.) This has pretty much the same
advantages and disadvantages of explicit neonate flag setting:
+ Simple
+ Fast DOD
- Slow for unanchored temporaries
- Sometimes slow for anchored objects (depending on whether they need
to be temporarily anchored before the final anchoring)
- Easy to forget to remove temporaries from the root set
- Easy to double-anchor objects and forget to remove the temporary
anchoring
- longjmp() can bypass the unanchoring
Many of the same or similar variations also apply: objects could be
automatically removed from the temporary anchoring at generation
boundaries, etc.
=head2 Variant 2: Anchor early, anchor often
First place a new PMC in the root set (e.g. a register), then initialise
it. If that's too cumbersome, disable DOD; if that's suboptimal, use
active anchoring to some root set linked list for temporary PMCs.
+ Simple
+ Fast DOD (No stack-walking)
- DOD might be turned off for a long time (Maybe a recursive run_core
is called)
- Easy to forget to reenable DOD
- longjmp() can bypass reenabling of DOD (this might be hidden in the
wrapper functions as only one value needs to be restored)
=head2 Variant 3: Use a linked list of frames
The signature of every Parrot function is extended with an extra
parameter which is a parameter to a frame structure. All temporary
PMCs needs to put into such a frame structure. The first parameter of
this frame structure is a link to the previously used frame
structure. If a function that can do a DOD run is called a pointer to
the current frame is applied. The linked list of frames represents
always an exact list of the active temporaries on the C-stack.
+ Fast DOD-runs (only the known PMC-pointers are walked)
+ Exact
+ works through recursive run_cores and longjmp()
- signature of every Parrot function changes
- Creation of temporaries is complicated (Need to create a frame
first)
=head1 REFERENCES
=over 4
=item What is neonate?
http://groups.google.com/groups?th=468fc4aebca262f7
Brent Dax's better description of the problem than I have here
=item Mike Lambert proposing Variant 1
http://groups.google.com/groups?th=b2c1aebf64d6ed9a
This also has some macro-heavy proposals that I ignored.
=item Leopold Toetsch proposing Variant 3
http://groups.google.com/groups?th=dc51f11f441bc7d0
Also includes Steve Fink proposing Variant 1
=item Dan Sugalski proposing Variant 3
http://groups.google.com/groups?th=da3012ceb99bab3c
=item Peter Gibbs implementing Variant 4 and getting shot down
http://groups.google.com/groups?th=d2cd475367fc81aa
=item General discussion kicked off by this document
http://groups.google.com/groups?th=66fe6f12e11a5f8d
This thread also includes Benjamin Goldberg Variant 6
=item Dan thinks the stackwalk is unavoidable
http://groups.google.com/groups?th=f7e270609ef93161
=item Infant mortality pain
http://groups.google.com/groups?th=ad045a1baeba0c9a
This is a good thread for illustrating the pain that infant mortality
causes -- in the context of Parrot, I mean.
=item Numbers!
http://groups.google.com/groups?th=d7cd4ca31dcb4414
Gives some benchmark numbers for different approaches.
=item Generational stuff
http://groups.google.com/groups?th=808f38c656a49806
Early discussion that has some stuff I didn't go over here. Mostly
involves generational schemes.
=item Problems with stack-walking
http://groups.google.com/groups?th=f9fc9c6d28eae2b5
This thread also includes Juergen Boemmels Variant 3 of Solution 3
=back
=head1 CHANGES
2002-Dec-30: Initial Version by Steve Fink
2003-Aug-04: Some extra variants added by Juergen Boemmels