Skip to content


Subversion checkout URL

You can clone with
Download ZIP
100644 84 lines (61 sloc) 3.274 kb
4eb1c19 [svn r55] Notes about memory management strategy.
faassen authored
1 Memory management
2 =================
4 There can be two types of nodes:
6 * those connected to an existing tree
8 * those unconnected. These may be the top node of a tree
10 Nodes consist of a C-level libxml2 node, Node for short, and
11 optionally a Python-level proxy node, Proxy. Zero, one or more Proxies can
12 exist for a single Node.
14 Proxies are garbage collected automatically by Python. Nodes are not
15 garbage collected at all. Instead, explicit mechanisms exist for
16 Nodes to clear them and the tree they may be the top of.
18 A Node can be safely freed when:
20 * no Proxy is connected to this Node
22 * no Proxy cannot be created for this Node
24 A Proxy cannot be created to a CNode when:
26 * no Proxy exist for nodes that are connected to that Node
28 This is the case when:
30 * the Node is in a tree that has no Proxy connected to any of the nodes.
32 This means that the whole tree in such a condition can be freed.
34 Detecting whether a Node is in a tree thas has no Proxies connected to
35 it can be done by relying on Python's garbage collection
36 algorithm. Each Proxy can have a reference to the Proxy that points to
37 the top of the tree. In case of a document tree, this reference is to
38 the Document Proxy. When no more references exist in the system to the
39 top Proxy, this means no more Proxies exist that point to the Node
40 tree the top Proxy is the top of. If this Node tree is unconnected;
41 i.e. it is not a subtree, this means that tree can be safely garbage
42 collected.
44 A special case exists for document references. Each Proxy will always
5eec5b0 [svn r58] Some more thinking about memory management strategies.
faassen authored
45 have a reference to the Document Proxy, as any Node will have such a
4eb1c19 [svn r55] Notes about memory management strategy.
faassen authored
46 reference to the Document Node. This means that a Document Node can
47 only be garbage collected when no more Proxies at all exist anymore
48 which refer to the Document. This is a separate system from the
49 top-Node references, even though the top-node in many cases will be
50 the Document. This because there is no way to get to a node that is
51 not connected to the Document tree from a Document Proxy.
53 This approach requires a system that can keep track of the top of the
54 tree in any case. Usually this is simple: when a Proxy gets connected,
55 the tree top becomes the tree top of whatever node it is connected
56 to.
58 Sometimes this is more difficult: a Proxy may exist pointing to a node
59 in a subtree that just got connected. The top reference cannot be
5eec5b0 [svn r58] Some more thinking about memory management strategies.
faassen authored
60 updated. This is a problem in the following case:
62 a
63 b c h
64 d e f g i j
65 k
67 now imagine we have a proxy to k, K, and a proxy of i, I. They both
68 have a pointer to proxy H.
70 Now imagine i gets moved under g through proxy I. Proxy I will have an
71 updated pointer to proxy A. However, proxy K cannot be updated and still
72 points to H, from which it is now in fact disconnected.
74 proxy H cannot be removed now until proxy A is removed. In addition,
75 proxy A has a refcount that is too low because proxy K doesn't point
76 to it but should.
78 Another strategy involves having a reference count on the underlying
79 nodes, one per proxy. A node can only be freed if there is no
80 descendant-or-self that has the refcount higher than 0. A node, when
81 no more Python references to it exist, will check for refcounts first.
82 The drawback of this is potentially heavy tree-walking each time a proxy
83 can be removed.
Something went wrong with that request. Please try again.