# Garbage Collection with Isolated Strong-Reference Cycles

One automatic memory management feature is Garbage Collection (GC), a stripped-down form of which is compile-time Automatic Reference Counting (ARC). A major practical difference to the programmer is handling of isolated strong-reference cycles (ISRC's).

## Steps Between Source Code and Execution

Before we explore this, it's worth clarifying some of the steps from source code (the code a programmer writes) to execution.

### Source Code

Source code is code written in a human-readable language, usually a high-level language like Python or C. (Sometimes people refer to assembly code as "source code", and indeed assembly is a human-readable language, but it's quite low-level.)

### Object Code

Source code gets compiled into object code, which is either bytecode designed to run on a virtual machine (VM, for example the Python interpreter contains a VM and the Java runtime is itself a VM), or machine code designed to be invoked by the operating system's shell or kernel and run natively on the physical machine (for example, Swift code and C code compile directly to machine code). (An assembly-code compiler is called an "assembler", and assembly-code compilation is called "assembly". "Disassembly" is the reverse translation, from machine code to human-readable assembly code.)

## Full-Featured GC

Full-featured GC (for example "tracing" GC as implemented by virtual machines for languages like Python---an interpreted language whose compiled object code is Python bytecode---or Java---a compiled language whose compiled object code is Java bytecode) is not vulnerable to ISRC's. A simple form of tracing GC continually runs graph search on the graph of known references (starting with root references, for example a user-defined global variable on the heap), traversing the entire graph from reference to reference.

At the end of the search, any object that was not reached is by definition not in usable scope (since the only way for an object to be in usable scope is for its memory address to be a root reference, or for its memory address to be pointed to by some object which is itself reachable from a root reference), and its memory can be freed.

The tradeoff is that the GC process (or thread) stays alive in the background at run-time, consuming memory and compute cycles.

### Generational GC

Both Python and Java improve on this tradeoff a bit by implementing "generational" GC, which avoids the overhead of a full trace by maintaining a list of object "generations". New objects are born into generation 0, and if they survive a round of GC, they get promoted to generation 1. When the GC needs to free up more space, it will start with generation 0, proceeding to garbage-collect generation 1 only if the generation-0 round does not free up enough space. CPython's GC, for example, keeps track of three distinct generations.

The idea is that most objects are "temporary", and go out of scope very quickly. Think of a line like `heads := RNG(seed=42).random() > 0.80`, which models the flip of a biased coin: the `RNG` instance is seeded and used to generate a standard Uniform pseudorandom variable, then immediately goes out of scope. Therefore, most objects can be garbage-collected soon after they are created. Focusing GC on young objects, therefore, is likely to yield a high "hit rate" of successful garbage collections, freeing up enough memory to proceed without having to sweep too many objects.

On the other hand, an object that survives a round of GC is probably "permanent" (think of a global variable, or an object assigned as a property of another object that is itself referenced by a global variable). Promoting such an object to the next generation keeps track of the fact that attempting to garbage-collect it in the future is probably a waste of time, to be avoided unless absolutely necessary.

#### Actual Efficiency Gains

Much as the actual efficiency gains of clever caching schemes depend critically on a program's actual data read/write patterns, the actual efficiency gains of a generational GC scheme depends critically on a program's actual allocation/deallocation patterns.

For example, generational GC on a program that allocates many new objects, allocates even more new objects, allows the original batch to go out of scope, allocates further new objects, allows the second batch to go out of scope, and so on, could end up _slower_ than non-generational GC, since there is operational overhead associated with keeping track of the generations. In this program, that overhead could be wasted, because it is always the youngest objects that remain in scope and the older objects that are allowed to go out of scope, so the GC could end up always needlessly sweeping the young un-collectable objects before moving on to the old collectable objects. In this case, a more efficient GC scheme might have been FIFO, whereby the oldest generations are swept first.

## ARC

On the other hand, ARC as implemented by e.g. the Swift compiler (`swiftc` on macOS) is a relatively space- and time-efficient form of GC, because it is a compile-time operation that simply injects explicit memory management instructions into the compiled machine code. (Arguably the only thing more time-efficient would be to decline to manage memory altogether; the ensuing memory leaks would of course be inelegant and totally unscalable.) These machine-code instructions are essentially the same as the instructions that a C compiler would have injected, but much more convenient to the programmer because they are automatic.

For example, unlike C programmers, in most cases Swift programmers don't need to remember to manually write a `free()` call in their source code when they're done with some object: The Swift compiler will "see" that the object's reference count (number of references to the object's address i.e. its memory location) has fallen to zero, hence its memory is no longer reachable (at least, not in any well-defined way: of course, a programmer could insist on invoking undefined behavior by attempting to access the memory at that address anyway), and inject the machine instructions corresponding to `free()` in the appropriate place, automatically.

The tradeoff is that ARC needs the programmer to "nudge" it in some cases.

### ISRC's

For example, Swift programmers assigning a closure that captures `self` as a property of `self` are familiar with the need to specify that the reference is weakly captured (you'll see them write `[weak self]`, which is a capture list): If they didn't do this, even when the `self` instance permanently left usable scope, the instance (which refers to the closure as a property) and the closure (which has "closed over" or "captured" the instance) would form an ISRC keeping each other alive for no reason. Unless the operating system is configured to detect and handle this at run-time, this would be a memory leak.

## Toy Example

Below, I show a toy example of GC at work detecting and handling an ISRC in Python. [1][1]

[1][1] Prior to Python 3.4 which implemented [PEP 442](https://www.python.org/dev/peps/pep-0442/), the below toy example would not have worked, because we defined a custom `__del__()` destructor for the `Foo` type to help announce what's going on under the hood. Although it could detect them, the CPython GC would not garbage-collect an ISRC of objects with custom `__del__()` destructors, because it could not decide in what order it should destroy the objects. Python programmers had to use `weakref` for at least one reference in a cycle of objects with custom destructors. This is the same as the Swift ARC's solution: By replacing one of the _strong_ references with a _weak_ reference, you turn an isolated strong-reference _cycle_ into an isolated strong-reference _chain_, at which juncture it's clear that you should start at the beginning of the chain---the object with no incoming references---and work your way out. PEP 442 got around this issue by simply "copping out" and making the order of destruction undefined, which means that the interpreter makes no promises and can choose whichever arbitrary order is most convenient.

In [1]:
import gc

In [2]:
class Foo:

    def __init__(self, name, other=None):
        self.name = name
        self.other = other

    def __repr__(self):
        return f"{self.name}Foo"

    def __del__(self):
        print(f"{repr(self)} is being freed!")

If we construct an object, assign it to a named variable, then immediately `del` that variable, the GC will destroy the object. Note that `del` does NOT destroy the object referenced by a variable name: It simply takes the variable name out of scope and decrements the referenced object's refcount by one.

In [3]:
eve_var = Foo(name="Eve")

# now EveFoo's refcount is one

del eve_var  # take `eve_var` out of scope and decrement EveFoo's refcount by one

# now EveFoo's refcount is zero, so she can be garbage-collected

EveFoo is being freed!


This works even if we don't `del` the variable, but simply reassign it:

In [4]:
reve_var = Foo(name="ReincarnatedEve")

# now ReincarnatedEveFoo's refcount is one

reve_var = None  # reassign `reve_var` to point to `None` and decrement ReincarnatedEveFoo's refcount by one

# now ReincarnatedEveFoo's refcount is zero, so she can be garbage-collected

ReincarnatedEveFoo is being freed!


This will _not_ work if we store the object somewhere else first, then `del` the variable..

In [5]:
rreve_var = Foo(name="ReincarnatedReincarnatedEve")

# now ReincarnatedReincarnatedEveFoo's refcount is one

anonymous_list = [rreve_var,]

# now ReincarnatedReincarnatedEveFoo's refcount is two

del rreve_var  # take `rreve_var` out of scope and decrement ReincarnatedReincarnatedEveFoo's refcount by one

# now ReincarnatedReincarnatedEveFoo's refcount is still one, so she cannot be garbage-collected

.. but it will work if we then also take that "somewhere else" out of scope (and, to avoid having to wait around for the sake of this demo, manually prompt the GC to run a full sweep)!

In [6]:
del anonymous_list  # decrement ReincarnatedReincarnatedEveFoo's refcount by one

# now ReincarnatedReincarnatedEveFoo's refcount is zero, so she can be garbage-collected

_ = gc.collect()  # manually prompt the GC to run a full sweep, suppressing its output

ReincarnatedReincarnatedEveFoo is being freed!


### ISRC

Objects that are unreachable will be garbage-collected, even if they are involved in an ISRC:

In [7]:
alice_var = Foo(name="Alice")
bob_var = Foo(name="Bob")

# now AliceFoo's and BobFoo's refcounts are each one

alice_var.other = bob_var
bob_var.other = alice_var

# now AliceFoo's and BobFoo's refcounts are each two

del alice_var, bob_var

# now AliceFoo's and BobFoo's refcounts are each still one, so they cannot be garbage-collected.. or can they??!

_ = gc.collect()  # manually prompt the GC to run a full sweep, suppressing its output

AliceFoo is being freed!
BobFoo is being freed!
