# Memory Management in Python

* [Memory Management: From Hardware to Software](#ware)
* [The Default Python Implementation](#def)
* [Garbage Collection](#trash)
* [CPython's Memory Management Strategy](#cpy)
    * [Pools](#pool)
    * [Blocks](#block)
    * [Arenas](#arena)

## Memory Management: From Hardware to Software <a class="anchor" id="ware"></a>

* Process by which applications read and write data
* __Memory Manager__ - determines where to put an application's data in a process called __allocation__
    * When that data is no longer needed it can be __freed_
* Python goes through many layers of abstractoin before any saved objects actually get to the hardware that stores your data
    * Python handles memory management with the python applicaiton that runs in the background of your programs

## The Default Python Implementation <a class="anchor" id="def"></a>

* Python is implemented in CPython (which is written in C) by default
    * Python gets compiled down to bytecode and interpreted by CPython, a VM
    * There are other implementations, such as Jython, Iron Python, PyPy
* C does not natively support object-oriented programming, but on an implementation level, everything in Python _is_ an object
    * All of Python's datatypes use a `struct` from C called `PyObject`
        * `struct` (short for structure) is a custom data type that groups together additional data types
    * `PyObject` contains only two things:
        1. `ob_refcnt`: reference count
        2. `ob_type`: pointer to another type
    * Reference count is used for garbage collection
    * Object type is just a pointer to another `struct` that describes a python object (like dict or int)
* Each object has its own object-specific memory allocator and deallocator that 'knows' how to store and free the object
    * As a shared resource, memory is protected through the GIL

## Garbage Collection <a class="anchor" id="trash"></a>

* Reference count gets increased for many reasons, some commons ones:
    1. Assign it to another variable
    2. Pass the object as an argument
    3. Include the object in a list
* When an object's reference count drops to 0, the deallocation function ('garbage collector') frees the memory so that other objects can use it

## CPython's Memory Management  Strategy<a class="anchor" id="cpy"></a>

* The OS abstracts the physical memory and creates a virtual  memory layer that applications can access
    * Carves out a chunk of memory for a process, in this case, Python, which know owns the dark gray boxes of memory below
    <br>
<img src="resources/malloc.PNG">
* Python then uses a portion of the memory for internal use and non-object memory and another portion for object storage
* CPython has an object allocator that is responsible for allocating memory wihtin the _object memory area_
    * "Fast, special purpose memory allocator for small blocks, to be used on top of a general purpose malloc"
    * This gets called every time a new object needs space allocated or deleted
    * Most objects don't involve too much data at one time, so the allocator is designed to work best with relatively small amounts of data at a time. 
    * Tries not to allocate memory until it's absolutely required
* CPython organizes the memory space in 3 main "containers"
    1. Arenas - The largest chunks of memory (~256 KB). They are aligned on a memory page boundary (ie: the edge of a fixed-length contiguous chunk of memory that the OS uses)
    2. Pools - 2nd level of organizaiton. All blocks contained in a pool are of the same size class
        * Size class defines a speific block size, given some amount of requested data
    3. Blocks - Smallest piece. Its size is determined by the size of the data request
    
<img src="resources/cpy_book.PNG">

### Pools <a class="anchor" id="pool"></a>

* Composed of blocks from a single size class
* Each pool maintains a ___double-linked list___ to other pools of the same size class, for easy location of available space even across pools
* Pool status is tracked via 2 lists:
    * `usedpools` list tracks all the pools that have some space available for data for each size class
        * When a given block size is requested, the algorithm checks this list for available space
    * `freepools` list tracks all the empty pools
        * Will try to use any available blocks in a used pool before starting a new one
* Pools are always in one of 3 states:
    1. Used - contains available blocks for data to be stored
    2. Full 
    3. Empty - Can be assigned any size class for blocks when needed
* Pools contain a pointer to their "free blocks"
<img src="resources/blocks.PNG">

### Blocks <a class="anchor" id="block"></a>

* Blocks can be in 1 of 3 states
    1. Untouched - a portion of memory that has not been allocated
    2. Free - a portion of memory that was allocated by later made "free" by CPython and no longer contains relevant data. It is not yet released to the OS
    3. Allocated - a portion of memory that actually contains relevant data
* `freeblock` pointer points to a single linked list of free blocks of memory (ie: available places to put data)
    * If more memory needs to be allocated, the allocator will get some untouched blocks, but always starts with previously allocated blocks
    * As blocks are made free, they gt added to the front of the `freeblock` list.
    * It's not always actually contiguous, could look more like this than the first image above

<img src="resources/blocks2.PNG">    

### Arenas <a class="anchor" id="arena"></a>

* Instead of explicit states like pools and blocks, Arenas are organized into a double linked list called `usable_arenas`
    * Sorted ascending by the number of free pools available
* Arenas are the only things that can truly be freed to the OS
<img src="resources/arenas.PNG">