# Reducing Memory Footprint for Python Objects

### Zaur Shibzukhov

In [1]:
import sys
from pprint import pprint
from collections import namedtuple
from recordclass import recordclass, mutabletuple

## What is the problem

* Python
* Limited memory
* Large number of running objects

> How to reduce memory footprint of the objects?

## Dictionary

In [2]:
ob = {'x':1, 'y':2, 'z':3}
a = ob['x']

* Universal
* Powerful
* Intuitive

Since 3.6: `Compact Dict` (inspired by `PyPy`)

## Dictionary size

In [3]:
print('sizeof:', sys.getsizeof(ob))

sizeof: 232



### Large memory footprint

* 1 000 000 instances &rarr; 240 Mb
* 10 000 000 instances &rarr; 2.4 Gb

## Regular Class

In [4]:
class Point:
    #
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
        
ob = Point(1,2,3)
a = ob.x

## Instance structure

`-------------------------` <br/>
`PyGC_Head        24 bytes` <br/>
`-------------------------` <br/>
`PyObject_HEAD    16 bytes` <br/>
`__weakref__      8  bytes` <br/>
`__dict__         8  bytes` <br/>
`-------------------------` <br/>
**`TOTAL:           56 bytes`**


## Regular Class: Memory Footprint

In [5]:
print('sizeof:', sys.getsizeof(ob), sys.getsizeof(ob.__dict__))

sizeof: 48 104


> Instance dict has a better memory footprint than regular dict. <br/>
PEP 412: Key-Sharing Dictionary, Python 3.3+

### Still large memory footprint

* 1 000 000 instances &rarr; 168 Mb
* 10 000 000 instances &rarr; 1.68 Gb

## Regular Class + `__slots__`

In [6]:
class Point:
    __slots__ = 'x', 'y', 'z'
        
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

In [7]:
ob = Point(1,2,3)
print('sizeof:', sys.getsizeof(ob))
print('has __dict__?', hasattr(ob, '__dict__'))
print('has __weakref__?', hasattr(ob, '__weakref__'))

sizeof: 56
has __dict__? False
has __weakref__? False


## Instance structure

`-------------------------` <br/>
`PyGC_Head        24 bytes` <br/>
`-------------------------` <br/>
`PyObject_HEAD    16 bytes` <br/>
`x                8  bytes` <br/>
`y                8  bytes` <br/>
`z                8  bytes` <br/>
`-------------------------` <br/>
**`TOTAL:           64 bytes`**


## Behind the scene

In [8]:
pprint(Point.__dict__)

mappingproxy({'__doc__': None,
              '__init__': <function Point.__init__ at 0x7f4c40293160>,
              '__module__': '__main__',
              '__slots__': ('x', 'y', 'z'),
              'x': <member 'x' of 'Point' objects>,
              'y': <member 'y' of 'Point' objects>,
              'z': <member 'z' of 'Point' objects>})


`x`, `y`, `z` in the `Point.__dict__` are special descriptors

## Tuples

> Tuples are like records, but without field names

In [9]:
ob = (1,2,3)
x = ob[0]

> They are compact

In [10]:
print("sizeof:", sys.getsizeof(ob))

sizeof: 64


## Named tuple

> A subclass of tuple with descriptors for access to each item by its name

In [11]:
Point = namedtuple("Point", "x y z")
ob = Point(1,2,3)
x = ob.x
y = ob[1]

> Named tuple &mdash; **immutable** record-like object

## Instance structure

`-------------------------` <br/>
`PyGC_Head        24 bytes` <br/>
`-------------------------` <br/>
`PyObject_HEAD    16 bytes` <br/>
`ob_size          8  bytes` <br/>
`x                8  bytes` <br/>
`y                8  bytes` <br/>
`z                8  bytes` <br/>
`-------------------------` <br/>
**`TOTAL:           72 bytes`**


## Mutable tuple

* It's natural to implement mutable named tuple on the base of ***mutable tuple***.
* Python hasn't builtin mutable tuple type.
* ***Recordclass*** library introduce `memoryslots` type.
  * `mutabletuple` and `tuple` have identical memory structure and size


## Instance structure

`-------------------------` <br/>
`PyGC_Head        24 bytes` <br/>
`-------------------------` <br/>
`PyObject_HEAD    16 bytes` <br/>
`ob_size          8  bytes` <br/>
`[0]              8  bytes` <br/>
`[1]              8  bytes` <br/>
`[2]              8  bytes` <br/>
`-------------------------` <br/>
**`TOTAL:           72 bytes`**


In [12]:
mt = mutabletuple(1,2,3)
print(mt[0], mt[1:])
print('Sizeof>', 'mutabletuple:', sys.getsizeof(mt), 'tuple:', sys.getsizeof((1,2,3)))

1 mutabletuple(2, 3)
Sizeof> mutabletuple: 48 tuple: 64


In [13]:
mt[0] = 100
mt[-1] = 200
mt

mutabletuple(100, 2, 200)

## Recordclass factory function

* Recordclass factory function generate subclass of the `mutabletuple` with descriptors for accessing fields.
* It has almost identical API with named tuple.
* By default, a `recordclass`-based instance takes up less memory than a `namedtuple`-based instance or a class instance with `__slots__`.
  
  * By default it, only **reference counting** mechanism is supported. **Cyclic garbage collection** isn't supported, but may be enabled. 

In [14]:
Point = recordclass("Point", "x y z")
ob = Point(1,2,3)
print(ob)
print("size:", sys.getsizeof(ob))

Point(x=1, y=2, z=3)
size: 48


> Memory footprint is decreased in size of `PyGC_Head`.

## Instance structure

`-------------------------` <br/>
`PyObject_HEAD    16 bytes` <br/>
`ob_size          8  bytes` <br/>
`x                8  bytes` <br/>
`y                8  bytes` <br/>
`z                8  bytes` <br/>
`-------------------------` <br/>
**`TOTAL:           48 bytes`**
