# Recordclass library

**Recordclass** is [MIT Licensed](http://opensource.org/licenses/MIT) python library.
It was started as a "proof of concept" for the problem of fast "mutable"
alternative of `namedtuple` (see [question](https://stackoverflow.com/questions/29290359/existence-of-mutable-named-tuple-in-python) on stackoverflow).
It implements a factory function `recordclass` (a variant of `collection.namedtuple`) in order to create record-like classes with the same API as  `collection.namedtuple`. It was evolved further in order to provide more memory saving, fast and flexible type.

Later **recordclass** library started to provide record-like classes that do not participate in *cyclic garbage collection* (CGC) mechanism, but support only *reference counting* mechanizm for garbage collection.
The instances of such classes have not `PyGC_Head` prefix in the memory, which decrease their size.
This may make sense in cases where it is necessary to limit the size of objects as much as possible, provided that they will never be part of circular references in the application.
For example, when an object represents a record with fields that represent simple values by convention (`int`, `float`, `str`, `date`/`time`/`datetime`, `timedelta`, etc.).
Another examples are non-recursive data structures in which all leaf elements represent simple values.
Of course, in python, nothing prevents you from “shooting yourself in the foot" by creating the reference cycle in the script or application code.
But in some cases, this can still be avoided provided that the developer understands
what he is doing and uses such classes in the code with care. Another option is a use of static analyzers together with type annotations.

**First** `recodeclass` library provide the base class `dataobject`. The type of `dataobject` is special metaclass `datatype`. It control creation of subclasses of `dataobject`, which  doesn't participate in CGC by default. As the result the instance of such class need less memory. It's memory footprint is similar to memory footprint of instances of the classes with `__slots__` . The difference is equal to the size of `PyGC_Head`. It also tunes `basicsize` of the instances, creates descriptors for the fields and etc. All subclasses of `dataobject` created with class statement support `attrs`/`dataclasses`-like API.

**Second** it provide a factory function `make_dataclass` for creation of subclasses of `dataobject` with the specified field names. These subclasses support `attrs`/`dataclasses`-like API.

**Three** it provide the class `lightlist`, which considers as list-like *light* container in order to save memory.

Main repository for `recordclass`is on [bitbucket](https://bitbucket.org/intellimath/recordclass).

Here is also a simple [example](http://nbviewer.ipython.org/urls/bitbucket.org/intellimath/recordclass/raw/master/examples/what_is_recordclass.ipynb).

## Quick start

### Installation

#### Installation from directory with sources

Install:

    >>> python setup.py install

Run tests:

    >>> python test_all.py

#### Installation from PyPI

Install:

    >>> pip install recordclass

Run tests:

    >>> python -c "from recordclass.test import *; test_all()"


### Quick start with recordclass

First load inventory:

    >>> from recordclass import recordclass

Example with `recordclass`:

    >>> Point = recordclass('Point', 'x y')
    >>> p = Point(1,2)
    >>> print(p)
    Point(1, 2)
    >>> print(p.x, p.y)
    1 2             
    >>> p.x, p.y = 10, 20
    >>> print(p)
    Point(10, 20)
    >>> sys.getsizeof(p) # the output below is for 64bit cpython3.9
    40

Example with `RecordClass` and typehints::

    >>> from recordclass import RecordClass

    class Point(RecordClass):
       x: int
       y: int

    >>> print(Point.__annotations__)
    {'x': <class 'int'>, 'y': <class 'int'>}
    >>> p = Point(1, 2)
    >>> print(p)
    Point(1, 2)
    >>> print(p.x, p.y)
    1 2
    >>> p.x, p.y = 10, 20
    >>> print(p)
    Point(10, 20)
    

Now by default `recordclass`-based class instances doesn't participate in CGC and therefore  they are smaller than `namedtuple`-based ones. If one want to use it in scenarios with reference cycles then one have to use option `gc=True` (`gc=False` by default):

    >>> Node = recordclass('Node', 'root, children', gc=True)
    
or decorator `@enable_gc` for `RecordClass`-based classes:

    @recordclass.enable_gc
    class Node(RecordClass):
         root: 'Node'
         chilren: list


### Quick start with dataobject

First load inventory::

    >>> from recordclass import dataobject, asdict

    class Point(dataobject):
        x: int
        y: int

    >>> print(Point.__annotations__)
    {'x': <class 'int'>, 'y': <class 'int'>}

    >>> p = Point(1,2)
    >>> print(p)
    Point(x=1, y=2)

    >>> sys.getsizeof() # the output below is for 64bit python
    32
    >>> p.__sizeof__() == sys.getsizeof(p) # no additional space for CGC support
    True    

    >>> p.x, p.y = 10, 20
    >>> print(p)
    Point(x=10, y=20)
    >>> for x in p: print(x)
    1
    2
    >>> asdict(p)
    {'x':1, 'y':2}
    >>> tuple(p)
    (1, 2)

Another way &ndash; factory function `make_dataclass`:

    >>> from recordclass import make_dataclass

    >>> Point = make_dataclass("Point", [("x",int), ("y",int)])

Default values are also supported::

    class CPoint(dataobject):
        x: int
        y: int
        color: str = 'white'

or

    >>> Point = make_dataclass("Point", [("x",int), ("y",int), ("color",str)], defaults=("white",))

    >>> p = CPoint(1,2)
    >>> print(p)
    Point(x=1, y=2, color='white')


## Memory footprint

The following table explain memory footprints of `recordclass`-base and `dataobject`-base objects:

| namedtuple    |  class with \_\_slots\_\_  |  recordclass   | dataobject |
| ------------- | ----------------- | -------------- | ------------- |
|   $g+b+s+n*p$     |     $g+b+n*p$         |  $b+s+n*p$       |     $b+n*p$     |

where:

 * b = sizeof(`PyObject`)
 * s = sizeof(`Py_ssize_t`)
 * n = number of items
 * p = sizeof(`PyObject*`)
 * g = sizeof(PyGC_Head)

This is useful in that case when you absolutely sure that reference cycle isn't supposed.
For example, when all field values are instances of atomic types.
As a result the size of the instance is decreased by 24-32 bytes (for cpython 3.4-3.7) and by 16 bytes for cpython 3.8::

    class S:
        __slots__ = ('a','b','c')
        def __init__(self, a, b, c):
            self.a = a
            self.b = b
            self.c = c

    R_gc = recordclass('R_gc', 'a b c', gc=True)
    R_nogc = recordclass('R_nogc', 'a b c')
    DO = make_dataclass('R_do', 'a b c')

    s = S(1,2,3)
    r_gc = R_gc(1,2,3)
    r_nogc = R_nogc(1,2,3)
    do = DO(1,2,3)
    for o in (s, r_gc, r_nogc, do):
        print(sys.getsizeof(o), end=' ')
    print
    56 64 48 32

Here are also table with some performance counters:

|         | namedtuple    |  class with \_\_slots\_\_  |  recordclass   | dataobject  |
| ------- | ------------- | ----------------- | -------------- | ------------- |
|   `new`   |    320±6 ns  |     411±8 ns    |   406±8 ns   |    113±1 ns  |
| `getattr` |   35.6±0.7 ns |    20.8±0.4 ns   |   26.8±1.8 ns |   27.7±2.3 ns |
| `setattr` |               |     24.2±0.3 ns  |   30.9±1.1 ns |   31.5±1.8 ns |
