<img src="../../images/banners/python-advanced.png" width="600"/>

# <img src="../../images/logos/python.png" width="23"/> Records, Structs, and Data Transfer Objects in Python Overview

## <img src="../../images/logos/toc.png" width="20"/> Table of Contents 
* [Transfer Objects](#transfer_objects)
    * [`dict` Built-in](#dict_built_in)
    * [`tuple` Built-in](#tuple_built_in)
    * [Writing a Custom Class](#writing_a_custom_class)
    * [`collections.namedtuple` Class](#namedtuple)
    * [`types.SimpleNamespace` Class](#simple_namespace)
* [Which type should I use for data objects in Python?
](#which_type_to_use)

---

A record is a collection of fields, possibly of different data types, typically in a fixed number and sequence. The fields of a record may also be called members, particularly in object-oriented programming; fields may also be called elements, though this risks confusion with the elements of a collection.

For example, a date could be stored as a record containing a numeric year field, a month field represented as a string, and a numeric day-of-month field. A personnel record might contain a name, a salary, and a rank. A Circle record might contain a center and a radius—in this instance, the center itself might be represented as a point record containing x and y coordinates.

Records are distinguished from arrays by the fact that their number of fields is typically fixed, each field has a name, and that each field may have a different type.

Python provides several data types you can use to implement records, structs, and data transfer objects. In this article you’ll get a quick look at each implementation and its unique characteristics. At the end you’ll find a summary and a decision making guide that will help you make your own pick.

<a class="anchor" id="transfer_objects"></a>
## Transfer Objects

<a class="anchor" id="dict_built_in"></a>
### `dict` Built-in

Python dictionaries store an arbitrary number of objects, each identified by a unique key. Dictionaries are often also called “maps” or “associative arrays” and allow the efficient lookup, insertion, and deletion of any object associated with a given key.

Using dictionaries as a record data type or data object in Python is possible. Dictionaries are easy to create in Python as they have their own syntactic sugar built into the language in the form of dictionary literals. The dictionary syntax is concise and quite convenient to type.

Data objects created using dictionaries are mutable and there’s little protection against misspelled field names, as fields can be added and removed freely at any time. Both of these properties can introduce surprising bugs and there’s always a trade-off to be made between convenience and error resilience.

In [1]:
car1 = {
    'color': 'red',
    'mileage': 3812.4,
    'automatic': True,
}
car2 = {
    'color': 'blue',
    'mileage': 40231.0,
    'automatic': False,
}

In [2]:
# Dicts have a nice repr:
car2

{'color': 'blue', 'mileage': 40231.0, 'automatic': False}

In [3]:
# Get mileage:
car2['mileage']

40231.0

In [4]:
# Dicts are mutable:
car2['mileage'] = 12
car2['windshield'] = 'broken'
car2

{'color': 'blue', 'mileage': 12, 'automatic': False, 'windshield': 'broken'}

In [5]:
# No protection against wrong field names,
# or missing/extra fields:
car3 = {
    'colr': 'green',
    'automatic': False,
    'windshield': 'broken',
}

<a class="anchor" id="tuple_built_in"></a>
### `tuple` Built-in

Python’s tuples are a simple data structure for grouping arbitrary objects. Tuples are immutable—they cannot be modified once they’ve been created.

Performance import diswise, tuples take up slightly less memory than lists in CPython and they’re faster to construct at instantiation time. As you can see in the bytecode disassembly below, constructing a tuple constant takes a single LOAD_CONST opcode while constructing a list object with the same contents requires several more operations:

In [6]:
import dis

In [7]:
dis.dis(compile("(23, 'a', 'b', 'c')", '', 'eval'))

  1           0 LOAD_CONST               0 ((23, 'a', 'b', 'c'))
              2 RETURN_VALUE


In [8]:
dis.dis(compile("[23, 'a', 'b', 'c']", '', 'eval'))

  1           0 LOAD_CONST               0 (23)
              2 LOAD_CONST               1 ('a')
              4 LOAD_CONST               2 ('b')
              6 LOAD_CONST               3 ('c')
              8 BUILD_LIST               4
             10 RETURN_VALUE


However you shouldn’t place too much emphasis on these differences. In practice the performance difference will often be negligible and trying to squeeze out extra performance out of a program by switching from lists to tuples will likely be the wrong approach.

A potential downside of plain tuples is that the data you store in them can only be pulled out by accessing it through integer indexes. You can’t give names to individual properties stored in a tuple. This can impact code readability.

Also, a tuple is always an ad-hoc structure. It’s difficult to ensure that two tuples have the same number of fields and the same properties stored on them.

This makes it easy to introduce “slip-of-the-mind” bugs by mixing up the field order, for example. Therefore I would recommend you keep the number of fields stored in a tuple as low as possible.

In [9]:
# Fields: color, mileage, automatic
car1 = ('red', 3812.4, True)
car2 = ('blue', 40231.0, False)

In [10]:
# Tuple instances have a nice repr:
car1

('red', 3812.4, True)

In [11]:
car2

('blue', 40231.0, False)

In [12]:
# Get mileage:
car2[1]

40231.0

In [13]:
# Tuples are immutable:
car2[1] = 12

TypeError: 'tuple' object does not support item assignment

In [None]:
# No protection against missing/extra fields
# or a wrong order:
car3 = (3431.5, 'green', True, 'silver')

<a class="anchor" id="writing_a_custom_class"></a>
### Writing a Custom Class

Classes allow you to define reusable “blueprints” for data objects to ensure each object provides the same set of fields.

Using regular Python classes as record data types is feasible, but it also takes manual work to get the convenience features of other implementations. For example, adding new fields to the `__init__` constructor is verbose and takes time.

Also, the default string representation for objects instantiated from custom classes is not very helpful. To fix that you may have to add your own `__repr__` method, which again is usually quite verbose and must be updated every time you add a new field.

Fields stored on classes are mutable and new fields can be added freely, which may or may not be what you intend. It’s possible to provide more access control and to create read-only fields using the @property decorator, but this requires writing more glue code.

Writing a custom class is a great option whenever you’d like to add business logic and behavior to your record objects using methods. But this means these objects are technically no longer plain data objects.

In [1]:
class Car:
    def __init__(self, color, mileage, automatic):
        self.color = color
        self.mileage = mileage
        self.automatic = automatic

In [2]:
car1 = Car('red', 3812.4, True)
car2 = Car('blue', 40231.0, False)

In [3]:
# Get the mileage:
car2.mileage

40231.0

In [4]:
# Classes are mutable:
car2.mileage = 12
car2.windshield = 'broken'

In [5]:
# String representation is not very useful
# (must add a manually written __repr__ method):
car1

<__main__.Car at 0x7fbfa814d250>

<a class="anchor" id="namedtuple"></a>
### `collections.namedtuple` Class

The namedtuple class available in Python 2.6+ provides an extension of the built-in tuple data type. Similarly to defining a custom class, using namedtuple allows you to define reusable “blueprints” for your records that ensure the correct field names are used.

Namedtuples are immutable just like regular tuples. This means you cannot add new fields or modify existing fields after the namedtuple instance was created.

Besides that, namedtuples are, well…named tuples. Each object stored in them can be accessed through a unique identifier. This frees you from having to remember integer indexes, or resorting to workarounds like defining integer constants as mnemonics for your indexes.

Namedtuple objects are implemented as regular Python classes internally. When it comes to memory usage they are also “better” than regular classes and just as memory efficient as regular tuples:

In [None]:
from collections import namedtuple
from sys import getsizeof

In [None]:
p1 = namedtuple('Point', 'x y z')(1, 2, 3)
p2 = (1, 2, 3)

In [None]:
getsizeof(p1)

In [None]:
getsizeof(p2)

Namedtuples can be an easy way to clean up your code and to make it more readable by enforcing a better structure for your data.

I find that going from ad-hoc data types like dictionaries with a fixed format to namedtuples helps me express the intent of my code more clearly. Often when I apply this refactoring I magically come up with a better solution for the problem I’m facing.

Using namedtuples over unstructured tuples and dicts can also make my coworkers’ lives easier because namedtuples make the data passed around “self-documenting”, at least to a degree.

In [None]:
from collections import namedtuple

In [None]:
Car = namedtuple('Car' , 'color mileage automatic')

In [None]:
car1 = Car('red', 3812.4, True)

In [None]:
# Instances have a nice repr:
car1

In [None]:
# Accessing fields
car1.mileage

In [None]:
# Fields are immtuable:
car1.mileage = 12

In [None]:
car1.windshield = 'broken'

<a class="anchor" id="simple_namespace"></a>
### `types.SimpleNamespace` Class

Here’s one more “esoteric” choice for implementing data objects in Python. This class was added in Python 3.3 and it provides attribute access to its namespace. It also includes a meaningful `__repr__` by default.

As its name proclaims, SimpleNamespace is simple—it’s basically a glorified dictionary that allows attribute access and prints nicely. Attributes can be added, modified, and deleted freely.

In [None]:
from types import SimpleNamespace
car1 = SimpleNamespace(color='red', mileage=3812.4, automatic=True)

In [None]:
# The default repr:
car1

In [None]:
# Instances are mutable
car1.mileage = 12
car1.windshield = 'broken'
del car1.automatic
car1

<a class="anchor" id="which_type_to_use"></a>
## Which type should I use for data objects in Python?

As you’ve seen there’s quite a number of different options to implement records or data objects in Python. Generally your decision will depend on your use case:

> **Note:** There is also `typing.NamedTuple` and `struct.Struct` that are not covered here. 

- **You only have a few (2-3) fields:** Using a plain tuple object may be okay because the field order is easy to remember or field names are superfluous. For example, think of an `(x, y, z)` point in 3D space.

- **You need immutable fields:** In this case plain tuples, `collections.namedtuple`, `typing.NamedTuple` would all make good options for implementing this type of data object.

- **You need to lock down field names to avoid typos:** `collections.namedtuple` and `typing.NamedTuple` are your friends.

- **You want to keep things simple:** A plain dictionary object might be a good choice due to the convenient syntax that closely resembles JSON.

- **You need full control over your data structure:** It’s time to write a custom class with `@property` setters and getters.

- **You need to add behavior (methods) to the object:** You should write a custom class. Either from scratch or by extending `collections.namedtuple` or `typing.NamedTuple`.