# Principle #1: Separate code from data

“Separate code from data in a way that the code resides in functions whose behavior does not depend on data that is encapsulated in the function’s context.” — Yehonathan Sharvit

In [1]:
from dataclasses import dataclass

In [2]:
@dataclass
class AuthorData:
    """Class for keeping track of an uthor in the system"""

    first_name: str
    last_name: str
    n_books: int

In [3]:
def calculate_name(first_name: str, last_name: str):
    return f"{first_name} {last_name}"

In [4]:
def is_prolific(n_books: int):
    return n_books > 100

In [5]:
author_data = AuthorData("Isaac", "Asimov", 500)

In [6]:
calculate_name(author_data.first_name, author_data.last_name)

'Isaac Asimov'

> **Benefit #1:** “Code can be reused in different contexts” — Yehonathan Sharvit

As seen from the example above, `calculate_name()` can be used not only for authors but also for users, librarians, or anyone that has a first name and a last name field. The code that deals with full name calculation is separate from the code that deals with the creation of author data.

In [7]:
@dataclass
class UserData:
    """Class for keeping track of a user in the system"""

    first_name: str
    last_name: str
    email: str

In [8]:
user_data = UserData("John", "Doe", "john.doe@gmail.com")

In [9]:
calculate_name(user_data.first_name, user_data.last_name)

'John Doe'

> **Benefit #2:** “Code can be tested in isolation” — Yehonathan Sharvit

Below is an example that doesn’t adhere to Principle #1.

In [10]:
class Address:
    def __init__(
        self, street_num: int, street_name: str, city: str, state: str, zip_code: int
    ):
        self.street_num = street_num
        self.street_name = street_name
        self.city = city
        self.state = state
        self.zip_code = zip_code

In [11]:
class Author:
    def __init__(self, first_name: str, last_name: str, n_books: int, address: Address):
        self.first_name = first_name
        self.last_name = last_name
        self.n_books = n_books
        self.address = address

    @property
    def full_name(self):
        return f"{self.first_name} {self.last_name}"

    @property
    def is_prolific(self):
        return self.n_books > 100

In [12]:
address = Address(651, "Essex Street", "Brooklyn", "NY", 11208)

In [13]:
author = Author("Issac", "Asimov", 500, address)

In [14]:
assert author.full_name == "Issac Asimov"

In order to test the `full_name()` property that lives inside the `Author` class, we need to instantiate the `Author` object, which requires us to have values for all attributes, including those unrelated to the behavior we are testing (such as `n_books` and `address` custom class). This is an unnecessarily complex and tedious setup just to test a single method.

On the other hand, in the DOP version, to test `calculate_name()` code, we can create data to be passed into the function in isolation.

In [15]:
assert calculate_name("Issac", "Asimov") == "Issac Asimov"

# Principle #2: Represent data with generic data structures

“In DOP, data is represented with generic data structures, such as maps (or dictionaries) and arrays (or lists).” — Yehonathan Sharvit

In Python, our built-in options for generic data structures are `dict`, `list`, and `tuple`.

In this article, I use Python’s `dataclass`, which can be thought of as a “mutable named tuple with defaults.” Note that this was not what Sharvit meant by “generic data structure.” Python’s `dataclass` is a hybrid that is closer to OOP than DOP. However, compared with dictionaries and tuples, this alternative is less susceptible to typos, more descriptive with type hinting, helps represent nested complex structure in a clearer and more concise way, and more. Also, it can easily be turned into a dictionary or a tuple if we want to.

In [16]:
from dataclasses import asdict, dataclass

In [17]:
@dataclass
class AuthorData:
    """Class for keeping track of an author in the system"""

    first_name: str
    last_name: str
    n_books: int

In [18]:
author_data = AuthorData("Isaac", "Asimov", 500)

In [19]:
author_data

AuthorData(first_name='Isaac', last_name='Asimov', n_books=500)

In [20]:
asdict(author_data)

{'first_name': 'Isaac', 'last_name': 'Asimov', 'n_books': 500}

> **Benefit #1:** “The ability to use generic functions that are not limited to our specific use case” — Yehonathan Sharvit

Given generic structures, we can manipulate data using a rich set of built-in Python functions available on `dict`, `list`, `tuple`, etc.

Below are a few examples of generic functions that can be used to manipulate data stored in a `dict`.

In [21]:
author = {"first_name": "Issac", "last_name": "Asimov", "n_books": 500}

In [22]:
# Access dict values
author.get("first_name")

'Issac'

In [23]:
# Add new field to dict
author["alive"] = False

In [24]:
author

{'first_name': 'Issac', 'last_name': 'Asimov', 'n_books': 500, 'alive': False}

In [25]:
# Update existing field
author["n_books"] = 703

In [26]:
author

{'first_name': 'Issac', 'last_name': 'Asimov', 'n_books': 703, 'alive': False}

This means we don’t have to learn and remember the custom methods of everyone’s classes. Also, the generic functions can’t break if we change some library versions. They only break if the Python language changes them (which almost never happens).

> **Benefit #2:** “Flexible data model” — Yehonathan Sharvit

“When using generic data structures, data can be created with no predefined shape, and its shape can be modified at will.” — Yehonathan Sharvit

In the example below, not all the dictionaries in the list have the same keys. The extra keys can exist in the second dictionary as long as the required fields are present.

In [27]:
names = []

In [28]:
names.append({"first_name": "Isaac", "last_name": "Asimov"})

In [29]:
names

[{'first_name': 'Isaac', 'last_name': 'Asimov'}]

In [30]:
names.append({"first_name": "Jane", "last_name": "Doe", "suffix": "III", "age": 70})

In [31]:
names

[{'first_name': 'Isaac', 'last_name': 'Asimov'},
 {'first_name': 'Jane', 'last_name': 'Doe', 'suffix': 'III', 'age': 70}]

# Principle # 3: Data is immutable

“According to DOP, data should never change! Instead of mutating data, a new version of it is created.” — Yehonathan Sharvit

To adhere to this principle, we make our `dataclass` frozen (i.e. immutable).

In [32]:
@dataclass(frozen=True)
class AuthorData:
    """Class for keeping track of an author in the system"""

    first_name: str
    last_name: str
    n_books: int

The immutable data types in built-in Python are `int`, `float`, `decimal`, `bool`, `string`, `tuple` and `range`. Note that `dict`, `list` and `set` are mutable.

> **Benefit #1:** “Data access to all with confidence” — Yehonathan Sharvit

“When data is mutable, we must be careful when passing data as an argument to a function since it can be mutated or cloned.” — Yehonathan Sharvit

In the example below, we originally pass an empty list as a default argument to the function. Since list is a mutable object, every time we call the function, the list gets mutated and a different default value gets used in the successive call.

In [33]:
def append_to_list(el, ls=[]):
    ls.append(el)
    return ls

In [34]:
append_to_list(1)

[1]

In [35]:
append_to_list(2)

[1, 2]

In [36]:
append_to_list(3)

[1, 2, 3]

To fix the use case above, we can do:

In [37]:
def append_to_list(el, ls=None):
    if ls is None:
        ls = []
    ls.append(el)
    return ls

In [38]:
append_to_list(1)

[1]

In [39]:
append_to_list(2)

[2]

This code works as expected because `None` is immutable.

“When data is immutable, it can be passed to any function with confidence because data never changes.” — Yehonathan Sharvit

> **Benefit #2:** “Predictable code behavior” — Yehonathan Sharvit

Here is an example of an unpredictable piece of code:

In [40]:
from datetime import date

In [41]:
dummy = {"age": 30}

In [42]:
if date.today().day % 2 == 0:
    dummy["age"] = 40

The value of `age` in `dummy` dictionary is not predictable. It depends on whether you run the code on an even or odd day.

However, with immutable data, it is guaranteed that data never changes.

In [43]:
author_data = AuthorData("Issac", "Asimov", 500)

In [44]:
if date.today().day % 2 == 1:
    author_data.n_books = 100

FrozenInstanceError: cannot assign to field 'n_books'

In [45]:
author_data.n_books

500

The piece of code above would error out, saying `dataclasses.FrozenInstanceError: cannot assign to field "n_books"`. With frozen data class, no matter it’s an even or odd day, `author_data.n_books` is always 500.

> **Benefit #3:** “Fast equality checks” — Yehonathan Sharvit

Python has two similar operators for checking whether two objects are equal: `is` and `==` . `is` checks for identity (of objects) by comparing the integer equality of the memory address. `==` checks for equality (of values) by examining the actual content stored.

In [46]:
# String is immutable
x = "abc"

In [47]:
# Note that the identity of `x` and `abc` is the same
print(id(x))

2807756546288


In [48]:
print(id("abc"))
# 140676188882480

2807756546288


In [49]:
print(x == "abc")

True


In [50]:
print(x is "abc")

True


  print(x is "abc")


In [51]:
# List is mutable
y = [1, 2, 3]

In [52]:
# Note that the identity of `y` and `[1, 2, 3]` is different
print(id(y))

2807857420800


In [53]:
print(id([1, 2, 3]))

2807857423872


In [54]:
print(y == [1, 2, 3])

True


In [55]:
print(y is [1, 2, 3])

False


As seen above, `is` and `==` behaves the same way for `x` which is a string (i.e. immutable data type) but behaves differently for `y` which is a list (i.e. mutable data type). With immutable data objects, `is` behaves more predictably. Also, `is` is generally faster than `==` because comparing object addresses is faster than comparing all the fields. Immutable data thus enables fast equality checks by comparing data by reference.

# Principle #4: Separate data schema from data representation

“In DOP, the expected shape of data is represented as (meta) data that is kept separately from the main data representation.” — Yehonathan Sharvit

Given below is a basic JSON schema (essentially a dictionary) that describes the format of data which is also represented as a dictionary. The schema defines which fields are required and the data types of the fields, whereas the data is represented by a generic data structure per Principle #3.

In [56]:
schema = {
    "required": ["first_name", "last_name"],
    "properties": {
        "first_name": {"type": str},
        "last_name": {"type": str},
        "books": {"type": int},
    },
}

In [57]:
data = {
    "valid": {"first_name": "Isaac", "last_name": "Asimov", "books": 500},
    "invalid1": {
        "fist_name": "Isaac",
        "last_name": "Asimov",
    },
    "invalid2": {"first_name": "Isaac", "last_name": "Asimov", "books": "five hundred"},
}

Data validation functions (or libraries) can be used to check whether a piece of data conforms to a data schema.

In [58]:
def validate(data):
    assert set(schema["required"]).issubset(
        set(data.keys())
    ), f"Data must have following fields: {schema['required']}"

    for k in data:
        if k in schema["properties"].keys():
            assert (
                type(data[k]) == schema["properties"][k]["type"]
            ), f"Field {k} must be of type {str(schema['properties'][k]['type'])}"

The `validate` function passes through when data is valid or returns errors with details in a human readable format when data is invalid.

In [59]:
validate(data["valid"])

In [60]:
validate(data["invalid1"])

AssertionError: Data must have following fields: ['first_name', 'last_name']

In [61]:
validate(data["invalid2"])

AssertionError: Field books must be of type <class 'int'>

Compared with the minimal schema defined above, the following schema can be expanded to include more properties for each field.

In [62]:
schema = {
    "required": ["first_name", "last_name"],
    "properties": {
        "first_name": {
            "type": str,
            "max_length": 100,
        },
        "last_name": {
            "type": str,
            "max_length": 100
        },
        "books": {
            "type": int,
            "min": 0,
            "max": 10000,
        },
    }
}

While not all the advantages and disadvantages of DOP principles mentioned by Sharvit directly apply to Python, the fundamental principles remain robust. This approach promotes code that is easier to reason about, test and maintain. By embracing the principles and techniques of DOP, Python programmers can create more maintainable and scalable code, and unlock the full potential of their data.