# Dataclasses

Dataclasses simplify creation of classes that are primarily used for data represnetation.

In [1]:
from dataclasses import dataclass

In [2]:
@dataclass
class Dog:
    name: str
    age: int

In [3]:
dog = Dog('Reksio', 10)

In [4]:
dog

Dog(name='Reksio', age=10)

Very convenient as we don't have to write `__init__`, `__repr__` etc

However,  it's a very basic tool and I was surprised to find out that it's not validating types of the declared fields.

In [5]:
dog = Dog(name=42, age=10)

In [6]:
dog

Dog(name=42, age=10)

This caused me a couple of problems.

In [7]:
dog.name.capitalize()

AttributeError: 'int' object has no attribute 'capitalize'

I was so sure that the validation is a part of dataclasses that I can't imagine using them without.

There probably is a couple of solutions to this problem, but the most common one seems to rely on `__post_init__` method. 
This method is called after `__init__` method implemented by dataclass runs and can be used for validation.

In [8]:
def __post_init__(self):
    for name, field_type in self.__annotations__.items():
        if not isinstance(slef.__dict__[name], field_type):
            raise TypeError

As I want to type check practically all of my dataclasses I tend to define a superclass with such method.

In [9]:
@dataclass
class TypeValidated:
    """Raise TypeError if instantiated with incorrect type."""
    def __post_init__(self):
        for name, field_type in self.__annotations__.items():
            if not isinstance(self.__dict__[name], field_type):
                current_type = type(self.__dict__[name])
                raise TypeError(f'For field `{name}` expected type `{field_type}`, got `{current_type}`')

In [10]:
@dataclass
class Dog(TypeValidated):
    name: str
    age: int

In [11]:
dog = Dog(name=42, age=10)

TypeError: For field `name` expected type `<class 'str'>`, got `<class 'int'>`

In [12]:
dog = Dog(name='Reksio', age=10)

In [13]:
dog

Dog(name='Reksio', age=10)

That's a very basic and general check for fields, but what if you want something more specfic for each field? For example, checking that the name is capitalized and age is not negative.

[Descriptors](https://docs.python.org/3/howto/descriptor.html) are typically used for more specific validation of class fields.

In [14]:
from abc import ABC, abstractmethod

class Validator(ABC):

    def __set_name__(self, owner, name):
        self.private_name = '_' + name

    def __get__(self, obj, objtype=None):
        return getattr(obj, self.private_name)

    def __set__(self, obj, value):
        self.validate(value)
        setattr(obj, self.private_name, value)

    @abstractmethod
    def validate(self, value):
        pass

In [15]:
class Number(Validator):

    def __init__(self, minvalue=None, maxvalue=None):
        self.minvalue = minvalue
        self.maxvalue = maxvalue

    def validate(self, value):
        if not isinstance(value, (int, float)):
            raise TypeError(f'Expected {value!r} to be an int or float')
        if self.minvalue is not None and value < self.minvalue:
            raise ValueError(
                f'Expected {value!r} to be at least {self.minvalue!r}'
            )
        if self.maxvalue is not None and value > self.maxvalue:
            raise ValueError(
                f'Expected {value!r} to be no more than {self.maxvalue!r}'
            )

In [16]:
class String(Validator):

    def __init__(self, minsize=None, maxsize=None, predicate=None):
        self.minsize = minsize
        self.maxsize = maxsize
        self.predicate = predicate

    def validate(self, value):
        if not isinstance(value, str):
            raise TypeError(f'Expected {value!r} to be an str')
        if self.minsize is not None and len(value) < self.minsize:
            raise ValueError(
                f'Expected {value!r} to be no smaller than {self.minsize!r}'
            )
        if self.maxsize is not None and len(value) > self.maxsize:
            raise ValueError(
                f'Expected {value!r} to be no bigger than {self.maxsize!r}'
            )
        if self.predicate is not None and not self.predicate(value):
            raise ValueError(
                f'Expected {self.predicate} to be true for {value!r}'
            )

In [17]:
@dataclass
class Dog(TypeValidated):
    name: str = String(predicate=str.capitalize)
    age: int = Number(minvalue=0)

In [18]:
dog = Dog(name='Reksio', age=10)

KeyError: 'name'

Unfortunately, dataclasses and descriptors don't compose well out of the box.

There is another way to implement field-specific validation with properties.

In [19]:
@dataclass
class Dog(TypeValidated):
    name: str
    age: int
    
    @property
    def name(self):
        return self._name
    
    @name.setter
    def name(self, value):
        if value != value.capitalize():
            raise ValueError("Name needs to be capitalized!")
        else:
            self._name = value
                             
    @property
    def age(self):
        return self._age
        
    @age.setter
    def age(self, value):
        if value < 0:
            raise ValueError("Age cannot be negative!")
        else:
            self._age = value

In [20]:
dog = Dog(name='reksio', age=10)

ValueError: Name needs to be capitalized!

In [21]:
dog = Dog(name='reksio', age=-1)

ValueError: Name needs to be capitalized!

In [22]:
dog = Dog(name='Reksio', age=10)

KeyError: 'name'

This looks like the same problem as with descriptors. 
It is a bit more approachable here as we are clearly validating underlying attributes that are prefixed with underscore.
The issue is coming form `__post_init__` and we can fix it.

In [23]:
@dataclass
class TypeValidated:
    """Raise TypeError if instantiated with incorrect type."""
    def __post_init__(self):
        for name, field_type in self.__annotations__.items():
            try:
                if not isinstance(self.__dict__[name], field_type):
                    current_type = type(self.__dict__[name])
                    raise TypeError(f'For field `{name}` expected type `{field_type}`, got `{current_type}`')
            except KeyError:
                if not isinstance(self.__dict__["_" + name], field_type):
                    current_type = type(self.__dict__["_" + name])
                    raise TypeError(f'For field `{name}` expected type `{field_type}`, got `{current_type}`')

In [24]:
@dataclass
class Dog(TypeValidated):
    name: str
    age: int
    
    @property
    def name(self):
        return self._name
    
    @name.setter
    def name(self, value):
        if value != value.capitalize():
            raise ValueError("Name needs to be capitalized!")
        else:
            self._name = value
                             
    @property
    def age(self):
        return self._age
        
    @age.setter
    def age(self, value):
        if value < 0:
            raise ValueError("Age cannot be negative!")
        else:
            self._age = value

In [25]:
dog = Dog(name='Reksio', age=10)

In [26]:
dog

Dog(name='Reksio', age=10)