# Dataclasses

What is a DATACLASS?
    - A dataclass is simply a CODE GENERATOR that allows us to define custom classes using 
    different syntax, and allows us to generate a 'BOILERPLATE' code.

    - Examples of code generators: collections.namedtuple; typing.NamedTuple

    - Dataclasses are used for datastructures.

In [30]:
class Circle:
    def __init__(self, x: int = 0, y: int = 0, radius: int = 1):
        self.x = x
        self.y = y
        self.radius = radius

In [31]:
c = Circle()
c

<__main__.Circle at 0x1139e0130>

In [32]:
class Circle:
    def __init__(self, x: int = 0, y: int = 0, radius: int = 1):
        self.x = x
        self.y = y
        self.radius = radius

    def __repr__(self):
        # use __qualname__ instead of __name__
        ## __qualname__ return inheritance chain and __name__ only return the class name.
        return f"{self.__class__.__qualname__}(x={self.x}, y={self.y}, radius={self.radius})"

In [33]:
c = Circle()
c

Circle(x=0, y=0, radius=1)

## Normal class using class variable - dataclass

- Now the example is using class variables and not instance variables as the example before.

In [35]:
from dataclasses import dataclass

In [35]:
@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

In [36]:
cd = CircleDataClass()
cd

CircleDataClass(x=0, y=0, radius=1)

- Dataclass decorator creates all the necessary dunder methods to represent our class, like `__repr__`, `__dict__`...

In [37]:
cd.x = 122
cd.__dict__

{'x': 122, 'y': 0, 'radius': 1}

In [38]:
cd2 = CircleDataClass()
cd2.y = -356
cd2.__dict__

{'x': 0, 'y': -356, 'radius': 1}

In [37]:
@dataclass(slots=True)
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

In [41]:
cd2 = CircleDataClass()
cd2.y = -356
cd2.__slots__

('x', 'y', 'radius')

## Equality comparisons

In [39]:
c1 = Circle(1, 1, 1)
c2 = Circle(1, 1, 1)
c1 == c2

False

In [40]:
c1 is c2

False

- For normal classes, to have the comparison function, it is necessary to implement it as the `__eq__` method.
- Dataclasses implement the `__eq__` method by default

In [41]:
cd1 = CircleDataClass()
cd2 = CircleDataClass()
cd1 == cd2

True

In [42]:
cd1 is cd2

False

## Implementing the `__eq__` method.

- implementing the `__eq__` method by hand.

In [43]:
class Circle:
    def __init__(self, x: int = 0, y: int = 0, radius: int = 1):
        self.x = x
        self.y = y
        self.radius = radius

    def __repr__(self):
        # use __qualname__ instead of __name__
        ## __qualname__ return inheritance chain and __name__ only return the class name.
        return f"{self.__class__.__qualname__}(x={self.x}, y={self.y}, radius={self.radius})"

    def __eq__(self, other):
        # to have the comparison functionality it necessary to implement the __eq__ method.
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) == (other.x, other.y, other.radius)
        # best practice:
        return NotImplemented

In [44]:
c1 = Circle(1, 1, 1)
c2 = Circle(1, 1, 1)
c1 is c2

False

In [45]:
c1 == c2

True

## Hash

- By default dataclasses are not hashable

In [46]:
hash(cd1)

TypeError: unhashable type: 'CircleDataClass'

In [47]:
class Circle:
    def __init__(self, x: int = 0, y: int = 0, radius: int = 1):
        self.x = x
        self.y = y
        self.radius = radius

    def __repr__(self):
        # use __qualname__ instead of __name__
        ## __qualname__ return inheritance chain and __name__ only return the class name.
        return f"{self.__class__.__qualname__}(x={self.x}, y={self.y}, radius={self.radius})"

    def __eq__(self, other):
        # to have the comparison functionality it necessary to implement the __eq__ method.
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) == (other.x, other.y, other.radius)
        # best practice:
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y, self.radius))

In [48]:
c1 = Circle()
c2 = Circle()
c1 == c2

True

In [49]:
hash(c1), hash(c2) 

(-1882636517035687140, -1882636517035687140)

In [50]:
set_ = {c1, c2}
set_

{Circle(x=0, y=0, radius=1)}

In [51]:
dict_ = {
    c1: "Circle 1"
}

In [52]:
dict_[c1]

'Circle 1'

### Hashable objects should be immutable

- Set variables in python as immutable is not possibl;e, the way around it is to set them as private

In [53]:
class Circle:
    def __init__(self, x: int = 0, y: int = 0, radius: int = 1):
        self._x = x
        self._y = y
        self._radius = radius

    def __repr__(self):
        # use __qualname__ instead of __name__
        ## __qualname__ return inheritance chain and __name__ only return the class name.
        return f"{self.__class__.__qualname__}(x={self.x}, y={self.y}, radius={self.radius})"

    def __eq__(self, other):
        # to have the comparison functionality it necessary to implement the __eq__ method.
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) == (other.x, other.y, other.radius)
        # best practice:
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y, self.radius))

    # read only properties.
    # dataclasses uses another approach to create read only variables.
    # dataclasses overwrite the __setattr__ and __getattr__ methods.
    @property
    def x(self):
        return self._x

    @property
    def y(self):
        return self._y

    @property
    def radius(self):
        return self._radius

## Immutable dataclass

- dataclasses do not uses read only properties. It uses another approach to create read only variables.
- dataclasses overwrite the __setattr__ and __getattr__ methods.
- `@dataclass(frozen=True)`.
- Every field in the dataclass will be immutable.


In [54]:
@dataclass(frozen=True)
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

- Now all the variables from the dataclass are immutable.
- Now it is possible to HASH instances of the dataclass, because they are immutable.

In [55]:
cd1 = CircleDataClass()
cd2 = CircleDataClass()
cd3 = CircleDataClass()

hash(cd1), hash(cd2), hash(cd3)

(-1882636517035687140, -1882636517035687140, -1882636517035687140)

## Ordering

- Logic comparisons: `__lt__`, `__le__`, `__et__`...
- There is no need to implement both greater than and less than. With one method python understands the other.
- Same for greater or equal to and less or equal to.
- The `total_ordering` decorator from the `functools` module implements the logical comparison dunder methods for you. You just need to implement one comparison dunder method and the decorator modifies your class assing the other ones.

### Ordering in dataclasses

- By default dataclasses do not implement any ordering.
- In order to enable comparioson between instances of the same dataclass you must to set `@dataclass(order=True)`.


In [56]:
c1 = Circle()
c2 = Circle(1, 1, 2)

c1 < c2

TypeError: '<' not supported between instances of 'Circle' and 'Circle'

### Implementing logical comparison methods

In [65]:
class Circle:
    def __init__(self, x: int = 0, y: int = 0, radius: int = 1):
        self._x = x
        self._y = y
        self._radius = radius

    # read only properties.
    # dataclasses uses another approach to create read only variables.
    # dataclasses overwrite the __setattr__ and __getattr__ methods.
    @property
    def x(self):
        return self._x

    @property
    def y(self):
        return self._y

    @property
    def radius(self):
        return self._radius

    def __repr__(self):
        # use __qualname__ instead of __name__
        ## __qualname__ return inheritance chain and __name__ only return the class name.
        return f"{self.__class__.__qualname__}(x={self.x}, y={self.y}, radius={self.radius})"

    def __eq__(self, other):
        # to have the comparison functionality it necessary to implement the __eq__ method.
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) == (other.x, other.y, other.radius)
        # best practice:
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y, self.radius))

    # There is no need to implement both greater than and less than.
    # With one method python understands the other.
    def __lt__(self, other):
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) < (other.x, other.y, other.radius)
        return NotImplemented
    
    # There is no need to implement both greater or equal to than and less than or equal to.
    # With one method python understands the other.
    def __le__(self, other):
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) <= (other.x, other.y, other.radius)
        return NotImplemented

In [66]:
c1 = Circle()
c2 = Circle(1, 1, 2)

c1 < c2

True

In [68]:
c1 <= c2

True

In [74]:
from functools import total_ordering

# The `total_ordering` decorator from the `functools` module implements the logical comparison dunder methods for you.
# You just need to implement one comparison dunder method and the decorator modifies your class assing the other ones.
@total_ordering
class Circle:
    def __init__(self, x: int = 0, y: int = 0, radius: int = 1):
        self._x = x
        self._y = y
        self._radius = radius

    # read only properties.
    # dataclasses uses another approach to create read only variables.
    # dataclasses overwrite the __setattr__ and __getattr__ methods.
    @property
    def x(self):
        return self._x

    @property
    def y(self):
        return self._y

    @property
    def radius(self):
        return self._radius

    def __repr__(self):
        # use __qualname__ instead of __name__
        ## __qualname__ return inheritance chain and __name__ only return the class name.
        return f"{self.__class__.__qualname__}(x={self.x}, y={self.y}, radius={self.radius})"

    def __eq__(self, other):
        # to have the comparison functionality it necessary to implement the __eq__ method.
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) == (other.x, other.y, other.radius)
        # best practice:
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y, self.radius))

    # There is no need to implement both greater than and less than.
    # With one method python understands the other.
    def __lt__(self, other):
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) < (other.x, other.y, other.radius)
        return NotImplemented

In [75]:
c1 = Circle()
c2 = Circle(1, 1, 2)

c1 <= c2

True

### Ordering with dataclasses

In [76]:
cd1 = CircleDataClass()
cd2 = CircleDataClass(1, 2, 3)

cd1 < cd2

TypeError: '<' not supported between instances of 'CircleDataClass' and 'CircleDataClass'

In [77]:
@dataclass(frozen=True, order=True)
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

In [78]:
cd1 = CircleDataClass()
cd2 = CircleDataClass(1, 2, 3)

cd1 < cd2

True

## Serialization

### dataclasses

- To enable sirialization for dataclasses you must to import the `asdict` and `astuple` methods from dataclasses module.

### Custom class

- To enable serialization you must to create the `asdict` and `astuple` methods.

In [79]:
from dataclasses import asdict, astuple

In [80]:
cd1 = CircleDataClass()

display(asdict(cd1))
display(astuple(cd1))

{'x': 0, 'y': 0, 'radius': 1}

(0, 0, 1)

In [83]:
from functools import total_ordering

# The `total_ordering` decorator from the `functools` module implements the logical comparison dunder methods for you.
# You just need to implement one comparison dunder method and the decorator modifies your class assing the other ones.
@total_ordering
class Circle:
    def __init__(self, x: int = 0, y: int = 0, radius: int = 1):
        self._x = x
        self._y = y
        self._radius = radius

    # read only properties.
    # dataclasses uses another approach to create read only variables.
    # dataclasses overwrite the __setattr__ and __getattr__ methods.
    @property
    def x(self):
        return self._x

    @property
    def y(self):
        return self._y

    @property
    def radius(self):
        return self._radius

    def __repr__(self):
        # use __qualname__ instead of __name__
        ## __qualname__ return inheritance chain and __name__ only return the class name.
        return f"{self.__class__.__qualname__}(x={self.x}, y={self.y}, radius={self.radius})"

    def __eq__(self, other):
        # to have the comparison functionality it necessary to implement the __eq__ method.
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) == (other.x, other.y, other.radius)
        # best practice:
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y, self.radius))

    # There is no need to implement both greater than and less than.
    # With one method python understands the other.
    def __lt__(self, other):
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) < (other.x, other.y, other.radius)
        return NotImplemented

    # add serialization as dict
    def asdict(self):
        return {
            "x": self.x,
            "y": self.y,
            "radius": self.radius,
        }

    # add serialization as tuple
    def astuple(self):
        return self.x, self.y, self.radius

In [84]:
c1 = Circle()
c1.asdict()

{'x': 0, 'y': 0, 'radius': 1}

## Introspection

- With dataclasses is easy to have introspection.
- To use it you must to use the `fields` method from `dataclasses` module.

In [85]:
from dataclasses import fields

In [89]:
c1 = CircleDataClass()

for field in fields(c1):
    print(field, end="\n-----------------\n")

Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)
-----------------
Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)
-----------------
Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)
-----------------


## Custom dataclasses

- A dataclass is just a normal class with a decorator on top of it.
- It is possible to add methods and properties and all the same functionalities as a normal class.
- This also means that it is possible to overwrite the special dunder methods and modify them.

In [90]:
from math import pi

@dataclass(frozen=True, order=True)
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

    @property
    def area(self):
        return pi * self.radius ** 2

    def circunference(self):
        return 2 * pi * self.radius

In [91]:
cd = CircleDataClass()
display(cd.area)
display(cd.circunference())

3.141592653589793

6.283185307179586

## Sorting

- Based on specific variable.
- Based in a property.

In [105]:
from math import pi, dist

@dataclass(frozen=True) #order=True)
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

    @property
    def area(self):
        return pi * self.radius ** 2

    def circunference(self):
        return 2 * pi * self.radius

    def __lt__(self, other):
        if self.__class__ == other.__class__:
            return dist((0, 0), (self.x, self.y)) < dist((0, 0), (other.x, other.y))
        return NotImplemented

In [106]:
cd1 = CircleDataClass(1, 1, 2)
cd2 = CircleDataClass(2, 2, 1)

cd1 < cd2

True

In [107]:
cd1 <= cd2

TypeError: '<=' not supported between instances of 'CircleDataClass' and 'CircleDataClass'

In [108]:
from math import pi, dist
from functools import total_ordering

@dataclass(frozen=True) #order=True)
@total_ordering
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

    @property
    def area(self):
        return pi * self.radius ** 2

    def circunference(self):
        return 2 * pi * self.radius

    def __lt__(self, other):
        if self.__class__ == other.__class__:
            return dist((0, 0), (self.x, self.y)) < dist((0, 0), (other.x, other.y))
        return NotImplemented

In [110]:
cd1 = CircleDataClass(1, 1, 2)
cd2 = CircleDataClass(2, 2, 1)

display(cd1 <= cd2)
display(cd1 >= cd2)

True

False

## kwargs to the `__init__` method - key only args.

In [111]:
from functools import total_ordering

# The `total_ordering` decorator from the `functools` module implements the logical comparison dunder methods for you.
# You just need to implement one comparison dunder method and the decorator modifies your class assing the other ones.
@total_ordering
class Circle:
    # the "*" indicates that after that the args must to be key only args, no more positional args.
    def __init__(self, x: int = 0, y: int = 0, *, radius: int = 1):
        self._x = x
        self._y = y
        self._radius = radius

    # read only properties.
    # dataclasses uses another approach to create read only variables.
    # dataclasses overwrite the __setattr__ and __getattr__ methods.
    @property
    def x(self):
        return self._x

    @property
    def y(self):
        return self._y

    @property
    def radius(self):
        return self._radius

    def __repr__(self):
        # use __qualname__ instead of __name__
        ## __qualname__ return inheritance chain and __name__ only return the class name.
        return f"{self.__class__.__qualname__}(x={self.x}, y={self.y}, radius={self.radius})"

    def __eq__(self, other):
        # to have the comparison functionality it necessary to implement the __eq__ method.
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) == (other.x, other.y, other.radius)
        # best practice:
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y, self.radius))

    # There is no need to implement both greater than and less than.
    # With one method python understands the other.
    def __lt__(self, other):
        if self.__class__ == other.__class__:
            return (self.x, self.y, self.radius) < (other.x, other.y, other.radius)
        return NotImplemented

    # add serialization as dict
    def asdict(self):
        return {
            "x": self.x,
            "y": self.y,
            "radius": self.radius,
        }

    # add serialization as tuple
    def astuple(self):
        return self.x, self.y, self.radius

In [113]:
c1 = Circle(1, 1, 2)
c1

TypeError: Circle.__init__() takes from 1 to 3 positional arguments but 4 were given

In [114]:
c1 = Circle(1, 1, radius=2)
c1

Circle(x=1, y=1, radius=2)

### Key-only args for `dataclasses`.

In [117]:
from math import pi, dist
from dataclasses import KW_ONLY

@dataclass(frozen=True) #order=True)
class CircleDataClass:
    x: int = 0
    y: int = 0
    # this field can have any name, it will not be used, just indicate that from here the args must be key-only, same as "*".
    _: KW_ONLY # it is just a "separator"
    radius: int = 1

    @property
    def area(self):
        return pi * self.radius ** 2

    def circunference(self):
        return 2 * pi * self.radius

    def __lt__(self, other):
        if self.__class__ == other.__class__:
            return dist((0, 0), (self.x, self.y)) < dist((0, 0), (other.x, other.y))
        return NotImplemented

In [118]:
cd1 = CircleDataClass(1, 1, 2)
cd1

TypeError: CircleDataClass.__init__() takes from 1 to 3 positional arguments but 4 were given

In [119]:
cd1 = CircleDataClass(1, 1, radius=2)
cd1

CircleDataClass(x=1, y=1, radius=2)

### Every parameter as key-only.

- Use `@dataclass(kw_only=True)`

In [121]:
from math import pi, dist

@dataclass(frozen=True, order=True, kw_only=True)
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

    @property
    def area(self):
        return pi * self.radius ** 2

    def circunference(self):
        return 2 * pi * self.radius

In [122]:
cd1 = CircleDataClass(1, 1, 1)
cd1

TypeError: CircleDataClass.__init__() takes 1 positional argument but 4 were given

In [123]:
cd1 = CircleDataClass(radius=1, y=1, x=1)
cd1

CircleDataClass(x=1, y=1, radius=1)

## Resource utilization and performance - `dataclasses` X `NamedTuple`.

- Regarding to size, `dataclasses` and `NamedTuple` has the same performance.
- Regarding to creation time, `dataclasses` with slots and not frozen has better performance than `NamedTuple`, but if it is frozen this changes.
- Regarding to speed on reading attributes, `dataclasses` always performs better than `NamedTuples`.

In [125]:
answer = {
    "answer": "test"
}

In [126]:
answer["answer"]

'test'

## Post-init special method

- The `__post_init__` method is used to augmnt the normal init method that is automatically generated by the `@dataclass` decorator.
- Used to perform custom inicialization for your `@dataclass` and runs after the original initialization.
- Allows us to extennd the `__init__` method without the need to modify the method itself.
- It is an `INSTANCE` method and for that it has access to instance data.

In [128]:
@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

    def __post_init__(self):
        print("__post_init__ called")
        print(repr(self))

In [130]:
cd1 = CircleDataClass()

__post_init__ called
CircleDataClass(x=0, y=0, radius=1)


### Init only variables

- Variable that are passed to `__init__`. They appears as parameter as the other variables before the `__post_init__`.
- They are `NOT STORED` in the isntance `dict` or `slots`.
- It's going to be a field defined in the `@dataclass` it's not going to end up being a field in the `final resulting class`. It's only going to be something that is passed to the `__init__` and `__post_init__` methods.
- To have a `INIT ONLY` variable it is used the generic type `dataclasses.InitVar`.

In [134]:
@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: int = 0 # regular field 
    translate_y: int = 0 # regular field

In [133]:
from dataclasses import fields

# displaying regular fields
fields(CircleDataClass)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='translate_x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='translate_y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init

In [135]:
from dataclasses import InitVar

@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: InitVar[int] = 0 # init only field 
    translate_y: InitVar[int] = 0 # init only field

    def __post_init__(self, translate_x, translate_y): # args must be in the same order as it is defined.
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x # overwrite the variables 
        self.y += translate_y # overwrite the variables


In [136]:
cd1 = CircleDataClass(0, 0, 1, -1, -2)
cd1

Translating center by: Δx=-1, Δy=-2


CircleDataClass(x=-1, y=-2, radius=1)

In [137]:
fields(CircleDataClass)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

In [139]:
# making post_init variables keyword only

from dataclasses import InitVar, KW_ONLY

@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1
    _: KW_ONLY
    translate_x: InitVar[int] = 0 # init only field 
    translate_y: InitVar[int] = 0 # init only field

    def __post_init__(self, translate_x, translate_y): # args must be in the same order as it is defined.
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x # overwrite the variables 
        self.y += translate_y # overwrite the variables

In [140]:
cd1 = CircleDataClass(0, 0, 1, -1, -2)
cd1

TypeError: CircleDataClass.__init__() takes from 1 to 4 positional arguments but 6 were given

In [141]:
cd1 = CircleDataClass(0, 0, 1, translate_x=-1, translate_y=-2)
cd1

Translating center by: Δx=-1, Δy=-2


CircleDataClass(x=-1, y=-2, radius=1)

In [142]:
cd1.__dict__

{'x': -1, 'y': -2, 'radius': 1}

## Field levels customization

- Define defaults as instances of the `dataclasses.Field` class.
- We should `NEVER` instantiate `dataclasses.Field` directly. It is necessary to use the `dataclasses.field` function.
- `type(field())` == `dataclasses.Field`.

In [143]:
from dataclasses import InitVar, KW_ONLY, Field, field

@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1
    _: KW_ONLY
    translate_x: InitVar[int] = 0 # init only field 
    translate_y: InitVar[int] = 0 # init only field

    def __post_init__(self, translate_x, translate_y): # args must be in the same order as it is defined.
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x # overwrite the variables 
        self.y += translate_y # overwrite the variables

### Customization with `dataclasses.field`

In [144]:
from dataclasses import field

@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

In [145]:
dc = CircleDataClass()
dc

CircleDataClass(x=0, y=0, radius=1)

In [146]:
from dataclasses import field

@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

    # overwrite __repr__ - normal way.
    def __repr__(self):
        return f"{self.__class__.__qualname__}(radius={self.radius})"

In [147]:
dc = CircleDataClass()
dc

CircleDataClass(radius=1)

In [148]:
from dataclasses import field

@dataclass
class CircleDataClass:
    # Excludind fields from __repr__
    # now there is no more default values for X and Y
    x: int = field(repr=False)
    y: int = field(repr=False)
    radius: int = 1

In [150]:
dc = CircleDataClass(0, 0, 1)
dc

CircleDataClass(radius=1)

In [151]:
from dataclasses import field

@dataclass
class CircleDataClass:
    # Excludind fields from __repr__
    # now there is no more default values for X and Y
    # it is possible to set the default values again with 'default'.
    x: int = field(default=0, repr=False)
    y: int = field(default=0, repr=False)
    radius: int = 1

In [153]:
cd = CircleDataClass()
cd.x, cd.y

(0, 0)

### Define fields now passing them to `__init__`.

- Good for calculated fields

In [154]:
from dataclasses import field
from math import pi

@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

    def __post_init__(self):
        # the dataclass cannot be frozen for it to work
        self._area = pi * self.radius ** 2

    @property
    def area(self):
        return self._area

In [155]:
dc = CircleDataClass()
dc, dc.area

(CircleDataClass(x=0, y=0, radius=1), 3.141592653589793)

In [157]:
# now the area field does not appears in the class variables
# and the '_area' apears.
display(fields(dc))
display(dc.__dict__)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

{'x': 0, 'y': 0, 'radius': 1, '_area': 3.141592653589793}

In [158]:
# to solve all this problems it is necessary to have a field in the class that is
# NOT EXPECTED as an agument in the __init__
# In this case you are still not able to have a frozen class.

from dataclasses import field
from math import pi

@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1
    # a field that is not passed to init.
    area: float = field(init=False, repr=False)

    def __post_init__(self):
        self.area = pi * self.radius ** 2

In [159]:
cd = CircleDataClass()
display(fields(cd))
display(cd.__dict__)
display(cd.area)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='area',type=<class 'float'>,default=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,default_factory=<dataclasses._MISSING_TYPE object at 0x10f4e9ab0>,init=False,repr=False,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

{'x': 0, 'y': 0, 'radius': 1, 'area': 3.141592653589793}

3.141592653589793

### Having a `frozen` class with mutable fields.

In [160]:
from dataclasses import field
from math import pi

@dataclass(frozen=True)
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1
    # a field that is not passed to init.
    area: float = field(init=False, repr=False)

    def __post_init__(self):
        # self.__setattr__("area", pi * self.radius ** 2) # same as below.
        # self.area = pi * self.radius ** 2

        ## to solve this problem it is necessary to write the variable overwriting the __setattr__ method from the super() class
        super().__setattr__("area", pi * self.radius ** 2) # same as below.

In [161]:
cd = CircleDataClass()
cd

CircleDataClass(x=0, y=0, radius=1)

## Customazing comparison fields
- Equality of two instances of a `dataclass` is based on the equality of the tuples containing the variables values of each instance.

In [164]:
from dataclasses import field

@dataclass(order=True)
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1

In [166]:
cd = CircleDataClass(1, 1, 1)
cd2 = CircleDataClass(1, 1, 1)

display(cd == cd2)
display(cd < cd2)

True

False

In [167]:
# order to be based in just one field
# it also changes the way that comparison works

from dataclasses import field

@dataclass(order=True)
class CircleDataClass:
    x: int = field(default=0, compare=False)
    y: int = field(default=0, compare=False)
    radius: int = 1

In [168]:
cd = CircleDataClass(1, 1, 1)
cd2 = CircleDataClass(1, 1, 2)

display(cd == cd2)
display(cd < cd2)

False

True

In [169]:
cd = CircleDataClass(1, 1, 1)
cd2 = CircleDataClass(1, 0, 1)

display(cd == cd2)
display(cd < cd2)

True

False

In [None]:
# there are situations where you want to have comparison and equality based on different logics

from dataclasses import field

@dataclass(order=True)
class CircleDataClass:
    x: int = field(default=0, compare=False)
    y: int = field(default=0, compare=False)
    radius: int = 1

## Hashing
- to create a hash the isntance of the class should be immutable.
- It does not means that the entire class should be immutable.
- Rules:
    - The instance data used to generate a hash for the instance should be immutable.
    - The same data used to generate a hash should be part of the equality implementation.
    - Two instances that compare equal should have the same hash.

In [170]:
class Person:
    def __init__(self, name, age, ssn): 
        self.name: str = name # this changes over time
        self.age: int = age # this changes over time
        # ssn is mutable at this point.
        self.ssn: str = ssn # ssn - social security number - unique id.

    def __eq__(self, other):
        if self.__class__ == other.__class__:
            return self.ssn == other.ssn
        return NotImplemented

    def __hash__(self):
        return hash(self.ssn)

In [171]:
class Person:
    def __init__(self, name, age, ssn): 
        self.name: str = name # this changes over time
        self.age: int = age # this changes over time
        self._ssn: str = ssn # ssn - social security number - unique id.

    def __eq__(self, other):
        if self.__class__ == other.__class__:
            return self.ssn == other.ssn
        return NotImplemented

    def __hash__(self):
        return hash(self.ssn)

    @property # read only property - make the ssn immutable
    def ssn(self):
        return self._ssn

    def __repr__(self):
        return f"{self.__class__.__qualname__}(name={self.name}, age={self.age}, ssn={self.ssn}, id = {hex(id(self))})"

In [179]:
p1 = Person("A", 30, 123)
p2 = Person("B", 40, 321)
p3 = Person("C", 50, 123)

In [180]:
p1 == p2, p1 == p3

(False, True)

In [181]:
hash(p1), hash(p2), hash(p3)

(123, 321, 123)

In [182]:
#set
{p1, p2, p3}

{Person(name=A, age=30, ssn=123, id = 0x114236560),
 Person(name=B, age=40, ssn=321, id = 0x1142366e0)}

In [183]:
dct = {
    p1: "Person 1",
    p2: "Person 2",
}
dct

{Person(name=A, age=30, ssn=123, id = 0x114236560): 'Person 1',
 Person(name=B, age=40, ssn=321, id = 0x1142366e0): 'Person 2'}

In [184]:
p1.name = "X"
p2.age = 100

dct

{Person(name=X, age=30, ssn=123, id = 0x114236560): 'Person 1',
 Person(name=B, age=100, ssn=321, id = 0x1142366e0): 'Person 2'}

In [185]:
p1.ssn = 987

AttributeError: can't set attribute 'ssn'

### Hashing in dataclasses

- `dataclass` implements hash automatically if the `dataclass` is `IMMUTABLE`.

In [187]:
from dataclasses import dataclass

@dataclass(frozen=True) # the entire dataclass is immutable
class PersonDC:
    name: str
    age: int
    ssn: str

In [190]:
pDC1 = PersonDC("A", 30, 123)
pDC2 = PersonDC("A", 30, 123)
pDC3 = PersonDC("C", 50, 123)

In [191]:
# dataclass considers every field to equality
{pDC1, pDC2, pDC3}

{PersonDC(name='A', age=30, ssn=123), PersonDC(name='C', age=50, ssn=123)}

In [193]:
pDC1 == pDC2, pDC1 == pDC3

(True, False)

In [199]:
from dataclasses import dataclass, field

@dataclass(frozen=True)
class PersonDC:
    name: str = field(compare=False) # still immutable
    age: int = field(compare=False) # still immutable
    ssn: str

In [195]:
pDC1 = PersonDC("A", 30, 123)
pDC2 = PersonDC("B", 40, 321)
pDC3 = PersonDC("C", 50, 123)

In [196]:
pDC1 == pDC2, pDC1 == pDC3

(False, True)

In [197]:
pDC1 is pDC2, pDC1 is pDC3

(False, False)

In [198]:
hash(pDC1) == hash(pDC2), hash(pDC1) == hash(pDC3)

(False, True)

In [200]:
pDC1.name = "X"

FrozenInstanceError: cannot assign to field 'name'

In [202]:
from dataclasses import dataclass, field

@dataclass(unsafe_hash=True) # generates the __hash__ function
class PersonDC:
    name: str = field(compare=False)
    age: int = field(compare=False)
    ssn: str # the class is hashble but still muttable, including the ssn field

In [203]:
p1 = PersonDC("A", 35, 456)
p1.ssn = 987
p1

PersonDC(name='A', age=35, ssn=987)

In [None]:
from dataclasses import dataclass, field

@dataclass(unsafe_hash=True) # generates the __hash__ function
class PersonDC:
    name: str = field(compare=False)
    age: int = field(compare=False)
    ssn: str

## Key-only Arguments - second approach

In [None]:
from dataclasses import InitVar, field

@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: InitVar[int] = field(default=0, kw_only=True) 
    translate_y: InitVar[int] = field(default=0, kw_only=True)

    def __post_init__(self, translate_x, translate_y): # args must be in the same order as it is defined.
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x # overwrite the variables 
        self.y += translate_y # overwrite the variables

## Creating `dataclasses` programatically

In [2]:
from dataclasses import dataclass, InitVar, field

@dataclass
class CircleDataClass:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: InitVar[int] = field(default=0, kw_only=True) 
    translate_y: InitVar[int] = field(default=0, kw_only=True)

    def __post_init__(self, translate_x, translate_y): # args must be in the same order as it is defined.
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x # overwrite the variables 
        self.y += translate_y # overwrite the variables

In [11]:
from dataclasses import make_dataclass

def post_init(self, translate_x: int, translate_y: int):
    print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
    self.x += translate_x
    self.y += translate_y

CircleDataClass02 = make_dataclass(
    "CircleDataClass02", # class name
    [
        ("x", int, 0), # (field name, field type, default value)
        ("y", int, 0),
        ("radius", int, 0),
        ("translate_x", InitVar[int], field(default=0, kw_only=True)),
        ("translate_y", InitVar[int], field(default=0, kw_only=True)),
    ],
    order=True, # arguments
    # namespace - extra stuff inside the dataclass
    namespace = {
        "__post_init__": post_init
    }
)

In [12]:
c = CircleDataClass02()
c

Translating center by: Δx=0, Δy=0


CircleDataClass02(x=0, y=0, radius=0)

## Adding custom `metadata` to the fields of the dataclass

In [17]:
from dataclasses import dataclass, field, fields

@dataclass(unsafe_hash=True) # generates the __hash__ function
class PersonDC:
    name: str = field(compare=False, metadata={"table": "person", "column": "name"})
    age: int = field(compare=False, metadata={"table": "person", "column": "current_age"})
    ssn: str = field(metadata={"table": "person", "column": "ssn"})

In [18]:
fields(PersonDC)

(Field(name='name',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x110009ab0>,default_factory=<dataclasses._MISSING_TYPE object at 0x110009ab0>,init=True,repr=True,hash=None,compare=False,metadata=mappingproxy({'table': 'person', 'column': 'name'}),kw_only=False,_field_type=_FIELD),
 Field(name='age',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x110009ab0>,default_factory=<dataclasses._MISSING_TYPE object at 0x110009ab0>,init=True,repr=True,hash=None,compare=False,metadata=mappingproxy({'table': 'person', 'column': 'current_age'}),kw_only=False,_field_type=_FIELD),
 Field(name='ssn',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x110009ab0>,default_factory=<dataclasses._MISSING_TYPE object at 0x110009ab0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'table': 'person', 'column': 'ssn'}),kw_only=False,_field_type=_FIELD))

In [19]:
help(PersonDC)

Help on class PersonDC in module __main__:

class PersonDC(builtins.object)
 |  PersonDC(name: str, age: int, ssn: str) -> None
 |  
 |  PersonDC(name: str, age: int, ssn: str)
 |  
 |  Methods defined here:
 |  
 |  __eq__(self, other)
 |      Return self==value.
 |  
 |  __hash__(self)
 |      Return hash(self).
 |  
 |  __init__(self, name: str, age: int, ssn: str) -> None
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __annotations__ = {'age': <class 'int'>, 'name': <class 'str'>, 'ssn':...
 |  
 |  __dataclass_fie

In [20]:
fields(PersonDC)[0].metadata

mappingproxy({'table': 'person', 'column': 'name'})

## Initializing fields with muttable objects
- Factories: list, tuples

### wrong way to set a list in a function argument 
- The default empty list as function argument is "created" when the function is created and not when the function is called.
- Every time the function is called, it feeds the same list object in memory. 

In [21]:
def squares(idx, lst = []):
    lst.append((idx, idx ** 2))
    return lst

In [22]:
numbers = squares(1)
numbers

[(1, 1)]

In [23]:
other_number = squares(2)
other_number

[(1, 1), (2, 4)]

In [24]:
# this way the list is create only the fuction is called.
def squares(idx, lst = None):
    if lst is None:
        lst = []
    lst.append((idx, idx ** 2))
    return lst

In [25]:
numbers = squares(1)
numbers

[(1, 1)]

In [26]:
other_number = squares(2)
other_number

[(2, 4)]

### `dataclass`.
- Muttable value as default for a field reises an ValueError

In [33]:
from dataclasses import dataclass

@dataclass
class Test:
    tests: list = []

    def add(self, idx):
        self.tests.append((idx, idx ** 2))

ValueError: mutable default <class 'list'> for field tests is not allowed: use default_factory

In [31]:
from dataclasses import dataclass, field

@dataclass
class Test:
    # factory function
    tests: list = field(default_factory=list)

    def add(self, idx):
        self.tests.append((idx, idx ** 2))

In [32]:
t1 = Test()
t1.add(1)
t1.add(2)
t1.tests

[(1, 1), (2, 4)]

In [34]:
t2 = Test()
t2.add(1)
t2.tests

[(1, 1)]