### Dataclasses Explained (Part 2)

In the last video we covered a broad array of topics:
- basic data classes 
- equality
- hashability
- mutability/immutability (aka freezing)
- default ordering
- serialization (to `dict` and `tuple` types)
- fields introspection
- adding our own methods and properties to dataclasses
- one approach to custom ordering
- keyword-only initializer arguments
- performance/resource utilization compared to named tuples

In this video, we're going to dig deeper into customizing the code generated by a dataclass, using a few extra arguments to the `@dataclass` decorator we have not seen yet, as well as using field level directives.

We'll start with the same `Circle` class we were working with in the last video (we'll just keep the dataclass barebones for now).

In [1]:
from dataclasses import dataclass

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

#### The `__post_init__` Special Method

The special method `__post_init__` in dataclasses can be used to augment the normal `__init__` method which is generally implemented for us by dataclasses.

This allows us to modify the behavior of the class `__init__` without accessing the code in that function itself. 

Since the `__post_init__` method is an instance method, it has access to any instance fields (that were set up by the `__init__` method dataclasses create).

In [2]:
@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
        
    def __post_init__(self):
        print('__post_init__ called')
        print(repr(self))

In [3]:
c = CircleD()

__post_init__ called
CircleD(x=0, y=0, radius=1)


So this allows us to essentially extend the `__init__` method without having to modify the method itself.

#### Init-Only Variables

There are additional parameters that may be passed to `__post_init__`, so-called **init-only** variables.

These are variables that are passed to `__init__` (so if we decide to implement a custom `__init__` that variable will show up as a parameter), and, probably more importantly, will be added as a parameter to the `__post_init__` method as well.

They do not get stored in the instance dictionary (or slots) - so `__post_init__` does not have access to those variables in the `self` instance object, hence why they need to be passed as arguments to `__post_init__`.

Let's use that to maybe perform a translation of the circle's center point - where we just want to allow ther user to specify some x and y translations, but only store the final result into the `x` and `y` fields

To create an init-only variable, we declare it just like we would any other field (and again, order of definition will be reflected in the order in which `__init__` and `__post_init__` params are defined).

However, we need to tell the dataclasses generator that this field is not a real field in the class, and only used as an additional parameter to the `__init__` and `__post_init__` methods. 

We do that by declaring the field with a special type hint - the `InitVar` type, defined in the `dataclasses` module.

The `InitVar` type is a generic type, so still retain the ability to retain the specific type that the value should be.

In [4]:
from dataclasses import InitVar

In [5]:
@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: InitVar[int] = 0
    translate_y: InitVar[int] = 0
        
    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

In [6]:
c = CircleD(0, 0, 1, -1, -2)

c

Translating center by: Δx=-1, Δy=-2


CircleD(x=-1, y=-2, radius=1)

Now, this is a case where I would want to make `translate_x` and `translate_y` keyword-only arguments.

One way of doing that is using the `KW_ONLY` feature we learned in the last video (we'll see another way later).

In [7]:
from dataclasses import KW_ONLY

In [8]:
@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    _: KW_ONLY
    translate_x: InitVar[int] = 0  
    translate_y: InitVar[int] = 0
        
    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

As expected, we get an exception if we pass all the arguments positionally.

In [9]:
try:
    c = CircleD(0, 0, 1, -1, -2)
except TypeError as ex:
    print(f"TypeError: {ex}")

TypeError: CircleD.__init__() takes from 1 to 4 positional arguments but 6 were given


We have to pass those translation arguments as named arguments:

In [10]:
c = CircleD(0, 0, 1, translate_x=-2, translate_y=-1)
c

Translating center by: Δx=-2, Δy=-1


CircleD(x=-2, y=-1, radius=1)

And as I stated before, those translation fields are not in the instance data:

In [11]:
c.__dict__

{'x': -2, 'y': -1, 'radius': 1}

In [12]:
from dataclasses import fields

In [13]:
fields(CircleD)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

#### Field Level Customizations

So far we have seen how to customize dataclasses either by specifying parameters in the `@dataclass` decorator itself, or by using "special" types when type hinting the fields in the dataclass.

Dataclasses provide an additional mechanism to extend the speficiations of individual fields.

We do that by assigning a special default to the fields in our class.

That default needs to be an instance of the `dataclasses.Field` class, and can be instantiated using the `dataclasses.field` function.

According to the documentation, you **must** use the `field()` function to create an instance of the `Field` class - you should never instantiate a `Field` instance directly.

In [14]:
from dataclasses import field, Field

In [15]:
f = field()
type(f)

dataclasses.Field

##### Customizing the Class `repr`

As a first simple example, remember how dataclasses have a default representation that basically includes every field (remember that init-only variables are not fields, so those don't count).

In [16]:
@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1   

In [17]:
c  = CircleD()

In [18]:
repr(c)

'CircleD(x=0, y=0, radius=1)'

What if we only want to use a subset of the fields in the repr?

We can certainly do that by providing our own implementation of the repr:

In [19]:
@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
        
    def __repr__(self):
        return f"{self.__class__.__qualname__}(radius={self.radius})"

In [20]:
c = CircleD()
c

CircleD(radius=1)

However, we can do the same thing, leveraging the more declarative syntax of dataclasses this way:

In [21]:
@dataclass
class CircleD:
    x: int = field(repr=False)
    y: int = field(repr=False)
    radius: int = 1

In [22]:
c = CircleD(0, 0, 1)
c

CircleD(radius=1)

##### Specifying a Field Default

You'll notice that we lost one thing with the dataclass as it stands right now - the defaults for `x` and `y`.

We can no longer do this:

In [23]:
try:
    CircleD()
except TypeError as ex:
    print(f"TypeError: {ex}")

TypeError: CircleD.__init__() missing 2 required positional arguments: 'x' and 'y'


This was a consequence of using that `Field` instance as the default.

Instead, we can specify the default via the `Field` object itself as follows:

In [24]:
@dataclass
class CircleD:
    x: int = field(default=0, repr=False)
    y: int = field(default=0, repr=False)
    radius: int = 1

And now we have all our defaults again:

In [25]:
CircleD()

CircleD(radius=1)

##### Non-Initialized Fields

Just now we saw how we can create pseudo fields that are not fields (attributes in the class data), but still end up as arguments to `__init__` and `__post_init__`.

But what about the reverse? Cases where we want to define a field in the class, but don't necessarily want to pass it as an argument to `__init__`.

This could very well apply to calculated fields.

Let's look at this example using plain properties first:

In [26]:
from math import pi

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
        
    def __post_init__(self):
        self._area = pi * self.radius ** 2
        
    @property
    def area(self):
        return self._area

In [27]:
c = CircleD()

In [28]:
c.area

3.141592653589793

However, a few things with this approach:
1. the `area` attribute is a regular property, and does not show up in the fields of the dataclass - that might not be what we want
2. we have that extra "backing" variable `_area` that is now in the class state
3. if the dataclass is frozen, trying to store `self._area` in the `__post_init__` method is going to fail!

In [29]:
c.__dict__

{'x': 0, 'y': 0, 'radius': 1, '_area': 3.141592653589793}

In [30]:
fields(c)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

If we want to keep things cleaner and more consistent, what we really need is a field that is defined in the class, but that is not expectedf as an argument to the `__init__` (and `__post_init__`) methods.

In [31]:
@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    area: float = field(init=False, repr=False)
        
    def __post_init__(self):
        self.area = pi * self.radius ** 2

In [32]:
c = CircleD()

In [33]:
c.__dict__

{'x': 0, 'y': 0, 'radius': 1, 'area': 3.141592653589793}

In [34]:
c.area

3.141592653589793

In [35]:
fields(c)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='area',type=<class 'float'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=False,repr=False,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

Now we can also choose to freeze the class if we want to.

The caveat here is that when you freeze the class, no assignments to instance variables (even inside the `__post__init__` method will work - it's supposed to be frozen after all.

So this will not work:

In [36]:
@dataclass(frozen=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    area: float = field(init=False, repr=False)
        
    def __post_init__(self):
        self.area = pi * self.radius ** 2

In [37]:
from dataclasses import FrozenInstanceError

try:
    c = CircleD()
except FrozenInstanceError as ex:
    print(f"FrozenInstanceError: {ex}")

FrozenInstanceError: cannot assign to field 'area'


We can, however, use the `__setattr__` special method on the parent class to circumvent the dataclass freeze protection (this will lead to potential issues with hashing, so we'll revisit this later, as this is potentially dangerous - we saw in the last video how mutability and hashibility can be problematic)

In [38]:
@dataclass(frozen=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    area: float = field(init=False, repr=False)
        
    def __post_init__(self):
        super().__setattr__("area", pi * self.radius ** 2)

In [39]:
c = CircleD()

In [40]:
c.__dict__

{'x': 0, 'y': 0, 'radius': 1, 'area': 3.141592653589793}

In [41]:
fields(c)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='area',type=<class 'float'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=False,repr=False,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

In [42]:
c.area

3.141592653589793

We could even leverage that for lazy evaluation of properties if we wanted to:

In [43]:
@dataclass(frozen=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    _area: float = field(init=False)
        
    @property
    def area(self):
        if not getattr(self, '_area', None):
            print("Cache miss")
            super().__setattr__('_area', pi * self.radius ** 2)
        else:
            print("Cache hit")
        return self._area

In [44]:
c = CircleD()

In [45]:
c.__dict__

{'x': 0, 'y': 0, 'radius': 1}

In [46]:
c.area

Cache miss


3.141592653589793

In [47]:
c.area

Cache hit


3.141592653589793

At this point though, I would say that we are starting to "fight" dataclasses to get the precise functionality we want - might be a better option to switch to regular classes and void some of those headaches... 

There are, of course, valid reasons to have fields that are not included in the init arguments, but when the dataclass is also frozen it can quickly lead to the headaches we just saw.

##### Customizing Comparison Fields

We have two types of comparisons to deal with - one is the `__eq__` comparison (`==`), and the others are for ordering, the `__lt__`, `__le__`, etc operators.

We looked at this in the previous video, and saw that equality and default sort order is based on a tuple comprised of all the fields (in order) of the dataclass. 

We'll start with a mutable dataclass for now to avoid the additional complexity of hashing.

In [48]:
@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

Just as a reminder, in this case, equality of two instances of the dataclass is based on the equality of tuples containing the field values (in definition order).

In [49]:
c1 = CircleD(1, 1, 1)
c2 = CircleD(1, 1, 1)
c1 == c2

True

By the way, `InitVar` fields are not part of this tuple.

We saw how to add ordering (`<`, `<=`, etc) to our dataclass:

In [50]:
@dataclass(order=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

And, just like for equality, the ordering is based on a tuple containing the field values (in declaration order).

In [51]:
c1 = CircleD(1, 0, 2)
c2 = CircleD(1, 1, 1)
c1 <= c2

True

But what if we wanted the ordering to be based on the radius only?

We can do it this way:

In [52]:
@dataclass(order=True)
class CircleD:
    x: int = field(default=0, compare=False)
    y: int = field(default=0, compare=False)
    radius: int = 1

In [53]:
c1 = CircleD(1, 0, 2)
c2 = CircleD(1, 1, 1)
c1 <= c2

False

There is one caveat here, this also changed the way equality worked!!

In [54]:
c1 = CircleD(1, 0, 1)
c2 = CircleD(1, 1, 1)

In [55]:
c1 == c2

True

So, if you are in a situation where you need different mechanisms for equality and ordering, you'll need to abandon dataclasses altogether and write your classes by hand, or you can keep using your dataclass, specifying what fields are to be included (all by default) in the equality comparison, and implement your own ordering functions as we saw in the last video.

I've said this before, but it bears repeating:

**If you find yourself "fighting" dataclasses to make them do exactly what you want that is beyond the reasonable capabilities of dataclasses, you are barking up the wrong tree. Write custom classes by hand and avoid the uneccesary complications of hacking the dataclasses code generator in weird and wonderful ways. Dataclasses do not replace classes - it is simply a code generator that helps you avoid writing a ton of boilerplate code**

##### Hashing

Ok, so let's go back to hashing - we covered a lot of ground on that in the previous video. 

To refresh our memory, here's what we said last time:

The instance state used to create a hash for the instance should be immutable.

This does not mean that the entire class should be immutable.

What it does mean is that we should (in most cases, I'm sure there are exceptions) follow these simple rules:

- instance data used to generate a hash for the instance should be immutable
- the same data used to generate the hash should be part of the equality implementation
- two instances that compare equal should have the same hash

Let's see what I mean by this with a simple class:

In [56]:
class Person:
    def __init__(self, name, age, ssn):
        self.name: str = name  # this could change over time
        self.age: int = age  # this changes over time
        self.ssn: str = ssn  # this never changes over time
        
    def __eq__(self, other):
        if self.__class__ == other.__class__:
            return self.ssn == other.ssn
        return NotImplemented
    
    def __hash__(self):
        return hash(self.ssn)

As we can see in this example, we consider two `Person` instances to be equal if their `ssn` attribute matches - we don't care about `name` and `age` which could change.

So, we don't need to make all the attributes `name`, `age`, and `ssn` immutable - ratrher we only need `ssn` to be immutable, and then we can safely use that for equality and hashing.

In [57]:
class Person:
    def __init__(self, name, age, ssn):
        self.name: str = name  # this could change over time
        self.age: int = age  # this changes over time
        self._ssn: str = ssn  # this never changes over time
        
    def __eq__(self, other):
        if self.__class__ == other.__class__:
            return self.ssn == other.ssn
        return NotImplemented
    
    def __hash__(self):
        return hash(self.ssn)
    
    @property
    def ssn(self):
        return self._ssn
    
    def __repr__(self):
        return f"Person(name={self.name}, age={self.age}, ssn={self.ssn}, id={hex(id(self))})"

In [58]:
p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '23456')
p3 = Person('C', 50, '12345')

In [59]:
p1 == p2, p1 == p3

(False, True)

In [60]:
hash(p1), hash(p2), hash(p3)

(-8502841099791150938, -1674677448643560702, -8502841099791150938)

We can create a set of these three instances, and we would expect only two of them to end up in the result:

In [61]:
{p1, p2, p3}

{Person(name=A, age=30, ssn=12345, id=0x106883090),
 Person(name=B, age=40, ssn=23456, id=0x1068a36d0)}

And same for a dictionary:

In [62]:
d = {p1: "Person 1", p2: "Person 2"}
d

{Person(name=A, age=30, ssn=12345, id=0x106883090): 'Person 1',
 Person(name=B, age=40, ssn=23456, id=0x1068a36d0): 'Person 2'}

If we try and modify one of the instances in the keys, things will still work just fine since we are not modifying the ssn value used for eauqlity and hashing:

In [63]:
p1.name = 'X'
p1.age=100
d

{Person(name=X, age=100, ssn=12345, id=0x106883090): 'Person 1',
 Person(name=B, age=40, ssn=23456, id=0x1068a36d0): 'Person 2'}

In [64]:
d[p1]

'Person 1'

Now, given all this we know that dataclasses will implement hashing for us if we make the dataclass immutable (frozen).

In [65]:
@dataclass(frozen=True)
class Person:
    name: str
    age: int
    ssn: str

In [66]:
p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '23456')

Instances are now hashable:

In [67]:
{p1, p2}

{Person(name='A', age=30, ssn='12345'), Person(name='B', age=40, ssn='23456')}

In [68]:
{
    p1: "Person 1",
    p2: "Person 2"
}

{Person(name='A', age=30, ssn='12345'): 'Person 1',
 Person(name='B', age=40, ssn='23456'): 'Person 2'}

And equality is based on all the fields:

In [69]:
p1 = Person('A', 30, '12345')
p2 = Person('A', 30, '12345')
p3 = Person('B', 40, '12345')

In [70]:
p1==p2, p2==p3, p1==p3

(True, False, False)

What we really want is to base equality and hashability on the `ssn` field only, and let `name` and `age` be mutable.

We could start by limiting what fields are used for both equality and hashing.

In [71]:
@dataclass(frozen=True)
class Person:
    name: str = field(compare=False)
    age: int = field(compare=False)
    ssn: str

So we have an immutable and hashable class, and only `ssn` should be used for equality and hashing:

In [72]:
p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '12345')

In [73]:
p1 is p2

False

In [74]:
p1 == p2

True

In [75]:
hash(p1) == hash(p2)

True

As we noted just now, this of course would affect the default ordering if we were to make the dataclass orderable using the `order=True` decorator argument.

But more importantly, our dataclass is frozen, so even though we technically (from an equality and hashbility perspective) can mutate the `name` and `age` properties, we cannot do so:

In [76]:
try:
    p1.name = 'X'
except FrozenInstanceError as ex:
    print(f"FrozenInstanceError: {ex}")

FrozenInstanceError: cannot assign to field 'name'


##### Unsafe Hashing

There are some ways around this. The first thing is we need to unfreeze our dataclass (unless we want to start overriding the safeguards in the dataclass, using the `super().__setattr__()` approach we looked at earlier - but I would not recommend it. At this point we are just fighting dataclasses. Time to call it quits.

So, assuming we are OK making the dataclass mutable again, and relying on ourselves, and more importantly, other developers in our code base, not to modify the "immutable" attributes, we could do this:

In [77]:
@dataclass(unsafe_hash=True)
class Person:
    name: str = field(compare=False)
    age: int = field(compare=False)
    ssn: str

`unsafe_hash` basically tells the dataclass code generator that even though the class is mutable, it shoudl still try to implement a hash function - and in this case it will default to using the fields that are included in the equality comparisons, so just `ssn`

In [78]:
p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '12345')

In [79]:
p1 is p2, p1 == p2, hash(p1) == hash(p2)

(False, True, True)

And of course we can now mutate `name` and `age` without causing issues.

In [80]:
p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '23345')

d = {
    p1: 'Person A',
    p2: 'Person B'
}

d

{Person(name='A', age=30, ssn='12345'): 'Person A',
 Person(name='B', age=40, ssn='23345'): 'Person B'}

In [81]:
p1.name = 'AAA'
p1.age = 300

d

{Person(name='AAA', age=300, ssn='12345'): 'Person A',
 Person(name='B', age=40, ssn='23345'): 'Person B'}

However, we **can easily** modify `ssn` - and that's not safe!

In [82]:
p2.ssn = '12345'

In [83]:
d

{Person(name='AAA', age=300, ssn='12345'): 'Person A',
 Person(name='B', age=40, ssn='12345'): 'Person A'}

In [84]:
d[p1]

'Person A'

In [85]:
d[p2]

'Person A'

As you can see, the dictionary got messed up.

##### Keyword-Only Arguments

We already saw how to separate arguments into positional and keyword-only sections in our dataclass fields.

In [86]:
@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    _: KW_ONLY
    translate_x: InitVar[int] = 0  
    translate_y: InitVar[int] = 0
        
    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

In [87]:
CircleD(0, 0, 1, translate_y=-1, translate_x=0)

Translating center by: Δx=0, Δy=-1


CircleD(x=0, y=-1, radius=1)

In [88]:
try:
    CircleD(0, 0, 1, 0, -1)
except TypeError as ex:
    print(f"TypeError: {ex}")

TypeError: CircleD.__init__() takes from 1 to 4 positional arguments but 6 were given


We have another way of doing this:

In [89]:
@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: InitVar[int] = field(default=0, kw_only=True)
    translate_y: InitVar[int] = field(default=0, kw_only=True)
        
    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

And it will work the same way:

In [90]:
CircleD(0, 0, 1, translate_y=-1, translate_x=0)

Translating center by: Δx=0, Δy=-1


CircleD(x=0, y=-1, radius=1)

In [91]:
try:
    CircleD(0, 0, 1, 0, -1)
except TypeError as ex:
    print(f"TypeError: {ex}")

TypeError: CircleD.__init__() takes from 1 to 4 positional arguments but 6 were given


One important thing to be aware of is that, since Python **requires** keyword-only arguments to be specified **after** positional arguments in a function's parameter list, dataclasses will move things around if it needs to satisfy that requirement:

In [92]:
@dataclass
class CircleD:
    x: int = 0
    translate_x: InitVar[int] = field(default=0, kw_only=True)
    y: int = 0
    translate_y: InitVar[int] = field(default=0, kw_only=True)
    radius: int = 1
        
    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

In [93]:
CircleD.__init__

<function __main__.CircleD.__init__(self, x: int = 0, y: int = 0, radius: int = 1, *, translate_x: dataclasses.InitVar[int] = 0, translate_y: dataclasses.InitVar[int] = 0) -> None>

You can see how the keyword-only arguments were moved to the end of the `__init__` (and same will happen with `__post_init__`, so you would instantiate the class just as with the previous example:

In [94]:
CircleD(0, 0, 1, translate_y=-1, translate_x=0)

Translating center by: Δx=0, Δy=-1


CircleD(x=0, y=-1, radius=1)

##### Creating Dataclasses Programmatically

So far we have create dataclasses using static code, but we can also create dataclasses programmatically.

To draw a parallel, think of named tuples - we can define named tuples using static code:

In [95]:
from typing import NamedTuple

class Person(NamedTuple):
    name: str
    age: int
    ssn: str

or we have a programmatic way of doing too:

In [96]:
from collections import namedtuple

Person = namedtuple("Person", "name age ssn")

We can do the same thing with dataclasses by using the `make_dataclass` function. I won't get into too much detail here, you can read the docs or do some web searches if you want more info, but here is a quick example.

Let's say we have this dataclass:

In [97]:
@dataclass(order=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: InitVar[int] = field(default=0, kw_only=True)
    translate_y: InitVar[int] = field(default=0, kw_only=True)
        
    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

In [98]:
from dataclasses import make_dataclass

In [99]:
def post_init(self, translate_x, translate_y):
    print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
    self.x += translate_x
    self.y += translate_y

In [100]:
CircleD2 = make_dataclass(
    'CircleD2',
    [
        ('x', int, 0),
        ('y', int, 0),
        ('radius', int, 0),
        ('translate_x', InitVar[int], field(default=0, kw_only=True)),
        ('translate_y', InitVar[int], field(default=0, kw_only=True))
    ],
    order=True,
    namespace = {
        "__post_init__": post_init
    }
)

In [101]:
c = CircleD2(1, 2, 3, translate_x=1, translate_y=2)

c

Translating center by: Δx=1, Δy=2


CircleD2(x=2, y=4, radius=3)

One thing to note here is how we can inject code into the dataclass - this can be done via the `namespace` argument, and basically allows us to provide a pre-populate namespace dictionary for our class (similar to the `dict` argument in the `type` function when using it to create new classes). I cover this in detail in my deep dive course series (Part 4) - all the Jupyter notebooks for that course are freely available [here](https://github.com/fbaptiste/python-deepdive) 

##### Custom Metadata

The next topic we'll look at is how we can add custom metadata to our fields in a dataclass.

Currently nothing in Python or the standard library (that I'm aware of) actually uses this metadata.

However, you may find a use for it, and certainly 3rd party libraries already do.

Let's say that for our person class we need to add information about each field to define a mapping from the field to a table and column in a database.

In [102]:
@dataclass(unsafe_hash=True)
class Person:
    name: str = field(compare=False)
    age: int = field(compare=False)
    ssn: str

We add metadata this way:

In [103]:
@dataclass(unsafe_hash=True)
class Person:
    name: str = field(compare=False, metadata={'table': 'person', 'column': 'name'})
    age: int = field(compare=False, metadata={'table': 'person', 'column': 'current_age'})
    ssn: str = field(metadata={'table': 'person', 'column': 'ssn'})

In [104]:
help(Person)

Help on class Person in module __main__:

class Person(builtins.object)
 |  Person(name: str, age: int, ssn: str) -> None
 |  
 |  Person(name: str, age: int, ssn: str)
 |  
 |  Methods defined here:
 |  
 |  __eq__(self, other)
 |      Return self==value.
 |  
 |  __hash__(self)
 |      Return hash(self).
 |  
 |  __init__(self, name: str, age: int, ssn: str) -> None
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __annotations__ = {'age': <class 'int'>, 'name': <class 'str'>, 'ssn':...
 |  
 |  __dataclass_fields__ = 

As you can see, not even `help()` uses this meta information - however it is present, and we could use it in any way we want:

In [105]:
fields(Person)

(Field(name='name',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=False,metadata=mappingproxy({'table': 'person', 'column': 'name'}),kw_only=False,_field_type=_FIELD),
 Field(name='age',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=False,metadata=mappingproxy({'table': 'person', 'column': 'current_age'}),kw_only=False,_field_type=_FIELD),
 Field(name='ssn',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'table': 'person', 'column': 'ssn'}),kw_only=False,_field_type=_FIELD))

In [106]:
(fields(Person)[0]).metadata

mappingproxy({'table': 'person', 'column': 'name'})

##### Init Factories

One important thing to talk about is how to initialize fields with mutable objects.

A common issue many beginners make in Python is when initializing something like a list for example, in the following way:

In [107]:
def squares(i, l = []):
    l.append((i, i ** 2))
    return l

Then this might be used like this:

In [108]:
numbers = squares(1)
numbers

[(1, 1)]

The problem is that if we call the function again this way:

In [109]:
others = squares(2)
others

[(1, 1), (2, 4)]

As you can see, we have a problem - we might have expected the second list `others` to only contain `(2, 4)`, but in fact it also included `(1, 1)` from the first function call.

Weird, right?

The issue is that when the function was compiled (not called, compiled, something that happens one time), the default value for `l` was created, and stored in the function objectr's state. Every time the function is called, this **same** default is referenced - hence the issue.

The proper way of doing this would be to create the empty default list in the **body** of the function - that way that default is reset to an empty list every time the function is called without supplying that list.

In [110]:
def squares(i, l=None):
    if l is None:
        l = []
    l.append((i, i ** 2))
    return l

Then the function will work as expected:

In [111]:
numbers = squares(1)
others = squares(2)

In [112]:
numbers, others

([(1, 1)], [(2, 4)])

In [113]:
numbers = squares(3, numbers)
numbers

[(1, 1), (3, 9)]

So, we may have similar issues if the initializer for a class needs to default a parameter to a mutable object (such as a list as in the previous example).

In [114]:
class Test:
    def __init__(self, tests=[]):
        self.tests = tests
        
    def add(self, i):
        self.tests.append((i, i ** 2))

In [115]:
t1 = Test()
t1.add(1)

In [116]:
t1.tests

[(1, 1)]

Now let's create a second instance of that class, and call the same `add` method:

In [117]:
t2 = Test()
t2.add(2)

In [118]:
t2.tests

[(1, 1), (2, 4)]

Weird, right? But the issue is essentially the same.

We can't really use a class variabole either, since that will also be shared across multiple instances.

In [119]:
class Test:
    tests = []
    
    def add(self, i):
        self.tests.append((i, i ** 2))

In [120]:
t1 = Test()
t1.add(1)
t1.tests

[(1, 1)]

In [121]:
t2 = Test()
t2.add(2)
t2.tests

[(1, 1), (2, 4)]

So, to correct this we can create that list **inside** the body of `__init__`.

In [122]:
class Test:
    def __init__(self, tests=None):
        if tests is None:
            self.tests = []
        else:
            self.tests = tests
        
    def add(self, i):
        self.tests.append((i, i ** 2))

In [123]:
t1 = Test()
t1.add(1)
t1.tests

[(1, 1)]

In [124]:
t2 = Test()
t2.add(2)
t2.tests

[(2, 4)]

Now, when we look at dataclasses, how do we do something similar for fields? Suppose we want to default a field to an empty list - how do we do that?

This way is not going to work:

In [125]:
try:
    @dataclass
    class Test:
        tests: list = []

        def add(self, i):
            self.tests.append((i, i ** 2))
except ValueError as ex:
    print(f"ValueError: {ex}")

ValueError: mutable default <class 'list'> for field tests is not allowed: use default_factory


In fact, dataclasses do not even allow us to initialize fields with mutable objects (it approximates the mutability of an object with the hashibility of that object - not perfect, but close enough for most cases - as developers we just have to be aware of this and be careful).

To do this, dataclasses provides us with the ability to provide a function that will be called during the initialization phase to generate a default for a field, very similar to doing something like this:

In [126]:
class Test:
    def __init__(self, tests_factory):
        self.tests = tests_factory()
        
    def add(self, i):
        self.tests.append((i, i ** 2))

In [127]:
def factory_func():
    return []

t1 = Test(factory_func)
t1.add(1)
t1.tests

[(1, 1)]

In [128]:
t2 = Test(factory_func)
t2.add(2)
t2.tests

[(2, 4)]

Now, creating that factory function was not needed, since `list()` will do the same thing, so we can pass it directly instead:

In [129]:
t1 = Test(list)
t1.add(1)
t1.tests

[(1, 1)]

In [130]:
t2 = Test(list)
t2.add(2)
t2.tests

[(2, 4)]

And that's how dataclasses implement this as well:

In [131]:
@dataclass
class Test:
    tests: list = field(default_factory=factory_func)

    def add(self, i):
        self.tests.append((i, i ** 2))


In [132]:
t1 = Test()
t1.add(1)
t1.tests

[(1, 1)]

In [133]:
t2 = Test()
t2.add(2)
t2.tests

[(2, 4)]

And just like before, we can just use `list()` as our factory function:

In [134]:
@dataclass
class Test:
    tests: list = field(default_factory=list)

    def add(self, i):
        self.tests.append((i, i ** 2))

In [135]:
t1 = Test()
t2 = Test()

t1.tests is t2.tests

False

As we can see, `t1` and `t2` have different default empty lists.

#### Conclusion

In conclusion, you can see how dataclasses can save us a lot of typing, but dataclasses have limitations. 

They are not replacements for Python classes, the `@dataclass` decorator is a **code generator** that essentially generates the code required to add functionality to a class, just as if we had typed it out ourselves.

This means that not every possible eventuality can be covered by dataclasses (as we saw with the sort order and equality issues).

When this happens, don't start "fighting" the code generator and write all kinds of weird workarounds - just switch to writing standard classes - and this is why it is important for you to understand how to build classes by hand before you start using dataclasses - when you run into these limitations you will be prepared to write custom classes the long way, with all the flexibility that benefits that approach.

Also, don't forget about the `attrs` library - it is a superset of dataclasses, and may very well have the extra functionality you need and not require you to handwrite lots of boilerplate code - just like dataclasses do.

Personally, I use dataclasses for simple and straightforward classes - you can see the reasons for dataclasses and the rationale for them in the associated [PEP 557](https://peps.python.org/pep-0557/).

