https://itnext.io/effortlessly-create-classes-in-python-with-dataclass-19412eada8be

## Effortlessly Create Classes in Python with @dataclass

In Python, the @dataclass decorator simplifies the process of creating classes by automatically adding common methods and promoting best practices to write well-written and safe code. It is available from Python 3.7. In this article, I will tell you all about this decorator.

Let’s start with a simple example:

In [None]:
from dataclasses import dataclass

@dataclass
class MyDataClass:
    a: str # will become arg in __init__
    b: int = 0 # will become kwarg in __init__
    
dataobj = MyDataClass("hello")
print(dataobj.a, dataobj.b)
# => hello 0

# error, a not specified
MyDataClass()
# => TypeError: __init__() missing 1 required positional argument: 'a'

By defining two static variables with type hints and adding the @dataclass decorator, we obtain a class with an `__init__` method corresponding to the following:

In [None]:
def __init__(self, a: str, b: int = 0):
    self.a = a
    self.b = b

While skipping the` __init__`-method is nice, the dataclass offers a lot more than only that. To understand what, let’s have a look at the options available for the decorator:

In [None]:
@dataclass(
    *, 
    init=True, 
    repr=True, 
    eq=True, 
    order=False, 
    unsafe_hash=False, 
    frozen=False,
    ...
)

I’ve omitted some of the arguments and kept those that I think are most important to discuss. Note that using:

In [None]:
@dataclass
class MyDataClass:
    pass
    
@dataclass()
class MyDataClass:
    pass

will equivalently result in the default arguments being used.

As can be seen, depending on the arguments, methods will be added implicitly to the class. By default:

 - `init=True` will add `__init__` as we saw above
 - `repr=True` will add `__repr__`, i.e. what is shown when printing
 - `eq=True` will add `__eq__` based on fields
 - `order=True` will add `__lt__()`, `__le__()`, `__gt__()`, and `__ge__()`
Additionally, `frozen` and `unsafe_hash` are two important parameters that make the developer more aware of their actions and enforce constraints.

Note, that you can of course also define other custom class methods as you normally would in a class. Below I define a simple class without the @dataclass decorator for comparison.

In [None]:
class MyNormalClass:
    def __init__(self, a: str, b: int = 0):
        self.a = a
        self.b = b
        
obj = MyNormalClass("hello")

## Printing
With `repr=True`, which is set by default, the output when printing is changed. Let’s compare what the object made with the data class and the object made with the normal class (without `__repr__`) are showing when printed:

In [None]:
print(obj)
# => <__main__.MyNormalClass object at 0x105aa1190>
print(dataobj)
# => MyDataClass(a='hello', b=0)

As can be seen with the normal class, the printed information tells us very little about the object by default. On the other hand, the dataclass shows the values of the object succinctly, having overridden the default `__repr__`-method.

## Equality
The default behavior of equality for custom classes can be confusing. If you compare a custom class without a defined `__eq__`-method, two objects will only be equal if they are the exact same object:

In [None]:
# exact same object
print(obj == obj)
# => True

# a new object with identical values
obj2 = MyNormalClass("hello")
print(obj == obj2)
# => False

Thus, the values of the attributes don’t matter. But by using the dataclass and eq=True (set by default) the more expected form of equality is added where the attributes are compared (i.e. `__eq__` is overridden):

In [None]:
# same arguments as before
dataobj2 = MyDataClass("hello")
print(dataobj == dataobj2)
# => True

## Order
By setting the order parameter to true (false by default) and eq to true (`other=True` and `eq=False` will cause an error to be thrown) the methods:

 - `__lt__()`
 - `__le__()`
 - `__gt__()`
 - `__ge__()`

Will be automatically added to the class. They work by using the order of the defined fields and using the same formula as tuples are compared, i.e. comparing each field in order:

In [None]:
from dataclasses import dataclass

@dataclass(order=True)
class MyDataClass:
    a: int
    b: int

dataobj1 = MyDataClass(a=1, b=1)
dataobj2 = MyDataClass(a=2, b=1)
dataobj3 = MyDataClass(a=1, b=2)
dataobj4 = MyDataClass(a=0, b=1)

assert dataobj2 > dataobj1 and dataobj1 < dataobj2
assert dataobj2 > dataobj3 and dataobj3 < dataobj2
assert dataobj1 > dataobj4 and dataobj4 < dataobj1
assert dataobj3 > dataobj1 and dataobj1 < dataobj3

array = [dataobj1, dataobj2, dataobj3, dataobj4]
print(sorted(array))
# => 
# [
#   MyDataClass(a=0, b=1),
#   MyDataClass(a=1, b=1),
#   MyDataClass(a=1, b=2),
#   MyDataClass(a=2, b=1)
# ]

## Immutability and hashing
Finally, let’s talk about the frozen and unsafe_hash arguments. These arguments are very useful and facilitate the correct management of hashing, equality and mutation. If you are not familiar with hashing in Python, I recommend reading the article: (Understanding Hashing and Equality in Python with `__hash__` and `__eq__`)[https://medium.com/gitconnected/understanding-hashing-and-equality-in-python-with-hash-and-eq-12f6da79e8ad].

Setting `frozen=True` states that the class should be *read-only or immutable*, i.e. after the initialization of an object, no fields can be changed:



In [None]:
@dataclass(frozen=True)
class MyDataClass:
    a: int
    b: int

dataobj = MyDataClass(a=1, b=1)
dataobj.a = 10
# => FrozenInstanceError: cannot assign to field 'a'

The two properties, frozen and eq will affect the implicit hashing method that is created:

- `frozen=True`, `eq=True` => a `__hash__`-method is automatically generated using the fields
- `frozen=True`, `eq=False` => default`__hash__`-method, i.e. not using the fields
- `frozen=False`, `eq=True` => Unhashable, error is thrown if hashed
- `frozen=False`, `eq=False` => default`__hash__`-method, i.e. not using the fields

In [None]:
@dataclass(frozen=True, eq=True)
class MyDataClass:
    a: int
    b: int
        
# hashing and equality based on fields
dataobj1 = MyDataClass(a=1, b=1)
dataobj2 = MyDataClass(a=1, b=1)
assert dataobj1 == dataobj2 and hash(dataobj1) == hash(dataobj2)

for frozen in [False, True]:
    @dataclass(frozen=frozen, eq=False)
    class MyDataClass:
        a: int
        b: int

    # default equality, default hashing
    dataobj1 = MyDataClass(a=1, b=1)
    dataobj2 = MyDataClass(a=1, b=1)
    assert dataobj1 != dataobj2 and hash(dataobj1) != hash(dataobj2)

@dataclass(frozen=False, eq=True)
class MyDataClass:
    a: int
    b: int

# error is thrown
dataobj = MyDataClass(1, 1)
hash(dataobj)
# => TypeError: unhashable type: 'MyDataClass'

The reason for this behavior is that `__hash__` is directly connected to `__eq__`. If two objects are equal, their hashes should also be equal. Thus, if the class defines the `__eq__` method, it can also generate the `__hash__`-method.

But why does the class have to be frozen? If you add an object to a set or dictionary, the generated hash is used to place it inside the hash map. Later, when you want to retrieve the object, you use the same hash to find it. Now, if the object has been changed, the hash should also change (since it is based on the fields). Consequently, you will not be able to find the object again. Thus, for the hash to remain constant the fields it’s based on must remain constant. For this reason, if `eq=True` and `frozen=False`, there is no hash method available.

What about `unsafe_hash`? As described, the predefined behavior of the implicit definition of `__hash__` is based on principles promoting best practices. However, if you still wish to have a `__hash__`-method defined anyway, perhaps due to special circumstances, you can set `unsafe_hash=True` and it will do so. But be aware that the *unsafe* keyword is there for a reason:

In [None]:
@dataclass(unsafe_hash=True, eq=True, frozen=False)
class MyDataClass:
    a: int
    b: int

dataobj1 = MyDataClass(1, 1)
dataobj2 = MyDataClass(1, 1)
assert dataobj1 == dataobj2 and hash(dataobj) == hash(dataobj2)

## Conclusion
The @dataclass in Python is a useful decorator that enables us to save time, follow best practices, and simplify the process of creating classes. I believe it’s suitable for a wide variety of situations.

Thanks for reading!