# CENG 111 Computer Programming I [Fall 2023 - 2024]
## Week 14 - Other types of classes

## Objectives of the fourteenth week's lecture.

- *Introduction to the Python Dataclass*
- *Default values*
- *Convert to a tuple or dictionary*
- *Create immutable objects*
- *Customize attribute behaviours*
- *Sort objects*
- *Less code to define a class*
- *Support for default values*
- *Custom representations of the objects*
- *Easy conversion to a tuple or a dictionary*
- *Frozen instances / immutable objects*
- *No need to write comparison methods*
- *Custom attribute behaviour with the field function*
- *The \_\_post\_init\_\_ hook*
- *Compare objects and sort them*

## Introduction to the Python Dataclass


Python introduced the dataclass in version 3.7 ([PEP 557](https://peps.python.org/pep-0557/)). The dataclass allows you to define classes with less code and more functionality out of the box.

The following defines a regular **Person** class with two instance attributes **name** and **age**:


In [1]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age


This Person class has the \_\_init\_\_ method that initializes the **name** and **age** attributes.

If you want to have a string representation of the **Person** object, you need to implement the \_\_str\_\_ or \_\_repr\_\_ method. Also, if you want to compare two instances of the Person class by an attribute, you need to implement the \_\_eq\_\_ method.

However, if you use the dataclass, you’ll have all of these features (and even more) without implementing these dunder methods.

To make the **Person** class a data class, you follow these steps:

First, import the **dataclass** decorator from the **dataclasses** module:


In [2]:
from dataclasses import dataclass

Second, decorate the **Person** class with the **dataclass** decorator and declare the attributes:


In [3]:
@dataclass
class Person:
    name: str
    age: int


In this example, the **Person** class has two attributes **name** with the type **str** and **age** with the type **int**. By doing this, the **@dataclass** decorator implicitly creates the \_\_init\_\_ method like this:


In [4]:
def __init__(name: str, age: int):
    pass


Note that the order of the attributes declared in the class will determine the orders of the parameters in the \_\_init\_\_ method.

And you can create the **Person‘s** object:


In [5]:
p1 = Person('John', 25)

When printing out the **Person‘s** object, you’ll get a readable format:


In [6]:
print(p1)


Person(name='John', age=25)


Also, if you compare two **Person‘s** objects with the same attribute value, it’ll return **True**. For example:


In [7]:
p1 = Person('John', 25)
p2 = Person('John', 25)
print(p1 == p2)


True


The following discusses other functions that a data class provides.


## Default values


When using a regular class, you can define default values for attributes. For example, the following **Person** class has the **iq** parameter with the default value of **100**.


In [8]:
class Person:
    def __init__(self, name, age, iq=100):
        self.name = name
        self.age = age
        self.iq = iq


To define a default value for an attribute in the dataclass, you assign it to the attribute like this:


In [9]:
from dataclasses import dataclass


@dataclass
class Person:
    name: str
    age: int
    iq: int = 100


print(Person('John Doe', 25))


Person(name='John Doe', age=25, iq=100)


Like the parameter rules, the attributes with the default values must appear after the ones without default values. Therefore, the following code will not work:


In [10]:
from dataclasses import dataclass


@dataclass
class Person:
    iq: int = 100
    name: str
    age: int


TypeError: non-default argument 'name' follows default argument

## Convert to a tuple or dictionary


The dataclasses module has the **astuple()** and **asdict()** functions that convert an instance of the dataclass to a tuple and a dictionary. For example:


In [11]:
from dataclasses import dataclass, astuple, asdict


@dataclass
class Person:
    name: str
    age: int
    iq: int = 100


p = Person('John Doe', 25)


In [12]:
print(astuple(p))
print(asdict(p))

('John Doe', 25, 100)
{'name': 'John Doe', 'age': 25, 'iq': 100}


## Create immutable objects


To create readonly objects from a dataclass, you can set the frozen argument of the dataclass decorator to **True**. For example:


In [13]:
from dataclasses import dataclass, astuple, asdict


@dataclass(frozen=True)
class Person:
    name: str
    age: int
    iq: int = 100


If you attempt to change the attributes of the object after it is created, you’ll get an error. For example:


In [14]:
p = Person('Jane Doe', 25)
p.iq = 120


FrozenInstanceError: cannot assign to field 'iq'

## Customize attribute behaviours


If don’t want to initialize an attribute in the \_\_init\_\_ method, you can use the field() function from the **dataclasses** module.

The following example defines the **can_vote** attribute that is initialized using the \_\_init\_\_ method:


In [15]:
from dataclasses import dataclass, field


class Person:
    name: str
    age: int
    iq: int = 100
    can_vote: bool = field(init=False)


The **field()** function has multiple interesting parameters such as **repr**, **hash**, **compare**, and **metadata**.

If you want to initialize an attribute that depends on the value of another attribute, you can use the \_\_post_init\_\_ method. As its name implies, Python calls the \_\_post_init\_\_ method after the \_\_init\_\_ method.

The following use the \_\_post_init\_\_ method to initialize the can_vote attribute based on the **age** attribute:


In [16]:
from dataclasses import dataclass, field


@dataclass
class Person:
    name: str
    age: int
    iq: int = 100
    can_vote: bool = field(init=False)

    def __post_init__(self):
        print('called __post_init__ method')
        self.can_vote = 18 <= self.age <= 70


p = Person('Jane Doe', 25)
print(p)


called __post_init__ method
Person(name='Jane Doe', age=25, iq=100, can_vote=True)


## Sort objects


By default, a dataclass implements the \_\_eq\_\_ method.

To allow different types of comparisons like \_\_lt\_\_, \_\_lte\_\_, \_\*\_gt\_\_, \_\_gte\_\_, you can set the order argument of the **@dataclass** decorator to **True**:


In [17]:
@dataclass(order=True)

SyntaxError: incomplete input (3498210946.py, line 1)

By doing this, the dataclass will sort the objects by every field until it finds a value that’s not equal.

In practice, you often want to compare objects by a particular attribute, not all attributes. To do that, you need to define a field called **sort_index** and set its value to the attribute that you want to sort.

For example, suppose you have a list of **Person‘s** objects and want to sort them by **age**:


In [18]:
members = [
    Person('John', 25),
    Person('Bob', 35),
    Person('Alice', 30)
]


called __post_init__ method
called __post_init__ method
called __post_init__ method


To do that, you need to:

- First, pass the **order=True** parameter to the **@dataclass** decorator.
- Second, define the **sort_index** attribute and set its **init** parameter to **False**.
- Third, set the **sort_index** to the **age** attribute in the \_\_post_init\_\_ method to sort the **Person‘s** object by **age**.

The following shows the code for sorting **Person‘s** objects by **age**:


In [19]:
from dataclasses import dataclass, field


@dataclass(order=True)
class Person:
    sort_index: int = field(init=False, repr=False)

    name: str
    age: int
    iq: int = 100
    can_vote: bool = field(init=False)

    def __post_init__(self):
        self.can_vote = 18 <= self.age <= 70
        self.sort_index = self.age


members = [
    Person(name='John', age=25),
    Person(name='Bob', age=35),
    Person(name='Alice', age=30)
]

sorted_members = sorted(members)
for member in sorted_members:
    print(f'Name: {member.name}, Age: {member.age}')


Name: John, Age: 25
Name: Alice, Age: 30
Name: Bob, Age: 35


## Less code to define a class


When we define a class to store some attributes, it usually goes something like this.


In [19]:
class Person():
    def __init__(self, first_name, last_name, age, job):
        self.first_name = first_name
        self.last_name = last_name
        self.age = age
        self.job = job


This is the standard Python syntax.

When you use dataclasses, you first have to import **dataclass** and then use it as a decorator before the class you define.

And here’s what the previous code looks like using dataclasses.


In [20]:
from dataclasses import dataclass


@dataclass
class Person:
    first_name: str
    last_name: str
    age: int
    job: str


A few things to notice about this syntax:

- There is less boilerplate code: we define each attribute once and we don’t repeat ourselves
- We use type annotation for each attribute. Although this doesn’t enforce type validation, it helps your text editor provide better linting if you use a type checker like mypy.
- Your code would still work if you don’t respect the types but your code editor will signal the inconsistencies.
- dataclasses doesn’t just allow you to write more compact code. The dataclass decorator is actually a code generator that automatically adds other methods under the hood. If we use the inspectmodule to check what methods have been added to the Person class, we can see the \_\_init\_\_ , \_\_eq\_\_ and \_\_repr\_\_ methods: these methods are responsible for setting the attribute values, testing for equality and representing objects in a nice string format

If we allowed the Personclass to support order, we’d have these methods as well.

- \_\_ge\_\_ : greater or equal
- \_\_gt\_\_ : greater than
- \_\_le\_\_ : lower or equal
- \_\_lt\_\_ : lower than


## Support for default values


You can add default values to each attribute while preserving the annotation.


In [21]:
from dataclasses import dataclass


@dataclass
class Person:
    first_name: str = "Guido"
    last_name: str = "van Rossum"
    age: int = 66
    job: str = "Benevolent Dictator for Life (BDFL)"


Keep in mind that fields without default values cannot appear after fields with default values. For example, the following code won’t work:


In [22]:
from dataclasses import dataclass


@dataclass
class Person:
    first_name: str = "Guido"
    last_name: str = "van Rossum"
    age: int = 66
    job: str = "Benevolent Dictator for Life (BDFL)"
    hobbies: str


TypeError: non-default argument 'hobbies' follows default argument

## Custom representations of the objects


Thanks to the \_\_repr\_\_ method already added by dataclasses, instances have a nice, human-readable representation when they are printed to the screen.

This makes it easier for debugging.


In [24]:
@dataclass
class Person:
    first_name: str = "Guido"
    last_name: str = "van Rossum"
    age: int = 66
    job: str = "Benevolent Dictator for Life (BDFL)"


guido = Person()
print(guido)


Person(first_name='Guido', last_name='van Rossum', age=66, job='Benevolent Dictator for Life (BDFL)')


This representation can be overridden to implement any custom message you want.


In [24]:
@dataclass
class Person:
    first_name: str = "Guido"
    last_name: str = "van Rossum"
    age: int = 66
    job: str = "Benevolent Dictator for Life (BDFL)"

    def __repr__(self):
        return f"{self.first_name} {self.last_name} ({self.age})"


guido = Person()
print(guido)


Guido van Rossum (66)


## Easy conversion to a tuple or a dictionary


Instances can easily be serialized into dicts or tuples. This is very useful when your code interacts with other programs that expect these formats.


In [26]:
from dataclasses import astuple, asdict

guido = Person()
print(guido)
print(asdict(guido))
print(astuple(guido))


Person(first_name='Guido', last_name='van Rossum', age=66, job='Benevolent Dictator for Life (BDFL)')
{'first_name': 'Guido', 'last_name': 'van Rossum', 'age': 66, 'job': 'Benevolent Dictator for Life (BDFL)'}
('Guido', 'van Rossum', 66, 'Benevolent Dictator for Life (BDFL)')


## Frozen instances / immutable objects


Using dataclasses, you can create objects that are read-only. All you have to do is set the **frozen** argument to **True** inside the **@dataclass** decorator.


In [27]:
@dataclass(frozen=True)
class Person:
    first_name: str = "Guido"
    last_name: str = "van Rossum"
    age: int = 66
    job: str = "Benevolent Dictator for Life (BDFL)"


When you do this, you prevent anyone from modifying the values of the attributes once the object is instantiated.

If you try to set a frozen object’s attribute to a new value, a **FrozenInstanceError** error will be raised.


## No need to write comparison methods


When you define a class using the standard Python syntax and test for the equality between two instances that have the same attribute values, here’s what you’d get:


In [28]:
class Person():
    def __init__(self, first_name, last_name, age, job):
        self.first_name = first_name
        self.last_name = last_name
        self.age = age
        self.job = job


first_person = Person("Guido", "van Rossum", 66, "Benevolent Dictator for Life (BDFL)")
second_person = Person("Guido", "van Rossum", 66, "Benevolent Dictator for Life (BDFL)")

print(first_person == second_person)


False


These two objects are not equal, which is normal because the **Person** class doesn’t actually implement a method for testing equality. To add equality, you’d have to implement the \_\_eq\_\_ method yourself. And this may look like this:

In [29]:
class Person():
    def __init__(self, first_name, last_name, age, job):
        self.first_name = first_name
        self.last_name = last_name
        self.age = age
        self.job = job

    def __eq__(self, other):
        if other.__class__ is not self.__class__:
            return NotImplemented
        return (self.first_name,
                self.last_name,
                self.age,
                self.job) == (other.first_name,
                              other.last_name,
                              other.age,
                              other.job)


This method first checks that the two objects are instances of the same class and then tests the equality between tuples of attributes.

Now if you decide to add new attributes to your class, you’d have to update the \_\_eq\_\_ method again. The same goes for \_\_ge\_\_ , \_\_gt\_\_ ,\_\_le\_\_ and \_\_lt\_\_ if they’re used.

This seems like unnecessary code typing, right? Fortunately, dataclasses removes this struggle.

In [30]:
@dataclass
class Person:
    first_name: str = "Guido"
    last_name: str = "van Rossum"
    age: int = 66
    job: str = "Benevolent Dictator for Life (BDFL)"

first_person = Person()
second_person = Person()

print(first_person == second_person)

True


## Custom attribute behaviour with the field function


In some situations, you may need to create an attribute that is only defined internally, not when the class is instantiated. This may be the case when the attribute has a value that depends on previously-set attributes.

Here’s where you’d use the **field** function from dataclasses.

By using this function and setting its **init** and **repr** arguments to **False** to create a new field called **full\_name**, we can still instantiate the **Person** class without setting the **full\_name** attribute.

In [32]:
from dataclasses import dataclass, field

@dataclass
class Person:
    first_name: str = "Guido"
    last_name: str = "van Rossum"
    age: int = 66
    job: str = "Benevolent Dictator for Life (BDFL)"
    full_name: str = field(init=False, repr=False)

This attribute doesn’t exist yet in the instance. If we try to access it, an **AttributeError** is thrown.

How can we set the value of full_name and still keep it out of the constructor of the class? To do this, we’ll have to use the \_\_post\_init\_\_ method.

## The \_\_post\_init\_\_ hook


dataclasses has a special method called \_\_post\_init\_\_.

As the name clearly suggests, this method is called right after the \_\_init\_\_ method is called.

Going back to the previous example, we can see how this method can be called to initialize an internal attribute that depends on previously set attributes.

In [33]:
@dataclass
class Person:
    first_name: str = "Guido"
    last_name: str = "van Rossum"
    age: int = 66
    job: str = "Benevolent Dictator for Life (BDFL)"
    full_name: str = field(init=False, repr=True)
    
    def __post_init__(self):
        self.full_name = self.first_name + " " + self.last_name

guido = Person()
print(guido)

Person(first_name='Guido', last_name='van Rossum', age=66, job='Benevolent Dictator for Life (BDFL)', full_name='Guido van Rossum')


In [34]:
guido.full_name

'Guido van Rossum'

Note that the repr argument inside the **field** function has been set to **True** to make it visible when the object is printed. We couldn’t set this argument to **True** in the previous example because the attribute **full\_name** has not been created yet.

## Compare objects and sort them


One useful feature to have when you deal with objects that contain data is the ability to compare them and sort them in any order you intend.

By default, dataclasses implements \_\_eq\_\_. 

To allow the other types of comparison (\_\_lt\_\_ (less than), \_\_le\_\_ (less or equal), \_\_gt\_\_ (greater than) and \_\_ge\_\_ (greater or equal)), we have to set the **order** argument to **True** in the **@dataclass** decorator.

In [33]:
@dataclasses(order=True)

SyntaxError: incomplete input (2644099477.py, line 1)

The way these comparison methods are implemented take every defined field and compare them in the order they are defined until there’s a value that’s not equal.

Let’s get back to the **Person** class. Say we want to compare the instances of this class based on the age attribute (which makes sense, right?).

To do this, we’ll have to add a field, which we’ll call **sort\_index** and set its value to the ageattribute’s value.

And the way we’d do this is by calling the \_\_post\_init\_\_ method we saw in the previous example.

In [35]:
from dataclasses import dataclass, field


@dataclass(order=True)
class Person:
    first_name: str = "Guido"
    last_name: str = "van Rossum"
    age: int = 66
    job: str = "Benevolent Dictator for Life (BDFL)"
    sort_index: int = field(init=False, repr=False)

    def __post_init__(self):
        self.sort_index = self.age


p1 = Person(age=30)
p2 = Person(age=20)

print(p1 > p2)


True


Now instances from the Person a class can be sorted with respect to the age attribute.