# Dataclasses

## Learning Goals

* What are dataclasses?
* How can dataclasses make class creation easier?
* What functions are provided automatically?

## Introduction

So far we have learned how we can use classes to make storing and manipulating data cleaner and safer. Since version 3.7, python provides a built-in decorator called **@dataclass** (from the dataclasses module) to make defining simple classes for data storage easier and cleaner by automatically adding methods to your class.

Upon creation, a dataclass automatically creates the following methods for you:
* `__init__()` --> the constructor
* `__repr__()` --> an automatic object representation with all attributes
* `__eq__()`   --> an object comparison

First we start by importing the dataclass module.

```python
    from dataclasses import dataclass
```
After that, we only need to add `@dataclass` above our class definition.


Please recall the previous definition of our `Student` class:

As you can see above, the `__init__()` method is quite writing intensive and needs us to type each variable name at least three times. Furthermore, when we try to print the object, we only see the class and the memory location but do not see any of the attributes. (As we would need to write a `__str__()` method)

In [None]:
# Code snippet from the previous notebook
class Student:
  occupation = 'Student'

  # Init method
  def __init__(self, full_name, student_id):
    self.full_name = full_name
    self.student_id = student_id

# create 2 student objects with the same attributes
student1   = Student('John Doe', 'abc123')
student2   = Student('John Doe', 'abc123')

print(student1)
# Are the two objects the same?
print(student2 == student1)

Now we will create a similar class called `StudentEasier` using a **dataclass**

In [None]:
from dataclasses import dataclass # import the dataclass decorator

@dataclass # Define the Student class with the dataclass decorator
class StudentEasier:
    full_name: str
    student_id: str
    occupation: str = 'Student' # dafault values work the same way

# create 2 student objects with the same attributes
student1   = StudentEasier('John Doe', 'abc123')
student2   = StudentEasier('John Doe', 'abc123')

print(student1)
# Are the two objects the same here?
print(student2 == student1)

We now see that the class definition got way shorter and that we already have an object representation that gives us all the attributes. Furthermore, the comparison of two objects with the exact same attributes yields *false* using the standard python classes, as the objects are in different memory locations.  

>When using **dataclass** we assume that _objects are equal if their data is_, therefore the comparison `print(student2 == student1)` yields `True` in the lower example.

Also note when defining a dataclass in Python, we specify the expected type of each variable by adding `: type` after the variable name. These are called **type hints**, and they indicate what kind of input is expected when creating an object. In dataclasses, **every variable must have a type hint**. If the type does not matter, you can use `: Any` to indicate that the variable can hold any type of value.  

Type hints serve as **documentation** to help users understand what type is expected for each variable. However, it's important to note that Python **does not enforce type hints at runtime**. This means that even if a variable is annotated with a specific type, Python will still allow values of other types to be assigned. For example, a variable annotated as `str` could still receive an integer or a list without causing an immediate error. Using type hints improves code readability and usability.  

We can of course still add any costum methods as before. Below we will add the `set_study_program()` method from the previous notebook to our new `StudentEasier()` class:

In [None]:
@dataclass
class StudentEasier:
  full_name: str
  student_id: str
  occupation: str = 'Student'
  course: str = 'Programming with Python'
  uni: str = 'East West University'
  study_program: str = 'Undeclared'

  # The set_study_program method from before
  def set_study_program(self, study_program: str):
    """Set the student's study program with validation."""
    VALID_PROGRAMS = ["Computer Science", "Mechanical Engineering", "Biology", 
                      "Psychology", "Business Administration", "Philosophy", "Art and Design"]

    if study_program not in VALID_PROGRAMS:
        raise ValueError(f"Invalid study program. Must be one of: {', '.join(VALID_PROGRAMS)}")

    self.study_program = study_program


student3 = StudentEasier('Emily Smith', 'cba321')
print(student3)
print(student3.course)
print(student3.study_program)
student3.set_study_program('Mechanical Engineering')
print(student3.study_program)

### Summary  
In this notebook, we explored how to use the **dataclass** module to efficiently create classes for storing data. By applying the **`@dataclass`** decorator, we eliminate the need to manually define the `__init__()` method while also getting a default object representation automatically. Additionally, dataclasses assume that two objects are considered equal if all their attributes have the same values, enabling easy onject comparison.