# Data Classes

One new and exciting feature coming in Python 3.7 is the **data class** concept. 

A data class is a class typically containing mainly (but not only) data. 

## A first example

A data class is created using the new **`@dataclass`** decorator, as follows:


In [1]:
from dataclasses import dataclass

@dataclass
class Point:
    x: int
    y: int


What makes the class `Point` a data class is the **`@dataclass`** decorator just above the class definition. 

Beneath the **`class Point:`** line, you simply list the fields you want in your data class. 

A **data class** is a regular Python class. The only thing that sets it apart is that the special methods `__init__()`, `__repr__()`, and `__eq__()` are implemented for you.

The `: type` notation used for the fields is using a new feature in Python 3.6 called **variable annotations**. 

It is important to note that type annotations do not affect the program's runtime in any way. These hints are ignored by the interpreter and are solely used to increase the readability for other programmers and yourself.

The module **`typing`** provides several type annotations.

Even if type are not enforced at runtime, adding some kind of type hint is **mandatory** when defining the fields in your data class. Without a type hint, the field will not be a part of the data class. 

However, if you do not want to add explicit types to your data class, use **`typing.Any`**.

## Data classes are classes

A data class comes with basic functionality already implemented. 

For instance, you can instantiate, print, and compare data class instances straight out of the box:

In [2]:
p1=Point(2,5)  # __init__ used here
p1.x+=1
print(p1)      # __repr__ used here
p2=Point(3,5)
p1==p2         # __eq__ used here

Point(x=3, y=5)


True

In the above example, the code that is auto-generated thanks to the `@dataclass` decorator is equivalent to this one:

In [24]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return (f'{self.__class__.__name__}'
                f'(x={self.x!r}, y={self.y!r})')

    def __eq__(self, other):
        if other.__class__ is not self.__class__:
            return NotImplemented
        return (self.x, self.y) == (other.x, other.y)


In [12]:
p1=Point(2,5)
p1.x+=1
print(p1)
p2=Point(3,5)
p1==p2

Point(x=3, y=5)


True

## Alternatives to Data Classes

For simple data structures, you can use a **tuple**, a **named tuple** or a **dict** as an alternative to simple data classes:

**Note**: tuples and Named tuples are ***immutable***

In [20]:
p1_tuple = (2,2)
p1_dict = {'x': 2, 'y': 2}

from collections import namedtuple
Point = namedtuple('Point', 'x y')
pt1 = Point(1.0, 5.0)
pt2 = Point(2.5, 1.5)
print(pt2.x)
#pt2.x += 1 # forbidden !
print(pt1)
print(pt2)

from math import sqrt
line_length = sqrt((pt1.x-pt2.x)**2 + (pt1.y-pt2.y)**2)

2.5
Point(x=1.0, y=5.0)
Point(x=2.5, y=1.5)


## Another way to create a data class

You can also create data classes similarly to how named tuples are created using the **`make_dataclass`** function.

The following is (almost) equivalent to the definition of the data class `Point`defined above:

In [22]:
from dataclasses import make_dataclass

Point = make_dataclass('Point', ['x', 'y'])
p1=Point(2,5)
p1.x+=1
print(p1)
p2=Point(3,5)
p1==p2

Point(x=3, y=5)


True

## Default Values

It is possible to add default values to the fields of your data class.

This works exactly as if you had specified the default values in the definition of the `__init__()` method of a regular class.

In [1]:
from dataclasses import dataclass

@dataclass
class Point:
    x: int = 0
    y: int = 0
        
p1=Point()
p1

Point(x=0, y=0)

## Adding Methods

A data class being a regular class, you can freely add your own methods to a data class. 

In [2]:
from dataclasses import dataclass

@dataclass
class Point:
    x: int = 0
    y: int = 0
    def reset(self):
        self.x=self.y=0
p1=Point(2,3)
p1.reset()
p1

Point(x=0, y=0)

## Advanced Default Values

The **`field()`** specifier is used to customize each field of a data class individually. 

These are the parameters `field()` supports:

- **`default`**: Default value of the field

    `x: int = field(default=0) <=> x: int = 0`
    
    
- **`default_factory`**: Function that returns the initial value of the field
- **`init`**: Use field in \_\_init_\_() method? (Default is True.)
- **`repr`**: Use field in repr of the object? (Default is True.)
- **`compare`**: Include the field in comparisons? (Default is True.)
- **`hash`**: Include the field when calculating hash()? (Default is to use the same as for compare.)
- **`metadata`**: A mapping with information about the field

The `metadata` parameter is not used by the data classes themselves but is available to attach information to fields.  

In [3]:
from dataclasses import dataclass, field
from typing import List

def fourPoint():
    return [Point(0,0) for e in range(4)]

@dataclass
class Square:
    corners: List[Point]= field(default_factory=fourPoint, 
                                metadata="the fours cornes of the square")
    
p1=Square()
print(p1)
p2=Square()
print(p2)

Square(corners=[Point(x=0, y=0), Point(x=0, y=0), Point(x=0, y=0), Point(x=0, y=0)])
Square(corners=[Point(x=0, y=0), Point(x=0, y=0), Point(x=0, y=0), Point(x=0, y=0)])


The metadata (and other information about a field) can be retrieved using the **`fields()`** function

In [17]:
from dataclasses import fields
fields(Square)

(Field(name='corners',type=typing.List[__main__.Point],default=<dataclasses._MISSING_TYPE object at 0x00000130048C3EB0>,default_factory=<function fourPoint at 0x000001300499F1F0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy('the fours cornes of the square'),_field_type=_FIELD),)

## @dataclass parameters

You can give parameters to the **`@dataclass`** decorator. 

The following parameters are supported:

- **`init`**: Add `__init__()` method? (Default is `True`.)
- **`repr`**: Add `__repr__()` method? (Default is `True`.)
- **`eq`**: Add `__eq__()` method? (Default is `True`.)
- **`order`**: Add ordering methods? (Default is `False`.)
- **`unsafe_hash`**: Force the addition of a \_\_hash_\_() method? (Default is `False`.)
- **`frozen`**: If `True`, assigning to fields raise an exception. (Default is `False`.) This is a way to make a data class immutable.

In [18]:
from dataclasses import dataclass, field

@dataclass(order=True)
class Point:
    x: int = 0
    y: int = 0
    def reset(self):
        self.x=self.y=0
p1=Point(2,6)
p2=Point(5,0)
p1<p2

True

How are the two points compared though? 

Data classes compare objects as if they were tuples of their fields.

That does not really work for us. Instead, we need to define some kind of sort index that uses the addition of x and y.

For `Point` to use this sort index for comparisons, we need to add a field `sort_index` to the class. However, this field should be calculated from the other fields `x` and `y` automatically. 

This is exactly what the special method **`__post_init__()`** is for. It allows for special processing after the regular `__init__()` method is called:

In [21]:
from dataclasses import dataclass, field

@dataclass(order=True)
class Point:
    sort_index: int = field(init=False, repr=False)
    x: int = 0
    y: int = 0
    def reset(self):
        self.x=self.y=0
        self.sort_index = 0
    def __post_init__(self):
        self.sort_index = self.x + self.y

p1=Point(2,6)
p2=Point(6,2)
p1<p2

True

Note that `sort_index` is added as the first field of the class. That way, the comparison is first done using `sort_index` and only if there are ties are the other fields used. 

Using `field()`, you must also specify that `sort_index` should not be included as a parameter in the `__init__()` method (because it is calculated from the `x` and `y` fields). 

To avoid confusing the user about this implementation detail, it is probably also a good idea to remove `sort_index` from the repr of the class.

## Inheritance

A data class can inherit from another data class.

In [27]:
from dataclasses import dataclass, field

@dataclass
class Point:
    x: int = 0
    y: int = 0
    def reset(self):
        self.x=self.y=0
        
@dataclass
class PointColor(Point):
    color: str="red"

        
p1=PointColor(2,3, "green")
print(p1)
p1.reset()
p1

PointColor(x=2, y=3, color='green')


PointColor(x=0, y=0, color='green')