## Understanding Data Classes 

Please refer to "src/objectOriented/variableScope/variableScope.ipynb" first if you are unsure about the difference between global (class) variables  and local (instance) variables. Quick recap of global (class) and local (instance) variable::    

    i. class variables needs to be initialized in order to be accessed.
    ii. local variables with same name have precedence over global variables.

### 1 The unique case of uninitialized global variables

In [2]:
class Book:
    name: str
    pages: int
    edition: int = 2010

#### 1.1 Directly accessing global variables

In [3]:
Book.name, Book.pages, Book.edition

AttributeError: type object 'Book' has no attribute 'name'

**Observation:** from the above error it is witnessed that global variable can only be accessed by class if they are initialized.

#### 1.2 Indirectly accessing global variables

Now lets try to access global variables *(indirectly)* using class instance. We have two cases here:    
> i. Without initialization     
> ii. With initialization

Case I: Without initialization

In [4]:
bk = Book()
bk.name, bk.pages, bk.edition 

AttributeError: 'Book' object has no attribute 'name'

Case II: With initialization

In [5]:
bk = Book("animal farm", 200)
bk.name, bk.pages, bk.edition 

TypeError: Book() takes no arguments

**Observation:** from the above error it is witnessed that we can't access global variable even *(indirectly)* if they have not been initialized.

### 2. Dataclass decorator

In [7]:
from dataclasses import dataclass

@dataclass
class Book:
    name: str
    pages: int
    edition: int = 2010

#### 2.1 Directly accessing global variables

In [8]:
Book.name, Book.pages, Book.edition

AttributeError: type object 'Book' has no attribute 'name'

#### 2.2 Indirectly accessing global variables

Now lets try to access global variables *(indirectly)* using class instance. We have two cases here:    
> i. Without initialization     
> ii. With initialization

Case I: Without initialization

In [9]:
bk = Book()
bk.name, bk.pages, bk.edition 

TypeError: Book.__init__() missing 2 required positional arguments: 'name' and 'pages'

Case II: With initialization

In [10]:
bk = Book("animal farm", 200)
bk.name, bk.pages, bk.edition 

('animal farm', 200, 2010)

Note: In above cell we created a class instance that now results in local variables.

Note, the difference between the results of section 1.2 and section 2.2.      

**So what does decorating a class with dataclass do to produce the given results ?
Below is the equivalent of decorating a class with dataclass.**

In [11]:
from dataclasses import dataclass

@dataclass
class Book:
    name: str
    pages: int
    edition: int = 2010

In [12]:
class Book:   
    def __init__(self, name: str, pages: int, edition: int = 2010 ) -> None:
        self.name = name
        self.pages = pages
        self.edition = edition

**The above two cells are same in terms of logic and functionality. Hence, now we now know what the decorator dataclass does for us.    
The decorator automatically adds the relevent __init__ signature in our class without explicitly mention.**

#### 2.3 Parameterized dataclass decorator

In [13]:
from dataclasses import dataclass

@dataclass()
class Book:
    name: str
    pages: int
    edition: int = 2010

The default parameters for the above decorator are as mentioned below:  
-   init=True,   ------------------- __init__() method will be generated.
-   repr=True,   ----------------- __repr__() method will be generated
-   eq=True,     ------------------- __eq__() method will be generated.
-   order=False, 
-   unsafe_hash=False, 
-   frozen=False,
-   match_args=True, 
-   kw_only=False, 
-   slots=False, ---------------- __slots__ attribute will be generated (**exceptionally: a new class will be returned by the decorator**)
-   weakref_slot=False

In [14]:
from dataclasses import dataclass

@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False,
           match_args=True, kw_only=False, slots=False, weakref_slot=False)
class Book:
    name: str
    pages: int
    edition: int = 2010

We now know that decorator dataclasses creater dunder methods for us, but what if we too generate one ?

In [15]:
from dataclasses import dataclass

@dataclass
class Book:
    name: str
    pages: int
    edition: int = 2010
    
    def __init__(self, name, pages = 10):
        self.name = name
        self.pages = pages

In [16]:
bk = Book("animal farm")

In [17]:
bk.name, bk.pages, bk.edition

('animal farm', 10, 2010)

Note: if we have a custom __ init __ method, then its parameters (instance variables) will supercede the class variables.

In [18]:
Book.edition

2010

### 3. Dataclasses features

#### 3.1 field 

We know that global (class variables) require initialization (default value) in order to be considered class attributes. Neverthless,   
dataclass can generate a default value using a call to data-structure. Please view the example below:

In [19]:
import dataclasses
from dataclasses import dataclass

@dataclass
class Library:  
    yearsOld: int
    collection: list[int] = dataclasses.field(default_factory=list)

In [20]:
lib = Library(10,)
lib

Library(yearsOld=10, collection=[])

Note: Now we can create an instance of dataclass without initializing the "collection" parameter. Furthermore, we can define a "field" as keyword-only, etc.

However, **field** is important if we are using mutable datastructure that could be accessed by multiple class instances. To further understand the significance of databasees.field for default values please visit "Mutable default values" section on:   
https://docs.python.org/3/library/dataclasses.html

#### 3.2 asdict

**Note**: This feature is one of a major reason to use dataclass. It returns all attibutes and their values as dictionary datastructure which is difficult  
to do using standard class. The returned dictionary can be understood with reference to pyTrees of Jax.

In [21]:
from dataclasses import asdict

@dataclass
class Book:
     name: str
     pages: int

In [22]:
bk = Book("animal farm", 200)
bk

Book(name='animal farm', pages=200)

In [23]:
asdict(bk)

{'name': 'animal farm', 'pages': 200}

In [25]:
assert(asdict(bk)) == {'name': 'animal farm', 'pages': 200}

*In addition, asdict, functionality is not limited to current class, rather it can provide attribute-value pair of all nested dataclass-es instances, just like Jax pyTree !*

In [38]:
@dataclass
class Collection:
    books: list[Book] 
    idNr: int = 10

In [39]:
bk1 = Book("animal farm", 200)
bk2 = Book("cime and punishment", 1000)

In [40]:
myCollection = Collection([bk1, bk2])
myCollection

Collection(books=[Book(name='animal farm', pages=200), Book(name='cime and punishment', pages=1000)], idNr=10)

In [41]:
asdict(myCollection)

{'books': [{'name': 'animal farm', 'pages': 200},
  {'name': 'cime and punishment', 'pages': 1000}],
 'idNr': 10}

*See how nicely dataclasses "asdict" easily converts all nested dataclass instances attribute-value pairs into a nested dictionary as key-value pairs.*

#### 3.3 astuple

This function behave similar to asdict, except that now it returns only the values and not their keys/attributes. Hence, we will use the objects from section 3.2 for demonstration below:

In [42]:
from dataclasses import astuple

In [43]:
bk

Book(name='animal farm', pages=200)

In [44]:
astuple(bk)

('animal farm', 200)

In [45]:
myCollection

Collection(books=[Book(name='animal farm', pages=200), Book(name='cime and punishment', pages=1000)], idNr=10)

In [46]:
astuple(myCollection)

([('animal farm', 200), ('cime and punishment', 1000)], 10)

#### 3.4 ... etc.

addition features of dataclasses worth reading could be:
-   dataclassses.replace()
-   dataclasses.is_dataclass()
-   **dataclasses.KW_ONLY**
-   typing.ClassVar
-   dataclasses.InitVar

#### 3.4 __ post_init __

This dunder method is called by the auto generated __ init __() method by the dataclass decorator. Its use cases are as below:    
> i. Derive new variables/attributes from basic ones.   
> ii. Call __ init __() of parent class (if parent class in not a dataclass).

##### 3.4.1 Derive fields

In [60]:
from dataclasses import dataclass, field

@dataclass
class C:
    a: float
    b: float
    c: float = field(init=False)

    def __post_init__(self):
        self.c = self.a + self.b

color = C(2, 3)
color

In [65]:
from dataclasses import dataclass, field

@dataclass
class C:
    a: float
    b: float
    c: int = 10 

    def __post_init__(self):
        self.c = self.a + self.b

color = C(2, 3)
color

C(a=2, b=3, c=5)

**CAUTION**: We need to create a field as class variable even if it is to be derived/overwritten later.     
Below is the case if we don't create a class variable before __ post_init __ ():

In [64]:
from dataclasses import dataclass, field

@dataclass
class C:
    a: float
    b: float

    def __post_init__(self):
        self.c = self.a + self.b

color = C(2, 3)
color

C(a=2, b=3)

**Notice**: As you can see, though, we have got a and b as instance attributes but we couldn't create a variable c !

##### 3.4.2 Call parent __ init __ ()

In [68]:
class Rectangle:
    def __init__(self, height, width):
      self.height = height
      self.width = width

@dataclass
class Square(Rectangle):
    side: float

    def __post_init__(self):
        super().__init__(self.side, self.side)

The __ init __() method generated by @dataclass does not call base class __ init __() methods. If the base class has an __ init __() method that has to be called, it is common to call this method in a __ post_init __() method. Note, however, that in general the dataclass-generated __ init __() methods don’t need to be called, since the derived dataclass will take care of initializing all fields of any base class that is a dataclass itself.

In [69]:
class C:
    x = []
    def add(self, element):
        self.x.append(element)

o1 = C()
o2 = C()
o1.add(1)
o2.add(2)
assert o1.x == [1, 2]
assert o1.x is o2.x