# Object Oriented Programming and object creation

:hourglass: 3h

**Outline**:
1. OOP
2. Data classes
3. Best practices
4. Object creation
5. Closing words

## 1. OOP



### The OOP paradigm

:hourglass: 20 min

> A (programming) **paradigmn** is a way to think about, approach and solve a problem. It defines the (conceptual) primitives in which to think in order to create the solution.

There are several broad families of paradigms:
- Imperative: dictates how the *state* evolves
    * Procedural: the primivites are procedures (~functions);
    * OOP: the primivites are objects exchanging messages;
- Declarative: expresses the relationship between primitives

The example below illustrates the difference between the procedural and OOP paradigm.

In [2]:
# Procedural

def create_car(position=0, speed=0):
    return {"position": position, "speed": speed}

def accelerate_car(car):
    car["speed"] += 1

def decelerate_car(car):
    car["speed"] -= 1

def move_car(car):
    car["position"] += car["speed"]

car = create_car(speed=1)
accelerate_car(car)
move_car(car)
print("Car position:", car["position"])

Car position: 2


In [5]:
# OOP

class Car:
    def __init__(self, position=0, speed=0) -> None:
        self._position = position
        self._speed = speed

    def accelerate(self):
        self._speed += 1

    def decelerate(self):
        self._speed -= 1

    def move(self):
        self._position += self._speed

    def print_position(self):
        print("Car position:", self._position)

car = Car(speed=1)
car.accelerate()
car.move()
car.print_position()


Car position: 2


At first glance there are not many differences between the two approaches. However, those are fundamental!

The car class **encapsulate** both the data (ie. `position` and `speed`) as well as the behavior (`accelerate`, `decelerate`, `move` and `print_position`). This has a few advantages:
- *maintainability*: the behavior of the `car` sits with its data: you need only to edit the code in one place;
- *abstraction*: the user does not need know (and cannot mess up with) the details: it only sends messages via the methods;
- *conceptualization*: the notion of objects makes it easy to think in business terms;
- *inheritance*: the way the code is written makes it easy to implement inheritance a structured way.


Let's see an example of inheritance:

In [10]:
# Inheritance

class LimitedCar(Car):
    def __init__(self, max_speed, position=0, speed=0) -> None:
        super().__init__(position, speed)
        self._max_speed = max_speed

    def accelerate(self):
        if self._speed < self._max_speed:
            return super().accelerate()
        else:
            print("Max speed reached")
    

car = LimitedCar(max_speed=4, speed=3)
car.accelerate()
car.accelerate()

Max speed reached


### Cognitive map of OOP

:hourglass: 30 min

What is the difference between the following pairs of concepts:
- class and object
- object and instance
- class attribute and attribute
- attribute and property
- interface/protocol and abstract class
- method and function
- class method and method
- public method/attribute and private method/attribute
- private method/attribute and protected method/attribute

### The SOLID principles  :skull: 

People generally think of the OOP paradigm along the lines of the SOLID acrynom:
- **S**ingle-responsibility principle
- **O**pen-closed principle
- **L**iskov substitution principle
- **I**nterface segregation principle
- **D**ependency inversion principle

The general idea is that a class should have a clear purpose, shared with its subclasses and the details should not matter for users.


See for more https://en.wikipedia.org/wiki/SOLID.

## 2.Data classes

:hourglass: 10 min break + 20 min

### Usage 
When writing code, some classes will naturally tend to have lots of methods and do big computations. Sometimes, however, you will just need a convenient way to store data, possibly with a couple of methods. For that data classes are great and efficient:

In [19]:
from dataclasses import dataclass  # import the dataclass decorator

import datetime as dt
from typing import Optional

@dataclass  # annotate the class as being a dataclass
class Person:
    first_name: str
    last_name: str
    birth_date: dt.date
    likes_python: bool = True

    def get_age(self, at_date: Optional[dt.date] = None) -> int:
        if at_date is None:
            at_date = dt.date.today()

        return at_date.year - self.birth_date.year


p = Person("Guido", "van Rossum", dt.date(1956, 1, 31))
print(f"Age: {p.get_age()} ({p})")

Age: 68 (Person(first_name='Guido', last_name='van Rossum', birth_date=datetime.date(1956, 1, 31), likes_python=True))
<bound method __create_fn__.<locals>.__init__ of Person(first_name='Guido', last_name='van Rossum', birth_date=datetime.date(1956, 1, 31), likes_python=True)>


The `dataclass` annotation will generate the `__init__` method (as well as other things). You only need to declare and type the attribute the *instance* will have within the *class* body. Note how the `likes_python` default value was passed to the instance.

There are a few ways to customize the dataclass:
- you can have complex (ie. more complex than a default value) initialization; see `dataclasses.Field` and the `__post_init__` method;
- you can customize whether the instances are mutable, comparable, representable and hashable; see the full documentation at https://docs.python.org/3.9/library/dataclasses.html


### Dataclass and NamedTuple

An alternative to dataclasses are named tuples, which can be used in essentially the same manner:

In [20]:
from typing import NamedTuple

import datetime as dt
from typing import Optional

class Person(NamedTuple):  # inherit from NamedTuple
    first_name: str
    last_name: str
    birth_date: dt.date
    likes_python: bool = True

    def get_age(self, at_date: Optional[dt.date] = None) -> int:  # Defining behavior on NamedTuple is discouraged
        if at_date is None:
            at_date = dt.date.today()

        return at_date.year - self.birth_date.year


p = Person("Guido", "van Rossum", dt.date(1956, 1, 31))
print(f"Age: {p.get_age()} ({p})")

Age: 68 (Person(first_name='Guido', last_name='van Rossum', birth_date=datetime.date(1956, 1, 31), likes_python=True))


The main differences between the two can be summarized as followed:

| Property     | Dataclass                                | NamedTuple |
|--------------|------------------------------------------|------------|
| Mutable      | Yes (but can be restricted)              | No         |
| Customizable | Yes (repr, hash, mutability, comparison) | No         |
| Unpackable   | No                                       | Yes        |

As a rule of thumb, 
- if you would have used a tuple but naming the fields make it easier to manipulate, go for a NamedTuple. For instance, when returning several values at the end of a functions, or when creating a DataFrame:
- if you have many fields and some logics, go for a Dataclass;
- if you deal with inheritance, go for a Dataclass;
- exercize judgement for the gray in-between.

> There is an alternative syntax which does not need to inherint from `NamedTuple`: https://docs.python.org/3.9/library/collections.html#collections.namedtuple (it was the original syntax, although I personnally feel it is a bid unweildy).

## 3. Best practices

:hourglass: 20 min

When writing classes, there are a few principles that are worth following:
- [ ] stick to Python conventions (eg. case, protected/private attributes, action/actor names);
- [ ] give clear and descriptive names (*); 
- [ ] make anything protected by default;
- [ ] provide an evaluable repr if possible;
- [ ] inheritance is a great power, blabla responsibility :spider: (use it wisely);
- [ ] consider returning self to chain calls;
- [ ] type (production) code: well-typed and explicit variable names will drastically cut down the what-the-f*ck factor.
- [ ] **never** use a mutable object as default value


> (*) Concise is best, long is better than fuzzy (tips: remember the single-responsibility principle). A good name prevents from writting three lines of doc.

Here is an example of typing and giving a good repr:

In [14]:
from __future__ import annotations

from typing import TypeVar

TCar = TypeVar("TCar", bound="Car")


class Car:
    def __init__(self, position: int = 0) -> None:
        self._position = position
        self._speed: int = 0

    def set_speed(self: TCar, speed: int) -> TCar:
        self._speed = speed
        return self

    def accelerate(self) -> None:
        self._speed += 1
    
    def decelerate(self) -> None:
        self._speed -= 1
    
    def move(self) -> None:
        self._position += self._speed

    
    def __repr__(self) -> str:
        r = f"{self.__class__.__qualname__}(position={self._position!r})"
        if self._speed != 0:
            r = f"{r}.set_speed({self._speed!r})"
        return r

    def __str__(self) -> str:
        return f"{self!r} @ __str__"

    
class LimitedCar(Car):
    def __init__(self, max_speed: int, position: int = 0) -> None:
        super().__init__(position)
        self._max_speed = max_speed

    def accelerate(self) -> None:
        if self._speed < self._max_speed:
            return super().accelerate()
        
    def __repr__(self) -> str:
        r = (
            f"{self.__class__.__qualname__}"
            f"("
            f"max_speed={self._max_speed!r}"
            f", "
            f"position={self._position!r}"
            f")"
        )
        if self._speed != 0:
            r = f"{r}.set_speed({self._speed!r})"
        return r
    

car = LimitedCar(10).set_speed(8)
car.accelerate()
print(repr(car))
print(car)

LimitedCar(max_speed=10, position=0).set_speed(9)
LimitedCar(max_speed=10, position=0).set_speed(9) @ __str__


> In Python >= 3.12, the typing module gives a `Self` type to explicitly state that the instance is returned (especially useful with sublcasses: avoid to create the bounded `TypeVar`)

## 4. Object creation

Design patterns are re-usable recipes to efficiently/elegantly solve recurring problems. One main issue of OOP is creating the right object, as is evident from the number of *creational design patterns*: abstract factory, builder, factory method, prototype, singleton, etc.

There are a couple of reasons why this is:
- the exact object needed is not known in advanced (eg. based on user input in a web interface);
- some part of the object specification are based on context (eg. how to handle NA and which quality check to perform is clearer when you know you are handling time series);
- some steps that belongs (conceptually) to the creation process but are taken care of in the constructor must be taken.

In any case, this section is about dealing with object creation.

### (class) factory method

:hourglass: 20 min

A factory method is just a method that return an new instance. Unless there is a specific reason to have it outside of the class, it usually comes in the form a `classmethod`

> `classmethod` have mostly two usecases: factory and being a placeholder for code that needs to be encapsulated with the class, but is not dependent on the instance. Arguably, the latter is also the realm of `staticmethod` (if there is no dependcy on the class attributes). `staticmethod` tends to be disregarded; see with your team how you want to approach those elements.

Here is an example of a factory method:

In [23]:
from __future__ import annotations

from typing import TypeVar

TCar = TypeVar("TCar", bound="Car")


class Car:
    @classmethod
    def create_moving_car(
        cls,
        position: int = 0,
        speed: int = 0,
        max_speed: Optional[int] = None,
    ) -> Car:
        if max_speed is None:
            __o = cls(position)
        else:
            __o = _LimitedCar(max_speed, position)
        return __o.set_speed(speed)
        
    def __init__(self, position: int = 0) -> None:
        self._position = position
        self._speed: int = 0

    def set_speed(self: TCar, speed: int) -> TCar:
        self._speed = speed
        return self

    def accelerate(self) -> None:
        self._speed += 1
    
    def decelerate(self) -> None:
        self._speed -= 1
    
    def move(self) -> None:
        self._position += self._speed

    def __repr__(self) -> str:
        r = f"{self.__class__.__qualname__}(position={self._position!r})"
        if self._speed != 0:
            r = f"{r}.set_speed({self._speed!r})"
        return r
    
    
class _LimitedCar(Car):
    def __init__(self, max_speed: int, position: int = 0) -> None:
        super().__init__(position)
        self._max_speed = max_speed

    def accelerate(self) -> None:
        if self._speed < self._max_speed:
            return super().accelerate()
        
    def __repr__(self) -> str:
        r = (
            f"{self.__class__.__qualname__}"
            f"("
            f"max_speed={self._max_speed!r}"
            f", "
            f"position={self._position!r}"
            f")"
        )
        if self._speed != 0:
            r = f"{r}.set_speed({self._speed!r})"
        return r
    

print(Car.create_moving_car(2, 5))
print(Car.create_moving_car(2, 5, max_speed=10))

Car(position=2).set_speed(5)
_LimitedCar(max_speed=10, position=2).set_speed(5)


**DOs**:
- use a clear name (usually an action verb) to indicate it is factory (eg. `create`, `cons`) and be as specific as you can in the name;
    * if you are creating an instance from another (type of) object, eg. a string you can name the factory `from_string`;
- use factories when 
    * the logic is small and not too flexible;
    * you want an evaluable `repr` but the way the user will create the object is not compatible;
    * you want to offer a small set of unflexible alternatives to create the object;
    * expose only one class but allow for subclasses.

**DON'Ts**:
- create a factory if a classical constructor would do;
- create a factory for flexibility and complex logic (prefer the builder).

> :skull: it is possible to get the type returned from the factory to be based on the actual class from which you use the factory, although mixing inheritance and factory is tricky.

### Builder

:hourglass: 30 min

The builder pattern is a very flexible approach to creating complex objects. It consists in having an object (the builder) which stores all the necessary information for instanciating the target class. Information are filled in one at a time, and when all is done, the class can be instantiated.

The builder pattern can be used in three ways:
1. Stand alone builder: a builder class is used as a way to create an object lazily.
2. A same builder is managed by several directors (responsible for subpart of the objects).
3. A single director is responsible for several builders (related to different objects) to create coherent whole.

Here is how a builder can look like in the context of a ETL pipeline:

In [3]:
from __future__ import annotations

import logging

import pandas as pd

class Extractor:
    def extract(self) -> pd.DataFrame:
        raise NotImplementedError()
    
class Transformer:
    def transform(self, data: pd.DataFrame) -> pd.DataFrame:
        raise NotImplementedError()

class Loader:
    def write(self, data: pd.DataFrame) -> None:
        raise NotImplementedError()


class ETL:
    def __init__(
        self,
        extractor: Extractor,
        transformer: Transformer,
        loader: Loader,
    ) -> None:
        self._extractor = extractor
        self._transformer = transformer
        self._loader = loader

    def run(self) -> None:
        logging.info(f"Reading data with {self._extractor!r}")
        data = self._extractor.extract()
        logging.info(f"Transforming data with {self._transformer!r}")
        data = self._transformer.transform(data)
        logging.info(f"Writing data with {self._transformer!r}")
        self._loader.write(data)


class FromDB(Extractor):
    def __init__(self, database: str, table: str) -> None:
        super().__init__()
        self._database = database
        self._table = table

    def __repr__(self) -> str:
        return (
            f"{self.__class__.__qualname__}"
            f"(database={self._database!r}, table={self._table!r})"
        )

    def extract(self) -> pd.DataFrame:
        logging.info(f"Reading from table '{self._table}' in database '{self._database}'")
        # Mock data
        return pd.DataFrame(
            data={
                "first_name": ["Bruce", "Clark", "Peter"],
                "last_name": ["Wayne", "Ken", "Parker"],
                "super_hero": ["Batman", "Superman", "Spiderman"],
            }
        )
    

class AddRevelation(Transformer):
    def __init__(self, revelation_col_name: str = "revelation") -> None:
        super().__init__()
        self._revelation_col_name = revelation_col_name

    def __repr__(self) -> str:
        return self.__class__.__qualname__
    
    def transform(self, data: pd.DataFrame) -> pd.DataFrame:
        data[self._revelation_col_name] = (
            data["first_name"] 
            + " "
            + data["last_name"]
            + " is "
            + data["super_hero"]
        )
        return data
    

class LoadInCSV(Loader):
    def __init__(self, path: str) -> None:
        super().__init__()
        self._path = path

    def __repr__(self) -> str:
        return f"{self.__class__.__qualname__}({self._path!r})"
    
    def write(self, data: pd.DataFrame) -> None:
        # data.to_csv(self._path, sep=";")
        print(data)


class DB2CSVETLBuilder:
    def __init__(self) -> None:
        self._database: Optional[str] = None
        self._table: Optional[str] = None
        self._transformer: Optional[Transformer] = None
        self._path: Optional[str] = None

    def set_database(self, database: str) -> DB2CSVETLBuilder:
        self._database = database
        return self
    

    def set_table(self, table: str) -> DB2CSVETLBuilder:
        self._table = table
        return self
    
    def set_transformer(self, transformer: str) -> DB2CSVETLBuilder:
        self._transformer = transformer
        return self
    
    def set_path(self, path: str) -> DB2CSVETLBuilder:
        self._path = path
        return self
    
    def build(self) -> ETL:
        return ETL(
            extractor=FromDB(self._database, self._table),
            transformer=self._transformer,
            loader=LoadInCSV(self._path),
        )

In [8]:
logging.basicConfig(level=logging.INFO, force=True)

etl = (
    DB2CSVETLBuilder()
    .set_database("my_db")
    .set_table("my_table")
    .set_transformer(AddRevelation())
    .set_path("my_result.csv")
    .build()

)

etl.run()

INFO:root:Reading data with FromDB(database='my_db', table='my_table')
INFO:root:Reading from table 'my_table' in database 'my_db'
INFO:root:Transforming data with AddRevelation
INFO:root:Writing data with AddRevelation


  first_name last_name super_hero                 revelation
0      Bruce     Wayne     Batman      Bruce Wayne is Batman
1      Clark       Ken   Superman      Clark Ken is Superman
2      Peter    Parker  Spiderman  Peter Parker is Spiderman


:exclamation: The builder pattern provides a flexible way to create complex objects but at the cost of a lot of code. Here are a few points to keep in mind when looking at this pattern:
- there are tools to generate builder code based on the target class;
- the pattern hints at complex objects (which the example is not), make sure this is not due to a violation of the single-purpose principle.


### Discussion

:hourglass: 15 min


There are other creational patterns: abstract factory, prototype and singleton. 

> Remember that there is no points won for using patterns: only use them when they are appropriate.

:microphone: Do you feel the patterns outlined would have been useful for what you have already developped? 

## 5. Closing words

:hourglass: 10 min

This module was about OOP and object creation. We discussed the core concepts behind OOP (eg. clear concept, encapsulation), some best practices, and dataclasses. We also illustrated a few creational design patterns and the basics of typing in Python.

**Dunderscore**:
- `__init__`
- `__repr__`
- `__str__`