# Object-Oriented Programming

Let's pretend that there isn't any package for linear algebra such as [numpy.linalg](https://numpy.org/doc/stable/reference/routines.linalg.html), and we'd like to implement vector-like behavior for collections of numbers. Intuitively, a `list` seems to be useful for representing vectors:

In [None]:
v1 = [1, 2]
v2 = [-1, 2]

When multiplying a vector with a scalar (i.e., changing its magnitude), we expect each component of the vector to be multiplied by that scalar. What happens when we use the multiplication operator on a list?

In [None]:
v1 * 2

This is not useful: the multiplication operator `*` for a list creates `n` copies of the list `l` when used as `l * n`. A similar problem arises when using the addition operator `+`, which concatenates the two lists instead of doing pairwise addition of their elements.

In [None]:
v1 + v2

What we want is to be able to define custom multiplication or addition behavior that is intimately related to a vector and its components (its data). In Object-Oriented Programming, a _class_ is a construct for exactly this purpose.

## Classes = Data + Behavior

Classes are defined using the `class` keyword. When "calling" a class (using the parantheses, just like calling a function), a new _object_ is created with the type of that class.

In [None]:
class Foo:
    pass

In [None]:
f = Foo()
type(f)

As seen before, every object has _attributes_:

In [None]:
dir(f)

We can assign (new) attributes to objects using the `.` operator:

In [None]:
f.x = 42
f.x

It is also possible to assign _functions_ to a class. Note the subtle change in type when referring to the function itself (or as an attribute of the class) vs. when referring to it as an attribute of the _object_ `f`:

In [None]:
def bar(foo_obj):
    return foo_obj.x * 2

Foo.bar = bar
type(bar), type(Foo.bar), type(f.bar)

The "magic" that happens here is that when a function is an attribute of a class, it becomes a _(bound) method_ of objects of that class. When a method is bound to an object, Python passes the object itself as the first argument when calling the method:

In [None]:
bar(f)

In [None]:
f.bar()  # No arguments are passed to bar! 

It is not common to manually add functions as attributes to a class as in the example above. Instead, we usually define functions in the class body. It is a convention to use `self` as the name of the first argument that captures the object to which the method is bound.

In [None]:
class Point2D:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def distance_to_origin(self):
        return (self.x ** 2 + self.y ** 2) ** .5

That strangely-looking `__init__()` function is one of the mostly used ["magic methods"](https://rszalski.github.io/magicmethods/) (also called "dunder methods" for their "__d__ouble __under__score" naming convention). Whenever we construct a new `Point2D` object, Python will look for the object's `__init__()` method and pass through the arguments provided to the constructor:

In [None]:
p1 = Point2D(3, 4)

In [None]:
p1.x

In [None]:
p1.distance_to_origin()

Having several `Point2D` objects, what happens when we try to check if they're equal?

In [None]:
p2 = Point2D(3, 4)

In [None]:
p2 is p1

In [None]:
p2 == p1

__Question__: Does the result of value equality surprise you? Why would the two identical points be considered unequal? How should Python know what makes two objects of the same type equal by value?

Their (human-readable) representation is also not entirely clear:

In [None]:
p1

Just as `__init__()`, there are magic methods to define equality and representation of an object:

In [None]:
class Point2D(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def distance_to_origin(self):
        return (self.x ** 2 + self.y ** 2) ** .5
    
    def __eq__(self, other):
        if isinstance(other, Point2D):
            return self.x == other.x and self.y == other.y
        else:
            return False
        
    def __repr__(self):
        return f'{self.__class__.__name__}({self.x}, {self.y})'

In [None]:
p1 = Point2D(3, 4)
p2 = Point2D(3, 4)

The `__repr__()` method is automatically called when this notebook is trying to represent any of our `Point2D` objects:

In [None]:
p1

Whenever Python encounters a statement `obj1 == obj2`, it will look for an `__eq__()` method on `obj1` and return whatever the result is when passing it `obj2`. There are similar magic methods for comparison that are triggered when encountering operators such as `<`, `>=`, etc. These are called _rich comparison methods_, and are (just as `__eq__()` _not_ implemented by default!

In [None]:
p1 == p2

To prevent lots of repetitive code for initialization, comparison, and representation, modern Python (since version 3.7) has [_data classes_](https://docs.python.org/3/library/dataclasses.html). These provide sensible default implementations for object initialization, representation, and equality checks.

To make a data class, we just need to decorate a class with the `@dataclass` decorator (decorators will be explained later).

In [None]:
from dataclasses import dataclass

In [None]:
@dataclass
class Point2D:
    x: float
    y: float
    
    def distance_to_origin(self) -> float:
        return (self.x ** 2 + self.y ** 2) ** .5

In [None]:
p1 = Point2D(3, 4)
p2 = Point2D(3, 4)
p1.x

In [None]:
p1

In [None]:
p1 == p2

Data classes require that its member attributes (in our case `x` and `y`) have their type specified using _type annotations_.

## Type Hints

A full discussion of [type hints](https://docs.python.org/3/library/typing.html) is beyond the scope of this tutorial. Since Python 3.5, it is possible to indicate the type of variables, function arguments and return values, and class members.

In [None]:
def add_one(x: int) -> int:
    return x + 1

In [None]:
add_one(42)

In [None]:
add_one('foo')

Providing type hints is optional in most cases, but required in case of data classes!

## Composition

Objects can be composed of (multiple) other objects, just as the built-in container types discussed before.

In [None]:
import math

A circle has a point in space which is its center:

In [None]:
@dataclass
class Circle:
    center: Point2D
    radius: float
    
    def circumference(self):
        return 2 * math.pi * radius

When we add a method to our `Point2D` class to compute the distance to another point, suddenly we can easily find out if a given point is within a circle:

In [None]:
@dataclass
class Point2D:
    x: float
    y: float
    
    def distance_to_origin(self) -> float:
        return (self.x ** 2 + self.y ** 2) ** .5
    
    def distance_from(self, other: Point2D) -> float:
        return ((self.x - other.x) ** 2 + (self.y - other.y) ** 2) ** .5

@dataclass
class Circle:
    center: Point2D
    radius: float
    
    def circumference(self):
        return 2 * math.pi * self.radius
    
    def __contains__(self, point: Point2D):
        return self.center.distance_from(point) <= self.radius

Using magic methods such as `__contains__()` allows us to write surprisingly elegant code:

In [None]:
c = Circle(center=Point2D(3, 4), radius=1)
assert Point2D(3.5, 4) in c
assert Point2D(5, 4) not in c

The `assert` keyword makes Python evaluate the expression following it, raise an `AssertionError` when it evaluates to `False`, or pass silently when evaluating to `True`:

In [None]:
assert 1 < 2

In [None]:
assert 1 > 2

## Composition of a Pandas DataFrame

In [None]:
import pandas as pd

In [None]:
transaction_df = pd.DataFrame({
    'amount': [42., 100., 999.],
    'from': ['bob', 'alice', 'bob'],
    'to': ['alice', 'bob', 'alice']
})
transaction_df

In [None]:
type(transaction_df)

In [None]:
transaction_df.columns

In [None]:
transaction_df.amount

In [None]:
transaction_df['amount']

In [None]:
type(transaction_df['amount'])

In [None]:
transaction_df['amount'][0], type(transaction_df['amount'][0])

In [None]:
dir(transaction_df)

## Index-based (or Label-based) Selection and Assignment

In [None]:
transaction_df = pd.DataFrame({
    'amount': [42., 100., 999.],
    'from': ['bob', 'alice', 'bob'],
    'to': ['alice', 'bob', 'alice']
})
transaction_df

In [None]:
transaction_df.index

In [None]:
transaction_df.loc[1]

In [None]:
transaction_df.loc[[0, 2]]

In [None]:
transaction_df = pd.DataFrame({
    'amount': [42., 100., 999.],
    'from': ['bob', 'alice', 'bob'],
    'to': ['alice', 'bob', 'alice']
}, index=[2, 4, 6])
transaction_df

In [None]:
transaction_df.loc[1]

In [None]:
transaction_messages = pd.Series(['foo', 'bar', 'baz'])
transaction_messages

In [None]:
transaction_df.assign(message=transaction_messages)

In [None]:
transaction_messages = pd.Series(['foo', 'bar', 'baz'], index=[2, 4, 6])
transaction_df.assign(message=transaction_messages)

In [None]:
transaction_df.reset_index()

In [None]:
transaction_df = pd.DataFrame({
    'amount': [42., 100., 999.],
    'from': ['bob', 'alice', 'bob'],
    'to': ['alice', 'bob', 'alice'],
    'tx_id': [101, 201, 301]
})
transaction_df

In [None]:
transaction_df.set_index('tx_id')

In [None]:
import numpy as np

In [None]:
df_size = 100_000

foo_df = pd.DataFrame({
    'a': np.arange(df_size),
    'b': np.random.permutation(df_size)
})
foo_df

Why indexing?

In [None]:
%%timeit
for n in np.random.choice(df_size, size=10):
    foo_df.loc[lambda df: df['b'] == n]

In [None]:
%timeit foo_df.loc[lambda df: df['b'] == 42]

In [None]:
idx_foo_df = foo_df.set_index('b')

In [None]:
%timeit big_foo_df = foo_df.set_index('b')

In [None]:
%%timeit
for n in np.random.choice(df_size, size=10):
    idx_big_foo_df.loc[n]

In [None]:
%timeit idx_foo_df.loc[42]

## Inheritance

In [None]:
@dataclass
class Square:
    center: Point2D
    side_length: float
    
    def circumference(self):
        return 4 * self.side_length
    
    def __contains__(self, point: Point2D):
        return (
            point.x <= self.center.x + self.side_length / 2 and
            point.x >= self.center.x - self.side_length / 2 and
            point.y <= self.center.y + self.side_length / 2 and
            point.x >= self.center.y - self.side_length / 2
        )

In [None]:
from abc import ABC, abstractmethod

In [None]:
@dataclass
class Shape2D(ABC):
    center: Point2D
    
    @abstractmethod
    def circumference(self):
        pass
    
    @abstractmethod
    def __contains__(self, point: Point2D):
        pass
    
    def distance_to_origin(self):
        return self.center.distance_to_origin()

In [None]:
s = Shape2D()

In [None]:
@dataclass
class Circle(Shape2D):
    radius: float
    
    def circumference(self):
        return 2 * math.pi * self.radius

c = Circle(Point2D(3, 4), 1)

In [None]:
@dataclass
class Circle(Shape2D):
    radius: float
    
    def circumference(self):
        return 2 * math.pi * self.radius
    
    def __contains__(self, point: Point2D):
        return self.center.distance_from(point) <= self.radius

@dataclass
class Square(Shape2D):
    side_length: float
    
    def circumference(self):
        return 4 * self.side_length
    
    def __contains__(self, point: Point2D):
        return (
            point.x <= self.center.x + self.side_length / 2 and
            point.x >= self.center.x - self.side_length / 2 and
            point.y <= self.center.y + self.side_length / 2 and
            point.x >= self.center.y - self.side_length / 2
        )

Not _required_ to have abc, just convenient and clear

In [None]:
from typing import List

def total_size(shapes: List[Shape2D]) -> float:
    return sum(s.circumference() for s in shapes)

In [None]:
total_size([
    Circle(Point2D(3, 4), 1),
    Square(Point2D(0, 0), 2)
])

sklearn estimators?

## Iterating

In [None]:
@dataclass
class IntRange:
    upper_bound: int
    
    def __iter__(self):
        self.i = 0
        return self
    
    def __next__(self):
        if self.i >= self.upper_bound:
            raise StopIteration
            
        current_value = self.i
        self.i += 1
        return current_value

In [None]:
r = IntRange(3)
r_iter = r.__iter__()

In [None]:
r_iter.__next__()

In [None]:
l_iter = iter([1, 2, 3])

In [None]:
r_iter = iter(IntRange(3))
while True:
    try:
        print(next(r_iter))
    except StopIteration:
        print('Finished iterating!')
        break

In [None]:
for i in IntRange(3):
    print(i * 2)

In [None]:
[i ** 2 for i in IntRange(3)]

Ref to `range()`!

In [None]:
from typing import List

In [None]:
Square(Point2D(0, 0), 2).__class__.__name__

In [None]:
@dataclass
class ShapeGrouper:
    shapes: List[Shape2D]
    
    def __iter__(self):
        self.shape_type_iter = iter(set([shape.__class__.__name__ for shape in self.shapes]))
        return self
    
    def __next__(self):
        shape_type = next(self.shape_type_iter)
        return shape_type, [shape for shape in self.shapes if shape.__class__.__name__ == shape_type]

__Bonus Exercise__: What happens if our list of shapes is long, having many unique shape types? Can we make our implementation more efficient by using a dictionary for quickly looking up all shapes of a given type? 

_Hint_: [`collections.defaultdict()`](https://docs.python.org/3/library/collections.html#collections.defaultdict) from the standard library may be particularly useful in this case.

In [None]:
# Your solution:

In [None]:
# %load solutions/shape_grouper_efficient.py

In [None]:
grouper = ShapeGrouper([
    Square(Point2D(1, 1), 1),
    Circle(Point2D(3, 4), 1),
    Square(Point2D(0, 0), 2)
])

In [None]:
for shape_type, shape_list in grouper:
    print(f'{shape_type}: {shape_list}')

In [None]:
@dataclass
class ShapeGrouper:
    shapes: List[Shape2D]
    
    def __iter__(self):
        self.shape_type_iter = iter(set([shape.__class__.__name__ for shape in self.shapes]))
        return self
    
    def __next__(self):
        shape_type = next(self.shape_type_iter)
        return shape_type, [shape for shape in self.shapes if shape.__class__.__name__ == shape_type]
    
    def total_size(self):
        return [
            (shape_type, sum([shape.circumference() for shape in shapes]))
            for shape_type, shapes in self
        ]

In [None]:
grouper = ShapeGrouper([
    Square(Point2D(1, 1), 1),
    Circle(Point2D(3, 4), 1),
    Square(Point2D(0, 0), 2)
])

In [None]:
grouper.total_size()

In [None]:
transaction_df

In [None]:
type(transaction_df.groupby('to'))

In [None]:
for receiver, receiver_transactions in transaction_df.groupby('to'):
    print(f'{receiver} got a total amount of {receiver_transactions["amount"].sum()}')

In [None]:
transaction_df.groupby('to').sum()

## Operator (or Method) Chaining

__Exercise__: Create a class `Vector` that implements the addition and multiplication behavior as given at the top of this module. Use the assertions below to verify the correctness of your solution.

_Hints_: 

1. There are [some magic methods](https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types) to provide an implementation of numeric operators on custom classes.
2. For the addition of vectors, [`zip()`](https://docs.python.org/3/library/functions.html#zip) can be a useful built-in function.

In [None]:
# Your solution:

In [None]:
# %load solutions/vector_basic.py

In [None]:
v1 = Vector([1., 2.])
v2 = Vector([2., 4.])
v3 = Vector([3.5, 4.5])
v4 = Vector([1, 2, 3])

assert v1 + v1 == v2
assert v1 * 2 == v2
assert v1 * 2 + v3 == Vector([5.5, 8.5])
assert v1 + v3 * 2 == Vector([8, 11])
assert v4 + v4 == Vector([2, 4, 6])

_Reflection_: Why is it convenient that our addition and multiplication methods return (new) `Vector` objects?

__Bonus Exercise__: Add methods to the `Vector` class such that it (1) also implements a lookup by dimension just as we index a list using `[]`, and (2) it can return its number of dimensions using the builtin function `len()`.

_Hint_: There are [some magic methods](https://docs.python.org/3/reference/datamodel.html#emulating-container-types) for implementing container-like behavior for custom classes.

In [None]:
# Your solution:

In [None]:
# %load solutions/vector_as_container.py

In [None]:
v5 = Vector([42, 99])
assert v5[0] == 42
assert len(v5) == 2
assert (v5 + Vector([1, 1]) * 2)[1] == 101

Reflection: `.loc[]`, pandas method chaining

In [None]:
transaction_df.loc[transaction_df['to'] == 'alice']

In [None]:
type(transaction_df.loc[transaction_df['to'] == 'alice'])

In [None]:
transaction_df['to'] == 'alice'

In [None]:
transaction_df.loc[[0, 2]]

In [None]:
(
    transaction_df
    .loc[lambda df: df['to'] == 'alice']
    .assign(amount=lambda df: df['amount'] * 2)
)