# M1: Program Structure

## About this Course

* How this course is different from 121
* Goals of the course
    * Real-world experience (program structure, data structures, finding & evaluating tools)
    * Understand ins & outs of data (collection, cleaning, visualization)
### Syllabus Overview

## Lecture 1: Program Structure

In this module, we will explore the basics of software organization & structure in a Python program.
There is no singular way to structure a program, but we'll be discussing tools & conventions common in Python.


### Interfaces & Abstract Classes

When we talk about program design we start using the term "interface" a lot.
You've likely heard the term API, or application programming interface, even if you're not quite sure what it means.

When we talk about an interface, we're talking about the "exposed" means of interacting with a system.

When you get into a car, there's an interface for starting the engine, controlling the direction, speed, etc.
There are of course, other ways to start and stop a car -- but we generally prefer to use the pedals.

The same is true of our code. When we write functions and classes meant to be used by other developers, or other parts of our own code, we are creating interfaces.  We can think of interfaces as a contract between the developer and the user of the interface.

#### Example: Student Interface

Two developers, José and Sally, are working together on a project. 

José is responsible for defining the class definitions of the type of people. For example:


In [2]:
from datetime import date

In [3]:
class Student:
    def __init__(self, first_name, last_name, birth_date):
        self.first_name = first_name
        self.last_name = last_name
        self.birth_date = birth_date

    def age(self):
        today = date.today()
        # calculate age
        age = today.year - self.birth_date.year - ((today.month, today.day) < (self.birth_date.month, self.birth_date.day))
        return str(age)
    
students = [Student("Ada", "Lovelace", date(2000, 12, 10)), 
            Student("Charles", "Babbage", date(1991, 12, 26))]



Sarah's job is to define a function that displays the full names and ages of people. She starts with a function like:



In [4]:
def display_people(people):
    for person in people:
        print(f"{person.first_name} {person.last_name} is {person.age()} years old.")

display_people(students)

Ada Lovelace is 22 years old.
Charles Babbage is 31 years old.



José then reads [Falsehoods Programmers Believe About Names](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/) and decides that he wants to change the implementation to store the name as a tuple of first and last names.


In [5]:
class Student:
    def __init__(self, first_name, last_name, birth_date):
        # for now we'll leave the constructor the same and just combine the two
        self.names = (first_name, last_name)
        self.birth_date = birth_date

    def age(self):
        today = date.today()
        # calculate age
        age = today.year - self.birth_date.year - ((today.month, today.day) < (self.birth_date.month, self.birth_date.day))
        return str(age)

In [7]:
students = [Student("Ada", "Lovelace", date(2000, 12, 10)), 
            Student("Charles", "Babbage", date(1991, 12, 26))]
display_people(students)

AttributeError: 'Student' object has no attribute 'first_name'

Sarah's code breaks because she's accessing the `first_name` and `last_name` attributes directly. She could fix it by accessing the `names` attribute and then indexing into it, but that's not very readable, and it's not very future-proof.

Instead, she asks José to define an interface for accessing the name. He does so by defining a `full_name` method:


In [8]:

class Student:
    def __init__(self, first_name, last_name, birth_date):
        # for now we'll leave the constructor the same and just combine the two
        self.names = (first_name, last_name)
        self.birth_date = birth_date

    def full_name(self):
        return f"{self.names[0]} {self.names[1]}"
    
    def age(self):
        today = date.today()
        # calculate age
        age = today.year - self.birth_date.year - ((today.month, today.day) < (self.birth_date.month, self.birth_date.day))
        return str(age)


and updates her function to use it:


In [9]:
def display_people(people):
    for person in people:
        print(f"{person.full_name()} is {person.age()} years old.")

In [11]:
students = [Student("Ada", "Lovelace", date(2000, 12, 10)), 
            Student("Charles", "Babbage", date(1991, 12, 26))]
display_people(students)

Ada Lovelace is 22 years old.
Charles Babbage is 31 years old.




Now, if José decides to change the implementation of the name, Sarah's code won't break since they've agreed on an interface.

A new team member, Pat, is tasked with writing an `Employee` class.


In [12]:

class Employee:
    def __init__(self, first_name, last_name, age, employee_id):
        self.names = (first_name, last_name)
        self.age = age
        self.employee_id = employee_id

    def name(self):
        return f"{self.names[0]} {self.names[1]}"


In [15]:
# Sarah is asked to ensure display_people will work for Employee as well
employees = [Employee("Fred", "Flintstone", 44, 1), 
             Employee("George", "Jetson", 40, 7777)]
display_people(employees)

AttributeError: 'Employee' object has no attribute 'full_name'


Sarah's `display_people` function does not work with `Employee` objects because it's expecting a `full_name` method but `Employee` has a `name` method. Additionally, `age` is a property of `Employee` but a method on `Student`.

A naive solution to this might be to add more code to `display_people` to check what type it gets. Why is this not a good idea? 


This problem stems from the fact that the code the three are writing is already **tightly coupled**. This means that the code is dependent on the implementation details of other parts of the code. In this case, the `display_people` function is dependent on the `full_name` method and the `age` method.


To loosely couple the code, we need to define an interface that the `display_people` function can depend on, rather than the implementation details of the `Student` and `Employee` classes.

#### Abstract Classes

One solution to this problem is to define an interface using an abstract class, that defines the methods that must be implemented by any class that implements the interface.

In Python, we can use the `abc` module to define abstract classes.
A class that inherits from `ABC` is an abstract class, and any methods decorated with `@abstractmethod` must be implemented by any class that inherits from it.

For example:

In [17]:
from abc import ABC, abstractmethod

class Person(ABC):
    @abstractmethod
    def full_name(self):
        pass

    @abstractmethod
    def age(self):
        pass

`Person` is an abstract base class (ABC), and any class that inherits from it must implement the `full_name` and `age` methods.
Trying to instantiate `Person` directly, or any incomplete subclass, will raise an error.

To make a class that implements the `Person` interface, we can do:

In [18]:
class Student(Person):
    def __init__(self, first_name, last_name, birth_date):
        self.names = (first_name, last_name)
        self.birth_date = birth_date

    def full_name(self):
        return f"{self.names[0]} {self.names[1]}"

    def age(self):
        today = date.today()
        # calculate age
        age = today.year - self.birth_date.year - ((today.month, today.day) < (self.birth_date.month, self.birth_date.day))
        return str(age) 


class Employee(Person):
    def __init__(self, first_name, last_name, age, employee_id):
        self._names = (first_name, last_name)
        self._age = age
        self.employee_id = employee_id

    def full_name(self):
        return f"{self._names[0]} {self._names[1]}"

    def age(self):
        return str(self._age)

Sarah's implementation of `display_people` will work with any `Person` subclass, since they are guaranteed to have the required methods.

In [19]:
students = [Student("Ada", "Lovelace", date(2000, 12, 10)), 
            Student("Charles", "Babbage", date(1991, 12, 26))]
employees = [Employee("Fred", "Flintstone", 44, 1), 
             Employee("George", "Jetson", 40, 77777)]
people = students + employees
display_people(people)

Ada Lovelace is 22 years old.
Charles Babbage is 31 years old.
Fred Flintstone is 44 years old.
George Jetson is 40 years old.


#### Benefits of Interfaces

* Ease of maintenance & refactoring.
* Quickly add new classes that implement the interface without needing to reconsider design.
* Ease of testing.

**Note** It is also possible to provide default implementations in classes, which can be overridden by subclasses. For example:

In [22]:
from abc import ABC, abstractmethod


class Person(ABC):
    def __init__(self, first_name, last_name, birth_date):
        self.names = (first_name, last_name)
        self.birth_date = birth_date

    # these methods will be inherited by subclasses, but can be overridden
    def full_name(self):
        return f"{self.names[0]} {self.names[1]}"

    def age(self):
        today = date.today()
        # calculate age
        age = today.year - self.birth_date.year - ((today.month, today.day) < (self.birth_date.month, self.birth_date.day))
        return str(age) 

    @abstractmethod
    def include_in_payroll(self):
        pass


class Student(Person):
    def __init__(self, first_name, last_name, birth_date):
        # if you need to call a parent classes implementation, you can use super()
        super().__init__(first_name, last_name, birth_date)
        
    def include_in_payroll(self):
        return False


class Employee(Person):
    def __init__(self, first_name, last_name, birth_date, employee_id):
        super().__init__(first_name, last_name, birth_date)
        self.employee_id = employee_id
    
    def include_in_payroll(self):
        return True

### Python Data Model

The types you already know in Python implement special interfaces.

```python
# addition
>>> 1 + 2
3
>>> "hello" + "world"
"helloworld"
>>> [1, 2, 3] + [4, 5, 6]
[1, 2, 3, 4, 5, 6]
```

```python
# len()
>>> len("hello")
5
>>> len([1, 2, 3])
3
>>> len({"a": 1, "b": 2})
2
```

The `+` operator and the `len()` function are examples of **dunder methods**. These are methods that are defined in the Python data model, and are used by the interpreter to implement certain operations.

This allows us to define our own types that can be used in the same way as built-in types.  This is commonly known as **operator overloading**.

All classes implicitly inherit from `object`, which is the base class for all types in Python.  `object` defines a number of dunder methods, which are used by the interpreter to implement certain operations.

We've already seen:

* `__init__`
* `__str__`
* `__repr__`
* `__eq__`

There are many more, which you can find in the [Python documentation](https://docs.python.org/3/reference/datamodel.html).

By defining these methods, we can make our classes behave like built-in types.  Here's an example:

#### StaticArray

To demonstrate operator overloading, we'll implement a sequence type seen in other languages known as a *static array*:

- A static array is a sequence type (i.e., an object that can hold a collection of items) where there is a fixed capacity to number of items the collection can hold.

- Resizing of the array is not allowed after initialization. 

- We will define a class ``StaticArray`` that will allow use to use built-in python operators.

We'll be able to use it like this:

```python
>>> from static_array import StaticArray
>>> sa = StaticArray([1, 2, 3])
>>> print(sa * 2)
# should produce the following output:
# [1, 2, 3, 1, 2, 3]
>>> print(sa[1])
2
```



In [48]:
from collections.abc import Iterable

class StaticArray:
    def __init__(self, init_val, capacity = 5):
        if isinstance(init_val, Iterable):
            self.items = list(init_val)
            self.capacity = len(self.items)
        else:
            self.items = [init_val] * capacity
            self.capacity = capacity

In [49]:
sa = StaticArray([1, 2, 3])
# printing doesn't provide useful information, what is happening here?
print(sa)

<__main__.StaticArray object at 0x112e8e8f0>


In [50]:
# we can fix that by defining a __repr__ method

class StaticArray:
    def __init__(self, init_val, capacity = 5):
        if isinstance(init_val, Iterable):
            self.items = list(init_val)
            self.capacity = len(self.items)
        else:
            self.items = [init_val] * capacity
            self.capacity = capacity

    def __repr__(self):
        return f"StaticArray({self.items})"

In [51]:
sa = StaticArray([1, 2, 3])
print(sa)


StaticArray([1, 2, 3])


#### str() vs repr()

These are two functions that convert an object to a string.  The difference is that `str()` is intended to be readable, while `repr()` is intended to be unambiguous.

In practice, it is common to just define `__repr__` since `__str__` will default to `__repr__` if it is not defined.


In [5]:
print(str(sa))
print(repr(sa))

StaticArray([1, 2, 3])
StaticArray([1, 2, 3])


#### Emulating Collections & Sequences

**Collections**

* Have a length: `len(obj)`
* Can be iterated over: `for item in obj: ...`
* Can query membership: `item in obj`

**Sequences**

* Everything a collection can do
* Can be indexed: `obj[0]`

| You Write...   | Python calls...          |
| ---            | ---                      |
| ``len(obj)``   | ``obj.__len__()``        |
| ``x in obj``   | ``obj.__contains__(x)``  |
| ``obj[i]``     | ``obj.__getitem__(i)``   |
| ``obj[i] = x`` | ``obj.__setitem__(i,x)`` |
| ``del obj[i]`` | ``obj.__delitem__(i)``   |


In [52]:
class StaticArray:
    def __init__(self, init_val, capacity = 5):
        if isinstance(init_val, Iterable):
            self.items = list(init_val)
            self.capacity = len(self.items)
        else:
            self.items = [init_val] * capacity
            self.capacity = capacity

    def __repr__(self):
        return f"StaticArray({self.items})"

    def __str__(self):
        return f"StaticArray({self.items})"

    def __len__(self):
        return self.capacity

    def __contains__(self, item):
        return item in self.items

    def __getitem__(self, index):
        if index >= self.capacity or index < -self.capacity:
            raise IndexError("Index out of range")
        return self.items[index]

    def __setitem__(self, index, val):
        if index >= self.capacity or index < -self.capacity:
            raise IndexError("Index out of range")
        self.items[index] = val

    def __delitem__(self, index):
        raise NotImplementedError("StaticArray does not support deletion")
    

In [53]:
sa = StaticArray([1, "hi", 3.14, True])
len(sa)

4

In [54]:
42 in sa

False

In [55]:
"hi" in sa

True

In [56]:
sa[1]

'hi'

In [57]:
sa[43]

IndexError: Index out of range

We'll stop here for now, but there are many other operators you can override:

#### Emulating numeric operators 


| You Write...   | Python calls...          |
| ---            | ---                      |
| ``x + y``   | ``x.__add__(y)``        |
| ``x - y``   | ``x.__sub__(y)``  |
| ``x * y``     | ``x.__mul__(y)``   |
| ``x / y`` | ``x.__truediv__(y)`` |
| ``x // y`` | ``x.__floordiv__(y)``   |
| ``x ** y`` | ``x.__pow__(y)``   |
| ``x @ y`` | ``x.__matmul__(y)``   |


#### Reverse/Reflected/Right operators 


| You Write...   | Python calls...          |
| ---            | ---                      |
| ``x + y``   | ``y.__radd__(x)``        |
| ``x - y``   | ``y.__rsub__(x)``  |
| ``x * y``     | ``y.__rmul__(x)``   |
| ``x / y`` | ``y.__rtruediv__(x)`` |
| ``x // y`` | ``y.__rfloordiv__(x)``   |
| ``x ** y`` | ``y.__rpow__(x)``   |
| ``x @ y`` | ``y.__rmatmul__(x)``   |


#### Reverse/Reflected/Right operators 


| You Write...   | Python calls...          |
| ---            | ---                      |
| ``x + y``   | ``y.__radd__(x)``        |
| ``x - y``   | ``y.__rsub__(x)``  |
| ``x * y``     | ``y.__rmul__(x)``   |
| ``x / y`` | ``y.__rtruediv__(x)`` |
| ``x // y`` | ``y.__rfloordiv__(x)``   |
| ``x ** y`` | ``y.__rpow__(x)``   |
| ``x @ y`` | ``y.__rmatmul__(x)``   |


#### Rich Comparison

- Python allows you to also overload comparison operators:
   
  
| You Write...   | Python calls...          |
| ---            | ---                      |
| ``x == y``   | ``x.__eq__(y)``        |
| ``x != y``   | ``x.__ne__(y)``  |
| ``x < y``     | ``x.__lt__(y)``   |
| ``x > y`` | ``x.__gt__(y)`` |
| ``x <= y`` | ``x.__le__(y)``   |
| ``x >= y`` | ``x.__ge__(y)``   |

