# Class 10 - 13.5.18

# Software Design Principles

The previous class discussed one of the most important aspects of writing software - testing. But testing is a "mechanism" used when writing code, it's not some high-level principle. This class will deal with a few important principles that should be kept in the back of your minds whenever you write a program.

Most of the ideas presented below are from Robert Martin's, AKA Uncle Bob, lectures and textbooks. He's one of the founding fathers of object-oriented design.

## Object Orthogonality

In many cases objects interact with one another. In the case of some `ProcessData` class, which might process some instances of a `Data` class, that can contain a couple of `Series` and metadata, for example, we can see how `ProcessData` communicates with the data inside the `Data` class, modifying it further. 

A preliminary design might look like the following:

In [1]:
import numpy as np
import pandas as pd


class Data:
    """ Simple container for DataFrames and their metadata """
    def __init__(self, arr1: np.ndarray, arr2: np.ndarray, date: float):
            self.ser1 = pd.Series(arr1, dtype=np.uint8)
            self.ser22 = pd.Series(arr2, dtype=np.int16)
            self.metadata = dict(shape1=self.df1.shape,
                                 shape2=self.df2.shape,
                                 total=self.df1.shape[0] + self.df2.shape[0],
                                 date=date)
            
class ProcessData:
    """ Pipeline to process twin Data instances """
    def __init__(self, data1: Data, data2: Data):
        self.data1 = data1
        self.data2 = data2
        self.result = []
        self.metadata = dict(columns1=data1.columns,
                             columns2=data2.columns,
                             metadata=data1.metadata)
        
    def process(self):
        self.result.extend([data1.x.sum(), data2.x.sum()])
        self.result.append([data1.x.mean() + data2.y.mean()])
        return result

We have here a `Data` class which serves as a container for two DataFrames that are logically connected. It also simplifies the access to some of the metadata contained with theses DataFrames.

We also have a `ProcessData` class that uses the `Data` instances to calculate some statistical properties and keep them for later use.

While this design works (which is important), it's flawed in the sense that the `ProcessData` object is very reliant on the implementation details of the `Data` class. When higher-level objects are dependent on specific attributes of some lower-level module, we need to perform Dependency Inversion. This decoupling process can also be called "object orthogonality".

We'll do a couple of major changes to our design which will solve, step by step, the design issues we encoutered.

First we'll create a new `DataContainer` class that holds `Data` instances, and redefine the `Data` class more appropriately:

In [2]:
class Data:
    """ Simple container for DataFrames and their metadata """
    def __init__(self, arr1: np.ndarray, arr2: np.ndarray, date: float):
            self._ser1 = pd.Series(arr1, dtype=np.uint8)
            self._ser2 = pd.Series(arr2, dtype=np.int16)
            self._metadata = dict(shape1=self.df1.shape,
                                 shape2=self.df2.shape,
                                 total=self.df1.shape[0] + self.df2.shape[0],
                                 date=date)
    @property
    def data(self):
        """ Returns the actual data variables as an iterable"""
        result = [self._ser1, self._ser2]
        return result
    
    @property
    def metadata(self):
        return self._metadata
    
    def sum(self):
        return [x.sum() for x in self.data]
    
    
class DataContainer:
    """ Holds, in order, instances of Data """
    def __init__(self, datas):
        self._data = []
        self._metadata = {}
        try:
            for idx, data in enumerate(datas):
                if isinstance(data, Data):
                    self._data.append(data)
                    self._metadata[idx] = data.metadata
                else:
                    raise TypeError(f"TypeError: Data {data} isn't a 'Data' type.")
        except TypeError as e:
            print(e)
    
    @property
    def data(self):
        return self._data
    
    @property
    def metadata(self):
        return self._metadata
    
    def sum(self):
        result = []
        for data in self._data:
            result.append(data.sum())
        return result

First note the "new technical term": We introduce here the `@property` decorators. We'll discuss Python's decorators in the next class, but for now we only care about their practical aspect: If we define some method as a property, that keyword can be used like a regular attribute, except for the fact that it's immutable:

In [3]:
class Trial:
    def __init__(self):
        self.two_as_attr = 2
    
    def two_as_method(self):
        return 2
    
    @property
    def two_as_prop(self):
        return 2

tr = Trial()

# Changing attributes is possible:
print(f"The original attribute: {tr.two_as_attr}")
tr.two_as_attr = 3
print(f"Attributes can be changed: {tr.two_as_attr}")
print("------")

# Using the regular method requires brackets
print(f"Using the method: {tr.two_as_method()}")
print("And of course, it can't be changed (immutable).")
print("------")

# Using a property "feels" like using an attributes:
print(f"As a property: {tr.two_as_prop}")  # no brackets
try:
    tr.two_as_prop = 3  # AttributeError
except AttributeError as e:
    print(f"AttributeError: {e} - properties can't be changed.")

The original attribute: 2
Attributes can be changed: 3
------
Using the method: 2
And of course, it can't be changed (immutable).
------
As a property: 2
AttributeError: can't set attribute - properties can't be changed.


Properties are useful for one more reason (setters), which we'll examine in the next class.

But besides this new, exciting feature of Python, what else has changed with the implementation?

#### `Data`:
1. We redefined `Data`. The new object doesn't allow anyone from the outside to change the data it holds, it only allows for a "view" of the data. The use of properties ensure that once the object was created, the internal structure of the instance remains intact. The single underscore before the variable names also prevents direct access to the attribute.

2. Furthermore, if we examine the `sum()` method, we see that it's now bound to the `Data` object itself. If we write it explicitly it makes senes: _The sum of the data is a bound method to our data - an intrinsic property of it._ If we every decide to change how our data is stored, the `sum()` method should change accordingly, but no other object will be affected.


#### `DataContainer`:
1. The new `DataContainer` class _doesn't really know_ what it's holding. All it cares is that they're `Data` instances. It doesn't peek inside the methods of the different `Data` instances.

2. It doesn't allow access to the list of `Data` instances itself. It exposes a `data` property which returns the list. If we decide to change the internal implementation of `DataContainer`, users of this class wouldn't care as long as we keep the output of the `data` property similar. Even if the list is empty - it will always return something.

Let's see the redefined implementation of the `ProcessData` class:

In [4]:
class ProcessData:
    """ Pipeline to process twin Data instances """
    def __init__(self, datacont: DataContainer):
        self.datacont = datacont
        self.result = {}
        self.metadata = datacont.metadata
        
    def process(self):
        """ Mock processing pipeline """
        self.result['sum'] = self.datacont.sum()
        means = [x.mean() for x in self.datacont.data]
        self.result['mean'] = means
        return self.result

The code snippet above is now much cleaner than the one we had beforehand. It uses the "API" of the `DataContainer` in two ways - either using a fully-featured `sum()` function, or by (securely) accessing the data using the `data` property and running non-standard processing on it - mean calculation in our case.}

The downside is the added class - more code to write, more tests, more imports at the top. But the added value is tremendous. Think how easy it is to add new functionality into the pipeline. Everything is flexible, allowing to create a new `median()` function in the `DataContainer` class, for example. We can even change the internal structure of the `Data` class and still use the downstream class effectively.

## Liskov Subtitution Principle

The LSP can be presented in several ways, and we'll choose the more straight-forward approach of just showing an example of when the principle is violated.

Say I wish to model a rectangle, just as we did in the first class:

In [14]:
class Rectangle:
    """ A very simple implementation just to prove a point """
    def __init__(self, point, x, y):
        self.corner = point[0], point[1]
        self.x = x
        self.y = y
    
    def move(self, point):
        """ Move the object to the point """
        self.corner = point
        
    def set_width(self, dx):
        """ Change width to dx """
        self.x = dx
    
    def set_height(self, dy):
        """ Change height to dy """
        self.y = dy        

As the docstring says, above is a super-basic implementation of such a Rectangle. Take note of the two mutating functions that present a way to change the shape of the rectangle _independently._ This seems very logical when only dealing with a rectangle - each side truly is independent of the other.

However, if we wish to reuse this class when modeling a Square via inheritance, we'll be facing quite a pickle:

In [8]:
class Square(Rectangle):
    """ Simple circle, inheriting from Rectangle """
    def __init__(self, point, x):
        super().__init__(point, x, x)

In [13]:
sq = Square((0, 0), 10)
print(f"Square size: {sq.x, sq.y}")
print(f"Square corner: {sq.corner}")

Square size: (10, 10)
Square corner: (0, 0)


Initially this seems OK. We only require a single `x` input for a square, and we just pass it twice to the `Rectangle` constructor to create a squared rectangle.

Even the `move()` method of the rectangle is helpful - we can move our square around without the need to redefine it.

But the `set_X()` methods are an issue. We can't allow for users of our `Square` to modify the height and width of the square independently. If someone would only change the square's height, keeping its current width unchanged, it would make our `Square` not a true square.

In [16]:
sq.set_height(20)
print(f"New dimensions: {sq.x, sq.y} - not a square.")

New dimensions: (10, 20) - not a square.


Logically, and mathematically, a square _should_ inherit from a Rectangle. The simple mental model of the problem at hand is very clear with this inheritance relationship in mind. However, our implementation reaches a set back with might have not been able to predict in advance.

LSP claims that we should be able to replace instances of `Rectangle` with instances of `Square` without changing the correctness of the application. In this case we see that this substitution isn't possible, and so the principle breaks.

### What do we do?

#### 1. Limit the use of inheritance
Only when we're completely positive that the use of inheritance will contribute to our application - by improving readability or reducing code repetitions - only then should we use it. It's an important tool to have as an object-oriented programmer, but one which should be used carefully.

#### 2. Define a higher-level abstraction
We could define a more abstract base class for both the rectangle and square, such as a `2DShape`. This class can have a `corner` attribute, and a few very basic methods like `move()`. This will change the definition of `Rectangle` to 

```python
class Rectangle(TwoDShape):
     # ...
``` 
and `Square` to 
```python
class Square(TwoDShape):
     # ...
```

#### 3. Override methods of the base class
We may simply override the implementation of one (or both) of the `set_X()` methods. The new implementation may raise a warning when trying to use it, pointing the user to the appropriate method, or it may raise a simple exception.

#### 4. Addition of a precondition
We can add to the `Rectangle` class a flag (=boolean attribute) called `stretchable`. Each `set_X()` methods then checks this flag, to see if the operation is allowed, before changing the width and height.

## Typestates

Typestates are a way to enforce the state of our data\application with strict types.


Let's assume I have 24 human volunteers in combined a fMRI + questionnaire study. I keep them all in a single DataFrame for brevity and ease-of-use, but in effect they're in different stages of my experiment. A few were just recruited last week, and I haven't even set a date for our first meeting. A few others were already scanned in the magnet once, but still have to go through my second questionnaire session. 

My application monitors these students, alerts me of incoming meeting dates, and (of course) analyzes the results of the questionnaires and scans.

The __correctness__ of this application can be enforced in many ways - tests, mock data, daily use - but here I choose to show another mechanism - typestates. The fact that the current status of each volunteer isn't specified with a simple string in a table, but is actually a different class altogether, is another way to make sure that I always receive the expected output from each method call.

In [26]:
import datetime
import pandas as pd


# Helper types
class Name:
    """ First and last name """
    # Implementation omitted


class Age:
    """ Special age type """
    # Implementation omitted


class FmriResult:
    """ Results from an fMRI scan """
    # Implementation omitted


# Volunteer types    
class Volunteer:
    """ Base class for all volunteers in my project """
    def __init__(self, name: Name, age: Age, call_date: datetime.time, vol_id: int):
        self.name = name
        self.age = age
        self.call_date = call_date
        self.id = vol_id
        
    def __str__(self):
        return f"{self.name}, age {self.age}, first called at {self.call_date}."
        
    def update_df(self, records: pd.DataFrame):
        """ Add the instance to the dataframe containing the rest of the data """
        record = pd.DataFrame([self.name, self.age, self.call_date, 
                               self.id, self.metadata, type(self), copy.copy(self)])
        records.append(record)
        return records
    
    def remove_from_df(self, records: pd.DataFrame):
        """ Remove the instance from the student records """
        idx = records.id == self.id
        records.drop(idx, inplace=True)
        return records

    
class PreScanOne(Volunteer):
    """ Volunteer before the first session """
    loc = 0  # ordinal place in hierarchy
    
    def __init__(self, name: Name, age: Age, call_date: datetime.time, vol_id: int, 
                 scan_one_date: datetime.time):
        super().__init__(name, age, call_date, vol_id)
        self.metadata = dict(scan_one_date=scan_one_date)
        
    def advance(self, result: FmriResult, next_date: datetime.time):
        """ Advance a PreScanOne to a PostScanOne """
        new = PostScanOne(self, result, next_date)
        return new
    

class PostScanOne(Volunteer):
    """ Volunteer after the first session """
    loc = 1
    
    def __init__(self, pre_volunteer: PreScanOne, scan_one_data: FmriResult, 
                 scan_two_date: datetime.time):
        super().__init__(pre_volunteer.name, pre_volunteer.age, pre_volunteer.call_date, pre_volunteer.id)
        self.metadata = pre_volunteer.metadata
        self.metadata['scan_one_data'] = scan_one_data
        self.metadata['scan_to_date'] = scan_two_date
    
    def advance(self, result: FmriResult, next_date: datetime.time):
        """ Advance a PostScanOne to a PreScanTwo """
        new = PreScanTwo(self, result, next_date)
        return new
    
    
# Examples of generic methods that use this interface
def advance_volunteer(old_vol, results: FmriResult, records: pd.DataFrame):
    """ 
    Move volunteer to next step in the experiment, returning the new 
    instance and records.
    """
    old_vol.remove_from_df(records)
    new_vol = old_vol.advance(results, records)
    new_vol.update_df(records)
    return new_vol, records


def process_data(records):
    """ Run the same processing function over all fMRI data """
    results = []
    for vol in records:
        try:
            results.append(vol.process_data)
        except AttributeError:  # instance doesn't have data
            pass
    return results

This is long, but interesting, so let's try to break it down.

At the beginning we have a few help classes which I merely defined, but not implemented. These shouldn't look strange to you. We talked during class of how an `Age` type is an important example of defining our own types in a program, since it's neither an integer nor a floating point number.

The second part is the most interesting. We have a base class called `Volunteer` which contains basic information which is common to all experiment volunteers. But it's actually more than that - it also defines the _interfaces_ between the classes, it forces the classes to have specific attributes that will comply to this protocol, linking their behavior together.

The other two classes inherit from `Volunteer` and represent the first two steps in the "Volunteer path". The `loc` class variable signifies that. From phase one (`PreScanOne`( a volunteer can only advance forward (or drop out from the experiment) to step 2. And likewise from step 2 to 3 - you'll always find the same `.advance()` method that takes you to the next step, even though the implementation is slightly different. To handle the variability in the held data, we have the `metadata` attribute which can hold different parameters and datapoints.

The last part shows how to use such an interface. We have a function that advances an instance of a class "one step" to the next phase. We have a function that runs some processing on the data held inside the instances, and we can have as many functions (and classes as we wish). It's completely extensible since the API is well-defined.

## Helper Concepts and Libraries

In practice, good and clear software design can be aided using unique Python features and packages. We'll review a few of the more prominent ones:

### Type Annotations and MyPy

Since version 3.6, Python allows this syntax:

In [2]:
from typing import Tuple, Dict

def doer_of_stuffs(a: float, b: int, c: str = 'ccc') -> Tuple[str, Dict[int, float]]:
    """
    Does stuff to a, b, and c.
    Returns: A tuple of a string and a dictionary mapping ints to floats
    """
    a_helper: float = a + 2
    b_helper: float = b / 3
    int_a = int(a_helper)
    c2: str = c + c
    return c2, {b: a_helper, int_a: b_helper}

While a bit more verbose, these _type annotations_ make things clearer when dealing with large codebases. Knowing the defined type of your variables as they bounce around between modules and functions can help with the debugging process of your code tremendously.

Moreover, modern IDEs like PyCharm and VSCode will alert you before you run the code of any possible type errors. For example:

In [3]:
def main():
    a = 3  # integer
    a /= 2  # now it's a float
    arr = np.array([1, 2, 3])
    
    # ... lots of code here
    
    b = arr[a]  # TypeError - cannot index with a float variable

PyCharm and VSCode will mark this `arr[a]` expression and try to prevent you from running this code. 

A more wholesome approach is `mypy`, which was developed in Dropbox, a company very reliant on its Python-based product. When the Dropbox codebase increased in size, its engineers wanted to keep using Python due to its amazing features, but avoid the problems that come with a dynamically-typed language. Thus, `mypy` was born. In essence, it's a command-line tool that runs type checks on the entirety of your code base, verifying the type-correctness of your application. In many places a clean `mypy` error log is required before committing changes to the code base.

`mypy` supports both comment-based type annotations for older versions of Python (Dropbox, as of early 2018, is still using Python 2.7) and the new style of type annotations shown above. It can also generate type annotations on the fly, using `PyAnnotate`, while you run your application.

An example can be found in the `mypy_demo` folder.

### Enumerations

Python added enumeration support in Python 3.4, and it's starting to pop-up more and more in new code bases. An enumeration is a list of discrete possible values. Assuming I have a simple addition function:

In [6]:
def my_add(a, b, add=True):
    """ Simple addition\subtraction """
    return a + b if add else a - b

The list of possible values for `a` and `b` is endless, so these cannot be enumerated. The `add` keyword is called a "flag", since it has two possible values - `True` and `False`. It's an enumeration of two possible values.

When we have more than two options, or when our two options aren't simply booleans, we can use an enumeration. Here's a simple example:

In [7]:
import pandas as pd


rng = pd.date_range('1/1/2018',periods=100, freq='D')  # 'D' is days
rng

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
               '2018-01-09', '2018-01-10', '2018-01-11', '2018-01-12',
               '2018-01-13', '2018-01-14', '2018-01-15', '2018-01-16',
               '2018-01-17', '2018-01-18', '2018-01-19', '2018-01-20',
               '2018-01-21', '2018-01-22', '2018-01-23', '2018-01-24',
               '2018-01-25', '2018-01-26', '2018-01-27', '2018-01-28',
               '2018-01-29', '2018-01-30', '2018-01-31', '2018-02-01',
               '2018-02-02', '2018-02-03', '2018-02-04', '2018-02-05',
               '2018-02-06', '2018-02-07', '2018-02-08', '2018-02-09',
               '2018-02-10', '2018-02-11', '2018-02-12', '2018-02-13',
               '2018-02-14', '2018-02-15', '2018-02-16', '2018-02-17',
               '2018-02-18', '2018-02-19', '2018-02-20', '2018-02-21',
               '2018-02-22', '2018-02-23', '2018-02-24', '2018-02-25',
      

In [9]:
rng = pd.date_range('1/1/2018',periods=100, freq='M')  # it can also be 'M'
rng

DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
               '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',
               '2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31',
               '2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
               '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
               '2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31',
               '2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31', '2020-06-30', '2020-07-31', '2020-08-31',
               '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31',
               '2021-01-31', '2021-02-28', '2021-03-31', '2021-04-30',
               '2021-05-31', '2021-06-30', '2021-07-31', '2021-08-31',
               '2021-09-30', '2021-10-31', '2021-11-30', '2021-12-31',
               '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30',
               '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31',
      

What are the possible values for the `freq` keyword? Day is `D`, month is `M`, Year will probably be `Y`. Are there any more keywords? Will `d` also work, or do I have to use capital `D`? Actually, checking the [official](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.date_range.html) documentation doesn't result in anything too useful.

This is where enumerations come into play. This could've been simpler if we could only choose a value from a list of possible values:

In [10]:
from enum import Enum


class DateRangeFreq(Enum):
    D = 'days'
    M = 'months'
    Y = 'years'

rng = pd.date_range('1/1/2018',periods=100, freq=pd.DateRangeFreq.D)  # doesn't actually work...

AttributeError: module 'pandas' has no attribute 'DateRangeFreq'

If we were unsure of the available parameters, we could import the `DateRangeFreq` object and inspect its possible values. As you can see, each key has a value associated with it. This value can be an integer, string or event a Python object.

Enumerations are still hard to find in the Python ecosystem. They're a recent addition, and Pythonistas are used to typing strings in their function parameters, and not enumerations. But in many other languages with native enum support these data structures are very frequent for this use case, as well as others. If you're writing a piece of code that is intended to a Python 3.4+ audience, I suggest you use enumerations liberally in your code.

### `attrs` - Classes without boilerplate

Python classes are extremely useful, but they're also pretty verbose. They require you to write a lot of code for very basic operations.

For example, in the the `__init__()` method you have to go through each variable in the function signature and assign it to your own value:

In [11]:
class Example:
    def __init__(self, param1, param2, param3, param4):
        self.param1 = param1
        self.param2 = param2
        self.param3 = param3
        self.param4 = param4
    
    def my_method(self):
        """ Do stuff """
        pass

So many lines of repetitive code doing basically nothing. I didn't assert the types of the variables, I didn't do some basic pre-processing - this is called "boilerplate" code. Python requires me to write these tedious lines every time I create a class, and when classes get bigger and bigger, these assignments can be a hassle to write.

`attrs` to the rescue:

In [12]:
import attr
from attr.validators import instance_of


@attr.s
class ExampleTwo:
    param1 = attr.ib(validator=instance_of(int))
    param2 = attr.ib(validator=instance_of(float))
    param3 = attr.ib(default='no')
    param4 = attr.ib(default=attr.Factory(list))
    
    def my_method(self):
        """ Do stuff """
        pass

That's it. No `__init__` is required, each `paramX` variable is already assigned to `self.paramX`. It also allows the addition of validators, default values, converter functions (not shown), and it even implements the comparison methods (`__eq__`, `__gt__`, etc.) for you. It has a ton of other useful features which I won't go into right now, but you can be sure that it's a package worth using.

I can testify that 95% of classes I write today are `attrs` classes, and so do many other fellow Pythonistas. I encourage you to read the [official documentation](http://www.attrs.org/en/stable/?badge=stable) and start using it ASAP.

### Dimensionality analysis and units

When working with numbers that have units, it's usually a good idea to keep the physical quantity assigned to that value as close as possible.

When you're measuring the local field potential using some electrode array, it's good practice to verify that throughout the entirety of your processing pipeline, the voltage values aren't divided by a number with units of time, because units of _[Volts] / [seconds]_ usually have no physical meaning. It can also help you assert that your dF/F calculation indeed has natural units, and not some other arbitrary units.

There are many options in the Python world for dimensionality analysis. If you're using Python to write symbolic math and solve equations, I suggest you use SymPy's `physics.units` module. Else - use `pint`.

In [13]:
import pint


ureg = pint.UnitRegistry()
3 * ureg.meter + 4 * ureg.cm

In [15]:
measures = ureg.Quantity(np.random.random(100), 'volts')
print(measures)

[0.67840601 0.41680269 0.03569134 0.15617332 0.08101041 0.61858173 0.07591067 0.9391682  0.87775667 0.85875565 0.57915854 0.2985855 0.02239399 0.89577427 0.98684997 0.97975409 0.67309812 0.37162258 0.7719006  0.94962064 0.1181541  0.4552931  0.23393877 0.3822539 0.51241471 0.53755886 0.61827535 0.4350629  0.7268777  0.41257104 0.45446691 0.22042157 0.81388193 0.48407141 0.79956398 0.9360064 0.16081747 0.330716   0.00854326 0.04916125 0.66139035 0.69787212 0.74412695 0.24824785 0.90420234 0.47109738 0.73362357 0.24076135 0.90065602 0.89628194 0.17025071 0.62822956 0.86413525 0.94979033 0.72284429 0.46424103 0.55754662 0.75681794 0.82981382 0.54200402 0.29357256 0.96331112 0.21272206 0.70698981 0.44695428 0.73939125 0.59775033 0.35856208 0.18263938 0.42416676 0.93361213 0.88421459 0.93858066 0.71897601 0.61245747 0.16123439 0.30849112 0.88801092 0.52152472 0.08118126 0.44950426 0.30234981 0.24917469 0.72929148 0.35612704 0.10440059 0.33538699 0.00425218 0.53947961 0.7912409 0.81694805 0.

In [17]:
print(measures * 2)

[1.35681202 0.83360537 0.07138268 0.31234664 0.16202082 1.23716346 0.15182134 1.8783364  1.75551334 1.71751131 1.15831708 0.597171 0.04478798 1.79154855 1.97369993 1.95950817 1.34619624 0.74324516 1.54380121 1.89924128 0.2363082  0.9105862  0.46787754 0.76450779 1.02482943 1.07511772 1.23655069 0.8701258  1.4537554  0.82514209 0.90893382 0.44084315 1.62776385 0.96814283 1.59912796 1.87201281 0.32163494 0.661432   0.01708652 0.09832251 1.3227807  1.39574425 1.4882539  0.4964957  1.80840468 0.94219475 1.46724714 0.4815227 1.80131203 1.79256387 0.34050143 1.25645911 1.72827051 1.89958065 1.44568859 0.92848205 1.11509323 1.51363588 1.65962764 1.08400803 0.58714512 1.92662224 0.42544411 1.41397962 0.89390855 1.47878251 1.19550066 0.71712415 0.36527875 0.84833353 1.86722425 1.76842919 1.87716132 1.43795202 1.22491494 0.32246879 0.61698224 1.77602185 1.04304945 0.16236252 0.89900852 0.60469961 0.49834938 1.45858296 0.71225407 0.20880118 0.67077397 0.00850436 1.07895922 1.58248179 1.63389611 1

In [21]:
amps = measures / (2 * ureg.ohm)  # I = V/R
amps.dimensionality

<UnitsContainer({'[current]': 1.0})>

In [22]:
amps.to('seconds')  # DimensionalityError

DimensionalityError: Cannot convert from 'volt / ohm' ([current]) to 'second' ([time])

For some projects this can be a pretty big overkill, but for others this can save many "silent" bugs.

## Design vs. Productivity

Before we start exercising, one important note to remember: There's a thin line between under- and over-engineering. Very small scripting projects require almost no engineering at all. This might mean that after you gain a few extra months of experience in Python, the structure of code for a small scripting job in Python might be obvious for you right from the get-go. You'll know which data structures you'll have, whether or not you'll need a class or two, and how the user interface might go.

On the other hand, large applications which span at least a few thousands lines of code will always need _some_ form of pre-planning. It would be senseless not to write out a diagram of the main modules in your code and their interfaces. One can consider this to be common knowledge, or a simple programmer's instinct. Just like architects sit down and plan for months in advance the construct they're about to create, programmers should spell out the architecture of their own programs. In no way will this guarantee you'll get the architecture right in the first time, but the design might serve as good building blocks when you start the refactoring process.

Problems mostly occur when you write medium-sized scripts, up to a couple thousand lines. These scripts usually start out small - a few functions that deal with file I/O and display of data - but can grow quite quickly once you start adding functionality. When the script was short you probably didn't even write tests, since you were sure you're handling some insignificant piece of code, and now it starts biting back at you.

It's hard to write rules for these occasions. When someone asks me for improved functionality on some short script I wrote, I sometimes tell them it will take more time than I think it should, since I want to devote time for refactoring of the code, to make the new functionality feel more natural inside it.

It's also good practice to write use classes to bind data and methods, even when you think they might be overkill. It's much easier to expand the functionality of classes than of an assortment of functions.