(code-advanced)=
# Advanced Coding

```{note}
If you're just starting to code, you can safely skip this chapter.
```

## Introduction

This chapter covers some more advanced programming concepts. It's not strictly necessary to master the content of the book, but it's here in case you want a deeper understanding or in case you find that you eventually need to draw on more sophisticated programming tools and concepts.

This chapter has benefitted from the online book [*Research Software Engineering with Python*](https://merely-useful.github.io/py-rse/), the [official Python documentation](https://www.python.org/), the excellent [30 days of Python](https://github.com/Asabeneh/30-Days-Of-Python), and the [Hitchhiker's Guide to Python](https://docs.python-guide.org/).

## Sets

A set in coding is a collection of unordered and unindexed distinct elements (in analogy to the mathematical definition of a set). To define a set, the two commands are:

In [None]:
st = {}
# or
st = set()

These aren't very interesting though! Here's a set with some values in:


In [None]:
people_set = {"Robinson", "Fawcett", "Ostrom"}

What can we do with it? We can check its length using `len(people_set)` and we can ask whether a particular entry is contained within it:

In [None]:
"Ostrom" in people_set

We can add multiple items or another set using `.update` or `.union`, or a single item using:
 

In [None]:
people_set.add("Martineau")
people_set

We can remove entries with `.remove(entry_name)` or, to remove only the last entry `.pop()`. You can easily convert between lists and sets:

In [None]:
list(people_set)

The real benefits of sets are that they support set operations, though. The most important are `intersection`,

In [None]:
st1 = {"item1", "item2", "item3", "item4"}
st2 = {"item3", "item2"}
st1.intersection(st2)

`difference`,

In [None]:
st1 = {"item1", "item2", "item3", "item4"}
st2 = {"item2", "item3"}
st1.difference(st2)

and symmetric difference,

In [None]:
st1 = {"item1", "item2", "item3", "item4"}
st2 = {"item2", "item3"}
st2.symmetric_difference(st1)

## Truthy and falsy values

Python objects can be used in expressions that will return a boolean value, such as when a list, `listy`, is used with `if listy`. Built-in Python objects that are empty are usually evaluated as `False`, and are said to be 'Falsy'. In contrast, when these built-in objects are not empty, they evaluate as `True` and are said to be 'truthy'.

(If you are building your own classes, you can define this behaviour for them through the `__bool__` dunder method.)

Let's see some examples:

In [None]:
def bool_check_var(input_variable):
    if not (input_variable):
        print("Falsy")
    else:
        print("Truthy")


listy = []
other_listy = [1, 2, 3]


bool_check_var(listy)

In [None]:
bool_check_var(other_listy)

The method we defined doesn't just operate on lists; it'll work for many various other truthy and falsy objects:

In [None]:
bool_check_var(0)

In [None]:
bool_check_var([0, 0, 0])

Note that zero was falsy, its the nothing of a float, but a list of three zeros is not an empty list, so it evaluates as truthy.

In [None]:
bool_check_var({})

In [None]:
bool_check_var(None)

Knowing what is truthy or falsy is useful in practice; imagine you'd like to default to a specific behaviour if a list called `list_vals` doesn't have any values in. You now know you can do it simply with `if list_vals`.

## Lambda functions

Lambda functions are a very old idea in programming, and are part of the functional programming paradigm. Coding languages tend to be more object-oriented or functional, with the object-oriented approach originating with Alan Turing's "Turing Machines" and the functional approach with Alonso Church's "lambda calculus". These two approaches are mathematically equivalent and, on a more practical note, high-level programming languages often mix both. As examples, Haskell is strongly a functional language, statistics language R leans toward being more functional, Python is slightly more object oriented, and powerhouse languages like Fortran and C are object-oriented. However, despite being less functional than some languages, Python does have lambda functions, for example:

In [None]:
plus_one = lambda x: x + 1
plus_one(3)

For a one-liner function that has a name it's actually better practice here to use `def plus_one(x): return x + 1`, so you shouldn't see this form of lambda function too much in the wild. However, you are likely to see lambda functions being used with dataframes and other objects. For example, if you had a dataframe with a column of string called 'strings' that you want to change to “Title Case” and replace one phrase with another, you could use lambda functions to do that (there are better ways of doing this but this is useful as a simple example):

In [None]:
import pandas as pd

df = pd.DataFrame(
    data=[["hello my blah is Ada"], ["hElLo mY blah IS Adam"]],
    columns=["strings"],
    dtype="string",
)
df["strings"].apply(lambda x: x.title().replace("Blah", "Name"))

More complex lambda functions can be constructed, eg `lambda x, y, z: x + y + z`. One of the best use cases of lambdas is when you *don't* want to go to the trouble of declaring a function. For example, let's say you want to compose a series of functions and you want to specify those functions in a list, one after the other. Using functions alone, you'd have to define a new function for each operation. With lambdas, it would look like this (again, there are easier ways to do this operation, but we'll use simple functions to demonstrate the principle):

In [None]:
number = 1
for func in [lambda x: x + 1, lambda x: x * 2, lambda x: x ** 2]:
    number = func(number)
    print(number)

Note that people often use `x` by convention, but there's nothing to stop you writing `lambda horses: horses**2` (apart from the looks your co-authors will give you).

```{admonition} Exercise
Write a lambda function that takes the square root of an input number.
```

If you want to learn more about lambda functions, check out these [short video tutorials](https://calmcode.io/lambda/introduction.html).

## Splat and splatty-splat

You read those right, yes. These are also known as "unpacking operators" for iterables that are fed into functions as arguments (in the form of a tuple) and keyword arguments (in the form of a dictionary) respectively. Splat is `*` and splatty-splat is `**`. Because they unpack, they allow us to efficiency send packages of arguments or keyword arguments into functions without labouriously writing out every single argument.

Because function arguments are always tuples, the use of `*` must be accompanied by a tuple. Because function keywords are always dictionaries of key, value pairs, the use of `**` must always be accompanied by a dictionary.

Let's take a look at splat, which unpacks tuples into function arguments. If we have a function that takes two arguments we can send variables to it in different ways:

In [None]:
def add(a, b):
    return a + b


print(add(5, 10))

func_args = (6, 11)

print(add(*func_args))

The splat operator, `*`, unpacks the variable `func_args` into two different function arguments.

Perhaps surprisingly, we can use the splat operator *in the definition of a function*. For example, sum_elements below

In [None]:
def sum_elements(*elements):
    return sum(*elements)


nums = (1, 2, 3)

print(sum_elements(nums))

more_nums = (1, 2, 3, 4, 5)

print(sum_elements(more_nums))

```{admonition} Exercise
Write a function multiply that multiplies two input numbers, `a` and `b`, together and returns the answer. Send the argument `(10, 12)` to it using the splat operator.
```

Splatty-splat, `**`, unpacks dictionaries into keyword arguments (aka kwargs):

In [None]:
def function_with_kwargs(a, x=0, y=0, z=0):
    return a + x + y + z


print(function_with_kwargs(5))

kwargs = {"x": 3, "y": 4, "z": 5}

print(function_with_kwargs(5, **kwargs))

```{admonition} Exercise
Using a dictionary and splatty-splat with the `function_with_kwargs` function, find the sum of 9, 6, 13, and 2.
```

## Higher order functions

Functions are like any other variable in Python, which means you can do some interesting things with them and, well, it can get a bit *meta*. For example, a function can take one or more functions as parameters, a function can be returned as a result of another function, functions can be defined within functions, a function can be assigned to a variable, and you can iterate over functions (for example, if they are in a list).

Here's an example that shows how to use a higher order function: it accepts a function, `f`, as an argument and then, using the splat operator `*`, it accepts all arguments of that function.

In [None]:
def join_a_string(str_list):
    return " ".join(str_list)


def higher_order_function(f, *args):
    """Lowers case of result"""
    out_string = f(*args)
    return out_string.lower()


result = higher_order_function(join_a_string, ["Hello", "World!"])
print(result)

In the next example, we show how to return a function from a function (assigning a function, `result`, to a variable in the process):

In [None]:
def square(x):
    return x ** 2


def cube(x):
    return x ** 3


def higher_order_function(type):  # a higher order function returning a function
    if type == "square":
        return square
    elif type == "cube":
        return cube


result = higher_order_function("square")
print(f"Using higher_order_function('square'), result(3) yields {result(3)}")
result = higher_order_function("cube")
print(f"Using higher_order_function('cube'), result(3) yields {result(3)}")

Functions within functions are allowed. They are known as *closures*. Here's a simple (if contrived) example:

In [None]:
from datetime import datetime


def print_time_now():
    def get_curr_time():
        return datetime.now().strftime("%H:%M")

    now = get_curr_time()
    print(now)


print_time_now()

Finally, let's see how to iterate over functions

In [None]:
def square_root(x):
    return x ** (0.5)


functions_list = [square_root, square, cube]

for func in functions_list:
    print(f"{func.__name__} applied to 4 is {func(4)}")

## Iterators

An iterator is an object that contains a countable number of values that a single command, `next`, iterates through. Before that's possible though, we need to take a countable group of some kind and use the `iter` keyword on it to turn it into an iterator. Let's see an example with some text:

In [None]:
text_lst = ["Mumbai", "Delhi", "Bangalore"]

myiterator = iter(text_lst)

Okay, nothing has happened yet, but that's because we didn't call it yet. To get the next iteration, whatever it is, use `next`:

In [None]:
next(myiterator)

In [None]:
next(myiterator)

In [None]:
next(myiterator)

Alright, we've been through all of the values so... what's going to happen `next`!?

```python
next(myiterator)
```

```python
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-27-29fb3b4dbbec> in <module>
----> 1 next(myiterator)

StopIteration: 
```

Iterating beyond the end raises a `StopIteration` error because we reached the end. To keep going, use `cycle` in place of `iter`. Note that you can build your own iterators (here we used a built-in object type, the `list`, to create an iterator of type `list_iterator`).

## Generators

Generator functions return 'lazy' iterators. They are lazy because they do not store their contents in memory. This has *big* advantages for some operations in specific situations: datasets larger than can fit into your computer's memory, or a complex function that needs to maintain an internal state every time it’s called.

To give an idea of how and when they work, imagine that (exogeneously) integers are really costly, taking as much as 10 MB of space to store (the real figure is more like 128 bytes). We will write a function, "firstn", that represents the first $n$ non-negative integers, where $n$ is large. The most naive possible way of doing this would be to build the full list in memory like so:

In [None]:
def first_n_naive(n):
    """Build and return a list"""
    num, nums = 0, []
    while num < n:
        nums.append(num)
        num += 1
    return nums


sum_of_first_n = sum(first_n_naive(1000000))
sum_of_first_n

Note that `nums` stores *every* number before returning all of them. In our imagined case, this is completely infeasible because we don't have enough computer space to keep all $n$ 10MB integers in memory.

Now we'll rewrite the list-based function as a generator-based function:

In [None]:
def first_n_generator(n):
    """A generator that yields items instead of returning a list"""
    num = 0
    while num < n:
        yield num
        num += 1


sum_of_first_n = sum(first_n_generator(1000000))
sum_of_first_n

Now, instead of creating an enormous list that has to be stored in memory, we `yield` up each number as it is 'generated'. The cleverness that's going on here is that  the 'state' of the function is remembered from one call to the next. This means that when `next` is called on a generator object (either explicitly or implicitly, as in this example), the previously yielded variable `num` is incremented, and then yielded again.

That was a fairly contrived example but there are plenty of practical ones. Working with pipelines that process very large datasets is a classic use case. For example, imagine you have a csv file that's far too big to fit in memory, i.e. open all at once, but you'd like to check the contents of each row and perhaps process them. The code below would `yield` each row in turn.

```python
def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row
```

An even more concise way of defining this is via a *generator expression*, which syntactically looks a lot like a *list comprehension* but is a generator rather than a list. The example we just saw would be written as:

```python
csv_gen = (row for row in open(file_name))
```

It's easier to see the difference in the below example which clearly shows the analogy between *list comprehensions* and *generator comprehensions*.

In [None]:
sq_nums_lc = [num ** 2 for num in range(2, 6)]
sq_nums_lc

In [None]:
sq_nums_gc = (num ** 2 for num in range(2, 6))
sq_nums_gc

The latter is a generator object and we can only access individual values calling `next` on it.

In [None]:
next(sq_nums_gc)

Note that for small numbers of entries, lists may actually be faster and more efficient than generators-but for large numbers of entries, generators will almost always win out.

## Decorators

Decorators 'decorate' functions, they adorn them, modifying them as they execute. Let's say we want to run some numerical functions but we'd like to add ten on to whatever results we get. We could do it like this:

In [None]:
def multiply(num_one, num_two):
    return num_one * num_two


def add_ten(in_num):
    return in_num + 10


answer = add_ten(multiply(3, 4))
answer

This is fine for a one-off but a bit tedious if we're going to be using `add_ten` a lot, and on many functions. Decorators allow for a more general solution that can be applied, in this case, to any `inner` function that has two arguments and returns a numeric value.

In [None]:
def add_ten(func):
    def inner(a, b):
        return func(a, b) + 10

    return inner


@add_ten
def multiply(num_one, num_two):
    return num_one * num_two


multiply(3, 4)

We can use the same decorator for a different function (albeit one of the same form) now.

In [None]:
@add_ten
def divide(num_one, num_two):
    return num_one / num_two


divide(10, 5)

But the magic of decorators is such that we can define them for much more general cases, regardless of the number of arguments or even keyword arguments:

In [None]:
def add_ten(func):
    def inner(*args, **kwargs):
        print("Function has been decorated!")
        print("Adding ten...")
        return func(*args, **kwargs) + 10

    return inner


@add_ten
def combine_three_nums(a, b, c):
    return a * b - c


@add_ten
def combine_four_nums(a, b, c, d=0):
    return a * b - c - d


combine_three_nums(1, 2, 2)

Let's now see it applied to a function with a different number of (keyword) arguments:

In [None]:
combine_four_nums(3, 4, 2, d=2)

Decorators can be chained too (and order matters):

In [None]:
def dividing_line(func):
    def inner(*args, **kwargs):
        print("".join(["-"] * 30))
        out = func(*args, **kwargs)
        return out

    return inner


@dividing_line
@add_ten
def multiply(num_one, num_two):
    return num_one * num_two


multiply(3, 5)

## Errors and exceptions

When a programme goes wrong, it throws up an error and halts. You won't be coding for long before you hit one of these errors, which have special names depending on what triggered them.

Let's see a real-life error in action:

```python
denom = 0

print(1/denom)
```

```python
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-39-e45c0e0a3e37> in <module>
      1 denom = 0
      2 
----> 3 print(1/denom)

ZeroDivisionError: division by zero
```

Oh no! We got a `ZeroDivisionError` and our programme crashed. Note that the error includes a 'Traceback' to show which line went wrong, which is helpful for debugging.

In practice, there are often times when we know that an error *could* arise, and we would like to specify what should happen when it does (rather than having the programme crash). 

We can use *exceptions* to do this. These come in a `try` ... `except` pattern, which looks like an `if` ... `else` pattern but applies to errors. If no errors occur inside the `try` block, the `except` block isn’t run but *if* something goes wrong inside the `try` then the `except` block is executed. Let's see an example:

In [None]:
for denom in [-5, 0, 5]:
    try:
        result = 1 / denom
        print(f"1/{denom} == {result}")
    except:
        print(f"Cannot divide by {denom}")

Now we can see two differences. First: the code executed just fine *without* halting. Second: when we hit the error, the `except` block was executed and told us what was going on.

In this case, we wrote an informative message about the error but it's convenient to use Python's built in messages where we can. In the below, not only do we send our own message about the error but we add info on what caused the error for the language too:

In [None]:
for denom in [-5, 0, 5]:
    try:
        result = 1 / denom
        print(f"1/{denom} == {result}")
    except Exception as error:
        print(f"{denom} has no reciprocal; error is: {error}")

Sadly, division by zero is just one of the many errors you might encounter. What if a function is likely to end up running into several different errors? We can have multiple `except` clauses to catch these:

In [None]:
numbers = [-5, 0, 5]
for i in [0, 1, 2, 3]:
    try:
        denom = numbers[i]
        result = 1 / denom
        print(f"1/{denom} == {result}")
    except IndexError as error:
        print(f"index {i} out of range; error is {error}")
    except ZeroDivisionError as error:
        print(f"{denom} has no reciprocal; error is: {error}")

A full list of built-in errors may be [found here](https://docs.python.org/3/library/exceptions.html#exception-hierarchy) and they are nested in classes (eg `ZeroDivisionError` is a special case of a `ArithmeticError`).

Where do these errors come from anyway? What tells the programming language to throw a tantrum when it encounters certain combinations of values and operations.

The answer is that the person or people who wrote the code that's 'under the hood' can specify when such errors should be raised. Remember, the philosophy of Python is that things should faily loudly (so that they do not cause issues downstream). Here's an example of some code that raises its own errors using the `raise` keyword:

In [None]:
for number in [1, 0, -1]:
    try:
        if number < 0:
            raise ValueError(f"no negatives: {number}")
        print(number)
    except ValueError as error:
        print(f"exception: {error}")

A `ValueError` is a built-in type of error and there are plenty of ones to choose from for your case. Some big or specialised libraries define their own types of error too.

One very clever feature of Python's exception handling is "throw low, catch high", which means that even if an error gets thrown way deep down in the middle of a code block, the catching exception can be used some way away. Here's an example: the error arises *within* the `sum_reciprocals` function, but is caught elsewhere.

In [None]:
def sum_reciprocals(values):
    result = 0
    for v in values:
        result += 1 / v
    return result


numbers = [-1, 0, 1]
try:
    one_over = sum_reciprocals(numbers)
except ArithmeticError as error:
    print(f"Error trying to sum reciprocals: {error}")

Here's an example of combining a `try`, several `except` statements, an `else` that gets executed if `try` is, and a `finally` that always get executed.

In [None]:
import datetime

b_year = 1852
name_input = "Ada"


def process_input(name_input, b_year):
    try:
        name = str(name_input)
        year_born = int(b_year)
        age = datetime.datetime.now().year - int(year_born)
        print(f"You are {name}. And your age is {age}.")
    except TypeError:
        print("Type error occur")
    except ValueError:
        print("Value error occur")
    except ZeroDivisionError:
        print("zero division error occur")
    else:
        print("I usually run with the try block")
    finally:
        print("I always run.")


process_input(name_input, b_year)

## Classes and objects

Python is an object oriented programming language. Everything is an object (and every object has a type). A Class is an object constructor, a blueprint for creating objects. An object is a 'live' instance of a class. Objects are to classes what a yellow VW Beetle is to cars. The class defines the attributes and methods that the object can perform.

Classes and instances of them are useful in certain situations, the most common being when you need something that has 'state', i.e. it can remember things that have happened to it, carry information with it, and change form.

While you're quite unlikely to need to build classes in economics (unless you're doing something really fancy), some of the biggest Python packages are based around classes so it's useful to understand a bit about how they work, and especially how they have state.

The syntax to create a class is

```python
class ClassName:
  ...code...
```

But it's easiest to show with an example:

In [None]:
# Define a class called Person


class Person:
    def __init__(self, name):
        self.name = name


# Create an instance of the class
p = Person("Adam")

When we check `type`, that's when it gets *really* interesting

In [None]:
type(p)

Woah! We created a whole new data type based on the `Class` name. The class has a constructor method, `__init__`, that, in this case, takes an input variable `name` and assigns it to an *internal* object variable `name`. The `self` variable that you can also see is really saying 'generate an object of type this Class when called'. We can access any internal variables like this:

In [None]:
p.name

Okay but what's the point of all this? Well we can now create as many objects as we like of class 'Person' and they will have the same structure, but not the same state, as other objects of class 'Person'.



In [None]:
m = Person("Ada")
m.name

This is a very boring class! Let's add a method, which will allow us to change the state of objects. Here, we add a method `increment_age` which is also indented under the `class Person` header. Note that it takes `self` as an input, just like the constructor, but it only acts on objects of type person that have *already* been created.

In [None]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def increment_age(self):
        self.age = self.age + 1


# Create an instance of the class
p = Person("Adam", 231)

print(p.age)
# Call the method increment_age
p.increment_age()
print(p.age)

This very simple method changes the internal state. Just like class constructors and regular functions, class methods can take arguments. Here's an example:


In [None]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def increment_age(self):
        self.age = self.age + 1

    def change_age(self, new_age):
        self.age = new_age


# Create an instance of the class
p = Person("Adam", 231)

print(p.age)
# Call the method increment_age
p.change_age(67)
print(p.age)

It can be tedious to have to initialise a class with a whole load of parameters every time. Just like with functions, we can define *default parameters* for classes:

In [None]:
class Person:
    def __init__(self, name="default_name", age=20):
        self.name = name
        self.age = age


p = Person()
p.name

That covers a lot of the basics of classes but if you're using classes in anger then you might also want to look up [inheritance and composition](https://realpython.com/inheritance-composition-python/).

### Dataclassess

The basic classes we created above come with a lot of 'boilerplate'; code we need but which is not very surprising. Dataclasses were inroduced in Python 3.7 as a way to remove this boilerplate when the classes being created are quite simple. Think of dataclasses as a class with sensible defaults that is for light object-oriented programming.

A simple example, with a `Circle` class, demonstrates why they are effective. First, the full class way of doing things:



In [None]:
import numpy as np


class Circle1:
    def __init__(self, colour: str, radius: float) -> None:
        self.colour = colour
        self.radius = radius

    def area(self) -> float:
        return np.pi * self.radius ** 2


circle1 = Circle1("red", 2)
circle1

We don't get a very informative message when we call `circle1`, as you can see. At least we can compute its area:

In [None]:
circle1.area()

Now we'll create the same object with dataclasses

In [None]:
from dataclasses import dataclass


@dataclass
class Circle2:
    colour: str
    radius: float

    def area(self) -> float:
        return np.pi * self.radius ** 2


circle2 = Circle2("blue", 2)
circle2

Right away we get a much more informative message when we call the object, *and* the class definition is a whole lot simpler. Everything else is just the same (just try calling `circle2.area()`).

## Tests

Tests check that your code is behaving as you expect, even as parts of it change or are updated. They most commonly compare an expected input and output with what actually comes out of your code for a given input.

Writing tests for code is often an after thought for research code, if it's a thought at all. That's understandable. But it can really boost reliability and robustness. This excerpt from [Research Software Engineering with Python](https://merely-useful.github.io/py-rse/testing.html) explains why testing even research code is extremely useful:

> Why is testing research software important? A successful early career researcher in protein crystallography, Geoffrey Chang, had to retract five published papers—three from the journal Science—because his code had inadvertently flipped two columns of data (Miller 2006). More recently, a simple calculation mistake in a paper by Reinhart and Rogoff contributed to making the financial crash of 2008 even worse for millions of people (Borwein and Bailey 2013). Testing helps to catch errors like these.

The Reinhart and Rogoff paper used Excel for its analysis; while Excel has some pros and does some good in the world, my strong recommendation is that it should *not* be used for any important analysis.

Back to testing: if you're writing a *package* of code for others to use, one that will change and grow over time, tests are essential. These are some general guidelines for good testing, such as:

- A testing 'unit' should focus on a single bit of functionality;
- Each test unit must be fully independent (in practice, this means that `setUp()` and `tearDown()` methods need to be defined to prep data or objects to the point where they can be used by the testing unit);
- Tests should run fast if possible;
- If you know how, it's a good idea to use something like a 'Git hook' that runs tests before code is saved in a shared repository;
- If you find a bug in your code, it's good practice to write a new test that targets it; and
- Although long, extremely descriptive names are not very helpful in regular code, they *should* be used for functions that test code because this is the name you will see when the test fails (and you want to know what it refers to).

The most common way to run tests is to add assertions to code. An assertion is a statement that something must be true at a certain point in a program. When an assertion evaluates to true, nothing happens; if it’s false, the programme stops and prints a user-defined error message. Here's an example of an assertion:

In [None]:
positive_num = 5

assert positive_num > 0

If we had run 

```python
positive_num = 5

assert positive_num < 0
```

it would have resulted in

```python
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-55-db0e55fa5cb7> in <module>
      1 positive_num = 5
      2 
----> 3 assert positive_num < 0

AssertionError:
``` 

Putting assertions into code here and there is one way to check that everything is working as you anticipate, but you won't find out about it until you actually run the code! A testing framework allows you to separate checking the code behaves as you'd expect from running the code, so that you can ensure everything works before putting that big run on.

To run lots of tests, we can use a test framework (also called a test runner). A commonly used test framework is [**pytest**](https://docs.pytest.org/en/latest/). Essentially it will run anything that you've flagged as a test in your code and report back to you on whether it passed or not. Here's how it works: tests are put in files whose names begin with `test_`, each test is a function whose name also begins with `test_`, and these functions use `assert` statements to check results.

Normally, your functions would be in a different script but here's an example of a code function and a testing function (that tests it) that you might put in a file called `test.py`.

In [None]:
# content of test.py
def inc(x):
    return x + 1


def test_answer():
    assert inc(3) == 5

To run this, we would enter

```bash
pytest
```

on the command line. This would yield

```bash
=========================== test session starts ============================
platform linux -- Python 3.x.y, pytest-6.x.y, py-1.x.y, pluggy-0.x.y
cachedir: $PYTHON_PREFIX/.pytest_cache
rootdir: $REGENDOC_TMPDIR
collected 1 item

test.py F                                                     [100%]

================================= FAILURES =================================
_______________________________ test_answer ________________________________

    def test_answer():
>       assert inc(3) == 5
E       assert 4 == 5
E        +  where 4 = inc(3)

test_sample.py:6: AssertionError
========================= short test summary info ==========================
FAILED test.py::test_answer - assert 4 == 5
============================ 1 failed in 0.12s =============================
```

As you add more tests, they will be automatically picked up and run by `pytest`.

When writing tests, think about whether you have tested realistic combinations of input parameters, tested all discrete outputs at least once, tested the boundaries of continuous outputs, and ensured that informative errors are raised when things go wrong.

If you're using Visual Studio Code, support for **pytest** comes built-in.

### Testing continuous variables

Imagine we are not testing for an integer, such as 4, but instead for a real number, such as 4.32838. Computers are nothing if not punishingly literal, however, so if evaluation of the code produces 4.32837 instead of 4.32838 the `assert` statement will fail and the test won't pass even if the two numbers are close enough for our purposes. It's even worse than that because computers *cannot* accurately represent real numbers to arbitrary levels of precision. 

To avoid tests that fail even though nothing's really wrong, most testing packages come with tools for real numbers. Here's an example where we have a parameter alpha that we'd like to test against an expected value of 1 but with a tolerance of 0.01.

In [None]:
import pytest


def complicated_func():
    """A really complicated func"""
    return 0.998


alpha = complicated_func()
expected_alpha = pytest.approx(1.0, abs=0.01)
assert alpha == expected_alpha
print(expected_alpha)

### Test coverage

A production-ready code should be heavily tested. Coverage tests ask how much of your code is actually covered by tests and where the code is that isn't tested. One package to do this is called [**coverage**](https://coverage.readthedocs.io/en/coverage-5.5/). If you already have code tests written, you can run:

```bash
coverage run -m pytest
```

on the command line instead of just running `pytest`. To get a report on the success use

```bash
$ coverage report -m
Name                      Stmts   Miss  Cover   Missing
-------------------------------------------------------
my_program.py                20      4    80%   33-35, 39
my_other_module.py           56      6    89%   17-23
-------------------------------------------------------
TOTAL                        76     10    87%
```

If you would prefer a html report over one in the terminal, use `$ coverage html` to create a report at `htmlcov/index.html`.

## Type annotations and type checkers

Type annotations were introduced in Python 3.5 (these notes are written in 3.8). If you've seen more low-level languages, typing will be familiar to you. Python uses 'duck typing' ("if it walks and quacks like a duck, it is a duck") which means that if a variable walks like an integer, and talks like an integer, then it gets treated as if it is an integer. Ditto for other variable types. Duck typing is useful if you just want to code quickly and aren't writing production code.

But... there are times when you *do* know what variable types you're going to be dealing with ahead of time and you want to prevent the propagation of the wrong kinds of variable types. In these situations, you can clearly say what variable types are supposed to be. And, when used with some other packages, typing can make code easier to understand, debug, and maintain.

Note that it doesn't have to be all or nothing on type checking, you can just add it in gradually or where you think it's most important.

Now it's important to be really clear on one point, namely that *Python does **not** enforce type annotations*. But we can use *static type checking* to ensure all types are as they should be in *advance* of running. Before we do that, let's see how we add type annotations. 

This is the simplest example of a type annotation:

In [None]:
answer: int = 42

This explicitly says that `answer` is an integer. Type annotations can be used in functions too:

In [None]:
def increment(number: int) -> int:
    return number + 1

A static type checker uses these type annotations to verify the type correctness of a programme without executing it. [**mypy**](http://mypy-lang.org/) is the most widely used static type checker. After installing **mypy**, to run type checking on a file `code_script.py` use

```bash
mypy code_script.py
```

on the command line.

What do you see when you run it? Let's say the content of your script is:

```python
# Contents of code_script.py
def greeting(name: str) -> str:
    return 'Hello ' + name


greeting(3)
```

This would return:

```bash
Argument 1 to "greeting" has incompatible type "int"; expected "str"
```

Here are more of the type annotations that you might need or come across, courtesy of the [**mypy**](http://mypy-lang.org/) documentation:

```python
from typing import List, Set, Dict, Tuple, Optional

# For simple built-in types, just use the name of the type
x: int = 1
x: float = 1.0
x: bool = True
x: str = "test"
x: bytes = b"test"

# For collections, the type of the collection item is in brackets
# (Python 3.9+ only)
x: list[int] = [1]
x: set[int] = {6, 7}

# In Python 3.8 and earlier, the name of the collection type is
# capitalized, and the type is imported from 'typing'
x: List[int] = [1]
x: Set[int] = {6, 7}

# Same as above, but with type comment syntax (Python 3.5 and earlier)
x = [1]  # type: List[int]

# For mappings, we need the types of both keys and values
x: dict[str, float] = {'field': 2.0}  # Python 3.9+
x: Dict[str, float] = {'field': 2.0}

# For tuples of fixed size, we specify the types of all the elements
x: tuple[int, str, float] = (3, "yes", 7.5)  # Python 3.9+
x: Tuple[int, str, float] = (3, "yes", 7.5)

# For tuples of variable size, we use one type and ellipsis
x: tuple[int, ...] = (1, 2, 3)  # Python 3.9+
x: Tuple[int, ...] = (1, 2, 3)

# Use Optional[] for values that could be None
x: Optional[str] = some_function()
# Mypy understands a value can't be None in an if-statement
if x is not None:
    print(x.upper())
# If a value can never be None due to some invariants, use an assert
assert x is not None
print(x.upper())
```

## Timing and profiling

Timing and profiling your code are useful to understand where bottlenecks might be. One of the principles of programming is don't optimise too soon, so, if you are reaching for timing and profiling, you've probably already hit that bottleneck.

```{note}
This section just deals with diagnosing slow code: how to speed code up will be covered elsewhere.
```

There are different levels of sophistication for timing and profiling. The simplest way to time is to use the built-in library.

In [None]:
import time
import timeit


def f(nsec=1.0):
    """Function sleeps for nsec seconds."""
    time.sleep(nsec)


start = timeit.default_timer()
f()
elapsed = timeit.default_timer() - start
elapsed

If you're working jupyter notebooks (which is what this book is written in), you can use the `timeit` magic, a convenient shortcut for timing a line or chunk of code (more on magics, and timeit, [here](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time)):

In [None]:
%timeit f(0.01)

timeit will adjust the number of repeats according to how slow the code is to run. Several other magics are available:

- `%time`: Time the execution of a single statement
- `%timeit`: Time repeated execution of a single statement for more accuracy
- `%prun`: Run code with the profiler
- `%lprun`: Run code with the line-by-line profiler
- `%memit`: Measure the memory use of a single statement
- `%mprun`: Run code with the line-by-line memory profiler

Another way to time functions is via decorators:

In [None]:
def process_time(f, *args, **kwargs):
    def func(*args, **kwargs):
        import timeit

        start = timeit.default_timer()
        f(*args, **kwargs)
        print(timeit.default_timer() - start)

    return func


@process_time
def f1(nsec=1.0):
    """Function sleeps for nsec seconds."""
    time.sleep(nsec)


f1()

Timing is very simple and just tells you the total time of what's inbetween the start and end points. Often, you would want a bit more detail about what's being slow, or you may wish to profile an entire script. For that, there's `cProfile`, which is a built-in package. You can use it on the command line or within scripts, like this:

In [None]:
import time
import cProfile


def foo():
    time.sleep(1.0)
    time.sleep(2.5)


cProfile.run("foo()")

What do we get here? The `percall` columns are `tottime` and `cumtime` (respectively) divided by `ncalls`. `tottime` is the time spent in the function excluding any sub-functions, while `cumtime` is the cumulative time spent.

There are various profilers out there that extend the basics provided by **cProfile**. Let's look at a few others to get a sense of what extra they offer.

[**pyinstrument**](https://github.com/joerick/pyinstrument) attempts to improve on the standard profilers by recording the 'entire stack' of methods so that identifying expensive sub-calls to methods is easier. There are two ways to run it: on the command line, using `pyinstrument script.py` instead of `python script.py`, or by wedging the code you're interested in *within* a script between a `.start()` and `.stop()` method like this:

In [None]:
from pyinstrument import Profiler


profiler = Profiler()
profiler.start()


def fibonacci(n):
    if n < 0:
        raise Exception("n must be a positive integer")
    elif n == 1:
        return 0
    elif n == 2:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)


fibonacci(20)

profiler.stop()

# Print the results of the profiling to screen
print(profiler.output_text(unicode=True, color=True))

Because our function, `fibonacci`, is nested, the function call breakdown that **pyinstrument** produces is likewise nested so that you can see how the code calls progresses.

The output analysis can be written to an interactive HTML file too: just use `profiler.output_html()`

Time to execute code is not the only thing we care about. In fact, for economics applications, it's much more likely to be a memory bottleneck than a speed one. Operations that involve lots of data, large matrices, or both, might be memory hogs. If you hit the memory (RAM) limit on your machine, it will slow right down and the process that's running might just crash completely. So wouldn't it be great it we had a way to profile both code and memory!? Well, we do.

[**scalene**](https://github.com/plasma-umass/scalene) is a command-line utility for profiling code execution time *and* memory. It profiles whole scripts, run `scalene script.py` on the command line, and produces HTML reports. Here's an example of some output:

![Screenshot from scalene running on a problem](https://raw.githubusercontent.com/plasma-umass/scalene/master/docs/images/sample-profile-pystone.png)

What's particularly useful is having the part of the code where the bottleneck is displayed on the right. Memory usage over time tells you where the memory-hogging lines are, while the two CPU % columns tell you how much of the total time (shown at the top) was spent in Python code and in non-Python ('native') code, which Python often calls under the hood.

## I am the Walrus

The Walrus operator, `:=` was introduced in Python 3.8 and, well, it's fairly complicated but it *does* have its uses. The main use case for the Walrus operator is when you want to both *evaluate an expression* and *assign a variable* in one fell swoop.

Take this (trivial) example which involves evaluating an expression, `len(a) > 4`, that returns a boolean and then assigning that *same* expression to a variable `n`:

In [None]:
a = [1, 2, 3, 4]
if len(a) > 3:
    n = len(a)
    print(f"List is too long ({n} elements, expected <= 3)")

The Walrus operator allows us to skip the clumsy use of `len(a)` twice and do both steps in one go. As noted, that's trivial here, but if evaluation were very computationally expensive, then this might save us some trouble. Here's the version with the Walrus operator:

In [None]:
a = [1, 2, 3, 4]
if (n := len(a)) > 3:
    print(f"List is too long ({n} elements, expected <= 3)")

## Multiple dispatch

One can use object-oriented methods and inheritance to get different code objects to behave in different ways depending on the type of input. For example, a different behaviour might occur if you send a string into a function versus an integer. An alternative to the object-oriented approach is to use *multiple dispatch*. [**fastcore**](https://fastcore.fast.ai/) is a library that provides "goodies to make your coding faster, easier, and more maintainable" and has many neat features but amongst the 'goodies' is multiple dispatch, with the `typedispatch` decorator.

In [None]:
# fastcore is designed to be imported as *
from fastcore.dispatch import *


@typedispatch
def func_example(x: int, y: float):
    return x + y


@typedispatch
def func_example(x: int, y: int):
    return x * y


# Int and float
print(func_example(5, 5.0))

# Int and int
print(func_example(5, 5))

What we can see here is that we have the same function, `func_example`, used twice with *very* similar inputs. But the inputs are *not* the same; in the first instance it's an integer and a float while in the second it's two integers. The different inputs get routed into the different versions of the `@typeddispatch` function. This decorator-based approach is not the only way to use **fastcore** to do typed dispatch but it's one of the most convenient.

## Map, filter, and reduce

Map, filter, and reduce are built-in higher order functions. Lambda functions, featured in the basics of coding chapter, can be passed as into each of these as an argument and some of the best use cases of lambda functions are in conjunction with map, filter, and reduce.



### Map

`map` takes a function and an iterable as arguments, ie the syntax is `map(function, iterable)`. An iterable is a type that is composed of elements that can be iterated over. The map essentially applies the function to each entry in the iterable. Here's an example where a list of strings is cast to integers via `map`:

In [None]:
numbers_str = ["1", "2", "3", "4", "5"]
mapped_result = map(int, numbers_str)
list(mapped_result)

Here's an example with a lambda function. The benefit of using a lambda in this map operation is that otherwise we would have to write a whole function that simply returned the input with `.title()` at the end:

In [None]:
names = ["robinson", "fawcett", "ostrom"]
names_titled = map(lambda name: name.title(), names)
list(names_titled)

### Filter

`filter` calls a specified function and returns a boolean for each item of the specified iterable. It filters the items that satisfy the given boolean criteria. It uses the `filter(function, iterable)` syntax. In the example below, we take all the numbers from zero to five and filter them according to whether they are divisible by 2:

In [None]:
numbers = list(range(6))
fil_result = filter(lambda x: x % 2 == 0, numbers)
list(fil_result)

### Reduce

`reduce` is defined in the built-in `functools` module. Like `map` and `filter`, `reduce` takes two parameters, a function and an iterable. However, it returns a single value rather than another iterable. The way `reduce` works is to apply operations successively so that the example below effectively first sums 2 and 3 to make 5, then 5 and 5 to make 10, then 10 and 15 to make 25, and, finally, 25 and 20 to make the final result of 45.

In [None]:
from functools import reduce

numbers = [2, 3, 5, 15, 20]

reduce(lambda x, y: x + y, numbers)