# Chapter 3: Functions

## TL;DR

A *function* is a **named sequence** of statements that perform a computation.

Functions provide benefits:
- make programs easier to comprehend (and debug) for humans
- eliminate redundancies by allowing re-use of code

##  Function Definition

Custom (i.e., not built-in) functions can be defined with the **`def` statement**.

A function must be given a unique **name** (otherwise it may overwrite a previously defined one) where the same naming rules apply as for variables.

Functions can define an arbitrary number of **parameters** (in parenthesis) that are referred to within the indented **code block** (by convention with four spaces, not tabs). The latter is often also called a function's **body** while the first line is the **header** and ends with a colon.

A function may have a **return value**. Functions that have one are called **fruitful**; otherwise they are called **void**. Functions of the latter kind are still useful because of their side effects. Strictly speaking they also have an implicit return value `None` that is different from the `False` value seen before.

To maintain good coding practices, a function should define a **docstring** that describes what it does in a short subject line, what parameters it expects, and what it returns. A docstring is a simple multi-line string defined with triple-double quotes (note that strings are covered in depth in a later notebook). Good practices as to how to format the docstring are [PEP 257](https://www.python.org/dev/peps/pep-0257/) and section 3.8 in [Google's Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md).

The previous example "program" can be re-written like so.

In [1]:
def average_evens(numbers):
    """Calculate the arithmetic mean of all even numbers in a list.

    Args:
        numbers (list): a list of numbers, may be integers or floats

    Returns:
        float: the arithmetic mean
    """
    evens = [n for n in numbers if n % 2 == 0]
    average = sum(evens) / len(evens)
    return average

Once a function is defined, it can be referred to just like any other object by its name. It's "value" might seem "awkward" at first.

In [2]:
average_evens

<function __main__.average_evens(numbers)>

A function has its own memory location and type.

In [3]:
id(average_evens)

140067103815608

In [4]:
type(average_evens)

function

The built-in [help()](https://docs.python.org/3/library/functions.html#help) function shows a function's docstring.

In [5]:
help(average_evens)

Help on function average_evens in module __main__:

average_evens(numbers)
    Calculate the arithmetic mean of all even numbers in a list.
    
    Args:
        numbers (list): a list of numbers, may be integers or floats
    
    Returns:
        float: the arithmetic mean



In a Jupyter notebook, we can just as well add a question mark to a function's name to achieve the same.

In [6]:
average_evens?

Two questions marks show a function's code.

In [7]:
average_evens??

## Function Calls

Once defined, we can **call** (= "execute") a function using the **call operator** `(...)`. The formal parameters are filled in by passing variables or expressions as **arguments** to the function within the parenthesis.

In [8]:
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [9]:
average_evens(nums)

6.0

The return value is usually assigned to a new variable.

In [10]:
avg = average_evens(nums)

In [11]:
avg

6.0

## Scoping Rules

### Local Scope disappears

Note that parameters defined by a function and variables created inside it are **local** to that function. That means they are newly defined each time a function is called and "destroyed" right after the function returns. We say they go out of scope once the function terminates.

In [12]:
numbers

NameError: name 'numbers' is not defined

In [13]:
evens

NameError: name 'evens' is not defined

### Global Scope is everywhere

On the contrary, while a function is being executed, it can "see" the variables of the **enclosing scope**. This is a common source of semantic errors. Consider the following stylized example `average()`. The error is hard to spot with eyes. The problem is that internally the function never uses the `numbers` argument but accidently refers to the `nums` variable in the **global scope**.

In [14]:
nums

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [15]:
def average(numbers):
    """Calculate the arithmetic mean of all numbers in a list.

    Args:
        numbers (list): a list of numbers, may be integers or floats

    Returns:
        float: the arithmetic mean
    """
    sum_of_list = sum(nums)
    n_list_items = len(nums)
    return sum_of_list / n_list_items

In [16]:
average([1, 2, 3])

5.5

In [17]:
average([325742, 34052305, 3252])

5.5

### Shadowing

Code gets even more confusing when variables by the same name from different scopes collide. In particular, what should we expect to happen if the function internally sets or changes such a variable? The answer is that the assignment statement within `average_odds()` creates a new name `odds` that is local to the function's scope only.

In [18]:
def average_odds(numbers):
    """Calculate the arithmetic mean of all odd numbers in a list.

    Args:
        numbers (list): a list of numbers, must be integers

    Returns:
        float: the arithmetic mean
    """
    # First, cast all numbers as integers.
    nums = [int(n) for n in numbers]
    odds = [n for n in nums if n % 2 != 0]
    average = sum(odds) / len(odds)
    return average

In [19]:
nums

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [20]:
average_odds([1, 20, 3, 40, 5])

3.0

In [21]:
average_odds(nums)

5.0

The `nums` variable in the global scope is unaffected.

In [22]:
nums

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

## Built-in Functions

Python comes with plenty of useful functions built in, some of which we have already seen before. The [Python Documentation](https://docs.python.org/3/library/functions.html) has the full list.

[len()](https://docs.python.org/3/library/functions.html#len) counts the number of elements in the list ...

In [23]:
len(nums)

10

... while [sum()](https://docs.python.org/3/library/functions.html#sum) adds up all the elements.

In [24]:
sum(nums)

55

We can **cast** (= "convert") certain values with a different type. For example, to convert a float or a text into an integer, we can use the [int()](https://docs.python.org/3/library/functions.html#int) built-in.

In [25]:
int(avg)  # avg is a float

6

In [26]:
int("6")

6

Observe that casting as an integer is different from rounding (with the [round()](https://docs.python.org/3/library/functions.html#round) built-in).

In [27]:
int(7.99)

7

In [28]:
round(7.99)

8

Not all conversions are valid and runtime errors can occur, like the `ValueError` below.

In [29]:
int("six")

ValueError: invalid literal for int() with base 10: 'six'

We can also cast in the other "direction" with the [float()](https://docs.python.org/3/library/functions.html#float) built-in.

In [30]:
float(42)

42.0

## Positional vs. Keyword Arguments

In this chapter we have passed in only one argument in each of the above function calls. In a previous chapter, however, we have already seen the built-in function [divmod()](https://docs.python.org/3/library/functions.html#divmod) in use, which takes two arguments. Obviously, the order of the numbers passed in matters. Whenever we call a function and list its arguments in a comma seperated manner, we say that we pass in the arguments "by position" or refer to them as **positional arguments**.

In [31]:
divmod(42, 10)

(4, 2)

In [32]:
divmod(10, 42)

(0, 10)

For many functions there is a "natural" order to the arguments. But what if this is not the case? For example, let's create a close relative of the above `average_evens()` function that also scales the result by a factor. What is more natural? Passing in `numbers` first or `scalar`? There is no obvious way and we go with the first alternative.

In [33]:
def scaled_average_evens(numbers, scalar):
    """Calculate a scaled arithmetic mean of all even numbers in a list.

    Args:
        numbers (list): a list of numbers, may be integers or floats
        scalar (float): the scalar that is multiplied with the mean
            of the even numbers

    Returns:
        float: the scaled arithmetic mean
    """
    evens = [n for n in numbers if n % 2 == 0]
    average = sum(evens) / len(evens)
    return scalar * average

We can pass in the arguments by position as before.

In [34]:
scaled_average_evens(nums, 2)

12.0

However, now the function call is a bit harder to comprehend because we need to always remember what the $2$ means. Luckily, we can also reference the formal parameter names as **keyword arguments**. We can even combine positional and keyword arguments. Each of the function calls below does the exact same thing.

In [35]:
scaled_average_evens(nums, scalar=2)

12.0

In [36]:
scaled_average_evens(numbers=nums, scalar=2)

12.0

In [37]:
scaled_average_evens(scalar=2, numbers=nums)

12.0

Unfortunately, there are ways to screw this up with a `SyntaxError`. If positional and keyword arguments are mixed, the keyword arguments must come last.

In [38]:
scaled_average_evens(numbers=nums, 2)

SyntaxError: positional argument follows keyword argument (<ipython-input-38-a6b957a168d8>, line 1)

### Default Argument Values

Defining both `average_evens()` and `scaled_average_evens()` is also kind of repetitive, which will make a code base harder to maintain in the long run. As not scaling the arithmetic mean is just a special of scaling it with $1$, we could re-define the two functions like below. In this variant, the function resembling the special case **forwards** the call to the more general function using a `scalar` argument of $1$.

In [39]:
def scaled_average_evens(numbers, scalar):
    """Calculate a scaled arithmetic mean of all even numbers in a list.

    ...

    """
    evens = [n for n in numbers if n % 2 == 0]
    average = sum(evens) / len(evens)
    return scalar * average

In [40]:
def average_evens(numbers):
    """Calculate the arithmetic mean of all even numbers in a list.

    ...

    """
    return scaled_average_evens(numbers, scalar=1)

In [41]:
average_evens(nums)

6.0

If we assume that scaling the mean will happen rarely in reality, we could also handle the two cases in one function definition by providing a **default value** for the `scalar` argument like below.

In [42]:
def average_evens(numbers, scalar=1):
    """Calculate the arithmetic mean of all even numbers in a list.

    Args:
        numbers (list): list of numbers, may be integers or floats.
        scalar (float, optional): scalar that is multiplied with the mean
            of the even numbers before the latter is returned

    Returns:
        float: the (scaled) arithmetic mean
    """
    evens = [n for n in numbers if n % 2 == 0]
    average = sum(evens) / len(evens)
    return scalar * average

Now we can call the function either with or without the `scalar` argument, which can be interpreted as either a positional or a keyword argument. However, which of the two versions where `scalar` is $2$ is easier to comprehend in a large program?

In [43]:
average_evens(nums)

6.0

In [44]:
average_evens(nums, 2)

12.0

In [45]:
average_evens(nums, scalar=2)

12.0

### Keyword-only Arguments

Since we assumed that scaling will happen rarely, we'd prefer that our new version of `average_evens()` be called with a keyword argument if `scalar` is not $1$. Luckily, recent versions of Python offer a keyword-only syntax, where all we need to do is to place the arguments for which we require explicit keyword use after an asterix `*` as shown below.

In [46]:
def average_evens(numbers, *, scalar=1):
    """Calculate the arithmetic mean of all even numbers in a list.

    Args:
        numbers (list): list of numbers, may be integers or floats.
        scalar (float, optional): scalar that is multiplied with the mean
            of the even numbers before the latter is returned

    Returns:
        float: the (scaled) arithmetic mean
    """
    evens = [n for n in numbers if n % 2 == 0]
    average = sum(evens) / len(evens)
    return scalar * average

If we now use the function foregoing the default value for the (by assumption) seldomly used argument, we have to use keyword notation. Otherwise, we obtain a `TypeError`.

In [47]:
average_evens(nums)

6.0

In [48]:
average_evens(nums, scalar=3)

18.0

In [49]:
average_evens(nums, 3)

TypeError: average_evens() takes 1 positional argument but 2 were given

## Anonymous Functions

Just like we can create a float type object "on the fly" without assigning it to a variable (like the `float(42)` in the previous code cell), we can "define" a function in memory and execute it right away, i.e. only use the function once and never again. Python provides a [lambda expression](https://docs.python.org/3/reference/expressions.html#lambda) syntax for doing this that starts with the keyword `lambda` followed by (optional) parameters, a mandatory colon, and one expression that serves as the return value.

For example, let's create an anonymous function that adds 3 to the only argument passed in and returns it.

In [50]:
lambda x: x + 3

<function __main__.<lambda>(x)>

The previous code cell is rather pointless as we end up with a function object in memory that we cannot refer to at all.

We could assign the lambda expression to some variable. However, that would totally go against the idea of having anonymous functions to begin with.

In [51]:
add_three = lambda x: x + 3  # this is pointless, we could use def instead

Now we can call the function as if we defined it with the `def` syntax from above.

In [52]:
add_three(10)

13

Instead, we could call the lambda expression right away, which looks really "weird" for now as we need two pairs of parenthesis.

In [53]:
(lambda x: x + 3)(39)  # this looks weird but will become very useful

42

The main idea of having functions without a name is to use them in a context where we know ahead of time that we will use a function only once and do not want to create variables that we will never use again. A very popular context where we can apply lambda expressions is the "map-filter-reduce" paradigm introduced in the chapter on lists.

## Extending Core Python

### Standard Library

In addition to its core language, Python comes with the so-called [standard library](https://docs.python.org/3/library/index.html), a collection of useful functionalities. The website [PYMOTW](https://pymotw.com/3/index.html) features one well written blog post per week on various parts of the standard library and serves as a how-to guide for solving common problems.

#### Example: The [Math Module](https://docs.python.org/3/library/math.html)

The `math` module, for example, is a popular part of the standard library. A **module** is nothing but a plain text file with the file extension ".py" that usually is in the same directory as the current program. However, the standard library's modules are available at all times.

To make non-core features usable, they must be imported first using an `import` statement.

In [54]:
import math

Now a module object `math` can be referenced just like a variable.

In [55]:
math

<module 'math' (built-in)>

In [56]:
type(math)

module

Let's see some things we can do with it. The [dir()](https://docs.python.org/3/library/functions.html#dir) built-in function can also be used on an object.

In [57]:
dir(math)

['__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'pi',
 'pow',
 'radians',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'tau',
 'trunc']

Common mathematical constants and functions are now available via the so-called **dot operator** `.` on the `math` object.

In [58]:
math.pi

3.141592653589793

In [59]:
math.e

2.718281828459045

In [60]:
math.sqrt

<function math.sqrt>

In [61]:
help(math.sqrt)

Help on built-in function sqrt in module math:

sqrt(...)
    sqrt(x)
    
    Return the square root of x.



In [62]:
math.sqrt(2)

1.4142135623730951

Observe that the arguments passed to functions do not need to be simple values. Instead, we can pass in an expression that evaluates to a value of the type the function expects for a parameter. This means that we can **compose** new expressions out of variables, functions, and other expressions.

In [63]:
math.sqrt(2 ** 2)

2.0

In [64]:
math.sqrt(average_evens([99, 100, 101]))

10.0

If we only need one particular function from a module, we can also use the alternative `from ... import ...` syntax. However, the `import ...` syntax creates what programmers call a **namespace**, which is nothing but a "prefix" that avoids collision of function names.

In [65]:
from math import sqrt

In [66]:
sqrt(16)

4.0

#### Example: The [Random Module](https://docs.python.org/3/library/random.html)

Often times, we need a random variable.

In [67]:
import random

In [68]:
dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_BuiltinMethodType',
 '_MethodType',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_inst',
 '_itertools',
 '_log',
 '_pi',
 '_random',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

In [69]:
random.random

<function Random.random>

In [70]:
help(random.random)

Help on built-in function random:

random(...) method of random.Random instance
    random() -> x in the interval [0, 1).



In [71]:
random.random()

0.7857416395365128

In [72]:
random.choice

<bound method Random.choice of <random.Random object at 0x2ab94b8>>

In [73]:
help(random.choice)

Help on method choice in module random:

choice(seq) method of random.Random instance
    Choose a random element from a non-empty sequence.



In [74]:
random.choice(nums)

1

In order to reproduce the same random numbers in a notebook each time we run it, we can set the [random seed](https://en.wikipedia.org/wiki/Random_seed). It is good practice to do this at the beginning of a program or notebook. Then every time we re-start the program, we will get exactly the same random numbers again. This becomes very important, for example, when we employ certain machine learning algorithms that rely on randomization like the infamous [Random Forest](https://en.wikipedia.org/wiki/Random_forest), and want to compare results without a statistical sampling error.

The random module provides the [seed()](https://docs.python.org/3/library/random.html#random.seed) function to do that.

In [75]:
random.seed(42)

In [76]:
random.random()

0.6394267984578837

In [77]:
random.seed(42)

In [78]:
random.random()

0.6394267984578837

### Third-party Packages

As the Python community is based around open source, many developers publish their code on the Python Package Index [PyPI](https://pypi.org) from where anyone can download and install it for free using command line based tools like `pipenv` (modern way, recently endorsed by the Python Packaging Authority) or `pip` (traditional way, still very wide spread). This way, we can always customize our Python installation even more. Note that managing many such packages is actually quite a deep topic on its own (search for "Dependency Hell" on the internet).

The difference between the standard library and third-party packages is that the first follows a much more formalized review process and is basically guaranteed to be supported by volunteer programmers around the world at all times. This is, however, also often true for third-party packages.

#### Example: [numpy](http://www.numpy.org/)

numpy is the de-facto standard in the Python world for handling "array-like" data, e.g., matrices and vectors.

As numpy is not in the standard library it must be manually installed (to execute commands in the "shell", we can prepend an exclamation mark in a Jupyter notebook cell).

In [79]:
!pip install numpy



numpy is conventionally imported with the shorter "idiomatic" name `np`.

In [80]:
import numpy as np

Let's convert the above list `nums` into a vector type object.

In [81]:
v = np.array(nums)

In [82]:
v

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

numpy adds new behavior to Python's arithmetic operators. For example, we can now scalar-multiply the list. numpy's functions are implemented in highly optimized C code and therefore much faster, especially when it comes to big data.

In [83]:
2 * v

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

This scalar multiplication would "fail" if we used a plain list like `nums` instead of an array like `v`.

In [84]:
2 * nums  # surprise, surprise

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

numpy is part of Python's so-called "scientific stack", a tightly coupled set of third-party libraries for storing "big" data efficiently ([numpy](http://www.numpy.org/)), "wrangling" ([pandas](https://pandas.pydata.org/)) and visualizing them ([matplotlib](https://matplotlib.org/) and [seaborn](https://seaborn.pydata.org/)), fitting classical statistical models ([statsmodels](http://www.statsmodels.org/)), training machine learning models ([sklearn](http://scikit-learn.org/)), and much more.

These libraries are covered in later chapters in much more depth.