# Functions

**Expected time for self-study**: 2 hours

A *function* is a **named sequence** of statements that perform a computation.

Functions provide benefits:
- make programs easier to comprehend (and debug) for humans
- eliminate redundancies by allowing re-use of code

**Note**: Python also allows creation of so-called anonymous functions. An example of this is introduced in the notebook on lists.

##  Function Definition

Custom (i.e., not built-in) functions can be defined with the **`def` statement**.

A function must be given a unique **name** (otherwise it may overwrite a previously defined one) where the same naming rules apply as for variables.

Functions can define an arbitrary number of **parameters** (in parenthesis) that are referred to within the indented **code block** (by convention with four spaces, not tabs). The latter is often also called a function's **body** while the first line is the **header** and ends with a colon.

A function may have a **return value**. Functions that have one are called **fruitful**; otherwise they are called **void**. Functions of the latter kind are still useful because of their side effects. Strictly speaking they also have an implicit return value `None`.

To maintain good coding practices, a function should define a **docstring** that describes what it does in a short subject line, what parameters it expects, and what it returns. A docstring is a simple multi-line string defined with triple-double quotes (note that strings are covered in depth in a later notebook). Good practices as to how to format the docstring are [PEP 257](https://www.python.org/dev/peps/pep-0257/) and section 3.8 in [Google's Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md).

The previous example can be re-written like so.

In [1]:
def average_evens(numbers):
    """Calculate the arithmetic mean of a list of numbers.

    Args:
        numbers (list): A list of numbers, may be integers or floats.

    Returns:
        float: The arithmetic mean.
    """
    evens = []
    # Obtain the even numbers.
    for number in numbers:
        if number % 2 == 0:
            evens.append(number)  # .append(...) adds a number to the end of the list in memory
    average = sum(evens) / len(evens)
    return average

Once a function is defined, it can be referred to just like a normal variable. It's "value" might seem "awkward" at first.

In [2]:
average_evens

<function __main__.average_evens(numbers)>

As everything is an object in Python, a function has its own memory location and type.

In [3]:
id(average_evens)

140569320394400

In [4]:
type(average_evens)

function

The built-in [help()](https://docs.python.org/3/library/functions.html#help) function shows a function's docstring.

In [5]:
help(average_evens)

Help on function average_evens in module __main__:

average_evens(numbers)
    Calculate the arithmetic mean of a list of numbers.
    
    Args:
        numbers (list): A list of numbers, may be integers or floats.
    
    Returns:
        float: The arithmetic mean.



In a Jupyter notebook, we can just as well add a question mark to a function's name to achieve the same.

In [6]:
average_evens?

Two questions marks show a function's code.

In [7]:
average_evens??

## Function Calls

Once defined, we can **call** (= "execute") a function using the **call operator** `(...)`. The formal parameters are filled in by passing variables or expressions as **arguments** to the function within the parenthesis.

In [8]:
one_to_ten = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [9]:
average_evens(one_to_ten)

6.0

The return value is usually assigned to a new variable.

In [10]:
avg = average_evens(one_to_ten)

In [11]:
avg

6.0

Note that parameters defined by a function and variables created inside it are **local** to that function. That means they are newly defined each time a function is called and destroyed right after the function returns.

In [12]:
numbers

NameError: name 'numbers' is not defined

In [13]:
evens

NameError: name 'evens' is not defined

[PythonTutor](http://pythontutor.com/visualize.html#code=def%20average_evens%28numbers%29%3A%0A%20%20%20%20evens%20%3D%20%5B%5D%0A%20%20%20%20for%20number%20in%20numbers%3A%0A%20%20%20%20%20%20%20%20if%20number%20%25%202%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20evens.append%28number%29%0A%20%20%20%20average%20%3D%20sum%28evens%29%20/%20len%28evens%29%0A%20%20%20%20return%20average%0A%0Aone_to_ten%20%3D%20%5B1,%202,%203,%204%5D%0A%0Aaverage_evens%28one_to_ten%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) can help us again visualizing the **flow of execution** and the state in memory.

## Built-in Functions

Python comes with plenty of useful functions built in, some of which we have already seen before. The [Python Documentation](https://docs.python.org/3/library/functions.html) has the full list.

[len()](https://docs.python.org/3/library/functions.html#len) counts the number of elements in the list ...

In [14]:
len(one_to_ten)

10

... while [sum()](https://docs.python.org/3/library/functions.html#sum) adds up all the elements.

In [15]:
sum(one_to_ten)

55

We can **cast** (= "convert") certain values with a different type. For example, to convert a float or a text into an integer, we can use the [int()](https://docs.python.org/3/library/functions.html#int) built-in.

In [16]:
int(avg)

6

In [17]:
int("6")

6

Observe that casting as an integer is different from rounding (with the [round()](https://docs.python.org/3/library/functions.html#round) built-in).

In [18]:
int(7.99)

7

In [19]:
round(7.99)

8

Not all conversions are valid and runtime errors can occur.

In [20]:
int("six")

ValueError: invalid literal for int() with base 10: 'six'

We can also cast in the other "direction" with the [float()](https://docs.python.org/3/library/functions.html#float) built-in.

In [21]:
float(42)

42.0

## Extending Core Python

### Standard Library

In addition to its core language, Python comes with the so-called [standard library](https://docs.python.org/3/library/index.html), a collection of useful functionalities. The website [PYMOTW](https://pymotw.com/3/index.html) features one well written blog post per week on various parts of the standard library and serves as a how-to guide for solving common problems.

#### Example: The [Math Module](https://docs.python.org/3/library/math.html)

The `math` module, for example, is a popular part of the standard library. A **module** is nothing but a plain text file with the file extension ".py" that usually is in the same directory as the current program. However, the standard library's modules are available at all times.

To make non-core features usable, they must be imported first using an **`import` statement**.

In [22]:
import math

Now a module object `math` can be referenced just like a variable.

In [23]:
math

<module 'math' (built-in)>

In [24]:
type(math)

module

Let's see some things we can do with it. The [dir()](https://docs.python.org/3/library/functions.html#dir) built-in function can also be used on an object.

In [25]:
dir(math)

['__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'pi',
 'pow',
 'radians',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'tau',
 'trunc']

Common mathematical constants and functions are now available via the so-called **dot operator** `.` on the `math` object.

In [26]:
math.pi

3.141592653589793

In [27]:
math.e

2.718281828459045

In [28]:
math.sqrt

<function math.sqrt>

In [29]:
help(math.sqrt)

Help on built-in function sqrt in module math:

sqrt(...)
    sqrt(x)
    
    Return the square root of x.



In [30]:
math.sqrt(2)

1.4142135623730951

Observe that the arguments passed to functions do not need to be simple variables. Instead, we can pass in an expression that evaluates to a value of the type the function expects for a parameter. This means that we can **compose** new expressions out of variables, functions, and other expressions.

In [31]:
math.sqrt(2 ** 2)

2.0

In [32]:
math.sqrt(average_evens([99, 100, 101]))

10.0

If we only need one particular function from a module, we can also use the alternative `from ... import ...` syntax. However, the `import ...` syntax creates what programmers call a **namespace**, which is nothing but a "prefix" that avoids collision of function names.

In [33]:
from math import sqrt

In [34]:
sqrt(16)

4.0

#### Example: The [Random Module](https://docs.python.org/3/library/random.html)

Often times, we need a random variable.

In [35]:
import random

In [36]:
dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_BuiltinMethodType',
 '_MethodType',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_inst',
 '_itertools',
 '_log',
 '_pi',
 '_random',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

In [37]:
random.random

<function Random.random>

In [38]:
help(random.random)

Help on built-in function random:

random(...) method of random.Random instance
    random() -> x in the interval [0, 1).



In [39]:
random.random()

0.19958658383874017

In [40]:
help(random.choice)

Help on method choice in module random:

choice(seq) method of random.Random instance
    Choose a random element from a non-empty sequence.



In [41]:
random.choice(one_to_ten)

1

In order to reproduce the random numbers in a notebook, we can set the [random seed](https://en.wikipedia.org/wiki/Random_seed). It is good practice to do this at the beginning of a program or notebook. Then every time we re-start the program, we will get exactly the same random numbers again.

The random module provides the [seed()](https://docs.python.org/3/library/random.html#random.seed) function to do that.

In [42]:
random.seed(42)

### Third-party Packages

As the Python community is based around open source, many developers publish their code on the Python Package Index [PyPI](https://pypi.org) from where anyone can download and install it for free using command line based tools like `pipenv` (modern way, recently endorsed by the Python Packaging Authority) or `pip` (traditional way, still very wide spread). This way, we can always customize our Python installation even more. Note that managing many such packages is actually quite a deep topic on its own (search for "Dependency Hell" on the internet).

The difference between the standard library and third-party packages is that the first follows a much more formalized review process and is basically guaranteed to be supported by volunteer programmers around the world. This is, however, also often true for third-party packages.

#### Example: [numpy](http://www.numpy.org/)

numpy is the de-facto standard in the Python world for handling "array-like" data, e.g., matrices and vectors.

As numpy is not in the standard library it must be manually installed (to execute commands in the "shell", we can prepend an exclamation mark in a Jupyter notebook cell).

In [43]:
! pip install numpy



numpy is conventionally imported with the shorter "idiomatic" name `np`.

In [44]:
import numpy as np

Let's convert the above list `one_to_ten` into a vector type object.

In [45]:
v = np.array(one_to_ten)

In [46]:
v

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

numpy adds new behavior to Python's arithmetic operators. For example, we can now scalar-multiply the list. numpy's functions are implemented in highly optimized C code and therefore much faster, especially when it comes to big data.

In [47]:
2 * v

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

numpy is part of Python's so-called "scientific stack", a tightly coupled set of third-party libraries for storing big data efficiently ([numpy](http://www.numpy.org/)), "wrangling" ([pandas](https://pandas.pydata.org/)) and visualizing them ([matplotlib](https://matplotlib.org/) and [seaborn](https://seaborn.pydata.org/)), fitting classical statistical models ([statsmodels](http://www.statsmodels.org/)), training machine learning models ([sklearn](http://scikit-learn.org/)), and much more.

These libraries are not covered in this tutorial but are only provided here as an example as how core Python is usually extended in data science applications.