# How to create your own function

Functions provide a way of packaging code into reusable and easy-to-use components and are a key part of many programming languages including Python.

In a similar fashion to other constructs in Python (like `for` loops and `if` statements), functions have a rigid structure. They are composed of some necessary scaffolding and some user defined input.

Let us consider the following function called `add_lists`, which would allow us to add the elements of two lists together:

In [1]:
def add_lists(x, y):
    z = []
    length_x = len(x)
    for i in range(length_x):
        z.append(x[i] + y[i])
    return z

### Defining a function

To create this function, first we must use the `def` keyword to start a function definition:

<pre>
 ↓
<b style="color:darkred">def</b> add_lists(x, y):
    z = []
    length_x = len(x)
    for i in range(length_x):
        z.append(x[i] + y[i])
    return z
</pre>

Then we specify the name that we want to give the function. Like anything in Python, choose a descriptive name that describes what it does. This is the name which we will use when *calling* the function:

<pre>
        ↓
def <b style="color:darkred">add_lists</b>(x, y):
    z = []
    length_x = len(x)
    for i in range(length_x):
        z.append(x[i] + y[i])
    return z
</pre>

Function definitions must then be followed by a pair of round brackets. This is a similar syntax to that used when *calling* a function and giving it arguments but here we're just defining it:

<pre>
             ↓    ↓
def add_lists<b style="color:darkred">(</b>x, y<b style="color:darkred">)</b>:
    z = []
    length_x = len(x)
    for i in range(length_x):
        z.append(x[i] + y[i])
    return z
</pre>

Between those brackets go the names of the parameters we want the function to accept. We can define zero or more parameters. Here we are defining two:

<pre>
              ↓  ↓
def add_lists(<b style="color:darkred">x, y</b>):
    z = []
    length_x = len(x)
    for i in range(length_x):
        z.append(x[i] + y[i])
    return z
</pre>

Finally, the line is completed with a colon:

<pre>
                   ↓
def add_lists(x, y)<b style="color:darkred">:</b>
    z = []
    length_x = len(x)
    for i in range(length_x):
        z.append(x[i] + y[i])
    return z
</pre>

Since we've used a colon, we must indent the body of the function as we did with loops and conditional statements:

<pre>
def add_lists(x, y):
    <b style="color:darkred">z = []
    length_x = len(x)
    for i in range(length_x):</b>  ← body of function<b style="color:darkred">
        z.append(x[i] + y[i])
    return z</b>
</pre>

Most functions will also want to return data back to the code that called it. You can choose what data is returned using the `return` keyword followed by the data you want to return:

<pre>
def add_lists(x, y):
    z = []
    length_x = len(x)
    for i in range(length_x):
        z.append(x[i] + y[i])
    <b style="color:darkred">return</b> z
      ↑
</pre>

### Using the function

This function can then be *called* (i.e. used) in the same way as any other function: with round brackets `(` `)` and passing the input arguments we want to use.

In [2]:
list1 = [1, 2, 3]
list2 = [3, 4, 5]

In [3]:
added_list = add_lists(list1,list2)
print(added_list)

[4, 6, 8]


---

## Examining our function

The `add_lists()` function asks for two arguments as an input.

In [4]:
add_lists([1,2], [3,4])

[4, 6]

How does this function react for different inputs? For example what about if we pass a list which is shorter than the other:

In [5]:
a = [1,2]
c = [3]

In [6]:
add_lists(a, c)

IndexError: list index out of range

In [None]:
add_lists(c, a)

So the way we have written this function, we have looked at the length of the *first* list as our basis for indexing. If we pass lists of mismatching length, this means we see different behaviour if we pass the shorter list as the first or second argument.

We could update our function to check both lengths and pick the shortest:

In [None]:
def add_lists(x, y):
    
    ### ADDED IN A CHECK FOR SHORTEST LENGTH
    length_x = len(x)
    length_y = len(y)
    
    if length_x <= length_y:
        length = length_x
    else:
        length = length_y
    ###
    
    z = []
    for i in range(length):
        z.append(x[i] + y[i])
    
    return z

In [None]:
add_lists(a, c)

In [None]:
add_lists(c, a)

Because we only have to write this functionality once, it's a lot easier to consider and cover different corner cases and make sure your code does what you expect in different circumstances.

### Required arguments

How do we define what a function needs? In this case, we have defined a function which requires *two arguments* by specifying `x` and `y`. If we don't pass both arguments, this raises a `TypeError`.

In [None]:
add_lists(a)

In this case we specified our inputs based on their *position*. When specifying arguments in this way, the order matters. The first argument we pass is being assigned to the `x` variable used within the function and the second is being assigned to the `y` variable. For our `add_lists` function we saw why this originally made a difference when passing the shorter list.

To be explicit we can also pass arguments by assigning them to the keyword names we have defined, in this case: *x* and *y*:

In [None]:
add_lists(x = a, y = c)

### Optional arguments

We have also seen when using built-in functions that there are often other arguments that *can* be supplied but don't *need* to be supplied.

For example, when calling the [`read_csv` function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) only the filename input *needs* be supplied for Python to not produce an error but can also be (and often should be) supplied with lots of additional arguments.

These are *optional* arguments and can be specified by providing a *default value* when creating the function. For example, for our `add_lists` function we could add an option to check the lists are the same length first by creating an input called `check_same_length` and assigning this to `True` by default:

In [None]:
def add_lists(x, y, check_same_length=True):
    
    length_x = len(x)
    length_y = len(y)
    
    ### ADDED AN IF BLOCK FOR check_same_length
    if check_same_length:
        if length_x != length_y:
            return None
    ###
    
    if length_x <= length_y:
        length = length_x
    else:
        length = length_y
    
    z = []
    for i in range(length):
        z.append(x[i] + y[i])
    
    return z

There are a few new concepts introduced here.

We have added a second `return` statement within a branch. A function will stop and exit once a `return` statement is reached but you could include multiple `return` statements within different branches to exit early based on a certain condition.

In this case when `check_same_length` is set to True and the lengths are not the same, the function branches, reaches the `return` statement and immediately stops executing, returning a `None` value instead. `None` objects are a way for Python to say that nothing is returned and are often used in functions. We can look at the behaviour for our two mismatched lists again:

In [None]:
output = add_lists(a, c, check_same_length=True)
print(output)

In [None]:
output = add_lists(a, c, check_same_length=False)
print(output)

#### Raising errors

Alternatively, we can also *raise* our own errors using the `raise` statement and supplying a [suitable error type](https://docs.python.org/3/library/exceptions.html#concrete-exceptions). For example here we could raise a `ValueError`:

In [None]:
def add_lists(x, y, check_same_length=True):
    '''
    Adds two lists together, element wise.

    Args:
        x (list) : First list containing values which can be added
        y (list) : Second list containing values which can be added
        check_same_length (bool) :
            When check_same_length is set to True this will raise a ValueError if the two lists are not the same length
            Otherwise the output list will be truncated to the shortest list length.
            Default = True
    
    Returns:
        list: each value in x and y added together
    '''
    length_x = len(x)
    length_y = len(y)
    
    if check_same_length:
        if length_x != length_y:
            raise ValueError("Length of x and y must match")
    
    if length_x <= length_y:
        length = length_x
    else:
        length = length_y
         
    z = []
    for i in range(length):
        z.append(x[i] + y[i])
    
    return z

Due to the increasing complexity of this function, we have also added a [documentation string](https://www.python.org/dev/peps/pep-0257/) (using three opening and closing quotes) to include details of the function, the inputs and the outputs.

In [None]:
output = add_lists(a, c, check_same_length=True)

You can see this now raises a `ValueError` with the error message we supplied "Length of x and y must match". 

Our instinct when writing code can often be to see errors like this as undesirable. However, creating an error output in this way clearly indicates *what* the problem is and *where* the problem first occurred.

If we took another route and returned a shorter list or a None object instead, an error could still be raised later if your code is relying on an output in a particular form (e.g. it expects a list of a certain length). By raising the error at the point where the input isn't what you expect with a clear error message, it can actually make you code easier to debug.

---

## Scope

One thing to bear in mind when writing functions is the idea of *scope*. This is the idea that variables created within a function are only available within the function itself. For example, in our original `add_lists()` function we define a variable called `z`. When we call that function, the value contained within `z` is returned and we can assign this to a new variable.

In [None]:
def add_lists(x, y):
    z = []
    length_x = len(x)
    for i in range(length_x):
        z.append(x[i] + y[i])
    return z

In [None]:
a = [1, 2]
b = [3, 4]

c = add_lists(a, b)
print(c)

However, if we try and extract the contents of `z` directly, we will see that this is not defined.

In [None]:
print(z)

Similarly, other variables defined within the function don't exist either, such as `length_x`:

In [None]:
print(length_x)

Any values you want to be able to use after the function has been run must be *returned* (using a `return` statement).

One benefit to this is that you can reuse simple variable names when writing functions without fear of overwriting other variables you have defined within code.

In [None]:
z = "Retain this string"

In [None]:
c = add_lists(a, b)

In [None]:
print(z)

Even though the `z` variable was used within the function, the `z` variable we defined within our main code was not reassigned.

---