## User-defined functions

Author: <i>Mark Waters</i>

### Learning Objectives

* Build a function;
* Call a function

### Table of Contents
* What is a function?
* Why use functions?
* How to create functions
* Calling functions
* Scope
* Multiple parameters
* Docstrings
* Keyword arguments
* Default arguments
* Flexible functions (\*args, \*\*kwargs)
* Returning multiple values
* Conclusion
* Key Takeaways
* Exercises

In previous trains, you would have come across numerous **built-in** Python functions (such as print(), sum(), float(), list(), round(), range(), etc.). In your work as a data scientist, however, the function you need for your task at hand often won't be available within the Python library of pre-defined functions. In such cases you will need to create your own **custom** function which can cater for the unique demands of your work domain or project. We refer to these functions as **user-defined** functions - they are defined by the Python user. In this Train, you will learn how to *build* your own function and then *call* that function to get a result.

### What is a function?

A python function is just a **reusable** chunk (or block) of code for performing a **task** or solving a specific problem. It consists of a **set of statements** or sequence of commands that you can use over and over again by calling the function. In most cases a function will receive one or more **inputs** (otherwise known as **arguments**) and return an **output** (value or result) to the caller. The output is a result of **transformations** or computations performed on the input(s). A function transforms something into something else - it produces some **change** to the input. Each function is designed to perform a specific task.

For example, let's surmise that a small retail clothing chain has 3 branches - one in Joburg, a second in Cape Town and a third in Durban. The somewhat fretful-looking book-keeper, <i>Di Rhea</i>, has been asked by the big cheese, <i>Dinah Might</i>, to calculate the total sales for the week. Python provides a built-in function called 'sum' which will do the job for her. The sum function takes a list of values and adds them together to get a total before you can say 'Jack Robinson', greatly calming the nerves of our stress-puppy. 🤣

In [3]:
total_sales = [50000,30000,20000]
sum(total_sales)

100000

In this case the **input** was the list of 3 regional sales values and the **output** was the total sales nationwide. There are many built-in functions and these can help you in every part of your data project – from data importing to data cleaning, visualisation, analysis and machine learning, i.e. the whole caboodle. Some further examples of built-in functions include: 

* **len()** - calculates the number of items in a list or the number of characters in a string
* **min()** - returns the smallest item in a list
* **sorted()** - sorts a list (of strings or numbers) in ascending order (by default)

### Why use functions?

Let's say that you wish to convert temperatures provided in degrees Celsius to Fahrenheit and you need to perform this computation repeatedly. The formula is: Fahrenheit = Celsius * 1.8 + 32. Now, you don't know what Celsius temperature(s) you will be given in future. You just know that you could be given any of a large range of Celsius temperatures and you don't want to have to **repeatedly** write the formula each time you're given a different value. That's where functions come in and make your life a whole lot easier. In essence, they avoid the need for repeating the same code in different parts of your program. Some computer tasks could consist of 15 lines of code so you can imagine how much **time** you can save by storing this code inside a function (assuming you're planning on using the code more than once). In the code block below, the formula is **wrapped** within a function you've created with the name 'fahrenheit_conversion'. Yes, we **wrap** our code in a function. The act of defining a function **stores** the code for future use.  

In [4]:
def fahrenheit_conversion(celsius): #This is the function header
    product = celsius * 1.8
    result = product + 32
    return result

Now, imagine you needed to use the above code ten times in your program but you hadn't stored the code within a function. Not only would you be giving yourself a lot of unnecessary work to do (by **repeating** the same code ten times over) but you'd also expose your program to a greater risk of **error**. For instance, at a later stage you may discover that the code is incorrect and needs to be changed. This would now require changes in ten places, some of which could be easily overlooked. With a function, on the other hand, you need only make the change in **one place**.  

In data science, there are many situations where you need to perform a repeated computational process on data. For example, you will often want to perform the same manipulation on every value in a column of a **table**. In the absence of a function to speed up the process, you could be sitting behind your computer until the cows come home. By that time you will be hypnotised, lobotomised, and anaesthetized and would probably rather be talking to the office plants!🤣

### How to create functions

To create a function, we use the keyword **'def'** which stands for 'define'. Using the keyword 'def' is a way of telling the computer that we wish to create a function. The font colour of the word is **green**, signifying that it is a Python keyword. After the keyword we provide the **name** of our function - in this case, 'fahrenheit_conversion'. The name should reflect what the function **does**. Make sure your function name is lower-case and, if it consists of two or more words, connect those words with an underscore; this is known as **snake_case**. The function name is followed by a set of **parentheses** which can contain none, one or more optional **parameters**. In our example, the function accepts a single parameter for which we've assigned the name 'celsius'. The function header ends with a **colon**(':'). What follows after the colon is the function **body** consisting of one or more statements - a complex function could easily run into 20 lines of code.  This is the code that needs to be executed when the function runs (i.e when it is called). Notice that the function body is **indented**. If it is not indented, an error will be triggered. By pressing 'Enter' after the colon, the next line with automatically be indented 4 spaces. All lines of code within the body should be consistently indented four spaces else you will get an error.

Within the body of the above function, our first statement takes the value provided to it and multiplies it by 1.8. The product of these two values is stored within a **variable** we have called 'product'. In the subsequent line of code, we add 32 to the value stored in 'product'. The result of this calculation is then stored within a variable that we have named 'result' which is, in turn, returned to the user calling the function. The **return** statement marks the end of the function; after the computer executes this line it exits the function. In other words, it does not continue to the next line. Note that the return statement must be used to provide an **output** to the user. Without this statement, the function will only perform **tasks** (e.g. printing to the screen) or calculations but the user will not receive an output (value). In such a case, Python implicitly deems the last line to be a return statement with the value 'None'. 

### Calling functions

Up to this point, we have simply **defined** our function. We haven't applied or **used** it - it hasn't **executed** the code. So far, we have only **stored** the code - no output comes out when defining a function. To apply the function, you need to **call** it. Calling it is a way of asking the function to perform the job it is designed to do - in our case, that is to convert a given temperature in Celsius to Fahrenheit.

Let's now call (or **invoke**) the function. To call a function you need to know the function's name and what arguments (inputs) need to be passed. We know from the function definition that it is expecting the user to supply one argument (a value for the parameter 'celsius'). An **argument** is an actual value for a given parameter.

Not all functions require one argument to be supplied. Some functions may not require any arguments at all. Some may require two or more arguments while others may give the user the option of passing in a variable number of arguments. We will supply our function with a value of 25 (degrees celsius) to see what value (in Fahrenheit) it spits out. To call a function, you simply type the function's name followed by parentheses and supply your input (25) inside those parentheses, i.e. you **pass** your input to the function. The value provided inside the parentheses is your **argument** and it is stored within the **parameter** 'celsius'. In other words, the parameter is the **name** of the argument (as seen in the function definition) whereas the argument is the **value** supplied by the user for that parameter when calling the function.

In [5]:
# Calling the function
fahrenheit_conversion(25) 

77.0

![Explore%20Assignment%20diag5.jpg.png](attachment:Explore%20Assignment%20diag5.jpg.png)

<i>Figure 1: Function inputs and outputs</i>

As you can see in the code block above, the function has calculated the result as 77.0 degrees Fahrenheit. That is the output we get when we invoke the function with our argument. In effect, we have passed an input (25 degrees Celsius) into our function and **reassigned** the final result to a **variable** we've named 'result' which we return to the caller. Because our function returns a value, we can store this value in a variable for future access and use. In the code block below, we have stored the return value of our function within a variable named 'balmy_temp'.

In [6]:
# Storing the returned value in a variable (balmy_temp)
balmy_temp = fahrenheit_conversion(25)
balmy_temp

77.0

Now, remember that the reason you created this function in the first place is because you expected you would need to use it time and time again. So, the next time you want to do a Celsius-to-Fahrenheit temperature conversion, you don't have to try remember the code that performs this task and rewrite it. All you need to do now is simply call the function, supplying it with your new value. Calling the function means that you want to use it. Let's say that you now need to calculate the Fahrenheit temperature for 40 degrees Celsius. That's pretty darn hot! Incidentally, do you know the one about the two eggs in a pot of boiling water? The one egg says "It's getting <i>hot</i> in here!" to which the second egg replies "Wait 'til you get out! They'll bash your head in!" 🤣

In [7]:
# Call the function passing a celsius temperature value of 40
fahrenheit_conversion(40)

104.0

How simple is that! You could teach your dog to do that over a long weekend! 🤣 Each call results in the body of fahrenheit_conversion being executed, using the value of the argument passed to it. 

### Scope

Be aware that names that are defined inside a function, like 'product' and 'result', only have a fleeting existence. They are defined only while the function is being called, and they are only accessible inside the body of the function. They are known only to the function in which they are declared. We can’t refer to these variables outside the body of our function since they only exist within the function. In technical <i>parlance</i> these variables are said to have **local scope** - in contrast to global scope (where a variable is defined outside a function and is accessible anywhere within your file). Therefore, trying to access the value of either of these variables outside the body of the function will produce a NameError. The name 'product' is only recognised inside the body of the function - when the function runs. It is not recognised outside the body of the function. By implication, the variable can only be used inside the function. This also means that you're at liberty to use the same names for variables inside other functions you define. No error will be raised. 

Using functions will eliminate a lot of unnecessary repetition, making your code a lot more streamlined and efficient. It will also make it easier to understand. When defining your own functions, make sure that you give your function a **meaningful name**. If you had given the name 'calculation' to the fahrenheit_conversion function, it would not be immediately apparent to you (or someone else reading your code) what the specific purpose of the function was. You don't want to bamboozle others reading your code. Speaking of which, can you figure out what the following means?

<i>I know you believe you understand what you think I said, but I'm not sure you realise that what you heard is not what I meant!</i> 🤣

### Multiple parameters

Our function example above made use of a single parameter. Let's now look at a function that takes **two parameters**. Suppose our intrepid entrepreneur, <i>Izzy Foreel</i>, wishes to calculate revenue based on price and quantity, then print the result to the screen and also return the result to the user. Izzy owns a store which sells one product - widgets - all at the same price. On any day of the week, he wishes to calculate revenue. Knowing that he'll need to perform this task over and over again, he decides to create a function. Now, each time he wants his function to perform its task, all he needs to do is call it and supply it with the two required arguments. 

In [8]:
def calc_revenue(price, quantity):
    """ Calculates revenue by multipyling product price and quantity sold """
    revenue = price * quantity
    print("Today's revenue was R{} based on a price of {} and quantity of {}".format(revenue, price, quantity))
    return revenue

### Docstrings

Directly beneath the function header in the code block above you will see some text within """triple quotes""". This is known as a **docstring** and it is some documentation that describes what the function does. Docstrings are not mandatory but are considered good practice, particularly with more complex functions, those spanning many lines, and for those occasions when you're importing a function into your script. Using **triple quotes** enables your docstring to span several lines. It's good practice to describe the function briefly in the first line and use subsequent lines to provide more detail (e.g. what the function does specifically and the data types accepted for each of the arguments) and give examples of function calls with their corresponding outputs. You can also include information about the computations performed within the function and the values returned. At a later stage, when other users want to find out what the function does, they can simply type **'help'** followed by the function name within parentheses. This will display the docstring as output. 

In [9]:
# To view the docstring for the function
help(calc_revenue)

Help on function calc_revenue in module __main__:

calc_revenue(price, quantity)
    Calculates revenue by multipyling product price and quantity sold



Having defined our new function we can now call it (and use it). Keep in mind that the function must be called with the correct **number** of arguments. If your function has two parameters, it must be called with two arguments. In our example, also make sure that the figure you provide for price is your first argument in line with what the function definition expects - **order** of arguments matters! Don't set the cart before the horse! So, in the code below, we pass 100 as the value for the first parameter ('price') and 50 as the value for the second parameter ('quantity'). 

In [10]:
# Calling the function
calc_revenue(100, 50)

Today's revenue was R5000 based on a price of 100 and quantity of 50


5000

### Keyword arguments

When calling a function that takes multiple arguments, it may not be immediately apparent what each of those arguments **stand for** or the **order** in which they need to be passed. To make your intentions clear (and your code more readable), you can prefix your arguments with the names of their respective parameters. These arguments are now referred to as **'keyword arguments'**. The code below shows what this would look like when calling the calc_revenue function. 

In [11]:
calc_revenue(price=100, quantity=50)

Today's revenue was R5000 based on a price of 100 and quantity of 50


5000

By using keyword arguments the caller is able to pass the arguments in a non-positional manner, i.e. the **order** in which they are passed no longer matters. This is because the caller has **identified** the arguments by the parameter name. The Python interpreter uses the **keywords** (price, quantity) provided to **match** the values with parameters. Remember that keyword arguments make use of the argument name (parameter) and the '=' sign.  

### Default arguments

Suppose you want to make one of your parameters **optional** so that users do not need to specify that argument when calling the function. Possibly, for instance, the price of the product may be set in stone at R100 for the next 6 or 12 months. In this circumstance we could give the price parameter a **default value**, setting it equal to 100. Now we can call our function, passing a value for quantity but **not specifying** a value for the price parameter. By the same token, it also allows the user to use a **different** value than the default value should they wish to do so when calling the function. To convert price to a default argument, our function definition would need to be changed as follows:  

In [12]:
def calc_revenue(quantity, price=100):
    """ Calculates revenue by multipyling product price and quantity sold """
    revenue = price * quantity
    print("Today's revenue was R{} based on a price of {} and quantity of {}".format(revenue, price, quantity))
    return revenue

Notice that we have moved the price parameter to the **last** position in the parentheses. When using default arguments, any default parameters must be listed **after** the required parameters. Now that we have set the default price to 100, the user need only enter a value for quantity when calling the function. The default value will be automatically used by the function when the user does not supply the second argument. If the user wishes to change the price, then that price value can be entered after the quantity value when calling the function. This process is illustrated below.

In [13]:
#Calling the function with quantity value only
print(calc_revenue(30)) #Default value of 100 will be used for price
#Calling the function with values for both quantity and price
print(calc_revenue(30,110)) #Price value of 110 will be used

Today's revenue was R3000 based on a price of 100 and quantity of 30
3000
Today's revenue was R3300 based on a price of 110 and quantity of 30
3300


In the first example above, since the value for the price argument was not supplied when the function was called, the default argument of 100 (R100) was used. 

### Flexible functions

There will be occasions when you want to create functions that take a **variable** (or unlimited) number of arguments. Let's say you want to multiply each of the arguments with each other irrespective of whether two, three, four, or however many arguments are supplied by the user. For this use case, you can create the function below.

In [14]:
def multiply(*args):
    total = 1
    for number in args:
        total = total * number
    return total

In the above definition, the term **\*args** is used in line with established practice. However, you're free to change this to another term which makes more sense in your use case, for example, ***numbers**. But you must remember to precede the descriptor with an **asterisk** to indicate that a **variable** number of arguments can be supplied. As shown below, you can see how the function can be called with different numbers of arguments.

In [15]:
print(multiply(2,3))
print(multiply(2,3,4))
print(multiply(2,3,4,5))

6
24
120


A variation of \*args is **\*\*kwargs** which allows for a variable number of keyword arguments (key:value pairs) to be supplied when calling the function. Keyword arguments are in the form 'name=value'. In the example below, we define a function which allows a user to save profiles of company personnel. 

In [16]:
def save_profile(**kwargs):
    return kwargs

Let's say we now need to add a new user with values for userid, name, location and age. Instead of passing arguments, we now supply **keyword arguments** to the function. We call the function supplying the **key:value** pairs shown below.

In [17]:
save_profile(userid=250, name="Tai Mai Shu", location="Potato Creek", age=27)

{'userid': 250, 'name': 'Tai Mai Shu', 'location': 'Potato Creek', 'age': 27}

Notice how the profile is saved in a **dictionary** (indicated by curly braces) with key-value pairs, e.g. 'userid':250. Because we have made our function **flexible** (through the use of asterisks), enabling it to cater for a flexible number of keyword arguments, we can call the function using as many keyword arguments as we need. In the example below, we make use of 4 arguments as opposed to the 3 arguments supplied in the previous example. 

In [18]:
save_profile(userid=251, name="Barb E. Cue", location="Frog Pond", age=22, position="Desk Jockey")

{'userid': 251,
 'name': 'Barb E. Cue',
 'location': 'Frog Pond',
 'age': 22,
 'position': 'Desk Jockey'}

Summarizing the above, we can say that both **\*args** and **\*\*kwargs** allow for variable-length arguments. Using **\*args**, the arguments are **non-named** whereas with **\*\*kwargs** they are **named**. The \*args syntax passes all arguments to the function as a **tuple** called 'args'. On the other hand, the \*\*kwargs syntax passes all arguments to the function as a **dictionary** called 'kwargs'.  

### Returning multiple values

Until now we have been creating functions which return a **single** output, however, it is possible to define functions which return **multiple** values. To do so, we need to wrap the multiple variables we're returning within **parentheses** to convert them into a tuple. A tuple is an iterable sequence like a list which can contain multiple values. However, unlike a list, a tuple is immutable which means it can't be changed. What follows is an example of such a use case.

In [19]:
import datetime

def get_todays_date_and_time():
    current_datetime = datetime.datetime.now()
    todays_date = current_datetime.date()
    time_now = current_datetime.time()
    return (todays_date, time_now) # A tuple containing two values is returned

date, time = get_todays_date_and_time() # Use sequence unpacking to unpack the values from the tuple into date and time variables
print(date)
print(time)

2021-07-19
15:07:48.660391


To ascertain the date and time values above, we **unpack** the the two values in the tuple into two variables (date, time). That's how quick and easy it is! Yes, it can be done in two shakes of a lamb's tail! 🤣

### Conclusion

This brings to an end our Train on user-defined functions. At this point you have all the necessary knowledge to create your own simple functions. Please be aware that we have really only touched upon the **basics** of functions. We have made use of very simple examples to illustrate the concepts and we have not discussed more sophisticated variations such as **nested** functions and **recursive** functions. 

In a later tutorial you will be introduced to **lambda functions** which can be used for simple functions containing a single line of code. For now, make sure you embed these concepts in your mind by creating **variations** of the examples we've used in this Train and also completing the **exercises** below. Remember, you will learn through **repetitive practise** - learning by doing. But, that doesn't mean you should practise writing code mindlessly. On the contrary, you need to actively employ your mind to make **sense** of **what** you're doing, **how** you're doing it and **why** you're doing it that way. One very useful practise that you should adopt is to review the code in a function written by an advanced Python user and then try to **reproduce** that code without looking at the original code. When doing so, aim for a complete **understanding** of each line in the code (and why it has been employed) and avoid doing things by **rote**. Understanding trumps memory!

![Cycle%20diagram7.jpg.png](attachment:Cycle%20diagram7.jpg.png)

<i>Figure 2: How to become a proficient function writer</i>

### Key takeaways

* Define your own function when you need functionality **specific** to your needs (and you need to do a task repeatedly)
* **Defining** a function simply **stores** your code - it doesn't run the code
* **Calling** (or invoking) a function runs the code
* When we call a function we **pass arguments** (values for the parameters) to the function
* An **argument** is the value that is sent to the function when it is called
* A **parameter** is the variable listed inside the parentheses in the function definition
* When calling your function, make sure you use the same **number of arguments** stipulated in the function definition
* The **order** in which you pass your arguments must correspond to the order of the parameters in the function definition
* **Default arguments** - When the caller doesn't specify a value for a default argument, the default value (in the function definition) is used. The caller is also free to pass a different value to the function.
* The use of **keyword arguments** in a function call allows the caller to identify the arguments by the parameter name. Arguments can be placed in a different order to the parameters in the function definition
* Using the **'return'** keyword returns a result from the function 
* **\*args** and **\*\*kwargs** allow you to create functions with a flexible number of arguments

### Exercises

1) Write a function that calculates the squares of numbers using a user-defined function.

In [20]:
# Write your code below

    


2) Call the function you created above passing the value 7 to it 

In [21]:
# Call your function below


3) Build a function that accepts any number of numbers, adds them together and then prints them to the screen

In [22]:
# Write your function below






In [23]:
# Now call your function to test it


4) Create a function that calculates the average of any number of figures passed to it. Hint: The len() function will help you

In [24]:
# Write your code below







In [25]:
# Call your function with a few numbers to test it


### References

Brett, M. Berkeley introduction to functions. https://matthew-brett.github.io/dsfe/chapters/extra/data8_functions

edX (2019, April 4). Python Basics for Data Science - Functions. https://www.youtube.com/watch?v=jG73BfTEfvs

freeCodeCamp. https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/python-functions

Hamedani, M. (2018, November 6). Python Functions | Python Tutorial for Absolute Beginners #1. https://www.youtube.com/watch?v=u-OmVr_fT4s

Tatman, R. (2019). Six steps to more professional data science code. https://www.kaggle.com/rtatman/six-steps-to-more-professional-data-science-code

Yordanov, V. (2019, Feb 20). Python Basics: Functions. https://towardsdatascience.com/python-basics-functions-ed7c35e194a9