# NB: Introduction to Functions

Programming for Data Science

<!--
**Objectives**
- Explain the benefits of functions
- Illustrate how to use built-in functions
- Illustrate how to create and use your own (user-defined) functions
- Demonstrate the scope and lifetime of a variable
- Illustrate global and local nature of variables through functions
- Demonstrate function parameter use
- Provide recommendations on how to create and document functions
- Show how to print and write docstrings

**Concepts**
- functions
- built-in functions
- user-defined functions
- variable scope
- global versus local variables
- default arguments
- *args
- function call
- docstring
-->

## What is a Function?

A function is piece of code, separate from the larger program, that **performs a specific task**. 

This piece of code is given a **name** and can be **called** from the main program.

Functions are the **verbs** of a programming language. They signify **action**, and take subjects and objects (as it were).

## How do they Work?

Functions take **input** data and produce **output** data. 

- Function inputs are called both **parameters** and **arguments**.
- Outputs are called **return** values

Functions are always written with **parentheses** at the end of their names, e.g.

`len(some_list)`

`time()`

Internally, they contain a block of code to do their work.  

Often they produce a **transformation** ... e.g. from simple to complex.

When you use a function, we say you **call** a function. Programmers speak of "function calls" and "callbacks."

## Why Use Them?

Reduce **complex** tasks into **simpler** tasks.

Eliminate **duplicate code** &mdash; no need to re-write, reuse function as needed.

Make code **reusable**. Once function is written, you can reuse it in any other program.

**Distribute** tasks to multiple programmers. For example, each function can be written by someone.

Hide implementation details, i.e. **abstraction**. 

Increase code **readability**.

Improve **debugging** by improving traceability. Things are easier to follow; you can jump from function to function.

## Built-in Functions

Python provides many **built-in** functions. See [Python built-in functions](https://docs.python.org/3/library/functions.html).

We've looked at many of these already.

These are functions that are available to use any time your are running Python.

To take one simple example, this is a built-in function: `bool()`. 

Takes an argument $x$ and returns a boolean value, i.e. `True` or `False`.  

In [1]:
bool(0), bool(500)

(False, True)

## Imported Functions

Python is meant to be a highly **modular** language.

It is not designed to have a lot of special purpose functions built into it.

This keeps Python light and highly **customizable**.

Many functions can be **imported** into a program to add to the functions that you can call in a script.

In [3]:
import math

math.log(256, 2)

8.0

## User-Defined Functions

Python makes it easy for you to write your own functions. These are called **user-defined** functions.

Let's write a function to compare the list against a threshold.

In [10]:
def vals_greater_than_or_equal_to_threshold(vals, thresh):
    '''
    This is the "docstring" of a function. It is optional but expected. It describes it's 
    purpose and the nature of the input and return values, as well as a sense of what it does.
    More elaborate information should appear in external documentation packages with the function.
    
    PURPOSE: Given a list of values, compare each value against a threshold
    
    INPUTS
    vals    list of ints or floats
    thresh  int or float
    
    OUTPUT
    bools  list of booleans
    '''
    
    filtered_vals = [val >= thresh for val in vals]
    
    return filtered_vals

**Let's break down the components**

The function definition starts with `def`, followed by name, one or more arguments in parenthesis, and then a colon.

Next comes a **docstring** to provide information to users about how and why to use the function.

The function **body** follows.

Lastly is a `return` statement

The **function call** allows for the function to be used. \
It consists of function name and required arguments:

`vals_greater_than_or_equal_to_threshold(arg1, arg2)` where `arg1`, `arg2` are arbitrary names.

## About the docstring

A **docstring** occurs as first statement in module, function, class, or method definition.

Internally, it is saved in `__doc__` attribute of the function object.

It needs to be indented, i.e. part of the code block associated with the function. 

It can be a single line or a multi-line string.

## Users can print the docstring

In [11]:
print(vals_greater_than_or_equal_to_threshold.__doc__)


    This is the "docstring" of a function. It is optional but expected. It describes it's 
    purpose and the nature of the input and return values, as well as a sense of what it does.
    More elaborate information should appear in external documentation packages with the function.
    
    PURPOSE: Given a list of values, compare each value against a threshold
    
    INPUTS
    vals    list of ints or floats
    thresh  int or float
    
    OUTPUT
    bools  list of booleans
    


Print the docstring using `help()`:

In [12]:
help(vals_greater_than_or_equal_to_threshold)

Help on function vals_greater_than_or_equal_to_threshold in module __main__:

vals_greater_than_or_equal_to_threshold(vals, thresh)
    This is the "docstring" of a function. It is optional but expected. It describes it's 
    purpose and the nature of the input and return values, as well as a sense of what it does.
    More elaborate information should appear in external documentation packages with the function.
    
    PURPOSE: Given a list of values, compare each value against a threshold
    
    INPUTS
    vals    list of ints or floats
    thresh  int or float
    
    OUTPUT
    bools  list of booleans



Or, use the `?` prefix in a Jupyter notebook:

In [13]:
?vals_greater_than_or_equal_to_threshold

[0;31mSignature:[0m [0mvals_greater_than_or_equal_to_threshold[0m[0;34m([0m[0mvals[0m[0;34m,[0m [0mthresh[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
This is the "docstring" of a function. It is optional but expected. It describes it's 
purpose and the nature of the input and return values, as well as a sense of what it does.
More elaborate information should appear in external documentation packages with the function.

PURPOSE: Given a list of values, compare each value against a threshold

INPUTS
vals    list of ints or floats
thresh  int or float

OUTPUT
bools  list of booleans
[0;31mFile:[0m      /tmp/ipykernel_133200/392258855.py
[0;31mType:[0m      function

Or suffix ...

In [14]:
vals_greater_than_or_equal_to_threshold?

[0;31mSignature:[0m [0mvals_greater_than_or_equal_to_threshold[0m[0;34m([0m[0mvals[0m[0;34m,[0m [0mthresh[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
This is the "docstring" of a function. It is optional but expected. It describes it's 
purpose and the nature of the input and return values, as well as a sense of what it does.
More elaborate information should appear in external documentation packages with the function.

PURPOSE: Given a list of values, compare each value against a threshold

INPUTS
vals    list of ints or floats
thresh  int or float

OUTPUT
bools  list of booleans
[0;31mFile:[0m      /tmp/ipykernel_133200/392258855.py
[0;31mType:[0m      function

## Calling Our Function

Let's use, or "call," our function.

The function body uses a `list comprehension`to perform a filtering operation:

`[val >= thresh for val in vals]` 

Validate that it works for integers:

In [15]:
x = [3, 4]
thr = 4

vals_greater_than_or_equal_to_threshold(x, thr)

[False, True]

Validate that it works for floats:

In [7]:
x = [3.0, 4.2]
thr = 4.2

vals_greater_than_or_equal_to_threshold(x, thr)

[False, True]

This gives correct results and does exactly what we want.  

## Passing Parameters

All functions may take $0$ or more **arguments**, also called **parameters**.

Functions need to be called with correct number of parameters.

This function requires two parameters, but the function call includes only one parameter.

In [20]:
def func_with_args(x, y):
    return x + y

In [21]:
func_with_args(10)

TypeError: func_with_args() missing 1 required positional argument: 'y'

## Parameter Order

When calling a function, **parameter order matters**.

In [23]:
def fcn_swapped_args(x, y):
    out = 5 * x + y
    return out

In [24]:
x = 1
y = 2

In [25]:
fcn_swapped_args(x, y)

7

In [26]:
fcn_swapped_args(y, x)

11

## Named Parameters

Generally it's best to keep parameters in order.  

**However,** You can swap the order by putting the **parameter names** in the function call.

In [None]:
fcn_swapped_args(y=y, x=x)

## Weirdness Alert

Note that the same name can be used for the parameter names and the variables passed to them. 

The names themselves have nothng to do with each other! 

In other words, just because a function names an argument `foo`, \
the variables passed to it don't have to name `foo` or anything like it. \
They can even be named the same thing---it does not matter.

## Default Arguments

Use default arguments to set the value of arguments.

This allows users to call functions with fewer (or no) arguments.

Defaults are often set for the most common cases.

In [27]:
def show_results(precision, printing=True):
    precision = round(precision, 2)
    if printing:
      print('precision =', precision)
    return precision

In [28]:
pr = 0.912
res = show_results(pr)

precision = 0.91


The function call didn't specify `printing`, so it defaulted to True.

In [29]:
foo = show_results(pr, False)

**NOTE:** Default arguments must follow non-default arguments in function definition. 

This causes trouble:

In [30]:
def show_results(precision, printing=True, uhoh):
    precision = round(precision, 2)
    if printing:
      print('precision =', precision)
    return precision

SyntaxError: non-default argument follows default argument (830346004.py, line 1)

## Returning Values

Functions are not required to have return statement but it is a good idea to have one.

If there is no return statement, a function returns `None`.  

Functions can return no value (`None`), one value, or many.  

Many values are returned as a tuple.

Any Python object can be returned.  

This returns `None`.

In [32]:
def fcn_nothing_to_return(x, y):
    out = 'nothing to see here!'
    print(out)

In [33]:
fcn_nothing_to_return(x, y)

nothing to see here!


In [36]:
r = fcn_nothing_to_return(1, 1)

nothing to see here!


In [39]:
r

In [40]:
print(r)

None


This returns three values.

In [42]:
def negate_coords(x, y, z):
    return -x, -y, -z 

In [47]:
a, b, c = negate_coords(10, 20, 30)

In [49]:
a, b, c

(-10, -20, -30)

In [44]:
foo = negate_coords(10, 20, 30)

In [46]:
foo

(-10, -20, -30)

If you don't need an output, use the dummy variable `_`.

In [52]:
d, e, _ = negate_coords(10,20,30)

In [53]:
d, e

(-10, -20)

**Note:** It's generally a good idea to include return statements, even if not returning a value.  

This shows that you did not forget to consider the return value.

You can use `return` or `return None`.

**Functions can contain multiple return statements**.

These may be used under different logical conditions.

In [54]:
def absolute_value(num):
    if num >= 0:
        return num
    return -num

In [57]:
absolute_value(-4), absolute_value(4)

(4, 4)

## Unpacking List-likes with `*args`

The `*` prefix operator can be passed to avoid specifying the arguments individually.

In [91]:
def show_arg_expansion1(models):
    print(models)

We can pass a tuple of values to the function ...

In [92]:
show_arg_expansion1("logreg", "naive_bayes", "gbm")

TypeError: show_arg_expansion1() takes 1 positional argument but 3 were given

In [93]:
def show_arg_expansion2(*models):
    print(models)

In [94]:
show_arg_expansion2("logreg", "naive_bayes", "gbm")

('logreg', 'naive_bayes', 'gbm')


This also allows for an unspecified number of arguments, such as how `print()` works.

You can also pass a list to the function. 

If you want the elements unpacked, put `*` before the list.

In [97]:
models = ["logreg", "naive_bayes", "gbm"]
show_arg_expansion2(*models)

('logreg', 'naive_bayes', 'gbm')


This approach allows your function to accept an arbitrary number of arguments.

Note you can prefix a string with an asterisk `*`:

In [98]:
show_arg_expansion2(*'abcdefg')

('a', 'b', 'c', 'd', 'e', 'f', 'g')


Or a string operation that returns a list:

In [99]:
show_arg_expansion2(*'a b c d e f g'.split())

('a', 'b', 'c', 'd', 'e', 'f', 'g')


You can use the `*` prefix to pass list-like objects to a function with a defined number of arguments.

In [73]:
def arg_expansion_example(x, y):
    return x**y

In [74]:
my_args = [2, 8]
arg_expansion_example(*my_args)

256

But, the passed object must be the right length.

In [75]:
my_args2 = [2, 8, 5]
arg_expansion_example(*my_args2)

TypeError: arg_expansion_example() takes 2 positional arguments but 3 were given

## Function Design

A function is **not just a bag of code**!

Design a function to **do one thing**.

Make them as **simple** as possible. This makes them: 

- more comprehensible
- easier to maintain
- reusable

This helps avoid situations where a team has 20 variations of similar functions.

Give your function a good name. 

- It should reflect the action it performs. 
- Be consistent in your naming conventions.
- A name like `compute_variances_sort_save_print` suggests the function is overworked!

If the function `compute_variances` also produces plots and updates variables, it will cause confusion.  

Always give your function a docstring

- Particularly important since indicating data types is not required.  
- As a side note, you can include this information by using **type annotation**.

You may be interested to learn some of the formatting languages that have been developed to write docstrings. See [Lutz 2019](https://learning.oreilly.com/library/view/learning-python-5th/9781449355722/ch15.html) and this web page about [Documenting Python Code](https://realpython.com/documenting-python-code/) for more info.