<a href="https://colab.research.google.com/github/edoardochiarotti/class_datascience/blob/main/2024/00_Python-Basics/00_Python-Basics_4_Function.ipynb" target="_blank" rel="noopener"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Basics: function

<img src='https://www.agent-x.com.au/wp-content/uploads/2011/06/Perfect-Programmer-dfe194b-e8d3b11-b960bd5.jpg' width="400">

Source: [Agent-X Comics - Perfect Programming](https://www.agent-x.com.au/comic/perfect-programming/)

## Contents

In our previous notebook, we have learned how to iterate over objects using for and while loops, thus avoiding to repeat similar lines of codes. Similarly, functions will allow us to "store" a set of instructions that we can reuse, without the need to repeat lines of code.   

- [Function](#Function)  
  - [Defining a function](#define-function)
  - [Built-in functions](#built-in-function)
  - [Keyword argument](#kwarg)
  - [Splat](#splat)
  - [Improving your functions: *args and **kwargs](#arg-kwarg)
- [Recursive function](#recursive-function)

## Function <a name="Function"></a>

A function is a key element in writing programs. You can think of a function in a computing language in much the same way you think of a mathematical function. The function takes in arguments, performs some operation based on the identities of the arguments, and then returns a result.

### Defining a function <a name="define-function"></a>

As always, let's take an example. We will use the [Cobb-Douglas production function](https://en.wikipedia.org/wiki/Cobb%E2%80%93Douglas_production_function), which models the relationship between production output and production inputs (factors). It was put forward by mathematician Charles Cobb and economist Paul Douglas between 1927 and 1947 (Cobb & Douglas, 1928). They studied the relative importance of the two input factors, labor and capital, in manufacturing output in the USA over the period 1899 to 1922. Their model was the following:

$Y = A L^{\alpha} K^{1-\alpha} $

Where $Y$ is the output, $A$ is the efficiency parameter, $L$ labor, $K$ capital, and $\alpha \in (0,1)$ is the output-elasticity of labor.

Cobb and Douglas estimated the parameters using standard econometrics tools. They found $\alpha = 0.75$, implying that labor accounted for three quarters of the value of US manufacturing output (capital accounting for the remaining quarter) over the period studied. For the efficiency parameter, they got $A=1.01$, which, since it is greater than 1, reflects the positive effects of unobservable forces on production through the combination of labor and capital. 

Reference: Cobb, C. W., & Douglas, P. H. (1928). A theory of production.

The mathematical form that we used, with $\alpha + (1-\alpha)=1$ implies constant return to scale: output increases by the same proportional change as all inputs change. In addition, the cross-partial derivative of production output $Y$ with respect to labor $L$ and capital $K$ is positive and the second-order partial derivatives are both negative (because $\alpha \in (0,1)$). It means that adding either more labor or more capital (but not both) to the production process increases output, though at a diminishing rate. Finally, because $\alpha$ is a constant, it implies that the shares of labor and capital are constant. 

The design of the Cobb-Douglas production function was influenced by statistical evidence: it appeared that labor and capital shares of total output were constant over time in developed countries. However, Cobb and Douglas themselves acknowledged that their production function does not rest on solid theoretical foundations, nor should it be understood as a law of production! Still, nowadays, the Cobb-Douglas production function is often used in economics models. The main reason: its simple mathematical properties are attractive, in the sense that it is relatively easy to derive analytical solutions using such function. Obviously, this is very questionable. Many other models were developed to remedy the shortcomings of Cobb-Douglas, which will be discussion for your economics courses...

Here is the lesson though: **always question the assumptions behind models**, both theoretical and empirical. Do the assumptions have a large influence on the results? Can you back up the assumptions with data or theory? Also remember, even unrealistic assumption (e.g., *homo oeconomicus*) can lead to interesting results. A model is like a map. For example, think about a metro map: such map represents the metro lines and their connections, but is obviously not a faithful representation of reality! Still, it is very convenient to find your way from one place to another. What matters here is the goal. You would not use your metro map to walk from one station to another. Similarly, with economic/statistical models, ask yourself the question: **Can we extrapolate?**

Ok, enough discussion of the history of economics and economic models, and back to Python. Let's implement our Cobb-Douglas production function!

In [1]:
def cobb_douglas(l, k):
    """This function computes the Cobb Douglas production function from labor and capital."""
    A = 1.01          # efficiency parameter        
    alpha = 0.75      # output elasticity of labor
    return A* l**alpha * k**(1-alpha)

Pay attention to the syntax. A function is **defined** using the `def` keyword. 

Following the `def` keyword is a **function signature** which indicates the function's name and its arguments. Here the signature is `cobb_douglas(l, k)`. Just like in mathematics, the arguments are separated by commas and enclosed in parentheses. 

The indentation following the `def` line specifies what is part of the function. As soon as the indentation goes to the left again, aligned with `def`, the contents of the functions are complete.

Immediately following the function definition is the **doc string** (short for documentation string), a brief description of the function. The first string after the function definition is always defined as the doc string. Usually, it is in triple quotes, as doc strings often span multiple lines.

Doc strings are more than just comments for your code, the doc string is what is returned by the native python function `help()` when someone is looking to learn more about your function. For example:

In [2]:
help(cobb_douglas)

Help on function cobb_douglas in module __main__:

cobb_douglas(l, k)
    This function computes the Cobb Douglas production function from labor and capital.



You can also print out the documentation by using `?` in a Jupyter notebook or JupyterLab console.

In [3]:
cobb_douglas?

[1;31mSignature:[0m [0mcobb_douglas[0m[1;33m([0m[0ml[0m[1;33m,[0m [0mk[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m This function computes the Cobb Douglas production function from labor and capital.
[1;31mFile:[0m      c:\users\boris\appdata\local\temp\ipykernel_16424\1360091775.py
[1;31mType:[0m      function

You are free to type whatever you like in doc strings, or even omit them, but you should always have a doc string with some information about what your function is doing. Again: all functions should have doc strings.

In the next line of the function, we see a `return` keyword. Whatever is after the `return` statement is, you guessed it, returned by the function. Any code after the `return` is *not* executed because the function has already returned!

Now that we have defined our function, we can **call** it:

In [4]:
cobb_douglas(74, 39)

63.6812112530437

In [5]:
cobb_douglas(0.74, 0.39)

0.636812112530437

See, when we divided both labor and capital by 100, we also reduced production by 100: this the constant return to scale property in action!

A function does not need arguments. As a silly example, let's consider a function that just returns 42 every time. Of course, it does not matter what its arguments are, so we can define a function without arguments.

In [6]:
def answer_to_everything():
    """This function answers the ultimate question of life, the universe, and everything."""
    return 42

We still needed the open and closed parentheses at the end of the function name. Similarly, even though it has no arguments, we still have to call it with parentheses.

In [7]:
answer_to_everything()

42

Just like they do not necessarily need arguments, functions also do not need to return anything. If a function does not have a `return` statement (or it is never encountered in the execution of the function), the function runs to completion and returns `None` by default. `None` is a special Python keyword which basically means "nothing." For example, a function could simply print something to the screen.

In [8]:
def circular_economy():
    """Prints the definition of circular economy"""
    print("""A circular economy is a regenerative economic system that
    uses renewable energy and resources, 
    reuses materials and products as long as possible, 
    and recycles resources rather than disposing them as waste. """)

We call this function as all others, but we can show that the result it returns is `None`.

In [9]:
circ_eco = circular_economy()

A circular economy is a regenerative economic system that
    uses renewable energy and resources, 
    reuses materials and products as long as possible, 
    and recycles resources rather than disposing them as waste. 


In [10]:
print(circ_eco)

None


See, when we print the variable `circ_eco`, the answer is `None`. So apparently, circular economy models are not yet standard...

Finally, here is a very important tips:

<span style="color: dodgerblue; font-weight: bold;"> Always test your function! </span>

### Built-in functions <a name="built-in-function"></a>

The Python programming language has several built-in functions. We have already encountered a few of them such as `print()`, `len()`, `range()`, and `enumerate()`, in addition to type conversions such as `str`, `list()` and `tuple()`.  The complete set of **built-in functions** can be found [here](https://docs.python.org/3/library/functions.html). A word of warning about these functions and naming your own.

<span style="color: dodgerblue; font-weight: bold;"> Never define a function or variable with the same name as a built-in function. </span>

Additionally, Python has **keywords** (such as `def`, `for`, `in`, `if`, `True`, `None`, etc.), many of which we have already encountered. A complete list of them is [here](https://docs.python.org/3/reference/lexical_analysis.html#keywords). The interpreter will throw an error if you try to define a function or variable with the same name as a keyword.

### Keyword argument <a name="kwarg"></a>

When we defined our `cobb-douglas()` function, we defined the efficiency parameter and the output-elasticity of labor inside the function. With that structure, we cannot modify the parameters. Would there be a better way? Of course there is, we can use **named keyword argument**, also known as a **named kwarg**. Here is how it looks:

In [11]:
def cobb_douglas_kwarg(l, k, A=1.01, alpha = 0.75):
    """This function computes the Cobb Douglas production function from labor and capital."""
    return A* l**alpha * k**(1-alpha)

The syntax for a named kwarg is

    kwarg_name = default_value
    
in the `def` clause of the function definition. In this case, we added the efficiency parameter and output-elasticity of labor. Conveniently, if you can call the function without specifying the kwargs, they are assigned to their default values. Hence, we can omit kwargs when calling our function:

In [12]:
cobb_douglas_kwarg(74, 39)

63.6812112530437

But know we can also modify the values of the parameters!

In [13]:
cobb_douglas_kwarg(74, 39, A=1.1, alpha=0.70)

67.16983339223331

### Splat <a name="Splat"></a>

Python offers another convenient way to call functions. Suppose we want to know whether or not a triangle is a right triangle. We will write a function taking three arguments, `a`, `b`, and `c` (the sides of a triangle), which checks if $a^2 + b^2 = c^2$.

In [14]:
def is_right_triangle(a, b, c):
    """
    Checks if a triangle with side lengths
    `a`, `b`, and `c` is right.
    """
    
    # Use sorted(), which gives a sorted list
    a, b, c = sorted([a, b, c])
    
    # Check to see if it is almost a right triangle (1e-12 is our precision)
    if abs(a**2 + b**2 - c**2) < 1e-12:
        return True
    else:
        return False

Let's test our function:

In [15]:
is_right_triangle(5,3,4)

True

In [16]:
is_right_triangle(1,2,3)

False

Ok, it seems alright. Now, let's say we have a tupple with the triangle side `(a,b,c)`. We can pass these all in separately by splitting the tuple. However, there is a more efficient way using the **unpacking operator** `*` before a tuple, which is referred to as a "splat."

In [17]:
side_triangle = (5,3,4)

is_right_triangle(*side_triangle)

True

### Improving your function: *args and **kwargs <a name="arg-kwarg"></a>

We have discussed above the unpacking operators `*`. When manipulating dictionaries we have also encountered the unpacking operator `**`. These operators are also very useful when you want to define a function with a variable number of arguments. 

Let's take a example. Suppose we want to sum some integers. The first way would be to use a list (or tuple as argument): 

In [18]:
def my_sum(list_int):
    """This function sums integers. The argument should be a list/tuple of integers."""
    result = 0
    for i in list_int:
        result += i
    return result

print(my_sum([1, 2, 3]))

6


Alternatively, we might want to directly use the integers as arguments. But then the number of arguments could vary... No problem, we can use `*args` to define a function with an arbitrary number of arguments:

In [19]:
def my_sum(*args):
    """This function sums integers. The argument should be a integers."""
    result = 0
    # Iterating over the Python args tuple
    for i in args:
        result += i
    return result

print(my_sum(1, 2, 3))
print(my_sum(1, 2, 3, 4))

6
10


Note that `args` is just a name, i.e., you do not need to use `*args`, all that matters is the **unpacking operator** `*`. 

Ok, now that we understood `*args`, what about `**kwargs`? Does `kwargs` ring a bell? Yes, above we have used **named keyword argument**, i.e., named kwarg, to define the default values of our parameters: 

In [20]:
def cobb_douglas_kwarg(l, k, A=1.01, alpha = 0.75):
    """This function computes the Cobb Douglas production function from labor and capital."""
    return A* l**alpha * k**(1-alpha)

cobb_douglas_kwarg(42, 57, alpha = 0.71)

46.348115238492475

We could omit named keyword argument or modify their value. 

So, what about `**kwargs`? Well it allows to define function with **arbitrary keyword argument**. Let's see an example. Suppose we want to define a function printing the names of individuals for which we have data:  

In [21]:
def sample_names(**kwargs):
    result = 'Our sample includes: '
    # Iterating over the keys of the Python kwargs dictionary
    for key, val in kwargs.items():
        result += key + ' '
    return result

print(sample_names(Florence=0.3, Jordane=0.25, Julia = 0.5, Ale = 0.4))

Our sample includes: Florence Jordane Julia Ale 


Note that you can define functions that combines standard argument, `*args` arguments, and `**kwargs` arguments. However, the order of parameters matter:
1. Standard arguments
2. `*args` arguments
3. `**kwargs` arguments

## Recursive function <a name="recursive-function"></a>

A function that calls itself is said to be **recursive**, and the technique of employing a recursive function is called **recursion**.

Let's take an example. Imagine you wish to compute [Fibonacci numbers](https://en.wikipedia.org/wiki/Fibonacci_number). The Fibonacci numbers form the Fibonacci sequence, , in which each number is the sum of the two preceding ones. The sequence commonly starts from 0 and 1: 

$0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ...$ 

The Fibonacci numbers can also be expressed using a closed form solution, using the [Golden ratio](https://en.wikipedia.org/wiki/Golden_ratio)

The Fibonacci numbers have some amazing applications. For instance, they often appear in Nature, such as branching in trees, arrangement of leaves on a stem, the fruitlets of a pineapple, the flowering of artichoke, an uncurling fern and the arrangement of a pine cone, etc.

<img src='https://upload.wikimedia.org/wikipedia/commons/5/5a/FibonacciChamomile.PNG?20140804210532' width="350">

Image: [FibonacciChamomile](https://commons.wikimedia.org/w/index.php?curid=15047443), Wikimedia Commons

In the above image, yellow chamomile head are arranged in 21 (blue) and 13 (aqua) spirals, both Fibonacci numbers. What is this miracle?? Well, there is a reason. This pattern allows to maximize the numbers of cells. Explore further on [Math is Fun](https://www.mathsisfun.com/numbers/nature-golden-ratio-fibonacci.html). Such arrangements involving consecutive Fibonacci numbers appear in a wide variety of plants. See for instance [14 examples of Fibonacci numbers and the Golden ratio in Nature](https://www.mathnasium.com/blog/14-interesting-examples-of-the-golden-ratio-in-nature#:~:text=The%20Fibonacci%20sequence%20can%20also,each%20of%20the%20new%20stems.)

Anyway, let's try to define a function to compute Fibonacci numbers. We have:

$F(0)=0$

$F(1)=1$

$F(n)=F(n-1)+F(n-2)$



In [22]:
def fibonacci(n):
    """This function computes the n-th Fibonacci number"""
    if n in (0, 1):              # We know the solution F(0)=0 and F(1)=1
        return n
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)  # Else, we need a recursion

Let's try our function:

In [23]:
for i in range(10):
    print(fibonacci(i))

0
1
1
2
3
5
8
13
21
34


We define `fibonacci()` using a recursive function. In other word, the function will calls itself as many times as needed until it computes the desired Fibonacci number. This structure was not very efficient. For example, to compute $F(5)$, we need $F(4)$ and $F(3)$, but to know $F(4)$ you need to compute $F(3)$ and $F(2)$, and so on. Since Fibonacci numbers are not stored in memory, he function calculates many identical subproblems over and over again.

This function quickly falls into the repetition issue you saw in the above section. The computation gets more and more expensive as n gets bigger. The required time grows exponentially because the function calculates many identical subproblems over and over again. How to avoid wasting resources? Here is a very nice [Python Guide to the Fibonacci Sequence](https://realpython.com/fibonacci-sequence-python/).