# The Art of Writing Functions


```python
def do_stuff():
    print("Totally doing stuff")
```

## What *is* a function and why the heck do we need them?

As you gain experience with code, your scripts (or notebook cells) will invariably grow larger. Programs will balloon from a few dozen lines of code to hundreds or thousands. As code expands in size, so too does the complexity and mental burden of making sense of all those fancy lines of Python syntax.

Functions are a first line of defense against that complexity. They're an invaluable tool that helps us write **readable**, **reliable** and **reusable** code.

Functions help decompose a large coding problem into smaller, more manageable chunks. They help you reason more clearly about each step in a program. Even if you rarely create your own functions, it's important to understand the basics of this Python feature since you'll frequently be using functions built into Python and from third-party libraries in your own code.

## Magic little bags

But what exactly *are* functions? 

It's a bit silly, but it can be helpful to think of them as magic little bags (yes, ✨MAGIC✨). 

You can put stuff into them and, quite often, you'll get some useful thing back out of them.

You've already encountered several functions built into Python itself. The [print](https://docs.python.org/3/library/functions.html#print) function, for example, allows you to print one or more values and can be incredibly helpful with debugging code.

In [None]:
print("thing 1")

In [None]:
print("thing 1", "thing 2")

You've also seen the handy [len](https://docs.python.org/3/library/functions.html#len) function, which counts the number of items in sequences such as strings or lists.

In [None]:
len("How many characters is this?")

In [None]:
first_list = [1,2]
second_list = [1,2,3]
len(first_list) < len(second_list)

## Bring your own functions to the party

But you're not limited to functions that ship with Python. You can also define your own functions using the [def](https://docs.python.org/3/reference/lexical_analysis.html#keywords) keyword:

In [None]:
def hello():
    print("Howdy!")

A few important things to note about the (admittedly not-so-useful) function above:

1. The `def` keyword is used to say "Hey Python, I'm defining a function"
1. After `def` comes the **name** of the function
1. The function name is followed by a pair of parentheses (more on that later) and a colon (`:`), which signals to Python that the body of the function (ie the code that does stuff) will follow after that point, typically starting on the next line
1. You can include any other Python code in the **body** of the function, which should be _indented 4 spaces_. In this case the body is simply `print("Howdy!")`, but typically you can perform other useful actions in the body, including calling other functions!

Similar to Python's built-in functions, we "invoke" (aka call or execute) a function by referencing its name (`hello`) followed by parentheses `()`.

In [None]:
hello()

## Function inputs

Functions become much more interesting when you begin sending strings, integers and other data types into them. To allow one or more items to be passed into a function, we must specify parameter names in the function definition.

These items then become available for use *inside the body of the function* using the parameter names. You can think of these parameters as a way of pre-defining variables for use in a function.

Let's say we wanted to create a function that doubles a number.

In [None]:
def double(number):
    print(number * 2)

Above, we specified the parameter `number` inside the parenthesis on the `def` line. The `print` statement can then make use of the number inside the body of the function. Go ahead and try calling `double` by passing it some integer as an argument.

> NOTE: We use the term parameter when defining a function, and we say "passing an argument" when we call a function with some input value. See [Automate the Boring Stuff](https://automatetheboringstuff.com/2e/chapter3/#calibre_link-134) for an excellent overview of these terms.

In [None]:
double(2)

## Positional arguments

One parameter is good, but wouldn't it be handy if we could pass multiple arguments to a function? Well, you _can_ do that by simply specifying more than one parameter, each separated by a comma (`,`). 

Let's define a new function to demonstrate.

In [None]:
def repeat_shouty_greeting(greeting, times): # Note the comma separating the parameters
    print(greeting.upper() * times)

Now let's use our handy new function.

In [None]:
repeat_shouty_greeting('howdy', 4)

It's important to note that the *order of the arguments matters* -- which is why we call these positional parameters or arguments.

Try calling the function again -- but this time switch the order of the arguments and watch what happens.

In [None]:
# ruhroh
repeat_shouty_greeting(4, 'howdy')

## Keyword arguments

As functions grow in size, often the list of inputs we're passing into them can grow unwieldy and hard to manage. Further, we may want the flexibility of providing a default value for certain parameters. By using so-called _keyword arguments_, we give ourselves the ability to call a function with the parameters in any order we want, and even set default values so we don't have to pass in all the arguments.

Let's demonstrate by reworking our greeting function to use keyword arguments.

Notice that the syntax is very similar to defining a variable (`name=value`), with each parameter separated by a comma.

In [None]:
def repeat_shouty_greeting(greeting='howdy', times=4):
    print(greeting.upper() * times)

Now we can call the function without any arguments, which causes the default values to be used.

In [None]:
repeat_shouty_greeting()

Or we can customize the greeting...

In [None]:
repeat_shouty_greeting(greeting='Hello') # We get the default repetition of 4 times

Or specify the number of times to repeat the greeting...

In [None]:
repeat_shouty_greeting(times=2) # We get the default greeting

Or we can customize both!

In [None]:
repeat_shouty_greeting(times=10, greeting='hola') # And the order of arguments doesn't matter

## Positional and Keyword args

It's quite common to see functions that take both positional and keyword arguments. The former are typically used to specify "required" parameters -- ie items that must be passed into the function, without which the function could not do its work -- while the latter are often used for optional arguments, often with sensible defaults.

Let's create one final greeting function to illustrate. We'll make this version much more sensible -- ie non-shouty and non-repeating -- but will give our users the abilithy to jazz things up with optional keyword arguments.

> Note that in this version, we require the user to supply a greeting rather than offering a default.

In [None]:
def greet(greeting, repeat=1, shouty=False):
    greeting_multiplied = greeting * repeat
    if shouty:
        final_greeting = greeting_multiplied.upper()
    else:
        final_greeting = greeting_multiplied
    print(final_greeting)

Let's take `greet` out for a spin.

In [None]:
greet('howdy')

Now let's have some fun with it. Try tinkering with the positional and keyword arguments below to get a feel for things.

In [None]:
greet('howdy', repeat=2)

In [None]:
greet('hola', repeat=4, shouty=True)

> One final but important note: Positional arguments must always come *before* keyword arguments!

## Return values 

Printing values is fun, but it's not the most useful in the broader context of a script or notebook. The true power of functions come into play when you explicitly return values from them. These so-called [return values](https://realpython.com/python-return-statement/) are typically the final product of some series of coding steps.

And often, these steps are bits of code you typically would have to use in multiple places. By bundling them up into a single function, you can write fewer lines of code and isolate the changes to one location. That way if something goes wrong with the code, you only have to fix it in one location.

That's what we mean when we say functions allow you to write more _readable, reliable, and reusable_ code.

In terms of syntax, to send data or, in Python lingo, objects back out of a function, you simply use the `return` statement. For example, `return some_variable`.

Let's try an example. Imagine you're trying to match a list of cities to their population counts, but the list of cities is suffering from common data entry mistakes such as inconsistent casing and extraneous whitespace. 

In [None]:
cities = [
    'San francisco', # casing issue
    'new York',  # casing issue
    '  houston ' # Note the leading/trailing whitespace
]

city_pops = {
    'Houston': 2288000,
    'New York': 8468000,
    'San Francisco': 815000
}

You could write a simple function to:

- clean the data
- perform the lookup
- and return the population

In [None]:
# Note we're passing in the city name and the dictionary of pops
def pop_for_city(city, pop_dict):
    # Standardize the city name
    city_clean = city.strip().title()
    # lookup the city population
    pop = pop_dict[city_clean]
    # Return the population number for the city
    return pop

Now we can use that function to get the population for each of our cities.

In [None]:
for city in cities:
    pop = pop_for_city(city, city_pops)
    print(f"The population of {city} is {pop}")

That worked, but wouldn't it be nice if we could also get the cleaned up city name from our function? Well, you _can_ by simply returning more than one item in the form of a list.

In [None]:
def pop_for_city(city, pop_dict):
    city_clean = city.strip().title()
    pop = pop_dict[city_clean]
    # Return both the clean name and the population
    return [city_clean, pop]

Since our return value is now a list, we can access each item in the list using the index position of each value. 

In [None]:
for city in cities:
    results = pop_for_city(city, city_pops)
    print(f"The population of {results[0]} is {results[1]}")

Ultimately, a function can return any type of valid Python object -- strings, integers, floats, lists, dicts, classes and even other functions.

And one last tidbit -- functions *always* return some value, even if you don't explicitly use `return`.

Can you figure out what that "implicit" or default return value is?

> HINT: It's not the string `blerf`

In [None]:
def not_so_useful():
    print('blerf')
    
my_var = not_so_useful()

## Naming things is hard

Naming functions can be hard. Try to be descriptive but concise. The intent of the function should be clear.

And please, PLEASE use snake_case in Python :)

Here are some functions that illustrate good habits:

```python
def download_json():
    # code that downloads JSON
    # and saves it to local file
    pass

def convert_json(json_file):
    # code that reads JSON file
    # and converts to a CSV
    pass
```

Functions should generally have clear inputs and outputs.

```python
def convert_json(json_path):
    # code to read local JSON fle
    # and convert to list
    rows = [
        ['ca', 'san fran'],
        ['ca', 'los angeles']
    ]
    return rows
```

Having a hard time naming a function? It may be doing too much! Try breaking large functions into smaller functions. 

```python
### BAD ###

def scrape_and_process_and_load():
    # bunch of code to scrape a website,
    # process the data, and load it
    # into a database.
    pass
	
	
### BETTER ###

def scrape():
    pass
    
def process_data():
    pass
	
def load_database():
    pass
```

This strategy of decomposing large functions mirrors the process of chopping up a large programming problem into smaller sub-problems. By creating smaller functions devoted to particular tasks, we can more easily reason through our problem in code.

## The main function

It's quite likely you'll end up with code that has a number of functions performing very specific tasks. You may even have functions calling other functions.

Although we encourage using functions, orchestrating them can become a challenge. Even reading code with lots of functions can become difficult.

When writing a script that contains lots of functions, it's wise to create a "top-level" [entry point](https://en.wikipedia.org/wiki/Entry_point) function that serves as a sort of maestro. Its job is to kick off the chain of action and orchestrate lower-level functions devoted to specific tasks.

Traditionally, this high-level orchestrator is called `main`. It's defined in the same way as other Python functions and is really no different, except in its role.

```python
# Below, the main function calls a series of functions
# defined elsewhere
def main():
    download_json()
    transform_json()
    generate_csv()
    
# We call "main" at the end of our script or notebook cell,
# which kicks off all the other functions
main()
```

## Exercise

The code below extracts and counts the words from the source code for a [very basic web page](http://example.com).

Spend a few minutes reviewing the page's underlying HTML and the code below to get a sense of what it does.

Then try rewriting the code to use multiple, clearly defined functions. 

It can be helpful to print out the code (or use a whiteboard) to group related code into functions and give them clear names.

Along the way, think about what if any inputs and outputs each function should have. For example, does one function return a value that must be passed to another function?

Once you've defined the functions and their interplay, create a [`main`](#the-main-function) function to orchestrate the other functions. Remember, the job of `main` is to invoke these functions in the expected manner, handling inputs and outputs as needed.

Lastly, don't forget to call `main()` to kick things off.

In [None]:
import bs4

with open('files/data/example.html', 'r') as infile:
    html = infile.read()
soup = bs4.BeautifulSoup(html, "html.parser")
words = []
words.extend(soup.find('h1').text.split())
for paragraph in soup.find_all('p'):
    text = paragraph.text
    para_words = text.replace('.',' ').replace('\n', ' ').split()
    words.extend(para_words)

print("There are {} words on example.com".format(len(words)))

## Further reading

* [W3Schools Python Functions](https://www.w3schools.com/python/python_functions.asp)
* [Chapter 3 - Functions](https://automatetheboringstuff.com/2e/chapter3/) of *Automate the Boring Stuff*.