Functions are reusable blocks of codes. You've been using built-in functions throughout the course, such as `print()`. I wrote examples of functions in other Notebooks.

Like iteration, functions are integral to your scripts and programs.

Functions vastly increase the readability and maintainability of code. A well tested function reduces the amount of code that you have to write. Functions also provide a single point to test and repair. For example, fixing or improving a function causes those changes to reverberate wherever it is called.

Another benefit is that functions separate logic into discrete blocks that reduce a programmer's cognitive load. Reading well defined functions is easier than parsing hundreds of lines all at once.

**D.R.Y.** or Don't Repeat Yourself is a primary impetus for writing functions. If you find yourself copying and pasting blocks of code or writing similar lines of code, then you may likely refactor those into a function.

# Basic functions
Let's write a function to calculate the mean of a `list`!

As a sidenote, I don't recommend rolling your own mean function over using the built in [statistics](https://docs.python.org/3/library/statistics.html) module or [NumPy](https://numpy.org/).

In [1]:
def mean(a):
    """Calculate the arithmetic mean of an iterable.

    Parameters
    ----------
    a: Iterable
        An Iterable of numbers.

    Returns
    -------
    Mean of a as an integer or float.
    """
    sum_a = 0

    for x in a:
        sum_a += x
    
    return sum_a/len(a)

N.b., again: I also don't recommend writing a mean function using a `for` loop. But you shouldn't forget `for` loops, so I'll ensure you can't escape them!

This is more succinct (and still worse than using NumPy or the statistics module):

In [None]:
def better_mean(a):
    """Like mean(a) but BETTER!"""
    return sum(a)/len(a)

Functions may take zero or more parameters. `mean()` takes one parameter, `a`, which is the array for which to calculate the mean. Something that is passed into a function is known as an argument.

Parameters are scoped within a function. The parameters declared in a function signatures are valid within the function and can be used in the code block that defines the function.

Thus:

In [2]:
import random

random_numbers = [random.gauss(5., 10.) for _ in range(1000)]
x_bar = mean(random_numbers)
print(x_bar)

5.30213784849192


`random_numbers` is the argument passed into `mean()`. That argument is passed into the parameter `a`. Argument and parameter are often interchangable colloquially, so you don't need to memorize the exact distinction.

Python functions that don't take parameters or have defaults for every argument (more on that later) are called as you would expect: `f()`. 

In [3]:
from datetime import datetime, timedelta

now = datetime.now().strftime("%B %d, %Y")

print(f"It is {now} (at the time of running my code at least).")

It is October 11, 2021 (at the time of running my code at least).


`datetime.now()` doesn't have any parameters. Declaring a function without parameters also looks as you would expect.

In [4]:
def yesterday():
    return datetime.now() - timedelta(days=1)

yesterday_f = yesterday().strftime("%A, %B %d")

print(f"And yesterday was {yesterday_f}.")

And yesterday was Sunday, October 10.


**Questions:**
1. What happens if we remove `name` from the function declaration below?

In [None]:
def say_hello(name):
    print(f"Hi {name}!!")

# Why doesn't this work?
def say_hello():
    print(f"Hi {name}!!")

# Default arguments

Default arguments allow programmers to provide reasonable defaults for their functions. A function may have many parameters to custom execution. [matplotlib](https://matplotlib.org/) is a plotting and graphics library. Its classes and functions tend to have _tons_ of parameters to customize calls. Take a look at [some of the functions here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) for examples.

We'll look at the [scatter](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter) function as a demonstration.

A scatter plot in _matplotlib_ requires two arrays of the `x` and `y` pairs to plot. Beyond that, the size, point color, edge color, color map, alpha, and other parameters may be set.

So, think about it: would you really want to pass an argument for each parameter every time you need a simple scatter plot? **NO. AHH!** 😱

In order to prevent programmers from going totally nuts, the _matplotlib_ sages wisely provided default arguments for every parameter except `x` and `y`.

Default arguments make certain parameters optional. Positional arguments are mandatory. Thus, in the `scatter()` function linked above, you _always_ have to pass `x` and `y` but can pass the other arguments as necessary.

Now let's take a look at a few examples.

In [5]:
from faker import Faker

def generate_names(amount=2):
    """Generate random names using Faker.

    Parameters
    ----------
    amount: int
        Number of fake names to generate.
    
    Returns
    -------
    A list[str] of random names.
    """
    fake = Faker()
    return [fake.name() for _ in range(amount)]

def random_person(people, n=1):
    """Return n random people.

    Parameters
    ----------
    people: list[str]
        A list or iterable of people.
    n: int
        Amount of people to return.
    
    Returns
    -------
    A list[str] of people.
    """
    return [random.choice(people) for _ in range(n)]

def random_groups(people, groups=2):
    """Return people separated into groups.

    Parameters
    ----------
    people: list[str]
        A list or iterable of people.
    groups: int
        Amount of groups.

    Returns
    -------
    List of list of groups of people. list[list[str]]
    """
    length = len(people)
    # Shadowing, not mutation
    # Shuffle would mutate
    people = random.sample(people, length)
    return [people[i:i + groups] for i in range(0, length, groups)]

The functions above all have reasonable defaults. `generate_names()` returns realistic, localized names using [Faker](https://faker.readthedocs.io/en/master/); the function defaults to two names. We can call it by passing in a number or without arguments to use the default.

In [6]:
# Generates two names
names = generate_names()
print(f"Two names via default argument: {names}")

# Generates five names by specifically using the keyword argument.
names = generate_names(amount=5)
print(f"Five names via specifically passing 5 to 'amount': {names}")

# Generates 11 names by passing 11 by position.
names = generate_names(11)

Two names via default argument: ['Michael Costa', 'Travis Gonzalez MD']
Five names via specifically passing 5 to 'amount': ['Hannah Faulkner', 'David Lewis', 'Joseph Obrien', 'Kyle Rice', 'Emily Lambert']


The other functions I wrote have similar default arguments. `random_person()` returns a random person from a `list` of people (you've seen something like this in class). The default is to return a single person. `random_groups()` returns randomized groups with a default of two groups.

In [8]:
two_people = random_person(names, 2)
print(f"Two randomly selected people: {two_people}")

# Split everyone into three groups.
three_groups = random_groups(names, 3)
print(f"Three randomly shuffled groups: {three_groups}")

Two randomly selected people: ['Daniel Davis', 'Sheena Glover']
Three randomly shuffled groups: [['Sheena Glover', 'Jessica Downs', 'Tracey Brooks'], ['Stephen Jones', 'Debbie Smith', 'Cynthia Johnson'], ['Dr. Wendy Dawson', 'Daniel Davis', 'Crystal Jackson'], ['Toni Oneal', 'Stephanie Green']]


The `people` parameter for `random_person()` and `random_groups()` is a mandatory positional argument.

**Questions**
1. What's the difference between `random_groups(names, groups=4)` and `random_groups(names, 4)`?
2. Which of the following are correct?

* `random_person(n=2, people=names)`

* `random_person(names, 3)`

* `random_person(people=names, n=4)`

* `random_person(4, names)`

# Variadic arguments (\*args and **kwargs)

Variadic arguments are arbitrarily sized. You can pass in as many arguments to `*args` or `**kwargs` as you wish. Like default arguments, variadic arguments are designed to ease calling functions by providing flexibility.

[seaborn](https://seaborn.pydata.org/) is a plotting library built on _matplotlib_. The plotting functions generally take `**kwargs` that are passed down to the _matplotlib_ functions.

We can take a look at the [kdeplot()](https://seaborn.pydata.org/generated/seaborn.kdeplot.html#seaborn.kdeplot) function for a great example. The documentation mentions that the `**kwargs**` are all passed down to specific _matplotlib_ functions depending on other parameters.

We'll take a look at the basic, canonical examples before looking at better uses next week.

In [9]:
def concatenate(*args, sep=' '):
    """Combine strings separated by sep."""
    # You should use string's "join" method instead of this.
    temp = ""
    last_i = len(args) - 1
    for i, arg in enumerate(args):
        temp += str(arg)
        if i != last_i:
            temp += sep
    return temp

concatenate("I", "like", "cats", "meow", ["look", "it's", "a", "list"])

'I like cats meow [\'look\', "it\'s", \'a\', \'list\']'

`concatenate()` is something of the canonical example for `*args`. `print()` works somewhat similarly. Rather than taking a `list` of elements, `print()` takes `*args` that are printed by taking the string representation of each object.

# Composition

Functions are often composed of other functions. In other words, functions build on each other. They're not megaliths that encompass a varied range of actions. Individual functions should be disparate in the sense that do one action well. Other functions can use those functions in service of their goals.

Since we're statisticians and data scientists, I'll demonstrate composition via simple equations as functions.

I'll use loops instead of comprehensions so they're easier to understand as comprehensions aren't mandatory for this class.

In [10]:
import math

def variance(a):
    """Calculate the variance of an array."""
    mean_a = mean(a)
    diff_squares = 0
    
    for xi in a:
        diff_squares += (xi - mean_a)**2
    
    return diff_squares/len(a)

def stddev(a):
    """Calculate the standard deviation of an array."""
    return math.sqrt(variance(a))

stddev(random_numbers)

9.702127126336734

# Basic error handling

Python is dynamically typed which means that types (such as integers or strings) are determined at run time. This means that you can call `mean()` with nonsense values such as strings. However, Python is also strongly typed which in turn means that incorrect types would raise an exception.

In [7]:
# Dynamic typing
calculate_this = "i like vegan pie lolol."

# But strongly checked at runtime
mean(calculate_this)

TypeError: unsupported operand type(s) for +=: 'int' and 'str'

Python isn't pell-mell about typing! Notice the error as well. The error explains that the function is trying to do some operation that the value doesn't support. The error may seem unclear at first, but you can get a sense of what's wrong by the final line which states `unsupported operand type(s) for +=: 'int' and 'str'`.

We can add explicit error checks to functions to provide more information to the caller. 

In [18]:
from array import array

def mean(a):
    """Calculate the arithmetic mean of an iterable.

    Parameters
    ----------
    a: Iterable
        An Iterable of numbers.

    Returns
    -------
    Mean of a as an integer or float.
    """
    # Check for an iterable of numbers.
    if not (isinstance(a, (list, array))
            and
            isinstance(a[0], (int, float))):
        raise TypeError("You need to pass in an iterable of numbers.")

    # Empty arrays cause division by zero.
    if not len(a):
        raise ValueError("You can't calculate the mean of an empty array.")

    sum_a = 0

    for x in a:
        sum_a += x
    
    return sum_a/len(a)

_ = mean(["Dark Souls", "is a", "great series."])

TypeError: You need to pass in an iterable of numbers.

# Longer PokéAPI example

Let's revisit the [Pokémon API](https://pokeapi.co/) example from last week.

I designed this more as a script, but you shouldn't take it as the absolute best way to tackle this problem. The site lists several API implementations in different languages, including Python, that you should use instead.

In [None]:
import io
import matplotlib.pyplot as plt
import json
import time

from requests import Session, HTTPError
from IPython.display import display
from PIL import Image

POKEAPI = "https://pokeapi.co/api/v2/pokemon/{}"
CACHE_PATH = "pokeapi_cache.json"
THROTTLE = 30

session = None
poke_cache = None
last_time = None

def load_cache(path):
    global poke_cache
    try:
        with open(path, 'r') as cache:
            poke_cache = json.load(cache)
    except FileNotFoundError:
        # Create an empty cache if the file doesn't exist.
        # Also...just bubble up the rest of the errors.
        poke_cache = {}

def check_cache(pokenum):
    return poke_cache.get(pokenum)

def update_cache(new_pokemon):
    poke_cache.update(new_pokemon)

def check_time():
    global last_time

    # Check if thirty seconds has elapsed.
    sleep_time = THROTTLE - (time.monotonic() - last_time)
    if sleep_time > 0:
        print(f"Pausing for {sleep_time} seconds.")
        time.sleep(sleep_time)

def create_url(pokenum):
    return POKEAPI.format(pokenum)

def init_scraper(path=CACHE_PATH):
    global poke_cache
    global session
    global last_time

    if not poke_cache:
        load_cache(path)
    if not session:
        session = Session()
    if not last_time:
        last_time = time.monotonic()

def get_pokemon(pokemon_nums, **kwargs):
    pokedata = {}

    for pokenum in pokemon_nums:
        # Check if the Pokémon data exists in the cache
        # instead of scraping again.
        if data := check_cache(pokenum):
            pokedata[pokemon] = data
            continue

        # Throttle to avoid spamming the API
        check_time()
        try:
            url = create_url(pokenum)
            resp = session.get(url, kwargs).raise_for_status().json()
            pokedata[pokenum] = resp
        except HTTPError as e:
            

    update_cache(pokedata)
    return pokedata

init_scraper()
get_pokemon([54])
