<a href="https://colab.research.google.com/github/jbpost2/ST-554-Big-Data-with-Python/blob/main/01_Programming_in_python/22-Function_Annotations_Decorators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Function Annotations & Decorators

> Justin Post

There are a few more extremely useful techniques we can apply when creating or running our functions. We'll cover two of those here:

- Function Annotations: Help users by improving messages or describing the types of inputs and outputs we should use/expect
- Function Decorators: Add extra behavior to a function without modifying the function's source code

Note: These types of webpages are built from Jupyter notebooks (`.ipynb` files). You can access your own versions of them by [clicking here](https://colab.research.google.com/github/jbpost2/ST-554-Big-Data-with-Python/blob/main/01_Programming_in_python/22-Function_Annotations_Decorators.ipynb). **It is highly recommended that you go through and run the notebooks yourself, modifying and rerunning things where you'd like!**

## Annotations

We can describe two major <a href = "https://peps.python.org/pep-3107/" target = "_blank">types of annotations</a>:

- parameter (input) annotations
- return value annotations

Let's start by discussing how we might use parameter annotations to improve usability of our functions.

### Parameter Annotation

Consider the basic function below that finds a trimmed mean from a list of values we created a while back (of course we'd prefer to use `numpy` `arrays` now that we know them, but let's just consider this function for now).


In [None]:
def find_mean(y, method = None, p = 0):
    """
    Quick function to find the mean or trimmed mean
    Assumes we have a list with only numeric type data
    If method is set to Trim, will remove outer most p values off the data
    before finding the mean
    p should be a number between 0 and 0.5
    """
    if method == "Trim":
      sort_y = sorted(y)
      to_remove = floor(p*len(sort_y))
      y = sort_y[to_remove:(len(sort_y)-to_remove)]
    return sum(y)/len(y)

Providing annotations for our parameters (`y`, `method`, and `p`) can help the user to understand what our function expects those inputs to be. For instance, we can state that `y` should be a list of numeric values, `method` should be a string, and `p` should be a numeric value. This augments the use of the docstring.

Parameter annotations take the form of optional expressions that follow the parameter name:

`def foo(a: expression, b: expression = 5):`

So we want to put the expression prior to any default values. Let's put some annotations in our function definition.

- `list[float]` implies that the first argument should be a list containing floats (integers work too)
- `None | str` imply the second argument should be the special value `None` or a string
- `float` specifies that `p` should be a float

**Note: The only one of these I am able to get working with Colab is the third one. These others work with `mypy` though. If you are interested in that <a href = "https://mypy-lang.org/" target = "_blank">check here</a> or stop by office hours and we can chat!**

In [None]:
from math import floor

def find_mean(y: list[float], method: None | str = None, p: float = 0):
    """
    Quick function to find the mean or trimmed mean
    Assumes we have a list with only numeric type data
    If method is set to Trim, will remove outer most p values off the data
    before finding the mean
    p should be a number between 0 and 0.5
    """
    if method == "Trim":
      sort_y = sorted(y)
      to_remove = floor(p*len(sort_y))
      y = sort_y[to_remove:(len(sort_y)-to_remove)]
    return sum(y)/len(y)

This doesn't actually change how the code executes or anything like that.

In [None]:
find_mean([1, 3, 10, 21, 500], method = None)

107.0

In [None]:
find_mean([1, 3, 10, 21, 500], method = 'Trim', p = 0.2)

11.333333333333334

What it does is give us an alternative way to do type checking. We have an additional `__annotations__` attribute on our function. This is a mutable dictionary!

In [None]:
find_mean.__annotations__

{'y': list[float], 'method': None | str, 'p': float}

In Colab, we can enable type checking when we run functions by going to
- `Tools` -> `Settings` -> `Editor`
- Scrolling down to `Code diagnostics`, select `Syntax and Type Checking`

Now when we run code that has annotations, *some of them* are checked first! Again, I can only get the third one to work in Colab...

In [None]:
find_mean([1, 3, 10, 21, 500], method = 'Trim', p = 0.2) #works fine

11.333333333333334

In [None]:
find_mean([1, 3, 10, 21, 500], method = 'Trim', p = '20%') #p should be a float!

TypeError: must be real number, not str

In [None]:
find_mean(y = "cat") #should throw a different TypeError than we see here... alas

TypeError: unsupported operand type(s) for +: 'int' and 'str'

### Return Value Annotation

We can also note the type of the value to be returned. The syntax for this is

`def foo() -> expression:`

For our `find_mean()` function, we can specify that a float should be returned.

In [None]:
from math import floor

def find_mean(y: list[float], method: None | str = None, p: float = 0) -> float:
    """
    Quick function to find the mean or trimmed mean
    Assumes we have a list with only numeric type data
    If method is set to Trim, will remove outer most p values off the data
    before finding the mean
    p should be a number between 0 and 0.5
    """
    if method == "Trim":
      sort_y = sorted(y)
      to_remove = floor(p*len(sort_y))
      y = sort_y[to_remove:(len(sort_y)-to_remove)]
    return sum(y)/len(y)

Again, this doesn't change how the code executes but is useful to have when understanding what a function should do.

## Decorators

## Recap

Function annotations allow us to describe inputs and outputs more clearly. There are ways to enforce the types specified and, eventually, this should become an option that can be done in `python` without other packages or things needed.

Function decorators allow us to modify the behavior of a function while not changing the source code.

We'll end up seeing these when we get into `Spark`. They'll help us understand the inner workings of `Spark` just a little better. Here is an example where knowing these two syntaxes are useful!

```
class FakeDataSource(DataSource):
    """
    A fake data source for PySpark to generate synthetic data using the `faker` library.
    Options:
    - numRows: specify number of rows to generate. Default value is 3.
    """

    @classmethod
    def name(cls) -> str:
        return "fake"

    def schema(self) -> Union[StructType, str]:
        return "name string, date string, zipcode string, state string"

    def reader(self, schema: StructType) -> DataSourceReader:
        return FakeDataSourceReader(schema, self.options)

    def writer(self, schema: StructType, overwrite: bool) -> DataSourceWriter:
        return FakeDataSourceWriter(self.options)

    def streamReader(self, schema: StructType) -> DataSourceStreamReader:
        return FakeStreamReader(schema, self.options)

    def streamWriter(self, schema: StructType, overwrite: bool) -> DataSourceStreamWriter:
        return FakeStreamWriter(self.options)
```

We can see the use of annotations to help us understand what the inputs and outputs for these functions should be. A decorator is used

If you are on the course website, use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!

If you are on Google Colab, head back to our course website for [our next lesson](https://jbpost2.github.io/ST-554-Big-Data-with-Python/01_Programming_in_python/23-Basics_Using_Git_Github_Landing.html)!