# Decorators

## What do decorators look like?

Decorators are a feature of Python that allow you to modify the behaviour of functions and classes. Decorators in Python are typically written like: 

```python
@decorator
def func():
    ...
```

Writing decorators using and "@" symbol is what is called "syntactic sugar", which means that it is simply an alternative way of writing something. The `@decorator` syntax is easier to read, but the equivalent way to write the use of a decorator is perhaps better to understand what it is doing:

```python
def func():
    ...

func = decorator(func)
```

As the second code snippet indicates, decorators are functions. They take as input a callable object (i.e., function or class), modify it in some way, and then return the modified form. The first form conceals that behaviour a bit in favour of it being more obvious that a function is being modified. Once you know how decorators work, the first form is much easier to follow. Furthermore, the second form requires you to write the function name three times, which is far too many.

## How do decorators work?

Decorators use a feature of Python that we haven't talked much about in this course - functions are essentially a kind of variable. Functions can be aliased by using an assignment operation and they can also be returned from other functions. Consider the following examples of those two processes in action using the built-in `print()` function.

In [1]:
x = print

x("hello")

hello


In [2]:
def give_print():
    return print

y = give_print()

y("wassup")

wassup


In both cases, the `print()` function was passed around in some way without being executed. Instead, new variables were created that point to the definition of the `print()` function, wherever that code is stored on your computer. We can see that by printing `x` or `y`.

In [3]:
print(x)
print(y)
# Or printing the help message
help(x)

<built-in function print>
<built-in function print>
Help on built-in function print in module builtins:

print(*args, sep=' ', end='\n', file=None, flush=False)
    Prints the values to a stream, or to sys.stdout by default.
    
    sep
      string inserted between values, default a space.
    end
      string appended after the last value, default a newline.
    file
      a file-like object (stream); defaults to the current sys.stdout.
    flush
      whether to forcibly flush the stream.



In addition to writing a function that returns the function `print()`, we could also return a function that does something we define.

In [4]:
def give_double_print():
    def print_twice(something):
        print(something)
        print(something)
    return print_twice

x = give_double_print()

x(123)

123
123


Let's take a second to talk briefly about what we are doing here. We're defining a function `give_double_print()` and within that function we are defining another function `print_twice()`. The outer function `give_double_print()` returns the inner function, which is then stored in a variable `x`. We can see that x has stored the inner function by printing it.

In [5]:
print(x)

<function give_double_print.<locals>.print_twice at 0x7f6e3a547600>


The inner function `print_twice()` is only defined within the namespace of the outer function (you can kind of see that by noting the "."s in the printed repr of the function). If we try to call `print_twice()` from outside of `give_double_print()` we get an error.

In [6]:
print_twice("hello")

NameError: name 'print_twice' is not defined

In fact, the `print_twice()` function is only defined when we run the `give_double_print()` function. We can see that more clearly by using a different example. Let's consider a case when the outer function takes some input and the inner function takes a different input. As the inner function is defined within the namespace of the outer function, it has access to other data within that scope. This is just like how you have seen in class that you can define variables outside of functions and then refer to them within functions.

In [7]:
def outer_func(outer_input):
    def inner_func(inner_input):
        print(f"I was made with {outer_input}")
        print(f"I was called with {inner_input}")
    return inner_func

Now let's see what happens when we call `outer_func()` and then call the returned function with different inputs

In [8]:
x = outer_func("original_input")

x("something new")

I was made with original_input
I was called with something new


It printed what you probably expected by looking at the function definitions. Let's see what happens when we call it again.

In [9]:
x("yet another thing")

I was made with original_input
I was called with yet another thing


Whenever we call `x`, which is the returned inner function, it remembers the input that was provided when it was created, but uses the new input we provide whenever we call it. This behaviour of functions being associated with data present within their enclosing scope upon their definition is called a "closure".

The reason we care about closures is that decorators are a special kind of closure. Specifically, decorators are functions that look pretty much just like the ones we've been using above, except that instead of taking something like a `str` as an input, they take a function as an input and then do something to the function. Let's take a look at a simple decorator that takes a function as input and returns another function that modifies the behaviour of the first function by simply doubling any inputs provided. 

In [10]:
def double_the_input(func):
    def wrapper_double_input(x):
        result = func(x*2)
        return result
    return wrapper_double_input

@double_the_input
def say_number(num):
    print(num)

say_number(5)

10


Let's walk through that. In this example, our outer function, `double_the_input()` takes a function as input. Within `double_the_input()` we defined a second function called `wrapper_double_input()`. That inner function takes some input, `x`, as well. Inside the inner function, all that happens is the function that was provided to `double_the_input()` is run with double whatever input was provided to `wrapper_double_input()`. Importantly, just like how the closure in the example above remembered what `str` was provided to the outer function when it was first called, `wrapper_double_input()` will remember the function that was provided to its outer function.

Let's try using this decorator a couple of ways to get a better idea of what it's doing. First, let's confirm that our `say_number()` function isn't doing anything odd by itself.

In [11]:
def say_number(num):
    print(num)

say_number(5)

5


So without the decorator, it is behaving as you would expect. At the start of this document I said that the @ syntax was less clear. Let's use the decorator with the clearer syntax to see what it's actually doing. We can use the un-decorated `say_number()` that we just defined in the above cell. This time, though, let's store the decorated version in another variable.

In [12]:
decorated_say_number = double_the_input(say_number)

decorated_say_number(10)
say_number(10)

20
10


As you can see, the decorated version is modified, but the un-decorated version is not. Looking at the use of the decorator without the @ syntax, it is a bit clearer that what we are actually doing it running the decorator function and storing the return value. The @ syntax does that behind the scenes and overwrites the original function with the return value. We can inspect our decorated and un-decorated function now to see that they are indeed different.

In [13]:
print(say_number)

<function say_number at 0x7f6e3a5477e0>


In [14]:
print(decorated_say_number)

<function double_the_input.<locals>.wrapper_double_input at 0x7f6e3a546160>


As you can see, the `decorated_say_number()` is really actually the inner function of our decorator in disguise. When we call our decorated function we aren't actually calling the original function directly, we are instead calling a function that then calls our function. (By the way, the thing that calls a thing is often called a wrapper. This is used both for functions and for scripts that run other scripts or software.)

That's really all there is to it. While the function within a function concept is pretty confusing, that's all decorators are. They are functions that return a wrapper function which modifies the way some other function runs. 

I just want to cover two more things before we move on to looking at some decorators you will see commonly used and will likely find useful yourselves. The first thing is that you can't typically inspect decorated functions and see their true identity. We were able to do that here because, unless you add code to change it, the decorator will not remember the identity of the function being wrapper and will instead print its own details as the repr. If instead we wanted our decorator to preserve the information of the original function, we could use a decorator that does that.

In [15]:
import functools

def double_the_input(func):
    @functools.wraps(func)
    def wrapper_double_input(x):
        result = func(x*2)
        return result
    return wrapper_double_input

@double_the_input
def say_number(num):
    print(num)

print(say_number)

<function say_number at 0x7f6e38483880>


To see that `@functools.wraps` decorator at work, try commenting out that line in the above cell and rerunning it. You'll see the printed repr of the `say_number()` function change.

Note that the way to read what decorators are decorating is to simply read from top to bottom. Decorators decorate the thing on the line below them. In the above cell, `@functools.wraps` is decorating `wrapper_double_input()`. You may also see examples of more than one decorator, one above the other. You can think of that as the first decorator decorating the output of the second decorator. Two decorators, "first" and "second" decorating a function could be written equivalently as either of the following.

```python
def func():
    ...

func = first(second(func))

# or

@first
@second
def func():
    ...
```

The second thing I want to cover is a quick point about how decorators transmit arguments to the function they are wrapping. In the example decorator above, the input to `wrapper_double_input()` is a single argument. However, many decorators are designed to work for a wide range of functions. In most cases, the authors of decorator functions won't know beforehand what inputs to expect. Instead, they would probably be best off blindly passing any and all inputs straight to the wrapped function. Indeed, that's what is typically done. Here I just want to quickly describe the syntax that can be used to do that: `*args` and `**kwargs`.

`*args` and `**kwargs` refer to arguments and keyword arguments respectively. arguments are any inputs given to a function that are not provided with a name associated with them, while keyword arguments are those that are given as named arguments. Consider the following example function

```python
def func(x, y):
    ...
```

calling that function with args would look like this

```python
func(1, 2)
```

while calling that function with kwargs would look like 

```python
func(x=1, y=2)
```

So what happens if we instead define the function to accept `*args` and `**kwargs`? Let's run the code and see.

In [16]:
def func(*args, **kwargs):
    print(f"args were: {args}")
    print(f"kwargs were: {kwargs}")
    
func(1, 2)

args were: (1, 2)
kwargs were: {}


In [17]:
func(x=1, y=2)

args were: ()
kwargs were: {'x': 1, 'y': 2}


In [18]:
func(1, y=2)

args were: (1,)
kwargs were: {'y': 2}


As you can see, when we provide positional arguments, they are stored in `args` and when we provide keyword arguments they are stored in `kwargs`. Just to be completely clear, let's see if it was the words "args" and "kwargs" that were acheiving this outcome:

In [19]:
def func(*apple, **banana):
    print(f"apple was: {apple}")
    print(f"banana was: {banana}")
    
func(1, y=2)

apple was: (1,)
banana was: {'y': 2}


No! In fact, `*` and `**` is a Python idiom. `*` unpacks a list of 0 or more elements into a tuple. When you provide positional arguments, those are stored in a tuple behind the scenes and then unpacked into the parameters defined in your function. The `*` syntax essentially means "put the rest in here as a tuple". You can use the same syntax when unpacking tuples in other situations. For example

In [20]:
a, b, c = 1, 2, 3
print(a)
print(b)
print(c)

1
2
3


In [21]:
a, *b = 1, 2, 3
print(a)
print(b)

1
[2, 3]


Note that `*` actually just takes whatever isn't explictely being stored in another variable so you can take the first and last element and put the rest in another variable as follows

In [22]:
a, *b, c = [i for i in range(10)]
print(a)
print(b)
print(c)

0
[1, 2, 3, 4, 5, 6, 7, 8]
9


This is very handy for unpacking columns of files or any other case where you just want one field but don't want to both writing out variables to take the rest. It can also be clearer than indexing an executed function like `x = some_func()[0]` as that requires you to read to the end of the function call, which can be unclear if the function takes a lot of arguments. This syntax can be clearer as all the details of the assignment are on the left of the `=` sign.

`**` is a bit different to `*`. While `*` could be used within functions and in other cases of unpacking a container object like a `list` or `tuple`, `**` is only used in handling keyword arguments in calling or defining functions. In both cases `**` works to convert the keyword arguments either to or from a `dict`. As you saw in the example above, when we gave `func()` keyword arguments, they were stored in a dict in which the name of the parameter is the key, and the provided input is the value. In addition, `**` can be used to provide a dict of the same format as input to a function. For example,

In [23]:
our_kwargs = {"x": 1, "y":2}

func(**our_kwargs)

apple was: ()
banana was: {'x': 1, 'y': 2}


## Useful decorators

Now that we have (hopefully) cleared up basically what decorators do and how they work, let's take a look at a few that you might find useful. We'll discuss four decorators here, all of which decorate classes or their methods. The decorators we will discuss are `@dataclass`, `@staticmethod`, `@classmethod`, and `@property`

### `@dataclass`

The `@dataclass` decorator provides some time-saving functionality when writing classes. In the most basic usage, `@dataclass` handles writing a simple `__init__()`, `__repr__()`, and `__eq__()` methods. It's especially useful when writing classes that are simply going to store provided inputs in fields without any complex `__init__()` processing. Additionally, when paired with the `@classmethod` decorator that we will discuss below, you can also use the automatically generated `__init__()` of a dataclass to store data that requires processing.

Defining a dataclass looks a lot like defining any other class except that you start your class with (after the docstring) an ordered list of the attributes you want to be populated during instance creation. Any optional attributes can be assigned a default value.

In [24]:
from dataclasses import dataclass

@dataclass
class MyDataClass:
    """docstring goes here"""
    a: str
    b: int
    c: int = 0

x = MyDataClass("Hi", 1)
print(x)

MyDataClass(a='Hi', b=1, c=0)


As you can see, we have a simple class that initialized correctly and also has a nice `__repr__()` that prints informatively and would be sufficient to recreate our instance if we copied and pasted that print-out into a script. Let's compare that to a class without the decorator that works the same way.

In [25]:
class MyRegularClass:
    """docstring"""
    def __init__(self, a:str, b:int, c: int = 0):
        self.a = a
        self.b = b
        self.c = c
    def __repr__(self):
        return f"{self.__class__.__name__}(a='{self.a}', b={self.b}, c={self.c})"
    
    def __eq__(self, other):
        return (self.a, self.b, self.c) == (other.a, other.b, other.c)

y = MyRegularClass("Hi", 1)
print(y)
print(x==y)

MyRegularClass(a='Hi', b=1, c=0)
True


As you can see, `@dataclass` automates the writing of some fairly laborious "boilerplate code". If we were to make a class with many attributes or found ourselves changing the names of our attributes, modifying the `__repr__()` and `__eq__()` methods manually would be quite unpleasant. Furthermore, `@dataclass` can also write all the other comparator methods to support things like `>` and `<`. If you simple use `@dataclass(order=True)` then it will handle writing those methods to simply compare instances as ordered `tuple`s of their attributes (i.e., compares the first attribute of each instance, then if those are the same checks the second etc until it finds attributes that differ).

### `@staticmethod`

Sometimes when writing methods in a class, you might find that the method is likely to not only be useful for working with data stored in instances of the class, but also for working with data outside the class instances. An example of this could be if you are writing a class to store DNA sequences. It would be sensible to write a method to reverse complement sequence. However, you could foreseeably want to reverse complement sequences not stored in the class as well. A staticmethod would be suitable in that case. The sequence class example would look like something this

In [26]:
class SequenceRecord:
    def __init__(self, header: str, sequence: str):
        self.header = header
        self.sequence = sequence
    
    @staticmethod
    def reverse_complement(seq):
        rev_bases = {"A": "T", "T": "A", "C": "G", "G": "C"}
        return "".join([rev_bases[b] for b in seq[::-1]])

If you compare the definition line of the `reverse_complement()` method, you'll notice that it does not include a `self` parameter. Staticmethods don't take `self` as an input and do not have access to the attributes of the class. That means that they do not require an instance of the class to exist to be called.

We can now use the `reverse_complement()` method anywhere we have access to the `SequenceRecord` class. In fact, we can use the method either by calling it in an instance of the class, or directly in the class itself.

In [27]:
x = SequenceRecord("seq1", "GTCGTGACGTCAAAC")

print(x.reverse_complement("GGCCAATT"))

print(SequenceRecord.reverse_complement("ATCG"))

AATTGGCC
CGAT


Writing a method as a staticmethod doesn't stop us using the method within the class. The only difference is that when calling that method from another method of the same class is that we would need to explicitely pass an argument containing the data we want to analyze. With instance methods the method always has access to the attributes of the instance through the `self` variable so you generally don't need to pass things between instance methods. A method of the `SequenceRecord` class would call the `reverse_complement()` method as `self.reverse_complement(<sequence>)`.

### `@classmethod`

If, during last week's session on classes, or during this week's session, you have considered how you might apply classes to your own code, you might have wondered about how you could write `__init__()` methods that could handle different input formats. For example, perhaps you want to write a class that can handle genome features like those we used in several of the assignments in this class. To write such a class, you could reasonably expect to need to write code to handle the features being encoded in [BED](https://genome.ucsc.edu/FAQ/FAQformat.html#format1), [GFF](https://genome.ucsc.edu/FAQ/FAQformat.html#format3), [GTF](https://genome.ucsc.edu/FAQ/FAQformat.html#format4), [GBK](https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html), or another format. One possibility is that you could write your `__init__()` function to take an argument that states the format to use. However, your `__init__()` would quickly become complex. An alternative approach is to use the `@classmethod` decorator to define a method for each different way you could want to make an instance of your class (e.g., one method for each distinct format in which genome features can be encoded).

Staticmethods, which were described above, change the nature of methods, by removing `self` from the parameters of the method. Classmethods also make a large change to the way the method behaves. Specifically, classmethods don't take `self` as their first input, but instead take the class itself. Classmethods then return an instance of the class.

To make use of a classmethod, you simply write a generic `__init__()` that will take all the inputs you will want to store as attributes in your class. All of the processing is going to be performed by the classmethod so the `__init__()` can simply take some arguments and then store those directly in attributes without further processing. Once you have written the `__init__()` you then write a method that takes some sort of input, extracts all the desired information, and then creates an instance of the class just like you would outside the class. An example of how that would look in a simple example using genomic features in BED format is as follows

In [28]:
class GenomeFeature:
    def __init__(self, seqid: str, start: int, stop: int):
        self.seqid = seqid
        self.start = start
        self.stop = stop
    
    @classmethod
    def from_bed(cls, bedline: str):
        columns = bedline.strip().split("\t")
        seqid = columns[0]
        start = int(columns[1])
        stop = int(columns[2])
        
        instance = cls(seqid, start, stop)
        return instance
        
bed_line = "some_contig\t1000\t2000\n"

feat = GenomeFeature.from_bed(bed_line)

print(feat)
print(vars(feat))

<__main__.GenomeFeature object at 0x7f6e13e10cd0>
{'seqid': 'some_contig', 'start': 1000, 'stop': 2000}


Python handles giving the classmethod the class for the instantiation, just like it handles giving `self` to regular instance methods. All you need to do for a classmethod is process the data into the format that your `__init__()` is expecting and then create the instance and return it. Let's add a second classmethod to handle GFF, just to show that you can have more than one classmethod for the same class.

In [29]:
class GenomeFeature:
    def __init__(self, seqid: str, start: int, stop: int):
        self.seqid = seqid
        self.start = start
        self.stop = stop
    
    @classmethod
    def from_bed(cls, bedline: str):
        columns = bedline.strip().split("\t")
        seqid = columns[0]
        start = int(columns[1])
        stop = int(columns[2])
        
        instance = cls(seqid, start, stop)
        return instance
    
    @classmethod
    def from_gff(cls, gffline: str):
        columns = gffline.strip().split("\t")
        seqid = columns[0]
        start = int(columns[3]) - 1 # 1-based numbering in GFF files
        stop = int(columns[4])
        
        instance = cls(seqid, start, stop)
        return instance

bed_line = "some_contig\t1000\t2000\n"

bed_feat = GenomeFeature.from_bed(bed_line)
    
gff_line = "contig_1\tGene\tpromoter\t1000\t1010\t42\t+\t.\tX"

gff_feat = GenomeFeature.from_gff(gff_line)

print(bed_feat)
print(vars(bed_feat))
print(gff_feat)
print(vars(gff_feat))

<__main__.GenomeFeature object at 0x7f6e13e12250>
{'seqid': 'some_contig', 'start': 1000, 'stop': 2000}
<__main__.GenomeFeature object at 0x7f6e13e13e10>
{'seqid': 'contig_1', 'start': 999, 'stop': 1010}


### `@property`

Finally, consider a situation in which you want to control how a user can interact with an attribute in some way. An example of such a case could be one that sticks with the above example of genome features: you might want to have a public API (i.e., what a user sees) that uses a 1-base counting system like GFF format, but behind the scenes you might prefer to use 0-base counting as that is what Python uses. One solution to this situation could be to have a method to set the attribute (e.g., called `set_location`), which allows the user to provide a 1-base number that your class then converts to 0-base and stores. You could then have another method that performs the inverse operation (e.g., called `get_location`). This would work, but your user would have to remember two different methods to use to interact with the same attribute.

Instead, in such a case, you would probably be better off using a `@property`. What a `@property` does is essentially what I just described, except it does it behind the scenes and exposes a single attribute name to the user. Then, by using context clues about how the attribute is used (e.g., in an assignment operation vs as an argument in a function call) the `@property` decorator handles calling the appropriate method. The data would still be stored in an attribute of the class, and the `@property` method would simply access the attribute to set or retrieve the information. Typically such an attribute is treated as private (for use within the class only), which is indicated with a leading "\_" character We could write a simple class that offers this 0-base to 1-base conversion using a `@property` as follows

In [30]:
class CoordinateClass:
    def __init__(self, position: int = 0):
        self._pos = position
    
    # This method is how the attribute should be retrieved
    @property
    def pos(self):
        return self._pos + 1
    
    # The setter method is how the attribute should be changed or created
    @pos.setter 
    def pos(self, val: int):
        self._pos = val - 1

x = CoordinateClass()

x.pos = 100
print(f"The public value of pos through the @property is {x.pos}")
print(f"The private value of _pos hidden in the class is {x._pos}")

The public value of pos through the @property is 100
The private value of _pos hidden in the class is 99


As you can see in the above code, the `@property` decorator is a bit different from other decorators that we've looked at because you need to call use it in a different way to decorate multiple methods and it changes name after the first method. There's not much to understand here; it's just how it's written, and so you just need to remember this syntax if you want to use properties. A bit of jargon that might help is that the first method you define, which you decorate with `@property`, is called the "getter" method. That's the one that retrieves that value. Once you have defined the "getter" method, that property is then referred to using the name of the getter method in future decorators. Additionally, you typically define a "setter" method to create or change the value. The "setter" method is decorated with a decorator that is the name of the getter function plus `.setter`. That identifies the decorated method as defining how the property should be set. Finally, you may also define a "deleter" method if you like. That method controls how the value should be deleted in case there is anything specific about how that should happen.

In addition to examples where you might want to mdofiy values before storing or returning them, a couple of other cases in which you might want to use properties:

1. If it is computationally demanding to calculate something about your data, you could have a getter method that checks if the data have already been calculated and then calculate and store the data if not. This means you only expend that compute time if something asks for the data.
2. If you want to create a read-only attribute then you can simply only define a getter method and not define a setter method. Then if anyone tries to use the property to change the stored value, they will get an error. Note that this isn't foolproof. The "private" attributes that start with "\_" are not actually private. Anyone with access to your class can change them. Python doesn't prevent anyone from accessing anything in the code. Nothing is truly private. Instead, Python operates under a system where anyone changing private variables is free to do so, but is responsible for the consequences. Read-only attributes are therefore only encouragement to not change the value. They will stop people who are not familiar with Python, and will make other people pause to think about what they are doing and whether they should actually change the value.