# Classes

Classes in Python are definitions of types of objects. All objects in Python have a type or class (e.g., `str`, `int`, `list`), which determines how they behave, what they can store, and what you can do with them. We have spent some time discussing when you might need to use one class over another, and have sometimes had to convert our data between classes in order to perform a desired operation. In addition to using the built-in classes like `str`s, we can also define our own classes in order to create objects that behave exactly as we might like.

As you have probably noticed over the last few weeks of this course, you have been quite capable of writing sophisticate programs that can perform complex and useful analyses without the need to define custom classes. Indeed, with just built-in classes and the ability to define your own functions, you can do just about anything you could want to in Python. However, classes can make it easier to perform such complicated analyses. This Jupyter notebook is going to present some of the features of classes and what they can do. We will specifically focus on how the discussed features are useful. Hopefully, by the end of this document, anyone who has never seen a class before will have a good appreciation of why they might want to go to the effort of learning to use the.

We will begin by discussing two benefits that classes offer which make them worth adopting: 
1. Attributes - they allow you to pass around data in a clearer, more readable, and therefore less error-prone way.
2. Methods - they allow you to write functions that are specifically tailored to the structure of your data in a way that makes it very clear the purpose of the function.

In the interest of space, I won't add docstrings to all of the classes here, but just as with functions, docstrings should be used for classes whenever their function is not extremely obvious. Check out the [Google](https://google.github.io/styleguide/pyguide.html#384-classes) and [NumPy](https://numpydoc.readthedocs.io/en/latest/format.html#documenting-classes) stylesheets for inspiration as to how you may wish to document your classes.

## Class attributes

The most simple use-case of a class is to simply store data in named fields. This sort of usage is very similar to how you might use a `dict`. Let's start by looking at a very simple class to introduce how classes can be made. Then we'll write a class that stores BLAST outputs and explore the benefits of classes by expanding the functionality of the class throughout this document.

In [1]:
class SimpleClass:
    pass

x = SimpleClass()

print(type(x))

<class '__main__.SimpleClass'>


As the printed type tells you, `x` is now an instance of our `SimpleClass` class. [As specified in PEP8](https://peps.python.org/pep-0008/#class-names), class names are typically written with the first letter of each word capitalized and without underscores.

Our `SimpleClass` doesn't really do anything yet. Within the definition block, all that is written is `pass`. However, we can now interact with the instance `x` to see how attributes work.

In [2]:
x.new_att = 5

print(x.new_att)

5


Attributes are sort of like variables defined within the namespace of your class. You refer to them the same way you refer to other things within a certain namespace: using `.`. Once defined, an attribute can be accessed as above, but simply calling it. Compare that to how we could achieve the same thing with a `dict`:

In [3]:
d = {}
d["new_att"] = 5

print(d["new_att"])

5


For this simple case, the class and the `dict` approach to storing data in a way that can be retrieved by name is very similar. Perhaps the `dict` is even superior in this example as we could programmatically store any data with any key, as `dict` keys can be different types of variables, while class attributes are typically set and retrieved by name (although it is possible to retrieve class attributes using `str`s). However, while `dict`s are very flexible ways to store data, as soon as we want to store data of a consistent form and process it in a consistent way, classes start to show their use. Let's take a look at how we might write a class which will be used to store a consistent type of data.

For this example, instead of creating an empty class that has absolutely nothing defined, we will write a function that controls how an instance of the class will be created. This function is called `__init__()` (or "dunder init"). Before we make a start on our BLAST output-handling class, let's write another simple class to explore how `__init__()` (and class instance methods in general) differ from the functions you have been writing.

In [4]:
class Person:
    def __init__(self, name, height, age):
        self.name = name
        self.height = height
        self.age = age

The above `__init__()` function takes four inputs. The name, height, and age inputs probably make sense to you. However, the `self` input is something we haven't seen before.

## The `self` keyword

`self` is a keyword that is used within a class definition to refer the an individual instance. Specifically, `self` will refer to whichever instance is being considered at that moment. The easiest way to explain this is with an example. Let's create a couple of instances of our `Person` class and use those examples to illustrate what the `self` keyword is doing. We'll create two `Person` instances and then just view all of their attributes with `vars()`. `vars()` is a bit like the function `dir()` that we used when covering packages and modules. It just returns all the things in the namespace of the class instance you call it with. [You can see the docs here](https://docs.python.org/3/library/functions.html#vars).

In [5]:
sally = Person("Sally", 160, 28)
paul = Person("Paul", 180, 83)

print(f"Sally's vars are {vars(sally)}")
print(f"Paul's vars are {vars(paul)}")

Sally's vars are {'name': 'Sally', 'height': 160, 'age': 28}
Paul's vars are {'name': 'Paul', 'height': 180, 'age': 83}


So what exactly happened when we ran the above code? Let's go through it step by step to explore how our class definition determined how we then interact with it.

### Class definition line

Firstly, you may have noticed that in our class definition we started with the line

```python
class Person:
```

followed by no parentheses. This is in contrast to how that would have looked if `Person` were a function. That might have looked more like this

```python
def Person(param1, param2):
```

However, when we created instances of our `Person` class, we *did* provide inputs just like we would have done for a function. What is the meaning of this difference in definition and usage syntax? There are two explanations for these syntactic differences.

First, why did we not include parentheses containing parameters in our class definition? The reason for this is that parentheses in a class definition line do something else. We'll discuss this in another jupyter notebook when we talk about subclasses and inheritance. For now, a short answer is that when you define a class, you can copy an existing class to be the starting point for your new class. The copied class is what you put in the parentheses of a class definition line. If you are making a class completely from scratch then you don't need parentheses as all.

Second, why *did* we use parentheses and provide arguments when creating an instace of our class? As our class definition didn't include the specification of expected parameters, how were the provided arguments used? To understand what heppened there, we need to briefly look behind the scenes at how Python handles the creation of a class instance. Specifically, what code it executes when creating the instance. 

[As described in the docs](https://docs.python.org/3/reference/datamodel.html#object.__new__), when you create a new instance of a class (e.g., `gerty = Person("Gertrude", 160, 43)`), Python runs a dunder method called `__new__()`, which is responsible for creating a new instance of the class. `__new__()` then looks to see if there is a `__init__()` method defined for the class. If a `__init__()` method is found, it is called with the newly created class instance as the first argument followed by any arguments that were given ("Gertrude", 160, and 43 in the gerty example). 

So when we provide arguments when we initialize a class, those arguments aren't used directly by the class, but are instead silently passed along to the methods `__new__()` and then `__init__()`. It is the `__init__()` method that ends up using the provided arguments as well as the newly created class instance.

### The `__init__()` method

Within our `Person` class definition is an `__init__()` method definition. As described above, `__init__()` is used by Python to create your class instances. [The docs](https://docs.python.org/3/reference/datamodel.html#object.__init__) have a nice way of explaining how the two methods work together to create a class instance: Python uses "`__new__()` to create it, and `__init__()` to customize it".

`__init__()` is what's called an "instance method", which means that it acts to change or otherwise interact with an instance of the class for which it was defined. We will look at using other instance methods below. `__init__()` is simply an instance method that is used directly by Python during the creation of your instance. As usual, you can recognize when something is going to be used directly by Python when it is surrounded by double underscores (double underscore is often abbreviated to "dunder").

Apart from having underscores around it and being referred to as a special type of function (an instance method), `__init__()` is very similar to the functions you have been writing. It takes some inputs and does stuff with them. In the case of `__init__()` and all other instance methods, the first parameter is always the instance of the class. In our `Person` class, `__init__()` also takes some other inputs, but that is not required.

What `__init__()` does with the provided inputs is typically use them in some way to set up the instance of the class. `__init__()` can do perform any operations your would normally be able to use in a function. You can use imported modules, built-ins, and locally defined functions and classes to modify the provided inputs or you can simply use the inputs as they were provided and assign them to attributes of your class instance.

Within the `__init__()` function, assignment to an attribute of the class instance is done using the `self` parameter. `self` is used not because it is a magic word, but simply because it is the parameter that points to the class instance in your memory. This is exactly the same as how you can use different names to refer to a variable within a function compared to outside the function. When you pass arguments to a function, those arguments are referred to within the function by whatever names were specified in the function definition line. Indeed, instance methods always take an implicit first argument of the instance that they are working with. By convention this first argument is referred to as `self` to make clear that the class instance is using its own attributes. However, as with other parameter names used within a function, any string of characters can be used for the parameter name. `self` is simply a convention that makes it clear to a reader of your code what is going on.

`__init__()` doesn't return anything. Instead, it has made modifications to a mutable instance of the class and so those changes are reflected outside the function without the need to return anything. As `__init__()` accepted an argument with the specific instance it was supposed to modify, it only made changes to that one instance. We can see that reflected in the fact that `sally` and `paul` in the above example have different attributes to one another.

Finally, while we are only going to discuss a few, you can see all of the methods defined for a class using `dir()` just like we did to examine module contents previously. In the printed list of methods you can see the `__new__()` and `__init__()` methods described here as well as many more.

In [6]:
print(dir(Person))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']


## Defining instance methods

Before we get into instance methods, let's create a class that does something useful so we can start to see why classes are worth bothering to write. You have been working a lot with BLAST output in this course. I expect at this point you are a bit sick of writing code to extract specific indices from lines that you have split on tabs. Also, at this point you might be able to name the index of sstart and send without having to think. However, I would be surprised if you didn't make a single mistake when writing code to use those columns at some point in this course. Let's see how we could have been doing all that using a simple class instead.

In [7]:
class BLASTResult:
    def __init__(self, line):
        """ Read line of BLAST output. Assume -outfmt '6 std qlen'"""
        
        # Store the basic field data
        cols = line.split()
        self.qseqid = cols[0]
        self.sseqid = cols[1]
        self.pident = float(cols[2])
        self.length = int(cols[3])
        self.mismatch = int(cols[4])
        self.gapopen = int(cols[5])
        self.qstart = int(cols[6])
        self.qend = int(cols[7])
        self.sstart = int(cols[8])
        self.send = int(cols[9])
        self.evalue = float(cols[10])
        self.bitscore = float(cols[11])
        self.qlen = int(cols[12])
        
        # Store some convenience data
        self.prcnt_length = 100*self.length/self.qlen
        if self.sstart < self.send:
            self.reverse = False
        else:
            self.reverse = True
        
        if self.qstart == 1 and self.qend == self.qlen:
            self.full_length = True
        else:
            self.full_length = False
        
        

Now we have a basic class which stores the output of a BLAST run using `-outfmt '6 std qlen'` and also performs some simple calculations to assess if the result is full length or in the reverse direction in the subject sequence. We can now use this class to read in some BLAST results. This provides us with a nice and tidy interface to query our BLAST results for information that we want. Even with the simple class defined above, this is already a much nicer way to interact with our BLAST results.

In [8]:
blast_lines = [
    '515F\tNZ_CP028827.1\t89.474\t19\t2\t0\t1\t19\t46920\t46938\t0.005\t32.4\t19',
    '806R\tNZ_CP028827.1\t85.000\t20\t3\t0\t1\t20\t47211\t47192\t0.45\t26.1\t20',
    '515F\tNZ_CP028827.1\t89.474\t19\t2\t0\t1\t19\t144156\t144174\t0.005\t32.4\t19',
    '806R\tNZ_CP028827.1\t85.000\t20\t3\t0\t1\t20\t144447\t144428\t0.45\t26.1\t20'
]

blast_results = []
for line in blast_lines:
    blast_results.append(BLASTResult(line))
    
for result in blast_results:
    print(result.sstart, result.send, result.reverse)

46920 46938 False
47211 47192 True
144156 144174 False
144447 144428 True


You could take this further if you wanted. For example, the `__init__()` function could take an argument specifying what the outfmt used for the BLAST was. You could then populate all the relevant attributes with their data and leave any missing fields as `None`. We won't do that here. Instead, let's start exploring how we might use instance methods to increase the utility of our `BLASTResult` class.

You might recognize the example BLAST lines above. Those are results from the isPCR exercise. We needed to perform a couple of operations on those BLAST results. first, we needed to check if the hits were good enough to keep to model whether primer annealing would result in amplification. Second, we needed to check if each pair of primers was close enough together and in the correct orientation to produce an amplicon. We can perform both of those checks with instance methods. Let's redefine our class below with some extra methods.

In [9]:
class BLASTResult:
    def __init__(self, line):
        """ Read line of BLAST output. Assume -outfmt '6 std qlen'"""
        
        # Store the basic field data
        cols = line.split()
        self.qseqid = cols[0]
        self.sseqid = cols[1]
        self.pident = float(cols[2])
        self.length = int(cols[3])
        self.mismatch = int(cols[4])
        self.gapopen = int(cols[5])
        self.qstart = int(cols[6])
        self.qend = int(cols[7])
        self.sstart = int(cols[8])
        self.send = int(cols[9])
        self.evalue = float(cols[10])
        self.bitscore = float(cols[11])
        self.qlen = int(cols[12])
        
        # Store some convenience data
        self.prcnt_length = 100*self.length/self.qlen
        if self.sstart < self.send:
            self.reverse = False
        else:
            self.reverse = True
        
        if self.qstart == 1 and self.qend == self.qlen:
            self.full_length = True
        else:
            self.full_length = False
    
    def good_hit(self, percent_id: float, percent_len: float) -> bool:
        if self.pident >= percent_id and self.prcnt_length >= percent_len:
            return True
        else:
            return False
    
    
    def check_pair(self, other_hit: BLASTResult, max_amp_size: int) -> bool:
        """Check if this result would make an amplicon with another result, assuming both anneal"""
        
        # call primer with lower index "p_forward" and other "p_reverse"
        if self.sstart < other_hit.sstart:
            p_forward = self
            p_reverse = other_hit
        else:
            p_forward = other_hit
            p_reverse = self
        
        # check orientation (perhaps could have added a "forward" attribute...)
        if p_forward.reverse or not p_reverse.reverse:
            return False
        
        # check distance between hits
        if p_reverse.send - p_forward.send > max_amp_size:
            return False
        
        # otherwise all good
        return True
    

Now let's perform the pairwise comparisons we used in the isPCR module to identify which pairs would result in an amplicon.

In [10]:
# Recreate the instances with our new class definition
# only required because this is a jupyter notebook so it keeps old variables with the old class definitions

blast_lines = [
    '515F\tNZ_CP028827.1\t89.474\t19\t2\t0\t1\t19\t46920\t46938\t0.005\t32.4\t19',
    '806R\tNZ_CP028827.1\t85.000\t20\t3\t0\t1\t20\t47211\t47192\t0.45\t26.1\t20',
    '515F\tNZ_CP028827.1\t89.474\t19\t2\t0\t1\t19\t144156\t144174\t0.005\t32.4\t19',
    '806R\tNZ_CP028827.1\t85.000\t20\t3\t0\t1\t20\t144447\t144428\t0.45\t26.1\t20'
]

blast_results = []
for line in blast_lines:
    blast_results.append(BLASTResult(line))

# keep good hits
good_hits = [p for p in blast_results if p.good_hit(percent_id=30, percent_len=90)]

# find pairs that would amplify
for n, p1 in enumerate(good_hits):
    for p2 in good_hits[n+1:]:
        amp = p1.check_pair(p2, 2000)
        if amp:
            print(p1.sstart, p2.sstart, amp)
        

46920 47211 True
144156 144447 True


As you can see, our `BLASTResult` class is now able to store the fields of BLAST output and provides methods to perform useful analyses of the BLAST output. Wherever we have an instance of the `BLASTResult` class, we now have access to a function to tell us if the hit meets our criteria for goodness and a function to tell us if the hit would make an amplicon when paired with a comparator hit.

As you learned when you wrote this code yourself for the assignment, you don't *need* classes to do this. However, there are a couple of benefits to using a class like this to perform the above analysis.

1. Data stored in class attributes provide a readable and tidy way to read data associated with your class instance.
2. instance methods go wherever your class goes. If you import the class into a script or pass a class instance as an argument to a function, then it always bring all attributes and methods wherever it goes. If we had separate functions for dealing with our class we might have a harder time making sure they are available where we need them.
3. By defining the functionality used here within class instance methods, we are organizing functions that will really only work for this specific format of data into a place where those functions will only ever run on those data. In general, if you find yourself writing a function that will only ever take a very specific type of data as input, especially if the function will return a modified form of the same data, you should consider making that a method in a class. It isn't very useful to define those functions outside of a class as you won't then use them for anything else. Putting them in a class helps make it clear what they are useful for.

To finish up this introduction to Python classes, let's take a look at some other dunder methods that you might want to make use of to control how your classes behave.

## class dunder methods

We've already seen how to use the `__init__()` method to control how our class instances are made. Next, lets explore how the following dunder methods can be used: printing methods `__repr__()` and `__str__()`, and comparator methods including `__gt__()`, `__eq__()`, etc. We'll revert back to using a simple example class here, as it takes a lot of space to redefine our `BLASTResult` class just to tweak individual methods.

### Printing your classes

If we start with our Person class from the beginning of this document, let's see how it looks when we print an instance.

In [11]:
print(sally)

<__main__.Person object at 0x7f8b47303b90>


That printout is not useful for much. We can see that `sally` is an instance of the `Person` class that was defined within the namespace of our `__main__` script. We can also see the memory address at which `sally` is stored. However, it might be the case that you would prefer some of Sally's attributes to be printed instead. We can control that by taking advantage of where `print()` get's the information that it prints from: the `__str__()` method. The `__str__()` method simply defines what should be returned if someone tries to convert your class to a `str`.

In [12]:
class Person:
    def __init__(self, name, height, age):
        self.name = name
        self.height = height
        self.age = age
    
    def __str__(self):
        return f"My name is {self.name}, I am {self.height}cm tall, and I am {self.age} years old"

sally = Person("Sally", 160, 28)
paul = Person("Paul", 180, 83)

print(sally)
print(paul)

My name is Sally, I am 160cm tall, and I am 28 years old
My name is Paul, I am 180cm tall, and I am 83 years old


Another method we can use to control how a class prints is the `__repr__()` method. `__repr__()` and `__str__()` both control how to represent an object, but they are intended to be used in different ways. `__str__()` is intended to produce a human-readable representation of an object, while `__repr__()` is supposed to be precise, and ideally, produce something you could copy and paste into a Python script in order to recreate the object. We can write a `__repr__()` method that would allow us to recreate our class instances, so let's do that. We can specifically print the result of our `__repr__()` method using the built-in function `repr()`.

In [13]:
class Person:
    def __init__(self, name, height, age):
        self.name = name
        self.height = height
        self.age = age
    
    def __str__(self):
        return f"My name is {self.name}, I am {self.height}cm tall, and I am {self.age} years old"
    
    def __repr__(self):
        return f"Person('{self.name}', {self.height}, {self.age})"

    
sally = Person("Sally", 160, 28)
paul = Person("Paul", 180, 83)

print(repr(sally))
print(repr(paul))

Person('Sally', 160, 28)
Person('Paul', 180, 83)


If you compare the output of our `__repr__()` to the code we ran to create `sally` and `paul`, you will see that they are the same. That's a good ideal to aim for with a `__repr__()` method. That way you can use it if you ever want to know everything there is to know about an object, i.e., how could you recreate that object in its entirety to then test or analyze it.

Something to note here is that `print()` actually checks both `__str__()` and `__repr__()`. However, it checks `__str__()` first and uses that if it finds it. It is therefore optional to define `__str__()` methods for your classes. You can just do that if you want to have a pretty output. If not you can stick with `__repr__()`, which is the more useful way to represent objects.

In [14]:
class Person:
    def __init__(self, name, height, age):
        self.name = name
        self.height = height
        self.age = age
    
    def __repr__(self):
        return f"Person('{self.name}', {self.height}, {self.age})"

    
sally = Person("Sally", 160, 28)
paul = Person("Paul", 180, 83)

print(sally)
print(paul)

Person('Sally', 160, 28)
Person('Paul', 180, 83)


### Comparing your classes

When you define a class, unless you explicitly define how comparisons between instances should work, Python will not allow you to use comparators like `>` and `<`. Instead you will get an error like the following

In [15]:
print(sally > paul)

TypeError: '>' not supported between instances of 'Person' and 'Person'

In addition, unless you define how it should behave, `==` probably won't do what you expect. It might look fine if you check `sally` and `paul`

In [16]:
print(sally == paul)
print(sally != paul)

False
True


However, what about if we create a clone of sally? Surely Sally's clone should resolve as equal to Sally?

In [17]:
sally_clone = Person("Sally", 160, 28)

# to show they are, in fact, the same
print(sally)
print(sally_clone)

Person('Sally', 160, 28)
Person('Sally', 160, 28)


In [18]:
# But Python doesn't think they are the same
print(sally == sally_clone)

False


The reason for these behaviours is that comparisons like `>` are not implemented by default, while the default behavior of `==` is to check if both objects are actually the same object. That's actually the same as the `is` comparator, which is not what we want much of the time. If we want these comparators to behave according to our preference, then we need to define exactly what that preference is. The reason for the defaults is that it is not trivial to predict what we would actually want those behaviours to be. What actually makes one person greater than another? And for that matter, if we really cloned Sally, would her clone be equal to her or would they immediately become different people as soon as they began having independent experiences? Let's set aside those philosophical quandries for now and instead just make our classes do *something*.

Let's say that people who are more capable of reaching objects on top shelves are the best kinds of people. We can use that rule to assess whether `Person`s are greater, less than, or equal.

In [19]:
class Person:
    def __init__(self, name, height, age):
        self.name = name
        self.height = height
        self.age = age
    
    def __repr__(self):
        return f"Person('{self.name}', {self.height}, {self.age})"
    
    def __eq__(self, other): # called by ==
        if self.height == other.height:
            return True
        else:
            return False
    
    def __gt__(self, other): # called by >
        if self.height > other.height:
            return True
        else:
            return False
    
    def __ge__(self, other): # called by >=
        if self.height > other.height or self.height == other.height:
            return True
        else:
            return False

sally = Person("Sally", 160, 28)
paul = Person("Paul", 180, 83)
sally_clone = Person("Sally", 160, 28)

In [20]:
print(sally == paul)
print(sally != paul)
print(sally == sally_clone)
print(sally != sally_clone)

False
True
True
False


In [21]:
print(sally > paul)
print(sally < paul)
print(sally > sally_clone)

False
True
False


In [22]:
print(sally >= paul)
print(sally <= paul)
print(sally >= sally_clone)

False
True
True


Note that we can do comparisons in either direction (i.e. `>` or `<`) and can check `!=` even though we did not define `__ne__()` (`!=`) did not define `__lt__()` (`<`) and `__le__()` (`<=`). That's because Python checks both the objects being compared to see if they have the appropriate method and simply inverts the result of `__eq__()` for `!=`. However, it's good practice to define them all if you need them. 