# Week 9 - Quality Code

This week we are not really going to be looking at any new functionality of Python, but rather a number of ways to improve our code quality and make it more maintainable for the future. We have already talked about mental models for thinking about large systems (such as OOP and classes, organising files into libraries etc), but there are some good practices to pick up that make our code more sustainable.

## Style Guides, Formatters, Linting, Type Hints, and Documentation

Over the years there's been great conversation about how to write Python that is readable by others - a lot of this boils down to very vague advice like "your functions should only do one thing" and "functions should have as few arguments as possible". For cases where a definitive choice can be made as to what to do (for example, whether the code `2+2` or `2 + 2` is more readable), the [PEP 8 style guide](https://peps.python.org/pep-0008/). There's all sorts of guidance in here - from the maximum line length you should use to the ordering of your imports. **Your code will still work even if you don't stick to these guides**, and some companies/groups will slightly adjust these style guides based on their opinion.

Luckily, for `.py` files, there are plenty of tools to automatically format your code for you - the big players here are `autopep8`, `black`, `prettier`, and `yapf` - all use slightly different versions of PEP8 and allow you to tweak them for your needs. Tools like `black` work on `.ipynb` files as well - but with Colab we need to do a bit more work to make it happen, so I'll show you on my local machine!

Sticking to the topic of external Python tools that can help code quality - *linters* are tools that can spot errors before they come up when you run the code by continuously examining your syntax. Some can also tell you about general code quality advice such as unused imports, code that should be refactored etc. The main player here by a mile is `pylint`, although some use `flake8` or `pyflakes`. You have likely seen linting in action already when using Colab - if you have a red line under your code that's the linting in action. You can also use it as a command line tool as we did with our formatters to give a more detailed breakdown of your code.

Returning to Python's built in code quality efforts, there's been a big push recently in taking dynamically typed languages (like Python and Javascript) and adding the functionality to annotate types for varibles (as with Python), or statically enforce typing (with TypeScript for Javascript). While type hints in Python won't cause an error if the wrong type is given to a variable, other tools like linters will pick them up; this is *excellent* for seeing errors before they happen. If a function either returns a `str` or `None`, and the function that result is passed to can only handle `str`s, we have a problem and instantly know we need to handle 2 cases - otherwise we might have a difficult obtuse bug to fix!

Type hinting takes multiple forms, so let's look at a big block of code with type hints included:

In [3]:
# Typing library comes with additional good stuff
import typing as tp

# Annotate a variable with it's type (implying it shouldn't change)
my_num: int = 15

# Annotate a list with what variable types it can have in
my_list: tp.List[tp.Union[str, bool]] = ["hello", 1, 2, 3]

# Annotate a fuction with typed arguments, return
def my_function(number: int) -> bool:
    print("The number is:", number)
    return True

It's always a good idea to put these type hints in - at worst they are a bit annoying but at best they can spot errors before they happen. Static analysis tools like `mypy` can analyse your code and make sure all the typing makes sense - this can be particularly annoying but it keeps your code safer and more predictable!

Finally on code quality, perhaps the most contentious area of coding is documenting your code so it's more interpretable by others. Python notebooks lend themselves to this really easily - in notes like these I type paragraphs of text explaining what the code is doing at each step. However you might not want to format your code in this way, and in the case of normal `.py` files you might not be able to; so it's standard to document what your code does in the code itself

Documentation standards and expectations are extremely variable - some programmers will insist that good code should speak for itself and documentation is only needed in cases where bad code is present. Others will insist that there is no such thing as overdocumenting your code. I would push that every function, class and file should have a docstring (as below) after the definition explaining the purpose of the function/class/file - often this exercise alone can make you think about *why* you are making something a function and if it needs to be broken up. I'd also add spot comment (with `#`s) on particularly obtuse lines of code where a clearer way is not possible:

In [None]:
"""
    This is a multiline docstring - I'd put one of these at the top of each file if using a .py filetype.
"""

def my_cool_new_add_function(a: int, b: int) -> int:
    """Adds two integers together.

    Arguments:

        a (int): First integer to be added

        b (int): Second integer to be added

    Returns:

        int: The sum of a and b

    """
    return a + b

Obviously the above is a silly example, but in cases where you are working with complex function and data, it can be helpful to give more complex to what you are working with beyond a function name!

There are tools for enforcing documentation in your code - to keep you or your collegues honest. `pydocstyle` is the top one for this, allowing you to use some popular documentation styles and removing some rules if you feel they are not neccesary.

## Errors and Warnings

Still on the subject of helping yourself out - but more to do with actual code we can right; the idea of writing your own errors.

Sometimes it's painfully obvious that something shouldn't or can't be done - for example, let's go back to our favorite example of a fruit basket program:

In [None]:
from collections import Counter

class FruitBasket:

    def __init__(self, contents: Counter = Counter([])) -> None:
        if contents and contents.most_common()[-1][1] < 0:
            print("Cannot have negative fruits in basket")

        self.contents = contents
    
    def add_to_basket(self, fruit: str, quantity: int) -> bool:
        
        if quantity < 0:
            print("Can't add a negative number of fruit to a basket!")
        else:
            self.contents[fruit] += quantity
        return True
    
    def remove_from_basket(self, fruit: str, quantity: int) -> bool:

        if self.contents[fruit] < quantity:
            print("You do not have enough fruit to remove that amount")
        else:
            self.contents[fruit] -= quantity
        return True


There is one major problem with this code that we see 3 times. Whenever we try and do something we shouldn't be able to do, although we get a warning - we are still allowed to do it! Ideally if someone uses this code and tries to perform an illegal operation, the program should stop and an error should be raised.

We can do this in the following way, by changing the `print` to `raise` with an appropriate error:

In [None]:
from collections import Counter

class FruitBasket:

    def __init__(self, contents: Counter = Counter([])) -> None:
        if contents and contents.most_common()[-1][1] < 0:
            raise ValueError("Cannot have negative fruits in basket")

        self.contents = contents
    
    def add_to_basket(self, fruit: str, quantity: int) -> bool:
        
        if quantity < 0:
            raise ValueError("Can't add a negative number of fruit to a basket!")
        else:
            self.contents[fruit] += quantity
        return True
    
    def remove_from_basket(self, fruit: str, quantity: int) -> bool:

        if self.contents[fruit] < quantity:
            raise ValueError("You do not have enough fruit to remove that amount")
        else:
            self.contents[fruit] -= quantity
        return True

This code is much better, but we can make it awesome by coding a custom error class to let people know to better categorise the problem they have:

In [6]:
from collections import Counter

class NegativeFruitError(BaseException):
    pass

class FruitBasket:

    def __init__(self, contents: Counter = Counter([])) -> None:
        if contents and contents.most_common()[-1][1] < 0:
            raise NegativeFruitError("Cannot have negative fruits in basket")

        self.contents = contents
    
    def add_to_basket(self, fruit: str, quantity: int) -> bool:
        
        if quantity < 0:
            raise NegativeFruitError("Can't add a negative number of fruit to a basket!")
        else:
            self.contents[fruit] += quantity
        return True
    
    def remove_from_basket(self, fruit: str, quantity: int) -> bool:

        if self.contents[fruit] < quantity:
            raise NegativeFruitError("You do not have enough fruit to remove that amount")
        else:
            self.contents[fruit] -= quantity
        return True

Fantastic!

We now have ways to stop the program running if something goes wrong - but what happens if we want to manage the situation where there is an error and do something appropriate if an error is raised? For this, we use the pattern `try, except, finally`.

This pattern has 3 parts (`finally` is optional); first, we run some code we want to run under the `try` block - if this code runs sucessfully, we move straight onto `finally` or continue with the rest of the code if it's not there. If an error is bought up, we instantly switch to the `except` clause with the relevant `Error` object. After this, we go to `finally`.

In this example, we are using the `FruitBasket` to manage inventory in a shop. If a customer puts an order in for a number of fruit that we can't fufill, we order some more, print the customer a message and move on. Finally, whatever happens, we give a nice goodbye message to the customer:

In [None]:
from collections import Counter

class NegativeFruitError(BaseException):
    pass

class FruitBasket:

    def __init__(self, contents: Counter = Counter([])) -> None:
        if contents and contents.most_common()[-1][1] < 0:
            raise NegativeFruitError("Cannot have negative fruits in basket")

        self.contents = contents
    
    def add_to_basket(self, fruit: str, quantity: int) -> bool:
        
        if quantity < 0:
            raise NegativeFruitError("Can't add a negative number of fruit to a basket!")
        else:
            self.contents[fruit] += quantity
        return True
    
    def remove_from_basket(self, fruit: str, quantity: int) -> bool:

        if self.contents[fruit] < quantity:
            raise NegativeFruitError("You do not have enough fruit to remove that amount")
        else:
            self.contents[fruit] -= quantity
        return True

    def take_order(self, fruit: str, quantity: int) -> bool:

        try:
            self.remove_from_basket(fruit, quantity)
        except NegativeFruitError:
            print("Sorry, but we are out of stock for that fruit - ordering more now")
            # self.order_more_fruit(fruit)
        finally:
            print("Thank you for shopping with us")
        
        return True

Note that the order code is seperate from the stock management - if we wanted to remove something from our inventory we might not want to put it through the ordering process to avoid unneccessary prints.

Finally, `printing` still isn't a fantastic way of letting a user know something has gone wrong, but not urgently. For this, the `warnings` library can give users and us better control of what warnings we see, and better management of it. From a more technical perspective, these messages come out through a different output stream (stderr) meaning that they are not saved to a log file if we run `python3 myfile.py > logs.txt`, or pipe them using the UNIX `|` operator.

Using warnings are simple:

In [16]:
from collections import Counter
import warnings

class NegativeFruitError(BaseException):
    pass

class FruitBasket:

    def __init__(self, contents: Counter = Counter([])) -> None:
        if contents and contents.most_common()[-1][1] < 0:
            raise NegativeFruitError("Cannot have negative fruits in basket")

        self.contents = contents
    
    def add_to_basket(self, fruit: str, quantity: int) -> bool:
        
        if quantity < 0:
            raise NegativeFruitError("Can't add a negative number of fruit to a basket!")
        else:
            self.contents[fruit] += quantity
        return True
    
    def remove_from_basket(self, fruit: str, quantity: int) -> bool:

        if self.contents[fruit] < quantity:
            raise NegativeFruitError("You do not have enough fruit to remove that amount")
        else:
            self.contents[fruit] -= quantity
        return True

    def take_order(self, fruit: str, quantity: int) -> bool:

        try:
            self.remove_from_basket(fruit, quantity)
        except NegativeFruitError:
            warnings.warn("Sorry, but we are out of stock for that fruit - ordering more now")
            # self.order_more_fruit(fruit)
        finally:
            print("Thank you for shopping with us")
        
        return True

In summary - this is a much better way of processing edge cases with Python - we pass errors around and can process them using `try except`.

## Testing

When writing our code, sometimes it's difficult to know where to start. Part of this is the design process, but very often we make a mental list of features a program needs to have (for example, a fruit basket having a way of adding to the basket and removing from the basket) that we then implement one by one.

At the same time, as we write our code, very often we write little snippets to check that the output of what we've just written is what we expect - we might check difficult known edge cases and invalid entries to check that the errors we get out are what we think they are.

The problem with the above method is that there's a good chance that the code we write in one area of our program impacts the functionality of another. We want to be able to check for knock-on effects of our code on other areas of the program. We *could* have a seperate file that runs through a "typical usage" of our program, but we really want to test every facet of our codebase. This is not easy, and therefore it's good to formally think about tests and maybe using a testing framework to give some structure to our code. 

We can use these tests to drive the development of our code by first writing the tests for a feature, and then implementing it. This is called *test driven development* or TDD. TDD is a great way to approach coding as you write code to satisfy specific conditions rather than an arbitary "idea".

I suggest the use of `pytest` alongside `ipytest` for running these tests within a notebook environment, with all the tests in a seperate cell at the bottom of your notebook. In a developement environment we would have the tests in a seperate folder and would run them through the command line.

Testing code could really be it's own topic - there are many different types of tests you might use to test a product, but I would say the three most common are:

* *Unit testing*: Unit testing is the lowest level form of testing and directly tests the induvidual components of your program. If we've been good and split our program up into multiple functions, we should be able to easily test most (if not all) of the functionality of our code base by simply comparing expected outputs to what comes out. These are particularly helpful for TDD as we can write loads of unit tests and run them quickly without impacting any other services.

* *Integration testing* tests how your program interacts with other programs - say making an API call to a server or requesting data from a database. These are usually a bit more time consuming to run as you need to make sure both your program and the service you're testing the integration for are running at all times.

* *End to End testing* tests the whole system as a client would interact with it. For example, on a website, different brower emulators might be used to test if all the elements of a website load and behave correctly. This is by far the hardest testing to perform, and is often the most expensive as it requires the entire application to be up and running on a full user's system.

We'll only be looking at *unit testing* here, as we're not really building any additional services that interact with one another.

With that said, let's write some tests:


In [1]:
!pip install ipytest

Collecting ipytest
  Downloading ipytest-0.12.0-py3-none-any.whl (15 kB)
Installing collected packages: ipytest
Successfully installed ipytest-0.12.0


In [13]:
import pytest
import ipytest

ipytest.autoconfig()
ipytest.clean_tests()


def test_example():
    assert [1, 2, 3] == [1, 2, 3]

ipytest.run('-qq')

[32m.[0m[32m                                                                                            [100%][0m


<ExitCode.OK: 0>

Here this basic test runs fine - but what if it fails? Instead of just giving us an error, `pytest` gives us a detailed breakdown of what went wrong:

In [14]:
import pytest
import ipytest

ipytest.autoconfig()
ipytest.clean_tests()


def test_example():
    assert [1, 2, 3] == "hello"

ipytest.run('-qq')

[31mF[0m[31m                                                                                            [100%][0m
[31m[1m___________________________________________ test_example ___________________________________________[0m

    [94mdef[39;49;00m [92mtest_example[39;49;00m():
>       [94massert[39;49;00m [[94m1[39;49;00m, [94m2[39;49;00m, [94m3[39;49;00m] == [33m"[39;49;00m[33mhello[39;49;00m[33m"[39;49;00m
[1m[31mE       AssertionError: assert [1, 2, 3] == 'hello'[0m

[1m[31m/var/folders/kp/9glw0dvs0rl2j7dfkdmhd2fr0000gn/T/ipykernel_1555/386584637.py[0m:9: AssertionError
FAILED tmpb8k5q163.py::test_example - AssertionError: assert [1, 2, 3] == 'hello'


<ExitCode.TESTS_FAILED: 1>

Let's write some tests for our `FruitBasket` code:

In [22]:
import pytest
import ipytest

ipytest.autoconfig()
ipytest.clean_tests()


def test_fruit_basket_inits():
    fb = FruitBasket()
    assert fb.contents == Counter([])

def test_fruit_basket_default_override():
    fb = FruitBasket(Counter(["apple", "apple", "orange"]))
    assert fb.contents == Counter({"apple": 2, "orange": 1})

def test_negative_fruit_init_error():
    with pytest.raises(NegativeFruitError) as exc:
        fb = FruitBasket(Counter({"apple": -1}))
    assert str(exc.value) == "Cannot have negative fruits in basket"

# More tests for add, remove, order go here!

ipytest.run('-qq')

[32m.[0m[32m.[0m[32m.[0m[32m                                                                                          [100%][0m


<ExitCode.OK: 0>

Sometimes when testing, we want to save some safe, dummy data available we know is sound for testing other parts of the functionality. For example, in the above code, we might end up writing out the same valid initial fruit basket counter over and over again. For this code, it's not so bad, but you may have an large json file you want to put into the tests. You could define a variable at the top of the tests, but since the funcionality of the code you havem might mutate the values of that variable throughout testing, this might not be the best idea. We want a way to reset the value of the variable before each test - luckily `pytest` lets us do this using *fixtures*:

In [24]:
import pytest
import ipytest

ipytest.autoconfig()
ipytest.clean_tests()

@pytest.fixture
def normal_fruit_basket():
    return Counter(["apple", "apple", "orange"])

def test_fruit_basket_inits():
    fb = FruitBasket()
    assert fb.contents == Counter([])

def test_fruit_basket_default_override(normal_fruit_basket):
    fb = FruitBasket(Counter(normal_fruit_basket))
    assert fb.contents == Counter({"apple": 2, "orange": 1})

def test_negative_fruit_init_error():
    with pytest.raises(NegativeFruitError) as exc:
        fb = FruitBasket(Counter({"apple": -1}))
    assert str(exc.value) == "Cannot have negative fruits in basket"

# More tests for add, remove, order go here!

ipytest.run('-qq')

[32m.[0m[32m.[0m[32m.[0m[32m                                                                                          [100%][0m


<ExitCode.OK: 0>

## `git` and `pre-commit` 

When managing a large project, you want a good version managment system that allows you to revert changes you make to previous versions. A classic way of doing this would be to save multiple versions of a file (`v1`, `v2` etc) - but this can obviously get a bit unwieldy.

Another thing we would like is for multiple people to work on a project at the same time, allowing for all new features to the project to be added asynchronously without disrupting the rest of the code. This is simply not possible with the `v1` `v2` system before.

`git` is a very old system that tracks the changes we make to files, and stores them in a system that allows us to apply or revert these changes as needed. We can apply these changes in multiple steps, and "branch off" a copy of the codebase to work on ourselves, make changes and test that our code works, then reintergrate it with the new functionality into the main codebase *without disrupting any changes made in the meantime*.

This collaborative nature of `git` makes it invaluable for working in teams, so it's good for you to start using it to manage your projects as soon as possible. You don't have to know all the facets of how `git` works (it can get quite complicated!), but the general feel of what it is doing will work just fine. In general, if we are working on a new feature, we should take the following steps:

* Create a new "branch" for the feature with an appropriate name (`git branch my_branch`) and switch to it (`git checkout my_branch`).
* Make your changes to the code files
* Add your changes ready to be saved to the `git` tree using `git add filename`
* Once all your files are added and you're happy to go, you can use `git commmit` to commit the changes to the `git` tree and supply some commentary on what you've done.

That's it!

`GitHub` is a service that stores `git` repoistories online - `git` has some tools for working with remote repositories (`git clone` downloads a repository, `git pull` downloads any changes, `git push` uploads your changes) - however this is a bit beyond what we're going to look at.

Finally, everything we've done today links with an amazing tool called `pre-commit`. This tool will run tools before it allows you to make a `git commit` - so it will check your tests all pass, auto-format your code, check for typing and documentation etc and only if the code is of good quality are you allowed to save the changes. We'll see this in the final week when we work on our project!

# Exercises

This week I've given you 3 projects to do. They are all fairly complex and involved, so take your time - I'd like you to create the program to the spec using errors and warnings where relevant, and practice writing some tests to make sure everything works: 


## School Management System

You should build a management system for a school - this system must be able to:

* Hold a number of students from the ages 11-18
* Each student should have an assigned class group (you can make these up!)
* You should have a method for adding a student to the school - if the age is outside of the range you should raise an error.
* You should have a method for removing students - if an invalid student is given, you should raise an error.
* You should have a way to print out all the students of one class
* You should have a way to save all the students along with their information in `json` form to a file.

## HTML Link Parser

Using last week's work, write a program that gets the HTML code from a URL and extracts all of the hyperlinks from it (hyperlinks are in the form `<a href="<link-to-new-site"/>`). If you can't access the website, return an error. Then parse all the hyperlinks to get the root domain - count how many links on the website link internally (to the same root domain) vs externally (to a different root domain)

## Maze Solving Algorithm

There are a number of methods for solving a maze using programming - if I give you a maze as a nested list representing the walls with `W`, open spaces with `O`, start with `S` and end with `E` - for example:
```
[
    [O, O, O, W, E],
    [O, W, O, W, O],
    [S, W, O, O, O]
]
```
Write a program to navigate through the maze and give a list of directions to follow. To do this:

* For now, assume there's just one possible route with no forks in the maze. i.e. you only need to keep trying different directions other than the one you came in to get to the end.
* Build a program to parse a maze in the above format to a format that makes sense to you
* Build a way to navigate around the maze - if you try to hit a wall or outside of a maze, return an error.
* Keep trying different ways, only proceeding if an error isn't raised - save this history of moves to a list and return the list.

