# Code Style and Documentation

In the previous lecture we talked about how to write functioning and reliable software, and how we test code to verify that it is working as intended. Today, we turn to how to make code structured and readable. This means we will cover code style and documentation.

### What is code style and why is it important?

> Write programs for people, not computers

Just like writing prose, there are many choices one can make when writing code. The choices about how the code is written and structured is called the code style. It encompases everything from use of whitespace, indentation, naming of variables and so on. Following a good coding style gives code that is more readable and easier to understand. As a result, code with good style is easier to use, troubleshoot, maintain and extend. Code with bad style on the other hand, can be frustrating to work with lead to more errors over time.

Like with any writing, there are no hard rules about what constitutes good style, and there are different choices one can make—the important thing is to be consistent in your choices. A piece of code should have the same code style and formatting throughout.

### Style Guides

As most software in the real world is developed collaboratively, this means it is important for different programmers to agree on what code style to follow, otherwise they will end up with code that is messy. Because of this, most software groups agree on a *style guide*, which guides all the programmers in the project to follow the same style. Because everyone is following the same guidelines, it's easier to collaborate.

Put simply, a style guide is a list of *dos* and *don'ts* that you will be expected to follow. Because style guidelines is about *writing* code, they are often specific to a given programming language. Python is a popular programming language, and so many different style guides exist, for example [Google's Python style guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md). 

### The PEP8 Style Guide

In IN1910, we want you to use a good and consistent code style in the code you write. We therefore want you to attempt to follow a specific style guide. We opt to go with the PEP8 style guide.

PEP stands for *Python Enhancement Proposal*, and these are documents that are proposals for new features or design changes to Python. The PEPs and discussed and reviewd by the community and over time the Python developers incorporate them into the language. One of these, PEP8, is a code style guide for the Python code in Python's standard library should be written. Put simply: PEP8 is how Python's developers are writing python code.

* [Link to the PEP8 style guide](https://www.python.org/dev/peps/pep-0008/)

The PEP8 style guide is not that long and consists mostly of simple examples and explanations for why the rules are chosen as they are. If you want to become a more proficient Python programmer, reading through it is a adviced.

**In IN1910, the most important thing we expect from you when it comes to style is that you are consistent and use a style that is easily readable and understandable. We recommend that you follow the PEP8 style guide.**

We will now go through and cover *some* of the rules in PEP8, but we don't cover everything, so reading through the actualy style guide is still good. Most of the examples we show are taken directly from the PEP8 document.

### Some points from PEP8

#### General formatting

* Ident by 4 spaces, do not use tabs
* Avoid lines longer than 79 characters (although this limit is moving up ...)
* When wrapping long lines, use parentheses to keep things together
* Separate top-level functions and classes by two blank lines, 
* class methods should be separated by one blank line
* Use blank lines in functions, sparingly, to indicate logical sections.
* Use UTF-8 encoding
* Put imports on separate lines at the top of the file

#### Naming conventions

* Module (file) names should be short and written in all lowercase. Underscores are allowed and often used to separate different words.

* Function names should be all lowercase with parts separated by underscore (`count_words()`)

* Variable names: same as function names

* Class names: should normally use the CapWords convention. (`FunctionIntegrator`).

* Constants: all uppercase, parts separated by underscore (`MAX_WEIGHT`)

* Never use `l`, `O`, or `I` (lowercase l, uppercase o, uppercase i) for single-letter names. These can be easy to confuse for others characters, or downright indistinguishable in some fonts.

Note that the built-in dataypes break the class name convention (`tuple`, `int`, `list`, etc), but those are special cases.

#### White Space

Be careful and consistent with whitespace:
* Yes: `spam(1)`
* No:  `spam (1)`


* Yes: `spam(ham[1], {eggs: 2})`
* No:  `spam( ham[ 1 ], { eggs: 2 } )`


* Yes: `if x == 4: print x, y; x, y = y, x`
* No:  `if x == 4 : print x , y ; x , y = y , x`


* Yes: `dct['key'] = lst[index]`
* No:  `dct ['key'] = lst [index]`

**More bad examples**
```Python
i=i+1
submitted +=1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)

def complex(real, imag = 0.0):
    return magic(r = real, i = imag)
```
**More good examples**

```Python
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)

def complex(real, imag=0.0):
    return magic(r=real, i=imag)
```

#### Line Length and breaking lines

PEP8 states that no lines should go beyond 79 characters. This 80-width character limit in code is very common rule in code style guides, across many languages. Keeping lines from becoming too long helps readability, as long lines can be tricky to read. Another important reason is that we have no control over what tools others use to read the code. They might be reading the code in an editor with a 80-character width, or 130. If the line overflows, it might break at a weird place, and lead to messy code.

If we get lines that become longer than 79 characters, we need to break the line over several lines, the best way to do this is parenthenses, as expressions inside parenthenses can be split over multiple lines in Python without any other formatting needed. As it's mostly long mathematical expressions or function definitions/calls that get very long, parantheses are often already present.

When splitting a mathematical expression over multiple lines, the natural thing to do is split the lines at the operators, so that different lines contain the different terms, in this case, but the operators on the same line as the operands for readability:

**No**—operators sit far away from their operands
```Python
income = (gross_wages +
          taxable_interest +
          (dividends - qualified_dividends) -
          ira_deduction -
          student_loan_interest)
```

**Yes**—easy to match operators with operands
```Python
income = (gross_wages
          + taxable_interest
          + (dividends - qualified_dividends)
          - ira_deduction
          - student_loan_interest)
```
Note that we also align each line under each other, which really helps with keeping things readable.

**Yes**
```Python
foo = long_function_name(var_one, var_two,
                         var_three, var_four)
```

**No**
```Python
foo = long_function_name(var_one, var_two,
           var_three, var_four)
```




We can alternatively move all the arguments to the next line and align them wherever we find appropriate, just make sure they are indenteded further than the following code line so they are distinguishable

**Yes**
```Python
def long_function_name(
        var_one, var_two,
        var_three, var_four):
    print(var_one)
```
**No**
```Python
def long_function_name(
    var_one, var_two,
    var_three, var_four):
    print(var_one)
```

#### Comments

* Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!
* Comments should be complete sentences.
* Use inline comments sparingly.
* Inline comments are unnecessary and in fact distracting if they state the obvious. 
    
Don't do this:      
```
x = x + 1  # Increment x
```
But sometimes, this is useful:
```
x = x + 1  # Compensate for border
```

#### Some if-test conventions

When checking if a variable is or is not `None`, do so explicitly with `is`
```
Yes: if foo is None:
     if foo is not None:

No:  if foo == None:
     if foo != None:
     if not foo is None:
```

Note also that checking if something is `None` is more strict than checking true/false. The following prints only if `foo` is exactly `None`:
```
if foo is not None:
    print('***')
``` 
The following test however, prints if `foo` is `None`, `False`, 0, `[]`, `''`, etc.:
```
if foo:
    print('***')
```

Also, use `if a is not b` rather than `if not a is b`, the two are equivalent, but the first variants is much more readable.

If you are checking a boolean variable, don't compare it to `True` or `False`
* Yes: `if greeting:`
* No: `if greeting == True:`
* Worse: `if greeting is True:`


### Avoid Foolish Consistency

One of the first points of PEP8 states:
> A Foolish Consistency is the Hobgoblin of Little Minds

The goal of a style guide is to improve readability and consistency, but there will always be times were one should break with the rules. This is especially the case when strict adherence to the rule would *limit* readability.

In the end, good code is much more than good style. There is a great talk by Raymond Hettinger from PyCon2015 that goes into this: 
* *[Beyond PEP 8 -- Best practices for beautiful intelligible code](https://youtu.be/wf-BqAjZb8M)*
This talk is well worth a talk to understand how to think about writing good and elegant Python code.

There are also many useful linters that you can either integrate in your editor or run directly from the commandline that will analyze you code and report lines that violates the styleguide:

* [black](https://github.com/python/black)
* [flake8](http://flake8.pycqa.org/en/latest/)
* [autopep8](https://github.com/hhatto/autopep8)
* [pylama](https://github.com/klen/pylama)

## Documentation

Software documentation is a guide that describe how a piece of software is structured, how it works, and how to use and maintain it. Good documentation helps people understand a piece of software, and well documented code can therefore easily be picked up by others for use or extending it. Poorly documented code however, is often usuable only by its author, and it is hard for others to help develop, maintain or use it.

Poorly documented code is sadly quite common in the scientific field, and its a problem when many people who write and develop software are only in temporary positions. Code developed by master students, PhD's or post-docs will often be used or further developed long after that scientists leaves the group, and so should be well documented. Imagine starting in a lab and being given software consisting of many thousands of lines of code. The person who wrote it is gone, there is no documentation, and you have to figure out how to implement some new feature your advisor works. This will lead to frustration and inefficient software development.

### Document Design and Purpose, not mechanics

In a perfect world, we would have time to develop extensive and perfect documentation of all software projects. However, in the real world, there is often not much time to devote to documentation. It is therefore important to spend your documentation time effectively, focus on the parts somebody *needs* to understand, namely the design and purpose of the code, as well as its inputs, outputs and adjustable parameters. 

Also remember that people can always read the source code itself. So the documentation doesn't need to cover what is actually happening in the code down to the smallest detail. Whay they need is the big-picture use and helpful pointers on where to look.

### Embed documentation within the software itself

When we think of documentation, it might be common to think of a separate user manual, text document or a website explaining the code. However, developing the documentation seperate from the code itself is not recommended. For one thing, when someone updates the code, it is important the documentation is updated aswell. It is more likely that this happens if the two are contained in the same piece of software. This way, we are also guaranteed somebody that gets the code gets the documentation. Because of these points, we will always aim to embedd the documentation of the code directly into that code.

## Documentation Strings

In Python, the most important thing to have good documentation is to write good *docstrings*. Docstrings are string literals you put as the first thing in modules or function definitions. For example:

In [1]:
def factor(n):
    """Return the prime factorization of an integer."""
    if n == 1:
        return [1]
    
    factors = []
    while n > 1:
        for i in range(2, n+1):
            if n % i == 0:
                factors.append(i)
                n = int(n/i)
    return factors

We place docstrings so that they are always right were they are relevant. Thus someone reading our code will get a good idea of what the different parts of our code is doing and why they are doing it.

Another important part of docstrings is that they will get stored automatically as a `__doc__` attribute

In [2]:
print(factor.__doc__)

Return the prime factorization of an integer.


Similarily, we can place a docstring at the top of a module (file) and when we import that module as a package in Python, the string will be stored as a `__doc__` attribute:

In [3]:
import vector
print(vector.__doc__)


This module contains a class for representing 3-dimensional vectors,
it is covered by L4, but used for examples of nose testing in L5.
See the test_vector.py module for example of unit tests.



The fact that docstrings can be accessed directly as attributes means it is very easy for editors and other tools to use *code introspection* to give us more information. For example in an iPython shell, we can use `help()` or write `??` immediately behind a function or class method to read that methods docstring. Similarily, in Jupyter you can press `Shift+Tab` to read the docstring of the variable your cursor is located. 

In [4]:
u = vector.Vector3D(0, 4, 4)
u.unit

<bound method Vector3D.unit of Vector3D(0, 4, 4)>

There are also tools for going through entire codes, and compiling automatic documentation through these techniques. We'll get back to this later.

### Writing good docstrings

There is a segment in [PEP8](https://www.python.org/dev/peps/pep-0008/) on docstrings, but there is an even more important PEP here:
* [PEP257—Docstring Conventions](https://www.python.org/dev/peps/pep-0257/)

This PEP is very short, and it will take you a few minutes to read. In your project work, we expect you to try to follow PEP8 and PEP257.

We now go through and show some of the important points of good docstring conventions.

#### Docstring Conventions

Always use triple double quotes (`"""This is an example."""`)to define your docstrings, even if they are single lines. This is both to give them a consistent style in your code, but also to make it easy to extend them to multiple lines later.

Docstrings should always be full sentences, starting with a capital letter and ending with a period. For simple functions or obvious cases, docstrings can be kept short and put on a single line.
```Python
def is_prime(n):
    """Check if an integer is prime."""
```
We put everything on single line, even the quotes. Don't leave blank lines above or below docstrings, they aren't needed. Also note that the docstring should prescribe the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...".

For functions or methods where we want to write a longer docstring with more information, we split it on multiple lines. When we do this, always have a short sentence summarizing at the top, then a blank line, then give more information. Here is an example taken (and simplifed) from `numpy.gradient`
```Python
def gradient(f):
    """Return the gradient of an N-dimensional array.

    The gradient is computed using second order accurate central differences
    in the interior points and either first or second order accurate one-sides
    (forward or backwards) differences at the boundaries.
    The returned gradient hence has the same shape as the input array.
    """
```
Note that if the docstring spans multiple lines, we put the ending quotes on a seperate line.

Useful information for a docstring to contain
* summarize behavior of a function or method
* what argument it takes
* what output(s) it returns
* side effects
* what exceptions might be raised
* any restrictions on when it can be called
* optional keyword arguments should be indicated and explained

There exists a variety of docstring [styles](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/).
Personally I like the [Numpy style](https://numpydoc.readthedocs.io/en/latest/format.html), but there are also many other good ones. 

## Building automatic stand-alone documentation through code introspection

The nice thing about defining good docstrings is that there are plenty of tools that will parse complete code-bases and compile the docstrings into nice-looking, indexable and searchable pdf's, websites and so on.

One of the most popular tools for doing this in Python is [Sphinx](http://www.sphinx-doc.org/en/master/), which was originally made to be used on Python's own documentation, and is thus used by all packages in the standard library and the scipy stack. In C++ there is an alternative tool that is possible to do the same as Sphinx, called Doxygen.

To get an understanding of how powerful such automatic documentation is, take a look at the numpy reference pages, for example the reference for [numpy.gradient](https://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html). This is an example of good reference documentation. It shows what the function does, lists what inputs and optional inputs it takes, lists references and shows example. All of this documentation is automatically generated by Sphinx based on the the function's docstring.

As with other tools, we can also integrate Sphinx with our Git repository. One example of this is the website [readthedocs.org](https://readthedocs.org/). You can create a user on this site, and set up so that it automatically pulls from your Git repostiory every time you make changes, and readthedocs then recompiles the autogenerated documentation through Sphinx and hosts it. This way anyone who wants to use the code you share through git can find up-to-date documentation in a searchable, indexable manner. But you don't have to do anything other than write good docstrings!

In IN1910 we expect you to write good docstrings for your project. But we don't expect you to spend time on setting up Sphinx or readthedocs for your projects—you are, of course, free to do so, if you want to learn these tools.
