# Documenting your code and hosting the documentation online

Writing good documentation for your code is essential to allowing others to use it and it is crucial for lowering your own burden of supporting users of your code. Having excellent documentation that is easily accessible and that is up-to-date for the latest version of your code at all times allows people to use your code without having to constantly contact you with questions about the code (which many people will anyway not do, but they will rather simply not use your code if they cannot easily figure it out). Documentation is also important for your own use of the code: a few months after you've written a piece of code, it will not be immediately clear any longer how to use it (for me this can be as quick as a few days!), so by writing good documentation, you will also help *yourself* save time in the future from having to reconstruct how your own code works.

In this chapter, I first discuss the basics of how to write good documentation and then I discuss various software tools that make writing good, up-to-date documentation easy and that allow you to share the documentation online. 

## Basics of good documentation

Before starting a discussion of what makes for good code documentation, it is worth re-stressing the importance of *making your code easy and intuitive to use*, with many of the basic features taking at most a few lines to run. When that is the case, users will have to consult the documentation much less often than when your code is difficult to use or when even using a basic feature of your code requires them to write dozens of lines of code (e.g., setting up many related objects or many configuration options in a complicated way). It will also make your documentation much easier to write, because you will be able to illustrate your code's use with short, copyable code snippets, which makes the documentation much more pleasurable to read.

What's most important about documentation is that it is ***as complete as possible* and *as up-to-date as possible***. Both of these are difficult to achieve, which is why using automated tools such as those discussed below is useful, because they can help significantly with achieving this goal. It is important that your documentation is as complete as possible, because otherwise users will run into undocumented features and need to contact you or give up. The only reasonable features to exclude from the strict complete-documentation requirement are internal features that users shouldn't use; even then it is good practice to document them (albeit perhaps at a lower level of formatting clarity) for your own and other code developers' use. Documentation should be up-to-date to avoid mis-use of your code after major changes and to again avoid user frustration when they find that the documentation is out-of-date with the code and they cannot figure out how to use it. In addition to using automation tools to help you out, the best way to achieve complete and up-to-date documentation is to start writing documentation as soon as you implement new features and even *before* you implement them. That is, ideally you would write a first draft of the documentation of a function or class before implementing a first version of it, which has the added benefit of requiring you to think through carefully what you want the function or class to do, what inputs to take, and what outputs to return (similarly for tests later, ideally one would write them before writing the code). This is a hard ideal to achieve in practice, but it is good to write at least some documentation in parallel with the first implementation of the code. That way, your documentation will be complete. Keeping it up-to-date requires you to make sure to immediately update the documentation when you change the function.

Good documentation should cover at least the following sub-components:

* *A guide to the installation of your code*, discussing any pre-requisites. Your code should be able to be installed with [standard installation commands](01-Introduction.ipynb#The-dos-and-don’ts-of-software-package-development), but even so it is good to list the commands (especially if you have both ``pip`` and ``conda`` installation options available, it is necessary to alert users). 

* *A quick-start guide and a set of brief tutorials*: This helps users to get started using your code quickly by copying and pasting example code and it's a good way to show off what your code can do without requiring people to run it.

* *A full API* (Application programming interface): a complete listing of all of your code's functions, classes, and their methods. This is a reference guide that users can consult to learn about exactly how each feature works and what its options are.

Your code's **installation guide** should cover the typical way in which your package is installed. This can be as easy as expliclty stating that your code should be installed with ``pip`` as
```
pip install exampy
```
(for our example package from [Chapter 2](02-Package-Structure.ipynb), note that this doesn't exist on PyPI and is just used as an example here). This may seem obvious to you, but it is useful to explicitly give the command, people love to simply copy-and-paste code (and we will show below how to add automatic copy-to-clipboard buttons like the one above). If your code has dependencies that wouldn't be easily and unobtrusively installed by ``pip`` (which will attempt to install all requirements listed in the ``install_requires`` part of your ``setup.py`` file, as we discussed in Chapter 2](02-Package-Structure.ipynb#The-setup.py-file)), then it is useful to list how to install these as well, again giving explicit commands as much as possible, e.g.,
```
conda install numpy scipy
```
or
```
pip install numpy scipy
```
if your dependencies are ``numpy`` and ``scipy``. Especially if your code requires harder-to-install dependencies or non-Python libraries (like the [GSL](https://www.gnu.org/software/gsl/), which provides many scientific functions in C and is often used in C backends of Python codes), it is helpful to give commands for how to install these on different operating systems (the GSL is now luckily available on ``conda-forge``, so the easiest way to install it is ``conda install -c conda-forge gsl``). The installation guide is also a good place for a 'frequently-asked-questions (FAQ) section with common installation problems. Again, if your code is pure Python with few dependencies, just stating that your code can be installed with ``pip install`` is likely all you need to say here.

The **quick-start guide** is a way to show off what your code can do and a place to give your users some code snippets that they can start adapting for their own use. When potential users of your code first look at your code they will be deciding whether or not using your code and going to the trouble of installing it and learning to use it is worth it for them. Therefore, a page in your documentation that demonstrates features of your code while also serving as a way to get started is a good way to get people to start using your code. The key to a good quick-start guide is to keep it brief and simple, but also get to interesting use of your code to show off what it can do; achieving these two somewhat competing goals is again easier if your code is *easy and intuitive* to use, because you can do impressive things with your code with very few keystrokes and, tbus, you can write a good quick-start tutorial that also shows of what amazing things your code can do). It is difficult to keep these quick-start guides updated, so it is worth spending a bit of time carefully thinking what you want it to cover and to only cover very stable features of your code.

You can complement the quick-start guide with **a more extensive set of tutorials** that go into more detail. In practice, most outside users of your code (i.e., not yourself or your collaborators) will likely only use features that are clearly documented and for which a usage example exists, because most users will not attain a full understanding of all that your code can do (e.g., when combining different aspects of it that aren't obvious) to allow them to go far beyond the tutorials that you provide. So a set of tutorials is where you can go over all of the most common use cases of your code and all the things that you think people can use your code for. It is important to keep them clear and succinct (with pointers for more advanced use), but it is difficult to write too many tutorials (just like it is difficult to write *too much documentation*), so don't hold back (keeping your own time in mind of course).

Finally, **a complete API** should contain documentation for every function and class and every class-method in your code, arranged by sub-module. The objective of this is to fully document your code, so users can get information on the inputs and outputs of all of your code's functionality. The API should be arranged in a logical manner, grouping  functions and classes with similar functionality. This is a part of your documentation where you should do a minimal amount of manual work in the documentation itself, but rather you should use automated tools to directly grab documentation from your code itself, in your functions' and class' *docstrings*, which I discuss next.

## Python docstrings

Python has a built-in mechanism to attach documentation to modules, functions, and classes and their methods: [docstrings](https://www.python.org/dev/peps/pep-0257/). Docstrings are a place to put documentation for users of your code, that is, the type of documentation that we are interested in here. Docstrings are *not* for developers: don't use them to comment on specifics of the implementation or on how the code works, unless this is necessary for users of your code; for developer notes, use regular comments in the code (in Python: lines that start with ``#``). 

Docstrings are simply regular strings that by virtue of their placement in the code get attached to a module, function, class, or class-method as its documentation. They do not need to be explicitly assigned as documentation, rather, the Python interpreter does this assignment automatically when it encounters a string in the correct place. This location is as follows:

* For *functions*: immediately following the statement that defines the function ``def func(a,b,c=0):``, that is, between the ``def`` statement and the function body.

* For *classes*: immediately following the statement that defines the class ``class a_class(object):``, that is, between the ``class`` statement and the class body.

* For *class methods*: immediately following their definition using ``def``, in the same way as for functions.

* For *modules* and *submodules*: at the very top of the file defining the module.

I will give examples of these using the ``exampy`` example package that we set up in [Chapter 2](02-Package-Structure.ipynb#Package-layout). When the Python interpreter encounters a string in the place specified above, it binds this string to the ``__doc__`` attribute of the module/function/class/class method, where it is available to any user. Variables cannot have docstrings in Python itself (that is, the Python interpreter does not bind these to the variable's ``__doc__`` attribute), but many documentation tools will pick up docstrings immediately following a variables assignment in the source code:
```
frac_out= 0.25
"""Fraction of the data that is considered an outlier"""
```

While docstrings can be any string, the convention is to use triple-quoted strings of the type ``"""A triple-quoted string"""``, because most docstrings contain multiple lines, which is only allowed for triple-quoted strings. Thus, even if you have a docstring that is just a single line (which should rarely be the case), use a triple-quoted string. A good docstring should contain at least: (a) a brief description of what the module/function/class/class-method does, (b) an explanation of any input arguments and keywords, and (c) a discussion of any return value(s) or, for functions and methods, the lack thereof (it's useful to know that a function does *not* return anything). You can include extra information such as possible failure modes or references as well. While there are many standard formats for docstrings, some of which I will discuss below, you do not have to follow a standard format, but it is important to use a consistent style throughout your package such that users can easily parse the documentation once they are used to your format.

As an example, we can write a docstring for the top-level module of the ``exampy`` package. To do this, we edit the ``exampy/__init__.py__`` file such that it now looks like
```
"""exampy: an example Python package"""
from .utils import *
```
and the ``"""exampy: an example Python package"""`` string then becomes the module's docstring. To verify this, open a Python interpreter and do

In [1]:
import exampy
?exampy

which shows a message that says something like
```
Type:        module
String form: <module 'exampy' from '/PATH/TO/exampy/exampy/__init__.py'>
File:        /PATH/TO/exampy/exampy/__init__.py
Docstring:   exampy: an example Python package
```
and in which you see the docstring that we just defined. You can also verify that it was indeed attached as the module's ``__doc__`` attribute:

In [2]:
print(exampy.__doc__)

exampy: an example Python package


You should only use one-line docstrings for modules, submodules, and classes, because these do not have direct inputs and outputs, so all of the documentation can easily fit on a single line (however, you should feel free to have a multi-line docstring if there is more to say). A class' docstring simply describes the purpose of the class, *not* how to initialize the class or details on its methods (although it could contain a list of attributes or methods; this isn't generally considered to be necessary); a class' initialization should be documented as the docstring of the class' ``__init__`` function, just like any regular method as I discuss below.

Functions and methods typically have inputs and outputs in addition to the brief description, and these inputs and outputs should be separated onto their own line each; to keep a uniform style for your documentation, you should therefore also use multi-line docstrings for functions that have no inputs or outputs, stating explicitly that there are no arguments or keywords and no outputs. Class methods are functions that are defined as part of a class and they are essentially the same as regular functions, except that their first argument is ``self`` as the representation of the class instance. ``self`` is not typically listed as a documented argument of a method, because it is always the first argument of a method and it always has the same meaning. Therefore, methods and functions follow the same documentation rules. I will discuss documentation for functions below, but keep in mind that the same considerations apply to methods in exactly the same way.

While there are many standard docstring formats, for scientific code packages it is simplest to follow ``numpy``'s [docstring convention](https://numpydoc.readthedocs.io/en/latest/format.html) (also used, e.g., by ``scipy``, packages in the ``scikit`` series, and ``astropy``). The most basic version of a docstring for a function should contain a description, inputs list, and outputs list and in the ``numpy`` docstring format looks as follows, using as an example the ``exampy.square`` function that we defined in ``exampy/utils.py`` in [Chapter 2](02-Package-Structure.ipynb#Package-layout); 
```
def square(x):
    """The square of a number                                                   
                                                                                
Parameters                                                                      
----------                                                                      
x: float                                                                        
    Number to square                                                             
                                                                                
Returns                                                                         
-------                                                                         
float                                                                           
    Square of x                                                                  
"""
    return x**2.
```
The brief description is followed by a *Parameters* section that lists each argument and keyword with the format
```
parameter: type
    Parameter description
```
Similarly, the return value is described as
```
type
    Description of return value
```
If your function returns multiple values, ``Returns`` becomes a list as well; in that case, you may want to name your return values for extra clarity and follow the same format for each as that for each input parameter.

If we then run

In [3]:
?exampy.square

a message shows up that looks as follows
```
Signature: exampy.square(x)
Docstring:
The square of a number

Parameters
----------
x: float
    Number to square

Returns
-------
float
    Square of x
File:      /PATH/TO/exampy/exampy/utils.py
Type:      function
```
We can again check that the docstring was indeed assigned to the function's ``__doc__`` attribute with

In [4]:
print(exampy.square.__doc__)

The square of a number

Parameters
----------
x: float
    Number to square

Returns
-------
float
    Square of x



For most functions, you will want to include a longer description than the one-line description that we could use for the square function above, in that case, you would still start the docstring with a one-line summary, but also provide an extended description after two line breaks. For example, for a verbose ``exampy.square`` docstring
```
def square(x):
    """The square of a number                                                   
                                                                                
Calculates and returns the square of any floating-point number;
note that, as currently written, the function also works for
arrays of floats, ints, arrays of ints, and more generally,
any number or array of numbers.

Parameters                                                                      
----------                                                                      
x: float                                                                        
    Number to square                                                             
                                                                                
Returns                                                                         
-------                                                                         
float                                                                           
    Square of x                                                                  
"""
    return x**2.
```

If a function has optional keyword arguments, the documentation should make it clear that these are optional, either by adding ```, optional``` after the parameter's type or by stating this in the description of the parameter (but the first method is most clear). You can also specify what the default value of the keyword is, but this is not really necessary, because most documentation tools will display the function's signature, which normally shows the default value. For example, we can add a general ``pow(x,p=2.): return x**p`` function to ``exampy/utils.py``, which computes the p-th power of x and defaults to the square of x; with documentation, that function looks like
```
def pow(x,p=2.):
    """A number x raised to the p-th power                                      
                                                                                
Parameters                                                                      
----------                                                                      
x: float                                                                        
    Number to raise to the power p                                               
p: float, optional                                                                        
    Power to raise x to                                                          
                                                                                
Returns                                                                         
-------                                                                         
float                                                                           
    x^p                                                                          
"""
    return x**p
```
If we then request the documentation for the ``pow`` function

In [5]:
?exampy.pow

we get a message that says something like
```
Signature: exampy.pow(x, p=2.0)
Docstring:
A number x raised to the p-th power

Parameters
----------
x: float
    Number to raise to the power p
p: float, optional
    Power to raise x to

Returns
-------
float
    x^p
File:      ~/Repos/code-packaging-minicourse/exampy/exampy/utils.py
Type:      function
```
and we see that the function signature includes the default value of ``p`` even though the docstring didn't specify it. If the default value of a keyword is the result of calling a function, such that it isn't immediately clear what the default value is from the function signature or how it is calculated, you probably want to state it in the docstring.

Additional commonly-used sections of a function's dostring are (each following the
```
SECTION
--------
```
format) are:

* ``Raises``: a list of exceptions that the function may raise and when it raises them.

* ``See Also``: a list of related functions; automated documentation tools will be able to link these automatically if you list them in the same way that you would import and use them (e.g., in ``pow`` above you can refer to ``square``, but to refer to ``cube`` which is in ``exampy.submodule1``, you need to say explicitly ``exampy.submodule1.cube``).

* ``Notes``: extended notes on the function. Use this to list calculation or implementation details that you think the user should be aware of.

* ``References``: a list of bibliographic references, using the format
```
.. [1] J. Bovy, "galpy: A Python Library for Galactic Dynamics," 
   Astrophys. J. Supp., vol. 216, pp. 29, 2015.
```

* ``History``: not a standard part of the ``numpy``-style docstrings, but a section where you could give details on the history of the function, keeping track of major changes and when they occurred (I have often wished that ``numpy`` would have this!). For example,
```
History
-------
2020-03-01: First implementation - Bovy (UofT)
2020-04-06: Added new keyword Y to allow for Z - Bovy (UofT)
```

As a full example, we implement a docstring for the ``exampy.submodule1.cube`` function located in ``exampy/submodule1/subutils.py`` (see [Chapter 2](02-Package-Structure.ipynb#Package-layout)) and display it here:

In [6]:
import exampy.submodule1
print(exampy.submodule1.cube.__doc__)

The cube of a number

Calculates and returns the cube of any floating-point number;
note that, as currently written, the function also works for
arrays of floats, ints, arrays of ints, and more generally,
any number or array of numbers.

Parameters
----------
x: float
    Number to cube

Returns
-------
float
    Cube of x

Raises
------
No exceptions are raised.

See Also
--------
exampy.square: Square of a number
exampy.pow: a number raised to an arbitrary power

Notes
-----
Implements the standard cube function

.. math:: f(x) = x^3

References
----------
.. [1] A. Mathematician, "x to the p-th power: squares, cubes, and their 
   general form," J. Basic Math., vol. 2, pp. 2-3, 1864.

History
-------
2020-03-01: First implementation - Bovy (UofT)



That docstrings are simply a submodule/function/class/method's ``__doc__`` attribute means that they can be generated, parsed, and modified programatically. That is, you can also specify a docstring by explicitly setting the ``__doc__`` attribute, you can automatically extract information from the docstring by parsing it as you can any string in Python, or you can modify the docstring (e.g., adding additional information to it). This is, for example, useful when you are defining functions programatically, e.g., automatically defining a set of functions with similar functionality; then you can add documentation to these automatically generated functions by explicitly setting their ``__doc__`` attribute. For example, we set the docstring for the ``exampy.submodule1`` submodule by editing ``exampy/submodule1/__init__.py`` to be
```
__doc__= """exampy.submodule1: submodule with extra utilities"""
from .subutils import cube
```
When we then do

In [7]:
?exampy.submodule1

We get a message that says something like
```
Type:        module
String form: <module 'exampy.submodule1' from '/PATH/TO/exampy/exampy/submodule1/__init__.py'>
File:        /PATH/TO/exampy/exampy/submodule1/__init__.py
Docstring:   exampy.submodule1: submodule with extra utilities
```
that is, we see that the docstring was correctly attached.

At the risk of seeming a broken record, I will once more repeat that you should write these docstrings *as soon as you implement a function and ideally before you implement it* and you should implement a full docstring of your preferred format from the get-go. If you do this enough times, it will become second nature and you will not even be able to imagine writing code and documentation in any other way!

## Using ``sphinx`` to write and generate documentation for your package

## Including ``jupyter`` notebooks as part of your documentation

``exclude_patterns = ['.ipynb_checkpoints/*']``

## Automatically building and hosting your documentation on ``readthedocs.io``