# INF200 Lecture No. J04

### Hans Ekkehard Plesser
### 5 June 2020

## Today's topics

- Statistical tests
- Random selection
- Creating documentation with Sphinx

## Statistical tests and random selection

- Test methods that depend on random numbers
- Exact results will depend on precise sequence of random numbers generated, i.e., on the random generator used and the random seed

### Basic approaches

#### Fixed seed
By seeding the random number generator with a fixed value, we can ensure that we always get the same sequence of random numbers; particularly important while debugging.

- Requires that we know which random number generator is used by methods tested
- Adding more tests or changing tests or code can change the way in which random numbers are consumed

#### Mocking
Mock the random number function to return a fixed value.

- Allows us to check that the code using the random numbers works as expected
- Does not test whether the result has the expected distribution
- Requires that we know exactly how the code draws random numbers (white box testing)

#### Statistical tests

- The principal approach is based on statistical testing of hypothesis
    - Formulate a hypothesis (expectation), e.g., "value $x$ is a sample of random variable $X$ which has a normal (Gaussian) distribution of given mean $\mu$ and variance $\sigma$"
    - Find the $p$-value of $x$, i.e., the probability to observe a value at least as far from the mean as $x$ if $x$ indeed follows the assumed distribution
    - Compare the $p$-value to a predefined acceptance limit $\alpha$: if $p>\alpha$ the test is passed
- Interpretation: Let, e.g., $\alpha=0.01=1\%$. If we observe a value $x$ with a $p$-value less than $\alpha=1\%$, this means that the value $x$ belongs to the outer tail of the assumed distribution, among those values that make up the 1% least likely values in the distribution. We thus assume that $x$ did not come from the expected distribution and declare the test failed.
- Note: By construction, this test will fail in 1% of all cases even if $x$ follows the assumed distribution. Thus, failures need to be inspected carefully.
- See, e.g., Knuth, The Art of Computer Programming, vol 2.

#### Types of statistical tests

- [$Z$-test](https://en.wikipedia.org/wiki/Z-test)
    - Strictly speaking, tests whether the mean of $n$ random values drawn independently from the same distribution is from a Gaussian distribution of given mean and variance 
    - Due to the [central limit theorem](https://en.wikipedia.org/wiki/Central_limit_theorem), it can also be applied in many other cases as an approximation provided we are considering averages of many trials
    - If the variance of the Gaussian distribution is not know a priori, one should use [Student's $t$-test](https://en.wikipedia.org/wiki/Student%27s_t-test) instead

- [Binomial test](https://en.wikipedia.org/wiki/Binomial_test)
    - An explicit test for binomially distributed quantities, e.g., the number of successes in $n$ Bernoulli experiments (coin flips)
    - See also [GraphPad](http://www.graphpad.com/guides/prism/8/statistics/index.htm?stat_binomial.htm) for an explanation of the test. The [binomial test in SciPy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html) uses the same approach as GraphPad

- `scipy.stats` provides [a number of statistical test functions](https://docs.scipy.org/doc/scipy/reference/stats.html)




## Random selection

### Case 1: Dead or alive

- An animal has a probability $p$ to die
- How do we decided if the animal will die in a given year?
    - Draw uniformly distributed random number from $[0, 1)$ and compare to $p$
    
### Case 2: Choosing between multiple alternaives

- Literature: Knuth, The Art of Computer Programming, vol 2, ch 3.3-3.4 
- In a simluation, we want to choose between four alternatives with probabilities $p_0, p_1, p_2, p_3$
- Note $\sum_{n=0}^3 p_n = 1$ by definition
- Cumulative probabilities $P_n = \sum_{k=0}^n p_k$ divide unit interval in sections corresponding to events 0, 1, 2, 3
- Specifically, we choose a random number $r$ and select 

\begin{equation}
\begin{cases}
\text{event}\: 0 \quad\text{if}\; r < P_0 \\
\text{event}\: n \quad\text{if}\; P_{n-1} \leq r < P_{n}\;\; \text{for}\; n>0
\end{cases}
\end{equation}

- The following code will select from `len(p)` alternatives with probabilities `p[0]`, `p[1]`, ...

In [None]:
def random_select(p):
    r = random.random()
    n = 0
    while r >= p[n]:
        r -= p[n]
        n += 1
    return n

### Simpler approach for our simulations

- Animals move in all four directions with *same* probability
- Can use `random.choice()` to pick one element from a list with equal probability

------------------

## Creating documentation with Sphinx

### What is Sphinx?

- [Sphinx](http://www.sphinx-doc.org/en/stable/) is a tool for generating documentation for your code
- Can compile documentation to many different formats: LaTeX, pdf, html, etc.
- Can read out docstrings in your code and include in the documentation

### Getting started: `sphinx-quickstart`

1. Open `Terminal` under OSX or `Anaconda Prompt` under Windows. 
1. Navigate to your `BioSim_Gxx_Name1_Name2` folder (use `cd` to change directories)
1. Run the following command
```
sphinx-quickstart --ext-autodoc --ext-coverage --ext-mathjax --ext-viewcode docs
```
1. Accept default answers for questions by pressing ENTER and enter sensible values for
    - Project Name
    - Author Names(s)
    - Project version
1. Don't worry if you make a mistake, you can fix it in the `docs/conf.py` file
1. Open file `conf.py` in the `docs` directory and change the following lines (approx line 15) 

        #import os
        #import sys
        #sys.path.insert(0, os.path.abspath('.'))
        
    to
    
        import os
        import sys
        sys.path.insert(0, os.path.abspath('..'))
        autoclass_content = 'both'
        
     The first line ensures that Sphinx finds all code in the project directory, the second that documentation will be generated for all constructors.
1. Finally, add the following line at the end of `conf.py`

        latex_elements = {'papersize': 'a4paper'}

### Write documentation

1. Edit the `docs/index.rst` file and add additional documentation `*.rst`
1. For a worked example, see `Project/SampleProjects/biolab_project`
1. For more information on restructured text, see
    - [ReStructuredText primer](http://docutils.sourceforge.net/docs/user/rst/quickstart.html)
    - [ReStructuredText overview](http://docutils.sourceforge.net/docs/user/rst/quickref.html)

### Generate documents

1. Open a Terminal (e.g. inside PyCharm) and navigate to the `docs`
folder inside your project.

1. Run 

        make html
        
    This will create basic documentation, which you by opening `docs/_build/html/index.html` 
    in a web browser.
    
1. If the command above does not work in the terminal you opened in PyCharm, try opening a normal Terminal, navigate to the `docs` directory and try again.
       
1. To create documentation in other formats, run, e.g.

        make epub
        make latexpdf

    The resulting documentation will be in the `epub` and `latex`
    directories, respectively. Creating these formats may require
    additional software on your computer, especially a working TeX
    system, e.g.

    - Windows: [MikTeX](http://miktex.org)
    - OSX: [MacTex](https://tug.org/mactex)

    Under Windows, you may have to run
    
        ```
        make latex
        cd _build/latex
        pdflatex biolab
        ```

### Keep Sphinx-generated documentation out of Git repo!

The documentation that is generated in the `docs/_build` directory should **not** be committed to your git repository!

`docs/_build` should automatically be ignored by git if you have put the right `.gitignore` file in place (copied from course repo `project_description/sample.gitignore`.

If the `docs/_build` directory is not ignored by git, proceed as follows:
1. If you have not yet put `.gitignore` in place, do it now and see if `docs/_build` is ignored afterwards.
1. If the `docs/_build` build directory is still not ignored, there are a few possibilities:
    1. The `docs` directory has a different name, e.g. `Docs` or `doc`. Rename it to `docs`.
    1. The `docs` directory is not at the top level within the `BioSim_Gxx_Name1_Name2` folder. Move it to the top level.
    1. The `.gitignore` file is not at the top level within the `BioSim_Gxx_Name1_Name2` folder. Move it there.
    1. If none of this helps, contact Hans Ekkehard!
1. Commit your changes if you changed `.gitignore` or moved a directory.

### Formatting options for docstrings

Instead of the standard format for docstrings, e.g.,

```
def repeat(text, copies):
    """
    Repeat given text a given number of times.
    
    :param text: a string
    :param copies: an integer
    :return: string, text concatenated copies times
    """
```

one can also use NumPy-style docstrings which look like this

```
"""
Repeat given text a given number of times.

Parameters
----------
text : str
    Text to be repeated
copies : int
    Number of repetitions

Returns
-------
str
    Text concatenated copies times.
"""
```

For more on the NumPyDoc format, see
- http://numpydoc.readthedocs.io/en/latest/format.html
- http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html

To work with NumPyDoc docstrings, you need to do the following:
1. Install Sphinx Numpydoc extentsion
    - `conda install numpydoc`
    - `pip install numpydoc`
1. In `doc/conf.py`, around line 40, add `'sphinx.ext.napoleon'` to the list of `extensions`.
1. In PyCharm, open Preferences, go to `Tools > Python integrated tools` and select `Docstring format` NumPy

### Further  documentation on Sphinx

- [Sphinx homepage](http://sphinx-doc.org)
- ["Guided tour" to documenting with Sphinx](http://pythonhosted.org/an_example_pypi_project/sphinx.html)
- [Sphinx tutorial from the Matplotlib folks](http://matplotlib.org/sampledoc/)
- [Alternative themes for Sphinx-generated HTML](http://www.sphinx-doc.org/en/stable/theming.html)