# Software Engineering

What do programmers do apart from writing code?

- Testing (unit testing, integration testing, acceptance testing)
- Debugging
- Planning (User Stories, Requirements)
- Prototyping (dummy code, pseudocode, executable code)
- Software Design (flowcharts)
- Infrastructure (technology stack, data load)
- System engineering (business processes, use cases)
- Project management (processes, Agile/Scrum, Kanban)
- Documenting (comments, documentation)

|            Program             |      Size     |
|--------------------------------|:-------------:|
|            Jupyter             |    < 10^2     |
| script w/functions or modules  |    < 10^3     |
|       program w/Classes        |    < 10^4     |
| multiple libraries or packages |    < 10^5     |
|1. Don't use python 2. Get help |    ~ 10^6     |


### Project Structure
Where do we put the files?
- **Git** - solves problem of finding where you left off in your own programming 
- **Cookiecutter** - tool that gives us a project structure where everything is prepared: template files, stuff ready for testing, pip installable libs etc...
- **pip** - mkae our program pip installable! The tool for that is called 'distutils'. If we have Cookiecutter this makes this much easier
- **Sphinx** - used to build documentation 

### Testing
How can we make sure the project is actually working?
- **PEP8** - about coding style! Keeping your construction site clean, mowing the lawn! We use Pylint to check our code for whitespace etc
- **pytest** - need to specify what you want to test! eg tests whether a function works automatically. Really great way to showcase that you're thinking about your code 
- **Travis** - performs **'Continuous Integration'** - adds an extra dimension: Everytime someone changes the code, it is tested before it is incorporated into the programming pipeline!

| problem | bad solution | good for small projects | good for big projects |
|---------|--------------|-------------------------|-------------|
| access old versions of the code | copy folders | version control / git | git + Jira, Confluence |
| two projects with different requirements | buy two computers | virtual environments | docker |
| the program is easy to install | install manually, long recipe | installation script, distutils | make, Maven, Cookiecutter |
| the code is easy to read | comment everything or nothing | PEP8, pylint | pylint+isort, Continuous Integration (TravisCI) |
| existing features do not break | run everything manually | asserts, pytest, coverage | Pytest + TravisCI |
| the program works correctly | try one example | code reviews | code reviews |
| the code runs quickly | trial and error | buy a GPU, %timeit | cprofile |
|users know how to use the program | no documentation | README file | Sphinx, MkDocs |

## Building Python Packages

### pip installation
- Save all libraries used to a file:

`pip freeze > requirements.txt`

- Install them:

`pip install -r requirements.txt`

### Cookiecutter
Cookiecutter creates a project skeleton with a pre-configured infrastructure that makes building and maintaining your software a lot easier.

- First install Cookiecutter:

`pip install cookiecutter`

- Then you can create a new project by cloning a project template:

`cookiecutter https://github.com/audreyr/cookiecutter-pypackage.git`

And follow the dialog.

### Adding your source code
- Edit your source code in the mypackage/ subdirectory. The setup.py script will find it there.

**Note The setup.cfg file contains most settings of your project (names, URLs, etc.).**

### Using setup.py
- **setuptools** is a Python library that builds and installs Python packages. **Cookiecutter** prepares everything you need to use setuptools.

In your project directory, you have the following commands:

- Install the project in editable mode:

`pip install -e .`

OR

`python setup.py develop`

- Run the tests:

`python setup.py test`

- Install the program locally:

`pip install .`

OR

`python setup.py install`

- Create a release file with everything to run the program:

`python setup.py build`

- Create a release file with everything to develop:

`python setup.py sdist`

### README-File

The README-file (.md or .txt) should contain at least 4 pieces of information:

1. What does the program do? (short description in plain language)
2. How to use it? (step-by-step instructions)
3. Who owns it? (license, references etc.)
4. Who to ask? (contact information, links to base repos etc.)

### Building a command-line interface
1. Write a function
2. Reference the function in setup.py (uncomment the [console_scripts] section from pyscaffold template and edit the line below)
3. Re-install the program with pip install . The program should become available as a command-line tool.

You can also find modules to build GUI and web interfaces!

### Releasing your program on pip
Follow the instructions in the tutorial Packaging Python Projects

## PEP8

- PEP8 is a standard for formatting Python code. Adhering to PEP8 makes your code easier for others to read and helps to discover some bugs early.

### pylint
The pylint tool is a **linter**, it checks PEP8 in your code, outputs suggestions and a score with a maximum of 10.

`pylint my_program.py`

### Configuration
In the file **.pylintrc**, you can configure/disable some of the rules. E.g. to switch off checks on variable names, add:

`[pylint]
    disable=C0103`
    
**Note: PEP8 is a guideline, not a lawbook. You are encouraged to ignore rules that impede your work. Some teams use Git Hooks to run their linters automatically, whenever somebody pushes to the repo.**

### Code style in Jupyter

- Style checks in Jupyter are enabled by installing:

`pip install pycodestyle_magic`

- At the start of the notebook, add:

`%load_ext pycodestyle_magic`

- And in each cell you would like to check:

`%%pycodestyle`

## Automated Testing

| concept | description |
|---------|-------------|
| unit tests | tests for small units of code (functions, classes) | 
| integration tests | tests testing multiple components together | 
| regression tests | repeating tests after changes to the c | ode | 
| border cases | extreme situations that are tested
| with `pytest.raises()` | used to test whether an exception occurs | 
| fixtures | functions that prepare test data | 
| test parametrization | automatic generation of tests from data | 
| conftest.py | file where you can write pytest fixtures | 
| test coverage | percentage of lines executed by tests | 

- Complete Python Testing Tutorial: 

https://krother.gitbooks.io/python-testing-tutorial/content/

### Examples
### Testing for exceptions

In [1]:
import pytest

In [2]:
def test_error():
    with pytest.raises(IndexError):
        data = [5,4,3]
        print(data[7])

In [3]:
test_error()

### Writing a fixture


In [4]:
@pytest.fixture
def primes():
    return [3, 5, 7, 11, 13]

Put the fixture in `conftest.py`, pytest will automatically import it.

Use it in a test function anywhere:

In [6]:
def test_odd_primes(primes):
     for p in primes:
            assert p % 2 != 0

### Writing a parameterized test
- Generates many tests with different parameters

In [7]:
@pytest.mark.parametrize('prime', [3, 5, 7, 11, 13])
def test_odd_primes(prime):
    assert prime % 2 != 0

### Calculating Test coverage

`pip install pytest-cov`

`pytest-cov`

`coverage html`


### Data Validation
- It is essential to validate input or output data of your program to make sure whatever you do makes sense.

#### Example
We load the “boston” data set, and we will examine the age column in more detail:


In [8]:
from sklearn.datasets import load_boston
X, y = load_boston(True)

age = X[:,6]

#### Range checks with assert
Our first attempt is to check whether ages are in a valid range:


In [10]:
def validate_between(x, low, high):
    assert low <= x <= high

The `partial` function allows us to set the boundaries and obtain a new function (can be used via `apply` in pandas DataFrames):


In [11]:
from functools import partial

validate_age = partial(validate_between, low=0.0, high=120.0)

- Now we can check values with it:

In [13]:
validate_age(5)

In [14]:
validate_age(55)

In [15]:
validate_age(125)  # throws an AssertionError

AssertionError: 

#### Range checks with Exceptions
The `assert` statement does not tell us what is really going on. A better alternative is to write your own Exception class:

In [16]:
class ValidationError(Exception): pass

def validate_between(x, low, high):
    if not (low <= x <= high):
        raise ValidationError(f"Value {x} not between {low} and {high}!")

And we can use `partial()` as above:

In [17]:
validate_age = partial(validate_between, low=0.0, high=120.0)

Now we can validate the entire column:

In [18]:
b = [validate_age(a) for a in age]

#### Validating types
- In a similar manner, you can validate the type of variables:


In [19]:
def validate_type(x, typelist):
    if not type(x) in typelist:
        raise ValidationError(f"Wrong type {type(x)}, not in {typelist}")

- And we can match builtin, NumPy or any other type:

In [21]:
import numpy as np
validate_number = partial(validate_type,
    typelist=[float, int, np.float32, np.float64, np.int32])

validate_number(8.8)

In [22]:
validate_number("8")

ValidationError: Wrong type <class 'str'>, not in [<class 'float'>, <class 'int'>, <class 'numpy.float32'>, <class 'numpy.float64'>, <class 'numpy.int32'>]

#### Validating categories
- Matching categories to a list of values is also possible (although df.unique() probably does a better job):

In [23]:
def validate_category(x, values):
    if not type(x) in values:
        raise ValidationError(f"Wrong value for category {type(x)}, not in {typelist}")

## Documentation with Sphinx
This is a step-by-step guide to using the Python documentation tool Sphinx.

**What is Sphinx?**

- It is a python package / tool that allows you to create beautiful and intelligent documentation.

**Features:**

- It is written in a markup language called reStructuredText, or RST for short.
- It renders your RST code into various formats, including:
    - HTML (including Windows HTML Help)
    - LaTeX (for printable PDF versions)
    - ePub
    - Texinfo
    - manual pages
    - plain text
    
- It allows for cross-referencing, adding hyperlinks, creating document trees, tables of content, etc.

In short, it let’s you write nice, interactive, fancy documentation / web pages without needing to use HTML, CSS, or JavaScript. Consider this tool if you’re interested in going beyond README.md files and creating well-structured documentation for your python software development project.

## Travis CI
**Travis is a Continuous Integration Tool that lets you automate checks on your code (pylint, pytest) etc. on GitHub**

### Steps

1. Get an account on https://travis-ci.org

2. Tell Travis about your GitHub project

3. Authorize GitHub to talk to Travis

4. Create a `.travis.yml` file

5. Change something in your code, commit and push

6. Add a button to your README.md file:


### Example Travis configuration

            language: python
            before_script:
                - sleep 3

            python:
                - "3.6"

            install:
                - pip install -r requirements.txt
                - pip install pylint
                - pip install --editable .
                
            script:

            - pylint myprogram.py