# `py.test`

> The pytest framework makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries.

Features:

- All tests are pre-fixed with `test_`.
- All files containing tests are prefixed with `test_`. 
- **Testing made for humans.**

## Toy example: increment

- Create a Python module called `datafuncs.py`. We will be adding functions to this module as we go along.
- Inside `datafuncs.py`, write a function name `increment(x)`, which increments `x` by 1, and returns the result.

```python
# in datafuncs.py
def increment(x):
    return x + 1
```

## Exercise

- Create a new Python module called `test_datafuncs.py`. We will be adding tests to this module as we go along.
- Inside `test_datafuncs.py`, write the following test for the `increment(x)` function.

```python
# in test_datafuncs.py
import datafuncs as dfn
def test_increment():
    assert dfn.increment(2) != 3
```

Now, in your terminal, execute the following command:

```bash
$ py.test
```


```bash
============================= test session starts ==============================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /Users/ericmjl/github/tutorials/data-testing-tutorial, inifile:
collected 1 items

test_datafuncs.py F
```

```bash
=================================== FAILURES ===================================
________________________________ test_increment ________________________________

    def test_increment():
>       assert dfn.increment(2) != 3
E       assert 3 != 3
E        +  where 3 = <function increment at 0x10eaf7378>(2)
E        +    where <function increment at 0x10eaf7378> = dfn.increment

test_datafuncs.py:3: AssertionError
=========================== 1 failed in 0.06 seconds ===========================
```

Congratulations! You wrote your first failed test! With py.test, you have a command that automatically finds tests, executes them, and reports where they fail.

Questions so far?

Now, go fix the test such that it works correctly.

```python
def test_increment():
    assert dfn.increment(2) == 3
```

And then re-run that test.

```bash
$ py.test
```

```bash
============================= test session starts ==============================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /Users/ericmjl/github/tutorials/data-testing-tutorial, inifile:
collected 1 items

test_datafuncs.py .

=========================== 1 passed in 0.02 seconds ===========================
```

## Anatomy of a Test

Let's revise now what the anatomy of a test is like.

```python
from module import function
def test_function():  # `test_` is key here.
    assert function(input_val) == correct_val    # assertion statement
    assert function(input_val) != incorrect_val  # counter-example
```

And now, the testing loop:

1. Write a test for a function.
1. Write the function.
1. Execute `pytest`.
1. Go back to step 1.

There's nothing complex behind the ideas of testing, 80% of your cases will boil down to doing this loop.

## Exercise: Min-Max Scaler

In `datafuncs.py`, implement a function called `min_max_scaler(x)` for your data. It should take in a `numpy` array and scale all of the values to be between 0 and 1 inclusive. The min value should be 0, and the max value should be 1.

In [5]:
def min_max_scaler(x):
    """
    Returns a numpy array with all of the original values scaled between 0 and 1.
    
    Assumes the data are a numpy array.
    """
    return (x - x.min()) / (x.max() - x.min())

import numpy as np

arr = np.arange(1, 10)
min_max_scaler(arr)

array([ 0.   ,  0.125,  0.25 ,  0.375,  0.5  ,  0.625,  0.75 ,  0.875,  1.   ])

Now, write the following tests for the `min_max_scaler(x)` function:

- Given a particular test input, the output should be equal to some other array. Use the `np.allclose(arr1, arr2)` function.
- The values should fall between 0 and 1.
- The test should fail if I pass in an integer.

```python
# in test_datafuncs.py
def test_min_max_scaler():
    arr = np.array([1, 2, 3])  # set up the test with necessary variables.
    tfm = dfn.min_max_scaler(arr)  # collect the result into a variable
    assert tfm == np.array([0, 0.5, 1])  # assertion statements
    assert tfm.min() == 0  
    assert tfm.max() == 0  
    
    ## how do we test that a function should fail...?!?!
```

## Testing that a function should fail

Use the `with pytest.raises(ErrorType)` context manager.

```python
import pytest
import numpy as np

def test_min_max_scaler():
    arr = np.array([1, 2, 3])  # set up the test with necessary variables.
    tfm = dfn.min_max_scaler(arr)  # collect the result into a variable
    # Correctness tests
    assert np.allclose(tfm, np.array([0, 0.5, 1])  # assertion statements
    assert tfm.min() == 0  
    assert tfm.max() == 1

    # min_max_scaler(x) should fail if an integer is passed in.
    with pytest.raises(AttributeError):  
        dfn.min_max_scaler(2)
```

## Step back

- With tests, you're basically encoding your expectations of a function in code.
- The tests you've written so far might not necessarily cover all cases, but they can cover the 80% of failures that might happen.
- There are more powerful ways to write tests, will come later.

## Exercise: Testing functions on textual data.

Imagine we have textual data, and we want to clean it up. There are two functions we may want to write to standardize the data:

- `bag_of_words(text)`, which takes in the text and tokenizes the text into its set of constituent words.
- `strip_punctuation(text)`, which strips punctuation from the text.

Implement the two functions in `datafuncs.py`; you may wish to write additional helper functions to manage the business logic. There's leeway in this exercise; feel free to get creative!

In [3]:
import string
def strip_punctuation(text):
    exclude = string.punctuation
    return ''.join(s for s in text if s not in exclude)

t = "hello world! This is my pleasure, and the 2nd time I've been to PyCon!"

def bag_of_words(text):
    text = strip_punctuation(text)
    words = set(text.split(' '))
    return words

bag_of_words(t)

{'2nd',
 'Ive',
 'PyCon',
 'This',
 'and',
 'been',
 'hello',
 'is',
 'my',
 'pleasure',
 'the',
 'time',
 'to',
 'world'}

Now, implement the tests. This time round, for the sake of time, only implement the assertion test, without doing a counter-example. 

In [4]:
def test_strip_punctuation():
    text = 'random. stuff; typed, in-to th`is text^line'
    t = strip_punctuation(text)
    
    assert set(t).isdisjoint(string.punctuation)

test_strip_punctuation()

def test_bag_of_words():
    text = 'random stuff typed into this text line line'
    text_bagged = bag_of_words(text)
    assert len(text_bagged) == 7
    assert ' ' not in text_bagged
    
test_bag_of_words()

Finally, run pytest!

```bash
$ py.test
============================= test session starts ==============================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /Users/ericmjl/github/tutorials/data-testing-tutorial, inifile:
collected 3 items

test_datafuncs.py ...

=========================== 3 passed in 0.03 seconds ===========================
```
