To play slideshow, run this command in the terminal:
```
$ jupyter nbconvert intro-to-testing-no-demo.ipynb --to slides --post serve --SlidesExporter.reveal_theme=night --SlidesExporter.reveal_transition=none
```

# The Role of Testing in Data Science

Jes Ford, PhD

Data Scientist

<img src="img/pytest.png" style="width: 300px;">

# About Me

- Originally from Alaska, have followed the snow all around the western US/Canada
- PhD in Astrophysics from UBC, Vancouver
- Postdoc in Data Science at UW, Seattle
- Moved to UT to snowboard and be a Data Scientist at Backcountry.com $\rightarrow$ now at Recursion
- I love teaching and learning about things including Python
- I organize this PyLadies chapter

<!--- <table><tr>
<td> <img src="https://jesford.github.io/photos/cfht_and_me.jpg" alt="Drawing" style="width: 200px;"/> </td>
<td> <img src="https://jesford.github.io/photos/Jess_Ford_SSS_cbox_BSlide.jpg" alt="Drawing" style="width: 142px;"/> </td>
<td> <img src="https://jesford.github.io/photos/toki_lions.jpg" alt="Drawing" style="width: 315px;"/> </td>
</tr></table> --->

## The plan

In [None]:
def presentation():
    motivate_testing()
    introduce_testing_with_pytest()
    data_science_workflows()
    data_science_example_tests()
    additional_tools_for_testing()
    wrap_up()

## Why talk about testing?

I care about code quality...

meaning clean code\* that gives *correct results*

\* topic for another day: PEP8, code is read much more often than it is written!

## Why test?

- Tests can give you evidence that your code is working as expected
- Tests give you confidence to make changes without fear of breaking something
- Tests make other people trust your code more
- ... however bad tests can give you false confidence

## Why *not* test?

Writing tests takes time!

### The Struggle

As a data scientist I am constantly struggling with these competing goals:

- getting results as quickly as possible
- being as confident as possible that I've got the right answer
  
$\rightarrow$ How do we balance these interests in the optimal way?

This is what my talk is all about. Not just how to write tests, but how to decide when its worth the time and effort.

## In this talk...


- I will *not* insist that you always write tests

- I will describe different scenarios I find myself in as a data scientist and how I try to be confident that my results are correct

- I will show you how to get started testing 

## Disclaimer

- I am not a testing expert or a software engineer

- These *opinions* are based on my own experience as a data scientist

- "data science" covers a huge range of job duties and formal testing is less important in some of them (one-off analyses vs committing to production code base)

## How do you know if your code is correct??
- manual sanity checks
- defensive programming
- tests

## How do you know if your code is correct??
- manual sanity checks
- defensive programming: **assertions within the code**
- tests

In [1]:
# assertion example
def hello_to_all(list_of_names):
    assert len(list_of_names) > 0, 'There is no one here'
    print('Hello {}!'.format(', '.join(list_of_names)))

In [2]:
hello_to_all(['Parker', 'Missy', 'Ben'])

Hello Parker, Missy, Ben!


In [3]:
hello_to_all([])

AssertionError: There is no one here

Assertions are a careful data scientist's best friend. This is your middle ground of checking for expected behavior with extremely minimal effort! Check that you don't have any duplicated data, missing values, consistent dataframe shapes, column data types, etc. If you take nothing else away from this talk, start adding assertions within your code.

## Simple test example

In [4]:
def backwards_allcaps(text):
    return text[::-1].upper()

In [5]:
backwards_allcaps('PyData')

'ATADYP'

In [6]:
def test_backwards_allcaps():
    assert backwards_allcaps('python') == 'NOHTYP'
    assert backwards_allcaps('meetup') == 'PUTEEM'

## Types of tests

- unit tests
- integration tests
- system tests

## pytest is great

- less boilerplate $\rightarrow$ easier/faster test writing
- automatically handles finding, collecting, running, evaluating your tests
- when tests fail you can get a lot of useful info
- lots of powerful built in features
- just works (with benefits) on existing tests written for unittest or nose

`pip install pytest`

vs unittest requires tests to be wrapped inside classes which subclass from unittest.TestCase; pytest you just write functions with simple regular assert statements, which easier to read/write


## Where do tests go?

This is a typical directory layout, but its not required. You can tell pytest where to look for tests, so really you *can* put them pretty much anywhere.

```
myproject/
    myproject/
        myproject.py
        utils.py
        __init__.py
    tests/
        test_myproject.py
    setup.py
    README.md
    LICENSE.txt
```

pytest searches all directories below the current directory for files that start or end with "test" (`test_*.py`, `*_test.py`) and runs any functions and classes like `def test_the_things()` and `class TestStuff()`.

## When to write tests

- when you write new code
- when you make a change to your code
- when you find a bug

## Test Driven Development

TDD: write the test *before* you write the code it is testing.

- reasons to use TDD:
  - makes you define your code requirements up front
  - helps frame how you'll write the code
  - is more fun
- reasons to not use TDD:
  - not always possible to define requirements up front (data exploratoration)
  - time constraints (need quick bugfix)

# TDD Demo

### *New feature:* whitespace should be removed from input text

In [7]:
def backwards_allcaps(text):
    return text[::-1].upper()


def test_backwards_allcaps():
    assert backwards_allcaps('python') == 'NOHTYP'
    assert backwards_allcaps('meetup') == 'PUTEEM'

### *New feature:* whitespace should be removed from input text

In [8]:
def backwards_allcaps(text):
    return text[::-1].upper()


def test_backwards_allcaps():
    assert backwards_allcaps('python') == 'NOHTYP'
    assert backwards_allcaps('meetup') == 'PUTEEM'

### TDD:
1. add a test
2. run the test (it should fail)
3. add the feature
4. run the test

### *New feature:* whitespace should be removed from input text

In [9]:
def backwards_allcaps(text):
    return text[::-1].upper()


def test_backwards_allcaps():
    assert backwards_allcaps('python') == 'NOHTYP'
    assert backwards_allcaps('meetup') == 'PUTEEM'


def test_letters_only():
    assert backwards_allcaps('Salt Lake City') == 'YTICEKALTLAS'  # step 1

### *New feature:* whitespace should be removed from input text

In [10]:
def backwards_allcaps(text):
    return text[::-1].replace(' ', '').upper()            # step 2


def test_backwards_allcaps():
    assert backwards_allcaps('python') == 'NOHTYP'
    assert backwards_allcaps('meetup') == 'PUTEEM'


def test_letters_only():
    assert backwards_allcaps('Salt Lake City') == 'YTICEKALTLAS'  # step 1

### *Bugfix:* passing an empty string should raise an error

In [11]:
def backwards_allcaps(text):
    return text[::-1].replace(' ', '').upper()


def test_backwards_allcaps():
    assert backwards_allcaps('python') == 'NOHTYP'
    assert backwards_allcaps('meetup') == 'PUTEEM'


def test_letters_only():
    assert backwards_allcaps('Salt Lake City') == 'YTICEKALTLAS'

### *Bugfix:* passing an empty string should raise an error

In [12]:
import pytest


def backwards_allcaps(text):
    return text[::-1].replace(' ', '').upper()


def test_backwards_allcaps():
    assert backwards_allcaps('python') == 'NOHTYP'
    assert backwards_allcaps('meetup') == 'PUTEEM'


def test_letters_only():
    assert backwards_allcaps('Salt Lake City') == 'YTICEKALTLAS'


def test_bad_string():                                       # step 1
    with pytest.raises(AttributeError):
        backwards_allcaps('')

### *Bugfix:* passing an empty string should raise an error

In [13]:
import pytest


def backwards_allcaps(text):
    if len(text) == 0:
        raise AttributeError('String must contain letters')  # step 2
    return text[::-1].replace(' ', '').upper()


def test_backwards_allcaps():
    assert backwards_allcaps('python') == 'NOHTYP'
    assert backwards_allcaps('meetup') == 'PUTEEM'


def test_letters_only():
    assert backwards_allcaps('Salt Lake City') == 'YTICEKALTLAS'


def test_bad_string():                                       # step 1
    with pytest.raises(AttributeError):
        backwards_allcaps('')

Most of the tests we have written use the same pattern of asserting that our function run on some input gives us an expected output. Instead of reusing this same assert pattern repeatedly we would like a way to avoid duplicating code.

Additionally, each assert statement can be thought of as a unique test, but with the current set up pytest treats each test function as a unique test instead, which means that if the first assert statement in a test function fails, the assertions after it *are not run*.

As an example, run the tests again with the typo below introduced. Try using the verbose flag `pytest -v demo_tdd_intro.py`.

# Fixtures Demo

### Parametrize the data test cases with a fixture
- abstracts the data test cases away from the actual test itself
- allows all assertions to be checked and treated as separate tests

In [14]:
import pytest


def backwards_allcaps(text):
    text = text.replace(' ', '')
    if len(text) == 0:
        raise AttributeError('String must contain letters')
    return text[::-1].upper()


@pytest.fixture(params=[
    {'input': 'python', 'output': 'NOHTYP'},
    {'input': 'meetup', 'output': 'PUTEEM'},
    {'input': 'salt lake', 'output': 'EKALTLAS'}])
def test_data(request):
    return request.param


def test_backwards_allcaps(test_data):
    assert backwards_allcaps(test_data['input']) == test_data['output']


def test_bad_string():
    with pytest.raises(AttributeError):
        backwards_allcaps('')

### That's great, but these examples were dumb and don't apply to data science anyway

# Data Science Domain Problems

- dataframes are the input and output of your functions
- acceptable tolerances on results
- working with databases
- ML models with non-deterministic outcomes
- testing for properties of things rather than exact values

# Data Science Workflows

1. "One-off analysis"
2. Exploratory
3. Well defined problem

<ol start="4">
  <li>Legacy code</li>
</ol>

You inherit a large amount of legacy code written by a predecessor that will need to be maintained and potentially updated over time.

## Data Science Workflows
1. **"One-off analysis"** $\leftarrow$
2. Exploratory
3. Well defined problem
4. Legacy code

### #1 - is it really one-off? I usually do not write tests, but instead focus on clear documentation in case the analysis gets revisited. 

When it *does* get revisited, I'll consider breaking the code out of a notebook and into a module (possibly refactoring) and adding some tests.

## Data Science Workflows
1. "One-off analysis"
2. **Exploratory** $\leftarrow$
3. Well defined problem
4. Legacy code

### #2 - I don't write tests during the exploratory phase. However, if things go well there is almost always code created along the way which is useful in a later stage of the project.

Judgment call needed as my legacy code base grows...

## Data Science Workflows
1. "One-off analysis"
2. Exploratory
3. **Well defined problem**  $\leftarrow$
4. Legacy code

### #3 is simple - if I'm writing code for a fairly well defined problem, which I know will be re-used, I try very hard to write tests as develop the code.

## Data Science Workflows
1. "One-off analysis"
2. **Exploratory**  $\leftarrow$
3. Well defined problem
4. **Legacy code**  $\leftarrow$

### #4 - the legacy code scenario can be caused by #2 (sometimes #1) or by inheriting code from someone else. Once I realize I will need to reuse code, I try to start adding tests *when I modify it.*

Sometimes I'll have enough time, or deem it crucial enough, to work on getting more test coverage for its own sake. But generally, if I'm confident something is working now, I'll only bother to add tests when I modify it (adding features or fixing bugs).

# Data Science Domain Problems

Examples of tests for common data science problems

## Working with Pandas DataFrames

Checking for duplicates and missing values.

In [15]:
import pandas as pd
import numpy as np


df = pd.DataFrame({'channel': ['email', 'paid_search', 'display', 'email'],
                   'customer': [1, 4, 4, 3],
                   'order': [1010, 2050, 2050, 3232]})
df

Unnamed: 0,channel,customer,order
0,email,1,1010
1,paid_search,4,2050
2,display,4,2050
3,email,3,3232


In [16]:
assert df.notnull().all().all()
assert ~df.isnull().any().any()
assert df.isnull().sum().sum() == 0

## Working with Pandas DataFrames

Checking for duplicates and missing values.

In [17]:
df

Unnamed: 0,channel,customer,order
0,email,1,1010
1,paid_search,4,2050
2,display,4,2050
3,email,3,3232


In [18]:
assert ~df.duplicated().any()

In [19]:
if df.duplicated(subset=['order']).any():
    raise ValueError('Duplicate records found for order')

ValueError: Duplicate records found for order

## Working with Pandas DataFrames

Built in utilities that help you test.

In [20]:
from pandas.util.testing import assert_frame_equal
from pandas.util.testing import assert_index_equal
from pandas.util.testing import assert_series_equal

In [None]:
assert_frame_equal(df, df2,
                   check_like=True,       # order of columns/rows doesn't matter
                   check_dtype=False,     # check for identical data types
                   check_less_precise=4)  # number of digits to compare

## Working with Pandas DataFrames

Built in utilities that help you test.

In [21]:
df2 = df.copy()
df2 == df

Unnamed: 0,channel,customer,order
0,True,True,True
1,True,True,True
2,True,True,True
3,True,True,True


In [22]:
assert df2 == df

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

## Working with Pandas DataFrames

Built in utilities that help you test.

In [23]:
df2 = df.copy()
df2 == df

Unnamed: 0,channel,customer,order
0,True,True,True
1,True,True,True
2,True,True,True
3,True,True,True


In [24]:
assert_frame_equal(df, df2)

## Working with Pandas DataFrames

Built in utilities that help you test.

In [25]:
df2.order = df2.order.astype(float)
df2

Unnamed: 0,channel,customer,order
0,email,1,1010.0
1,paid_search,4,2050.0
2,display,4,2050.0
3,email,3,3232.0


In [26]:
assert_frame_equal(df, df2)

AssertionError: Attributes are different

Attribute "dtype" are different
[left]:  int64
[right]: float64

## Working with Pandas DataFrames

Built in utilities that help you test.

In [27]:
df2.order = df2.order.astype(float)
df2

Unnamed: 0,channel,customer,order
0,email,1,1010.0
1,paid_search,4,2050.0
2,display,4,2050.0
3,email,3,3232.0


In [28]:
assert_frame_equal(df, df2, check_dtype=False)

## Working with Pandas DataFrames

Built in utilities that help you test.

In [29]:
df2.loc[0, 'channel'] = np.nan
df1 = df2.copy()
df1

Unnamed: 0,channel,customer,order
0,,1,1010.0
1,paid_search,4,2050.0
2,display,4,2050.0
3,email,3,3232.0


In [30]:
df1 == df2

Unnamed: 0,channel,customer,order
0,False,True,True
1,True,True,True
2,True,True,True
3,True,True,True


## Working with Pandas DataFrames

Built in utilities that help you test.

In [31]:
df2.loc[0, 'channel'] = np.nan
df1 = df2.copy()
df1

Unnamed: 0,channel,customer,order
0,,1,1010.0
1,paid_search,4,2050.0
2,display,4,2050.0
3,email,3,3232.0


In [32]:
assert_frame_equal(df1, df2)  # handles NaN or None comparisons "as expected"

## Generating DataFrames for testing

In [None]:
from hypothesis.extra.pandas import data_frames

In [None]:
data_frames?

In [None]:
# insert examples here

# Working with Databases

## Testing a function that queries the DB

In [None]:
# my_data_loader.py

import pandas as pd
import query_database


def load_data(condition=''):
    sql_query = f'select id, type, val from some_table {condition}'
    df_raw = query_database(sql_query)
    df = pd.get_dummies(df_raw, columns=['type'])
    df.index = df.pop('id')
    return df

In [None]:
# test_data_loader.py

import pytest
import my_data_loader
from pandas.util.testing import assert_frame_equal


@pytest.fixture(params=[{'condition': 'where val > 100', 'output': out1}])
def sample_data(request):
    return request.param


def test_load_data(sample_data):
    # problem: we might not want to query the DB as part of our tests
    output = my_data_loader.load_data(sample_data['condition'])
    assert_frame_equal(output, sample_data['output'])

## mocker

pytest-mock is a plugin that lets you patch or swap out one piece of code for another

## Testing a function that queries the DB

In [None]:
# my_data_loader.py

import pandas as pd
import query_database


def load_data(condition=''):
    sql_query = f'select id, type, val from some_table {condition}'
    df_raw = query_database(sql_query)
    df = pd.get_dummies(df_raw, columns=['type'])
    df.index = df.pop('id')
    return df

In [None]:
# test_data_loader.py

import pytest
import my_data_loader
from pandas.util.testing import assert_frame_equal


@pytest.fixture(params=[{'input': in1, 'output': out1}])
def sample_data(request):
    return request.param


def test_load_data(sample_data, mocker):
    mocker.patch('my_data_loader.query_database',
                 side_effect=lambda x: sample_data['input'])
    output = my_data_loader.load_data('')
    assert_frame_equal(output, sample_data['output'])

## A few additional pytest features...

## tmpdir

- create temporary directories and files that your tests need
- automatically removed afterwards
- see also tmpdir_factory

In [None]:
# example from pytest documentation

def test_create_file(tmpdir):
    p = tmpdir.mkdir("sub").join("hello.txt")
    p.write("content")
    assert p.read() == "content"
    assert len(tmpdir.listdir()) == 1

## mark

you can organize your tests using marks - there are a few built in ones, and you can create your own

In [None]:
import pytest
import sys


@pytest.mark.skip
def test_always_skip():
    pass


@pytest.mark.skipif(sys.platform == 'darwin',
                    reason='Feature not supported on OS X')
def test_not_on_mac():
    pass


@pytest.mark.slow  # custom mark
def test_that_takes_a_long_time():
    pass

```sh
$ pytest -m slow
```

## More on fixtures

- we already used fixtures to parametrize our tests
- many other uses
- keeps execution of a test separate from anything else thats needed, so the test function itself is as simple and clear as possible
- can optionally have scope, so is only run once per session, module, etc

In [None]:
# example from pytest documentation

import smtplib
import pytest


@pytest.fixture(scope="module")
def smtp_connection(request):
    smtp_connection = smtplib.SMTP("smtp.gmail.com", 587, timeout=5)

    def fin():
        # teardown smtp_connection
        smtp_connection.close()

    request.addfinalizer(fin)
    return smtp_connection  # provide the fixture value

## parametrize

In [None]:
# example from pytest documentation

import pytest


@pytest.mark.parametrize("x, y", [(0, 1), (2, 3)])
def test_foo(x, y):
    assert x + 1 == y


@pytest.mark.parametrize("x", [0, 1])
@pytest.mark.parametrize("y", [2, 3])
def test_more_foo(x, y):
    pass  # all combinations of x, y

## Wrap up

- data scientists should not *always* write tests
- any reused or shared piece of code should probably be tested
- strive for a balance between speed and confidence in your results

### Some aspects of data science code are really hard to test!

- ML results? probabilistic outcomes?
- Think about testing properties of your data
  - distributions, missing data, expected features and datatypes

## Cool related projects to be aware of

- [pytest-xdist](https://docs.pytest.org/en/3.0.1/xdist.html) plugin for pytest so you can run tests faster in parallel
- [pytest-cov](https://pytest-cov.readthedocs.io/en/latest/index.html) plugin for measuring test coverage
- [engarde](https://engarde.readthedocs.io/en/latest/index.html) for defensive data analysis with pandas
- [Hypothesis](https://hypothesis.readthedocs.io/en/latest/) for property-based testing and testing your code with many inputs and edge cases

## Resources & Credits

- **General testing resources**
  - Andreas Pelme's [Introduction to pytest](https://www.youtube.com/watch?v=LdVJj65ikRY) from EuroPython 2014
  - Mark Vousden's [Python testing](https://www.youtube.com/channel/UCKaKhMyhboLoMwmeF9yxg9w) 3-part series of youtube videos
  - Justin Crown's ["WHAT IS THIS MESS?" - Writing tests for pre-existing code bases](https://www.youtube.com/watch?v=LDdUuoI_lIg) from PyCon 2018
  - Ned Batchelder's [Getting Started Testing](https://www.youtube.com/watch?v=FxSsnHeWQBY) from PyCon 2014 (focuses on unittest)

- **Data Science specific resources**
  - Trey Causey's [Testing for Data Scientists](https://www.youtube.com/watch?v=GEqM9uJi64Q) from PyData Seattle 2015
  - Eric Ma's [Best Testing Practice's for Data Science Tutorial](https://www.youtube.com/watch?v=yACtdj1_IxE) from PyCon 2017, with GitHub notebooks [here](https://github.com/ericmjl/data-testing-tutorial)

## Questions?