
> Testing

*This notebook is just for learning some key concepts about testing with python from a data scientist perspective.*

!["One does not simply test in production"](https://www.flagship.io/wp-content/uploads/meme-one-does-not-simply-test-in-production-768x453.jpg)

*In fact, there are some cools talks available on the Internet that I found out and inspired me to create this notebook.*




# Libraries for testing in Python.

In [34]:
!pip -q install engarde
!pip -q install pytest
!pip -q install hypothesis

In [61]:
!pip -q install bulwark

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for bulwark (setup.py) ... [?25l[?25hdone


# Introduction
![Tests everywhere](https://mailtrap.io/wp-content/uploads/2020/06/testing_meme5.jpeg)



## Why test?
* Best way we know to figure out the code works.
* Testing helps you find bugs, check your assumptions.
* Tests help other people have confidence because tests need to be **automated, fast, reliable, informative and focused**.
* Testing helps to write simpler code because is a way to write really good code.
* Debugging is hard, testing is easy.



## When and what to test?
* When you change code, add a test.
* Test the outcome, not the implementation
* When you find a bug, add a test.
* Help identify complexity.
* Don't test code that's already tested such as code from libraries already tested.


## Types of tests
* **Unit tests**: Test one unit of code, a function that has no dependencies on other code you have written.
* **Regression tests**: Tests to validate  a bug you fixed is not failing anymore.
* **Integration tests**: Tests to validate components are working well together.






## Test isolation
* Keep the test independent of each other. 
* Every test gets a new test object. 
* Tests can't affect each other and Failure doesn't stop next tests.

## Test-driven development?
* Write failing test first, fix code until test pass.




## What does Testing mean for data scientist?

![Test won't fail if you don't write tests](https://i1.wp.com/mlinproduction.com/wp-content/uploads/2020/05/tests-cant-fail.png?w=700&ssl=1)


* Testing for data science can be a little different because a lot of time deterministic answers may not exist for your problem necessaryly. You get probabilistic answers but the test pass because you write code in order to the tests pass.

* Better ways to test could be test properties, not specific values, make assumptions about data shape and type, test probabilistically
 


#  Frameworks for testing


If we want to be more robust we can use some frameworks for testing.

## [Unittest](https://docs.python.org/3/library/unittest.html)

* The unittest unit testing framework was originally inspired by JUnit and has a similar flavor as major unit testing frameworks in other languages.
* Instead of using **asset** we can use **asset helpers**.This methods print the value expected in the message when a test fails.

| Lots of assert helpers | |
| ------------------------| |
| assertEqual(first,second) |assertNotEqual(first,second)
| assertTrue(expr)|assertFalse(expr) |
| assertIn(first,second) | assertNotIn(first,second)|
| assertIn(first,second) | assertNotIn(first,second)|
| assertIs(first,second) | assertIsNot(first,second)|
| assertAlmostEqual(first,second) | assertGreater(first,second)|
| assertLess(first,second) | assertRaises(exc_class,func,...)|
| assertItemsEqual(seq1,seq2)| etc |


In [64]:
# portfolio.py
class Portofolio(object):
  """ A simple stock portfolio"""
  def __init__(self):
    self.stocks=[]

  def buy(self, name, shares,price):
    """ Buy 'name': shares at price. """
    self.stocks.append([name,shares,price])

  def cost(self):
    """ What was the total cost of this portfolio """
    amt =0.0
    for name, shares, price in self.stocks:
      amt += shares* price
    return amt

In [66]:
import unittest

# test_portfolio.py
class PortfolioTest(unittest.TestCase):
  def test_empty(self):
    p=Portofolio()
    # assert p.cost() == 0.0
    self.assertEqual(p.cost() == 0.0)

  def test_buy_one_stock(self):
    p=Portofolio()
    p.buy("IBM", 100,176.48)
    self.assertEqual(p.cost() == 17648.0)

  def test_buy_two_stocks(self):
    p=Portofolio()
    p.buy("IBM", 100,176.48)
    p.buy("HPQ", 100,36.15)
    self.assertEqual(p.cost() == 21263.0)

# Execute
#$python -m unitttest test_portfolio

You can implement your own base class.

In [None]:
# test_portfolio2.py
class PortfolioTestCase(unittest.TestCase):
  def assertCostEqual(self,p,cost):
    self.assertEqual(p.cost() == cost)

class PortfolioTest(PortfolioTestCase):
  def test_empty(self):
    p=Portofolio()
    self.assertCostEqual(p,0.0)

  def test_buy_one_stock(self):
    p=Portofolio()
    p.buy("IBM", 100,176.48)
    self.assertCostEqual(p, 17648.0)

  def test_buy_two_stocks(self):
    p=Portofolio()
    p.buy("IBM", 100,176.48)
    p.buy("HPQ", 100,36.15)
    self.assertCostEqual(p,21263.0)


## [Py.test](https://docs.pytest.org/en/7.2.x/)

* Less boilerplate, 
* Fewer classes
* Gets your testing quickly
* Easy to interpret errors


In [2]:
# unit code
def mean(values):
  """ Calculate the mean"""
  return sum(values)/ len(values)

# unit test implemented with pytest
import pytest
def test_mean():
  assert(mean([1,2,3,4,5]) ==3)

## [Engarde](https://github.com/engarde-dev/engarde)
* For "defensive" data analysis when data are messy. 
* Great for ETL on changing data.

In [4]:
import pandas as pd
import numpy as np
from engarde.decorators import none_missing,unique_index, is_shape

# Test
@is_shape((3, 2))
@none_missing()
def test_nan_and_shape(df):
  return df

In [5]:
# Example OK: The test should pass because there isn't any nan value
d = {'name': ['Mary', 'Paul','James'], 'age': [18, 19,20]}
df_OK = pd.DataFrame(data=d)

test_nan_and_shape(df_OK)

Unnamed: 0,name,age
0,Mary,18
1,Paul,19
2,James,20


In [None]:
# Example KO: The test should fail because there is a nan value
d = {'name': ['Mary', 'Paul','James'], 'age': [18, 19,np.nan]}
df_KO = pd.DataFrame(data=d)

test_nan_and_shape(df_KO)

## [Bulwark](https://github.com/zaxr/bulwark)

Bulwark is a package for convenient property-based testing of pandas dataframes.  Bulwark's goal is to let you check that your data meets your assumptions of what it should look like at any (and every) step in your code, without making you work too hard.


In [62]:
import bulwark.checks as ck
import bulwark.decorators as dc
import numpy as np
import pandas as pd

def len_longer_than(df, l):
  if len(df) <= l:
    raise AssertionError("df is not as long as expected.")
  return df

@dc.CustomCheck(len_longer_than, 10, enabled=False)
def append_a_df(df, df2):
  return df.append(df2, ignore_index=True)

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df2 = pd.DataFrame({"a": [1, np.nan, 3, 4], "b": [4, 5, 6, 7]})

append_a_df(df, df2)  # doesn't fail because the check is disabled

Unnamed: 0,a,b
0,1.0,4
1,2.0,5
2,3.0,6
3,1.0,4
4,,5
5,3.0,6
6,4.0,7


## [Hypothesis](https://hypothesis.readthedocs.io/en/latest/)
* Property-base testing inspired by Haskell's Quickcheck.
* We generate data randomly according to some specs.
* Ideal for code that will be accepting input from the wild.
* Work with existing testing frameworks like pytest 
* Work with Faker

In [48]:
from hypothesis import given
import hypothesis.strategies as st

@given(st.integers(), st.integers())
def test_ints_are_commutative(x, y):
    assert x + y == y + x

test_ints_are_commutative()

In [60]:
from hypothesis import given
import hypothesis.strategies as st

@given(st.lists(st.integers()))
def test_mean(values):
  print(values)
  assert mean(values) == sum(values)/len(values)

# The test mean fails because the function mean hasn't added the case for empty values.
test_mean()

## [Feature Forge](https://github.com/machinalis/featureforge)

This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn.

* Defining and documenting features
* Testing your features against specified cases and against randomly generated cases (stress-testing). This helps you making your application more robust against invalid/misformatted input data. This also helps you checking that low-relevance results when doing feature analysis is actually because the feature is bad, and not because there's a slight bug in your feature code.
* Evaluating your features on a data set, producing a feature evaluation matrix. The evaluator has a robust mode that allows you some tolerance both for invalid data and buggy features.
* Experimentation: running, registering, classifying and reproducing experiments for determining best settings for your problems.


# Mocks
We can use [py.test](https://docs.pytest.org/en/7.2.x/) , unittest, unittest2, nose, fixture


# Probabilistic testing


# Resources
* [PyCon Ned Batchelder: Getting started Testing](https://www.youtube.com/watch?v=FxSsnHeWQBY)
*  [PyData Hanna Torrence: Unit testing for Datascientis](https://www.youtube.com/watch?v=Da-FL_1i6ps)
* [PyData Trey Causey: Testing for Data Scientists](https://www.youtube.com/watch?v=GEqM9uJi64Q)
* [Towards Data Science: PyTest with mocking and fixtures](https://towardsdatascience.com/pytest-with-marking-mocking-and-fixtures-in-10-minutes-678d7ccd2f70)

* [PyData Github](https://github.com/PyData)
* [PyCon Github](https://github.com/PyCon)
