



!["One does not simply test in production"](https://www.flagship.io/wp-content/uploads/meme-one-does-not-simply-test-in-production-768x453.jpg)


*This notebook is just for learning some key concepts about testing with python from a data scientist perspective.*

*In fact, there are some cools talks available on the Internet*. See resources section at the bottom.




# Libraries for testing in Python.

These are all of the testing libraries we are going to use in this notebook.

In [47]:
!pip -q install engarde
!pip -q install pytest
!pip -q install hypothesis
!pip -q install bulwark
!pip -q install pytest-mock

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/317.1 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.1/317.1 KB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25h

These are other libraries we are going to use but they are not for testing.

In [22]:
!pip -q install pyspark

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


In [38]:
import pandas as pd
import numpy as np
import pyspark
import time
import requests
import sqlite3

# <font color=orange>Introduction</font>

Learning to write good tests is an investment, you gain the benefit over time. 


## Why test?
* Best way we know to figure out the code works.
* Testing helps you find bugs earlier.
* Well-tested helps you iterate faster.
* Well-tested code helps you to design better code.
* Tests check your assumptions.
* Tests help other people have confidence because tests need to be **automated, fast, reliable, informative and focused**.
* Testing helps to write simpler code because is a way to write really good code.
* Debugging is hard, testing is easy.



## When and what to test?
* When you change code, add a test. Don't try to write all the test at once.
* Test the outcome, not the implementation
* When you find a bug, add a test.
* Help identify complexity. Write test as early as they can be valuable.
* Don't test code that's already tested such as code from libraries already tested.


## Types of tests
* **Unit tests**: Test one unit of code, a function that has no dependencies on other code you have written.
* **Regression tests**: Tests to validate  a bug you fixed is not failing anymore.
* **Integration tests**: Tests to validate components are working well together. For example, Integration testing with databases is one of the most vital, yet commonly overlooked part of any software development process






## Test isolation
* Keep the test independent of each other. 
* Every test gets a new test object. 
* Tests can't affect each other and Failure doesn't stop next tests.

## Test-driven development?
* Write failing test first, fix code until test pass.




## What does Testing mean for data scientist?




* Testing for data science can be a little different because a lot of time deterministic answers may not exist for your problem necessaryly. You get probabilistic answers but the test pass because you write code in order to the tests pass.

* Better ways to test could be test properties, not specific values, make assumptions about data shape and type, test probabilistically
 


# <font color=orange>Frameworks for testing</font>

If we want to be more robust we can use some frameworks for testing.

## [Unittest](https://docs.python.org/3/library/unittest.html)

* The unittest unit testing framework was originally inspired by JUnit and has a similar flavor as major unit testing frameworks in other languages.
* Instead of using **asset** we can use **asset helpers**.This methods print the value expected in the message when a test fails.

| Lots of assert helpers | |
| ------------------------| |
| assertEqual(first,second) |assertNotEqual(first,second)
| assertTrue(expr)|assertFalse(expr) |
| assertIn(first,second) | assertNotIn(first,second)|
| assertIn(first,second) | assertNotIn(first,second)|
| assertIs(first,second) | assertIsNot(first,second)|
| assertAlmostEqual(first,second) | assertGreater(first,second)|
| assertLess(first,second) | assertRaises(exc_class,func,...)|
| assertItemsEqual(seq1,seq2)| etc |


In [6]:
import unittest


In [None]:
# portfolio.py
class Portofolio(object):
  """ A simple stock portfolio"""
  def __init__(self):
    self.stocks=[]

  def buy(self, name, shares,price):
    """ Buy 'name': shares at price. """
    self.stocks.append([name,shares,price])

  def cost(self):
    """ What was the total cost of this portfolio """
    amt =0.0
    for name, shares, price in self.stocks:
      amt += shares* price
    return amt

In [None]:

# test_portfolio.py
class PortfolioTest(unittest.TestCase):
  def test_empty(self):
    p=Portofolio()
    # assert p.cost() == 0.0
    self.assertEqual(p.cost() == 0.0)

  def test_buy_one_stock(self):
    p=Portofolio()
    p.buy("IBM", 100,176.48)
    self.assertEqual(p.cost() == 17648.0)

  def test_buy_two_stocks(self):
    p=Portofolio()
    p.buy("IBM", 100,176.48)
    p.buy("HPQ", 100,36.15)
    self.assertEqual(p.cost() == 21263.0)

# Execute
#$python -m unitttest test_portfolio

You can implement your own base class.

In [None]:
# test_portfolio2.py
class PortfolioTestCase(unittest.TestCase):
  def assertCostEqual(self,p,cost):
    self.assertEqual(p.cost() == cost)

class PortfolioTest(PortfolioTestCase):
  def test_empty(self):
    p=Portofolio()
    self.assertCostEqual(p,0.0)

  def test_buy_one_stock(self):
    p=Portofolio()
    p.buy("IBM", 100,176.48)
    self.assertCostEqual(p, 17648.0)

  def test_buy_two_stocks(self):
    p=Portofolio()
    p.buy("IBM", 100,176.48)
    p.buy("HPQ", 100,36.15)
    self.assertCostEqual(p,21263.0)


## [Py.test](https://docs.pytest.org/en/7.2.x/)

In a project we see the **conftest.py** with fixtures and **pytest.ini** for configuration. Testing starts from given files/dirs or the current directory.Pytest walks over the filesytem and discover tests_*.py test files, test_ functions and Test clases. 

* Less boilerplate
* Highly configurable 
* Fewer classes
* Gets your testing quickly
* Easy to interpret errors

| Useful Options for Pytest | 
| ------------------------|
| -s print all string output|
| -v print names of individual test as the runall|
| -x stop at first failure|
| -k only run tests matching following keywords|
| --pdb start Python debugger on errors|
| --fixtures to see available fixtures|

### How to write a test




In [7]:
import pytest

In [8]:
# unit code
def mean(values):
  """ Calculate the mean"""
  return sum(values)/ len(values)

# unit test implemented with pytest

def test_mean():
  assert(mean([1,2,3,4,5]) ==3)

In [9]:
#unit code
def add_col(df,new_col_name, default_value):
  """ Add a new column with a default value """
  df[new_col_name] = default_value
  return df

# unit test
def test_add_col_passes():
  # setup
  df = pd.Dataframe({
      'col_a': ['a','a','a'],
      'col_b': ['b','b','b'],
      'col_c': ['c','c','c'],
  })
  # call function
  actual = add_col(df,'col_d','d')
  
  #set expectations
  expected = pd.testing.assert_frame_equal(actual,expected)


In [27]:
# unit code
def divide(x,y):
  return x/y

# unit test for checking an exception is raised
def test_raises():
  with pytest.raises(ZeroDivisionError):
    divide(3,0)

### How to run tests
`pytest name_file.py`

### [Fixtures](https://docs.pytest.org/en/6.2.x/fixture.html)
* Special functions pytest keeps track of to safely share resources and/or resource definitions.
* A modular approach to setup and teardown methods.
* We create a **conftest.py** file with fitxtures.
* We can create **parametrize fixtures** and **compose fixtures**.
* Fixtures are not imported are autodiscovered

In [28]:
#conftest.py

@pytest.fixture() # decorators tells pytest this is a fixture
def df():
  return pd.Dataframe({
      'col_a': ['a','a','a'],
      'col_b': ['b','b','b'],
      'col_c': ['c','c','c'],
  })

@pytest.fixture()
def df_with_column_d():
  return pd.Dataframe({
      'col_a': ['a','a','a'],
      'col_b': ['b','b','b'],
      'col_c': ['c','c','c'],
      'col_d': ['d','d','d'],
  })

@pytest.fixture()
def somevalue(): 
  return 42

Let's refactor the previous test with fixtures

In [None]:
# test with fixtures
def test_add_col_passes_with_fixtures(df,df_with_column_d):
  actual = add_col(df,'col_d','d')
  expected = df_with_column_d
  pd.testing.assert_frame_equal(actual,expected)


Another example to see how we can use fixtures in tests

In [30]:
@pytest.fixture(scope='function') # Fixture function
def fix(): 
  time.sleep(1)
  return 1

# Test
def test_func(fix):  # the parameter is the name of the fixture
  assert fix == 1

#### *Compose Fixtures*
Let's create a fixture that depends on the first

In [24]:
from pyspark.sql import SparkSession

@pytest.fixture(scope='session') # Fixture session
def spark(request):
  spark = SparkSession.builder \
    .appName("Word Count") \
    .master("local[2]") \
    .getOrCreate()

# Define a new fixture that depends on the first
@pytest.fixture()  
def spark_df(spark):
  return spark.createDataFrame(
      [
          ('a','b','c','d'),
          ('a','b','c','d'),
      ],
      ['col_a','col_b','col_c','col_d']
  )

#### *Parametrized Fixtures*

In [36]:
@pytest.fixture(params=[10,20])
def answer1(request):
  return 5 * request.param

@pytest.fixture(params=[2,4])
def answer2(answer1,request):
  return answer1 * request.param

# Python runs 4 test with all of the combinations [2-10],[2-20],[4-10],[4-20]
def test_answer(answer2):
  print(answer2)

A fixture to provide a text file to tests

In [37]:
import tempfile
@pytest.fixture
def input_file(tmp_path):
  path = tmp_path  # ex: "input.txt"
  path.write_text("Hello world")
  return path

def test_things(input_file):
  assert False, str(input_file)

#### [Fixture for databases](https://medium.com/@geoffreykoh/fun-with-fixtures-for-database-applications-8253eaf1a6d)

if you want to prepare the test environment before any test is run, a fixture can tak autouse=True. The test db will be created ahead of the first test.  

In [40]:
@pytest.fixture(scope='session', autouse=True)
def setup_database():
    """ Fixture to set up the in-memory database with test data """
    conn = sqlite3.connect(':memory:')
    cursor = conn.cursor()
    cursor.execute('''
	    CREATE TABLE stocks
        (date text, trans text, symbol text, qty real, price real)''')
    sample_data = [
        ('2020-01-01', 'BUY', 'IBM', 1000, 45.0),
        ('2020-01-01', 'SELL', 'GOOG', 40, 123.0),
    ]
    cursor.executemany('INSERT INTO stocks VALUES(?, ?, ?, ?, ?)', sample_data)
    yield conn

# Test to make sure that there are 2 items in the database
def test_connection(setup_database):
    cursor = setup_database
    assert len(list(cursor.execute('SELECT * FROM stocks'))) == 2


### Mocks
Replacing and object with a **mock** allows you to avoid external dependencies. We can mock:
* Data reads or writes
* API calls
* External functions you don't want to test



In [44]:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

@pytest.fixture(scope='function')
def genetate_features():

    engine = create_engine('fake_connection_string')
    df = pd.read_sql('SELECT col1, col2 FROM data_table;', con=engine)
    # ... processing on df ...
    return features

@mock.path()
def test_generate_features(read_sql_mock,engine_mock, db_creds,df):
  read_sql_mock.return_value=df
  actual_features_=


## [Engarde](https://github.com/engarde-dev/engarde)
* For "defensive" data analysis when data are messy. 
* Great for ETL on changing data.

In [None]:
from engarde.decorators import none_missing,unique_index, is_shape

# Test
@is_shape((3, 2))
@none_missing()
def test_nan_and_shape(df):
  return df

In [None]:
# Example OK: The test should pass because there isn't any nan value
d = {'name': ['Mary', 'Paul','James'], 'age': [18, 19,20]}
df_OK = pd.DataFrame(data=d)

test_nan_and_shape(df_OK)

Unnamed: 0,name,age
0,Mary,18
1,Paul,19
2,James,20


In [None]:
# Example KO: The test should fail because there is a nan value
d = {'name': ['Mary', 'Paul','James'], 'age': [18, 19,np.nan]}
df_KO = pd.DataFrame(data=d)

test_nan_and_shape(df_KO)

## [Bulwark](https://github.com/zaxr/bulwark)

Bulwark is a package for convenient property-based testing of pandas dataframes.  Bulwark's goal is to let you check that your data meets your assumptions of what it should look like at any (and every) step in your code, without making you work too hard.


In [None]:
import bulwark.checks as ck
import bulwark.decorators as dc

def len_longer_than(df, l):
  if len(df) <= l:
    raise AssertionError("df is not as long as expected.")
  return df

@dc.CustomCheck(len_longer_than, 10, enabled=False)
def append_a_df(df, df2):
  return df.append(df2, ignore_index=True)

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df2 = pd.DataFrame({"a": [1, np.nan, 3, 4], "b": [4, 5, 6, 7]})

append_a_df(df, df2)  # doesn't fail because the check is disabled

Unnamed: 0,a,b
0,1.0,4
1,2.0,5
2,3.0,6
3,1.0,4
4,,5
5,3.0,6
6,4.0,7


## [Hypothesis](https://hypothesis.readthedocs.io/en/latest/)
* Property-base testing inspired by Haskell's Quickcheck.
* We generate data randomly according to some specs.
* Ideal for code that will be accepting input from the wild.
* Work with existing testing frameworks like pytest 
* Work with Faker

In [None]:
from hypothesis import given
import hypothesis.strategies as st

@given(st.integers(), st.integers())
def test_ints_are_commutative(x, y):
    assert x + y == y + x

test_ints_are_commutative()

In [None]:
from hypothesis import given
import hypothesis.strategies as st

@given(st.lists(st.integers()))
def test_mean(values):
  print(values)
  assert mean(values) == sum(values)/len(values)

# The test mean fails because the function mean hasn't added the case for empty values.
test_mean()

## [Feature Forge](https://github.com/machinalis/featureforge)

This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn.

* Defining and documenting features
* Testing your features against specified cases and against randomly generated cases (stress-testing). This helps you making your application more robust against invalid/misformatted input data. This also helps you checking that low-relevance results when doing feature analysis is actually because the feature is bad, and not because there's a slight bug in your feature code.
* Evaluating your features on a data set, producing a feature evaluation matrix. The evaluator has a robust mode that allows you some tolerance both for invalid data and buggy features.
* Experimentation: running, registering, classifying and reproducing experiments for determining best settings for your problems.

# <font color=orange>Cool talks </font>	😄
Most examples are explained in these awesome talks. 
* [PyCon Ned Batchelder: Getting started Testing](https://www.youtube.com/watch?v=FxSsnHeWQBY)
*  [PyData Hanna Torrence: Unit testing for Datascientis](https://www.youtube.com/watch?v=Da-FL_1i6ps)
* [PyData Trey Causey: Testing for Data Scientists](https://www.youtube.com/watch?v=GEqM9uJi64Q)
* [PyConDE Floriah Bruhin: Pytest - simple, rapid and fun testing with Python](https://www.youtube.com/watch?v=CMuSn9cofbI)



# More Resources
* [This pytest plugin provides a mocker fixture](https://pytest-mock.readthedocs.io/en/latest/)
* [PyTest Talks and Tutorials](https://docs.pytest.org/en/6.2.x/talks.html?highlight=mock)
* [PyData Github](https://github.com/PyData)
* [PyCon Github](https://github.com/PyCon)
* [Towards Data Science: PyTest with mocking and fixtures](https://towardsdatascience.com/pytest-with-marking-mocking-and-fixtures-in-10-minutes-678d7ccd2f70)


* https://www.inspiredpython.com/course/testing-with-hypothesis/testing-your-python-code-with-hypothesis
* https://medium.com/@rinu.gour123/unit-testing-with-python-unittest-ad045671010
* https://towardsdatascience.com/pytest-with-marking-mocking-and-fixtures-in-10-minutes-678d7ccd2f70
* https://www.softwaretestinghelp.com/python-testing-frameworks/
* https://medium.com/@arnabroyy/best-python-testing-frameworks-bb7ab1b3d366
* https://mlinproduction.com/testing-machine-learning-models-deployment-series-07/

* [How and why to test Data Pipelines](https://www.youtube.com/watch?v=JYAcKSkCl8w)