# Unit Testing for Data Science in Python

Every data science project needs unit testing. It comes with huge benefits - saving a lot of development and maintenance time, improving documentation, increasing end-user trust and reducing downtime of productive systems. As a result, unit testing has become a must-have skill in the industry, used by almost every company. This course teaches unit testing in Python using the most popular testing framework pytest. By the end of this course, you will have written a complete test suite for a data science project. In the process, you will learn to write unit tests for data preprocessors, models and visualizations, interpret test results and fix any buggy code. You will also learn advanced concepts like TDD, test organization, fixtures and mocking so that you can test your own data science projects properly.

## 1. Unit testing basics

In this chapter, you will get introduced to the pytest package and use it to write simple unit tests. You'll run the tests, interpret the test result reports and fix bugs. Throughout the chapter, we will use examples exclusively from the data preprocessing module of a linear regression project, making sure you learn unit testing in the context of data science.

    1.1 Why unit test?
    1.2 How frequently is a function tested?
    1.3 Manual testing
    1.4 Write a simple unit test using pytest
    1.5 Your first unit test using pytest
    1.6 Running unit tests
    1.7 Understanding test result report
    1.8 What causes a unit test to fail?
    1.9 Spotting and fixing bugs
    1.10 More benefits and test types
    1.11 Benefits of unit testing
    1.12 Unit tests as documentation

## 2. Intermediate unit testing

In this chapter, you will write more advanced unit tests. Starting from testing complicated data types like NumPy arrays to testing exception handling, you'll do it all. Once you have mastered the science of testing, we will also focus on the arts. For example, we will learn how to find the balance between writing too many tests and too few tests. In the last lesson, you will get introduced to a radically new programming methodology called Test Driven Development (TDD) and put it to practice. This might actually change the way you code forever!

    2.1 Mastering assert statements
    2.2 Write an informative test failure message
    2.3 Testing float return values
    2.4 Testing with multiple assert statements
    2.5 Testing for exceptions instead of return values
    2.6 Practice the context manager
    2.7 Unit test a ValueError
    2.8 The well tested function
    2.9 Testing well: Boundary values
    2.10 Testing well: Values triggering special logic
    2.11 Testing well: Normal arguments
    2.12 Test Driven Development (TDD)
    2.13 TDD: Tests for normal arguments
    2.14 TDD: Requirement collection
    2.15 TDD: Implement the function

## 3. Test Organization and Execution

In any data science project, you quickly reach a point when it becomes impossible to organize and manage unit tests. In this chapter, we will learn about how to structure your test suite well, how to effortlessly execute any subset of tests and how to mark problematic tests so that your test suite always stays green. The last lesson will even enable you to add the trust-inspiring build status and code coverage badges to your own project. Complete this chapter and become a unit testing wizard!

    3.1 How to organize a growing set of tests?
    3.2 Place test modules at the correct location
    3.3 Create a test class
    3.4 Mastering test execution
    3.5 One command to run them all
    3.6 Running test classes
    3.7 Expected failures and conditional skipping
    3.8 Mark a test class as expected to fail
    3.9 Mark a test as conditionally skipped
    3.10 Reasoning in the test result report
    3.11 Continuous integration and code coverage
    3.12 Build failing
    3.13 What does code coverage mean?

## 4. Testing Models, Plots and Much More

In this chapter, You will pick up advanced unit testing skills like setup, teardown and mocking. You will also learn how to write sanity tests for your data science models and how to test matplotlib plots. By the end of this chapter, you will be ready to test real world data science projects!

    4.1 Beyond assertion: setup and teardown
    4.2 Use a fixture for a clean data file
    4.3 Write a fixture for an empty data file
    4.4 Fixture chaining using tmpdir
    4.5 Mocking
    4.6 Program a bug-free dependency
    4.7 Mock a dependency
    4.8 Testing models
    4.9 Testing on linear data
    4.10 Testing on circular data
    4.11 Testing plots
    4.12 Generate the baseline image
    4.13 Run the tests for the plotting function
    4.14 Fix the plotting function
    4.15 Congratulations

# Aditional material

- Datacamp course: https://learn.datacamp.com/courses/unit-testing-for-data-science-in-python
- Project https://github.com/gutfeeling/univariate-linear-regression

(From https://travis-ci.com/github/gutfeeling/univariate-linear-regression) --> Review .travis.yml file
>git clone https://github.com/gutfeeling/univariate-linear-regression.git<br>
>
>cd univariate_linear_regression
>
> Modify setup.py<br>
> Change the install_requires to (with out version):<br>
> <code>["jupyter", "matplotlib", "numpy","pytest", "pytest-mpl", "pytest-mock", "scipy",]</code><br>
>
>pip install -e .<br>
>
>pip install codecov pytest-cov
>
>pytest --cov=src tests
>
>codecov

- Local instalation dir: <a href=C:\Anaconda3\envs\datascience\Lib\site-packages>C:\Anaconda3\envs\datascience\Lib\site-packages</a>**

- To create an egg: https://python101.pythonlibrary.org/chapter38_eggs.html 

**Creating an egg**
>You can think of an egg as just an alternative to a source distribution or Windows executable, but it should be noted that for pure Python eggs, the egg file is completely cross-platform. We will take a look at how to create our own egg using the package we created in a previous modules and packages chapter. To get started creating an egg, you will need to create a new folder and put the mymath folder inside it. Then create a setup.py file in the parent directory to mymath with the following contents:

<code>
from setuptools import setup, find_packages
setup(
    name = "mymath",
    version = "0.1",
    packages = find_packages()
    )
</code>

Python has its own package for creating distributions that is called distutils. However instead of using Python’s distutils’ setup function, we’re using setuptools’ setup. We’re also using setuptools’ find_packages function which will automatically look for any packages in the current directory and add them to the egg. To create said egg, you’ll need to run the following from the command line:

<code>
python.exe setup.py bdist_egg
</code>