In [1]:
# Import libraries
import expectexception  #%%expect_exception TypeError
import pytest
import os

# 3. Test Organization and Execution

In any data science project, you quickly reach a point when it becomes impossible to organize and manage unit tests. In this chapter, we will learn about how to structure your test suite well, how to effortlessly execute any subset of tests and how to mark problematic tests so that your test suite always stays green. The last lesson will even enable you to add the trust-inspiring build status and code coverage badges to your own project. Complete this chapter and become a unit testing wizard!

# <font color=darkred>3.1 How to organize a growing set of tests?</font>

1. How to organize a growing set of tests?
>Congratulations on finishing Chapter 2!

2. What you've done so far
>You wrote about 16 unit tests for the functions row_to_list(), convert_to_int(), get_data_as_numpy_array() and split_into_training_and_testing_sets() in Chapter 1 and 2. Well done!

3. What you've done so far
>As the number of

4. What you've done so far
>unit tests

5. What you've done so far
>keep growing

6. Need a strategy to organize tests
>we would need a strategy to keep all these tests organized. Otherwise, we risk our unit tests looking like

7. Need a strategy to organize tests
>this clothes cabinet. But don't worry, this lesson is going to provide us with a strategy.

8. Project structure
>Assume that the four functions that you tested are present in the following project structure. There's a top level directory called src, which holds all application code.

9. Project structure
>Inside, there's the data package. This package deals with functions that preprocess data.

10. Project structure
>It has a Python module called preprocessing_helpers.py, containing the functions row_to_list() and convert_to_int().

11. Project structure
>Then there's the features package, which deals with extracting features from the preprocessed data.

12. Project structure
>It has a module called as_numpy.py, containing the function get_data_as_numpy_array().

13. Project structure
>Finally, there's the models package, which deals with training and testing the linear regression model.

14. Project structure
>It has a module called train.py. So far, it only contains one function split_into_training_and_testing_sets().

15. The tests folder
>The developers of pytest recommend that we create a directory called tests at the same level as src. This directory is also called the test suite.

16. The tests folder mirrors the application folder
>Inside this folder, we simply mirror the inner structure of src and create empty packages called data, features and models respectively.

17. Python module and test module correspondence
>The general rule is that for each python module my_module.py, there should be a corresponding test module called test_my_module.py. For example, for the module preprocessing_helpers.py, we create a test module called test_preprocessing_helpers.py. Since preprocessing_helpers.py belongs to the data package, we put the corresponding test module in the mirrored package inside the tests directory. The mirroring in the directory structure and test module names ensure that if we know where to find application code, we can follow the same route inside the test directory to access corresponding tests. The test module test_preprocessing_helpers.py should contain tests for row_to_list() and convert_to_int().

18. Structuring tests inside test modules
>We could just put the tests sequentially like this, but this is an organizational nightmare, because there's no way to tell where the tests for one function ends and another function begins.

19. Test class
>pytest solves this problem using a construct called the test class.

20. Test class is a container for a single unit's tests
>A test class is just a simple container for tests of a specific function.

21. Test class: theoretical structure
>To declare a test class, we start with the class keyword

22. Test class: theoretical structure
>and follow it up with the name of the class. The name of the class should be in CamelCase, and should always start with “Test”. The best way to name a test class is to follow the “Test” with the name of the function, for example, TestRowToList.

23. Test class: theoretical structure
>A test class takes one argument, and this argument is always called object. To know more about this argument, check out the DataCamp course on object-oriented Python, but we don't really need to for testing purposes, as we will never use this argument anywhere else. Now put all tests for the function under the test class as follows.

24. Test class: theoretical structure
>Note that, this time, all tests should receive a single argument called self. This also comes from object-oriented Python.

25. Clean separation
>For the other function convert_to_int(), we create another test class TestConvertToInt, and put the tests for convert_to_int() inside that class. Then the tests for the two functions are nicely separated.

26. Final test directory structure
>This procedure is then repeated for test_as_numpy.py, which would hold the test class TestGetDataAsNumpyArray and test_train.py, which would hold the test class TestSplitIntoTrainingAndTestingSets.

27. Test directory is well organized!
>Now our tests or, should we say clothes, are well organized.

28. IPython console's working directory is tests
>Before we move on to the exercises, note that the IPython console's working directory will be the tests directory from now on.

29. IPython console's working directory is tests

30. Let's practice structuring tests!
>Let's try all this out in the exercises.

In [2]:
%cd test_practices
!pytest
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 31 items

test_TestSplitIntoTrainingAndTestingSets.py .                            [  3%]
test_convert_to_int.py .                                                 [  6%]
test_convert_to_int_TDD.py ......                                        [ 25%]
test_convert_to_int_fail.py F                                            [ 29%]
test_for_missing_area_fail.py F                                          [ 32%]
test_get_data_as_numpy_array_fail.py .                                   [ 35%]
test_mystery_function.py .                                               [ 38%]
test_row_to_list.py .......                

# <font color=darkred>3.2 Place test modules at the correct location</font>

**Instructions**

A data science project without visualization is like pizza without cheese, right? But this has been fixed by creating a package called visualization under the top level application directory src.


<code>
src/                                    # All application code lives here
|-- visualization/                      # Package for visualization
    |-- __init__.py
    |-- plots.py                        # Module for plotting
</code>

In the package, there is a Python module plots.py, which contain functions related to plotting. These functions should be tested in a test module test_plots.py.

According to pytest guidelines, where should you place this test module within the project structure?

**Possible Answers**
- src/visualization/test_plots.py.
- src/visualization/tests/test_plots.py.
- tests/test_plots.py.
- <font color=red>tests/visualization/test_plots.py.</font>

**Results**

<font color=darkgreen>Wow, you have become good at organizing tests! Placing it in this location gives us two advantages: easier navigation within the tests folder and the possibility of having identically named test modules distinguished by the parent mirror package.</font>

# <font color=darkred>3.3 Create a test class</font>

Test classes are containers inside test modules. They help separate tests for different functions within the test module, and serve as a structuring tool in the pytest framework.

Test classes are written in CamelCase e.g. TestMyFunction as opposed to tests, which are written using underscores e.g. test_something().

You met the function split_into_training_and_testing_sets() in Chapter 2, and wrote some tests for it. One of these tests was called test_on_one_row() and it checked if the function raises a ValueError when passed a NumPy array with only one row.

In this exercise you are going to create a test class for this function. This test class will hold the test test_on_one_row().

**Instructions**
- Declare the test class for the function split_into_training_and_testing_sets(), making sure to give it a name that follows the standard naming convention.
- Fill in the mandatory argument in the test test_on_one_row().

**Results**

<font color=darkgreen>Wow, well done! Using test classes, you can now cleanly separate tests for different functions in your test modules. If you don't know object-oriented Python, the arguments object and self might make little sense to you. That is all right, since you don't need to use them extensively for the material of this course. Just make sure that you put the arguments in the right place, and everything will work like magic! If you are curious to learn more about them, check out the Datacamp course on <a href=https://www.datacamp.com/courses/object-oriented-programming-in-python>object-oriented Python</a>.</font>

In [3]:
%cd test_practices
with open("test_TestSplitIntoTrainingAndTestingSets.py", "w") as text_file:
    text_file.write("""
import pytest
import numpy as np

from models.train import split_into_training_and_testing_sets

# Declare the test class
class TestSplitIntoTrainingAndTestingSets(object):
    # Fill in with the correct mandatory argument
    def test_on_one_row(self):
        test_argument = np.array([[1382.0, 390167.0]])
        with pytest.raises(ValueError) as exc_info:
            split_into_training_and_testing_sets(test_argument)
        expected_error_msg = "Argument data_array must have at least 2 rows, it actually has just 1"
        assert exc_info.match(expected_error_msg)
    """)
    
!pytest test_TestSplitIntoTrainingAndTestingSets.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

test_TestSplitIntoTrainingAndTestingSets.py .                            [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


## APPLYING BEST PRACTICE --> USE THE TEST DIRECTORY

In [4]:
%cd univariate-linear-regression/src/test
with open("models/test_train.py", "w") as text_file:
    text_file.write("""
import pytest
import numpy as np

from models.train import split_into_training_and_testing_sets

# Declare the test class
class TestSplitIntoTrainingAndTestingSets(object):
    def test_on_six_rows(self):
        example_argument = np.array([[2081.0, 314942.0], [1059.0, 186606.0],
                                     [1148.0, 206186.0], [1506.0, 248419.0],
                                     [1210.0, 214114.0], [1697.0, 277794.0]]
                                    )
        # Fill in with training array's expected number of rows
        expected_training_array_num_rows = int(example_argument.shape[0]*0.75)
    
        # Fill in with testing array's expected number of rows
        expected_testing_array_num_rows = example_argument.shape[0] - expected_training_array_num_rows
    
        # Call the function to test
        actual = split_into_training_and_testing_sets(example_argument)
    
        # Write the assert statement checking training array's number of rows
        assert actual[0].shape[0] == expected_training_array_num_rows, \
            "The actual number of rows in the training array is not {}".format(expected_training_array_num_rows)
    
        # Write the assert statement checking testing array's number of rows
        assert actual[1].shape[0] == expected_testing_array_num_rows, \
            "The actual number of rows in the testing array is not {}".format(expected_testing_array_num_rows)

    
    def test_on_one_row(self):
        test_argument = np.array([[1382.0, 390167.0]])
        with pytest.raises(ValueError) as exc_info:
            split_into_training_and_testing_sets(test_argument)
        expected_error_msg = "Argument data_array must have at least 2 rows, it actually has just 1"
        assert exc_info.match(expected_error_msg)
    
    
    def test_valueerror_on_one_dimensional_argument(self):
        example_argument = np.array([2081, 314942, 1059, 186606, 1148, 206186])
    
        with pytest.raises(ValueError) as exception_info:
            # store the exception
            split_into_training_and_testing_sets(example_argument)
    
        # Check if ValueError contains correct message
        assert exception_info.match("Argument data_array must be two dimensional. Got 1 dimensional array instead!")
    """)
    
!pytest models/test_train.py
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 3 items

models\test_train.py ...                                                 [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>3.4 Mastering test execution</font>

1. Mastering test execution
>In the last lesson, we learned how to organize tests.

2. Test organization
>The centerpiece was the tests folder, which holds all tests for the project.

3. Test organization
>The folder contains mirror packages, each of which contain a test module.

4. Test organization
>The test modules contain many test classes.

5. Test organization
>A test class is just a container for unit tests for a particular function.

6. Running all tests
>pytest provides an easy way to run all tests contained in the tests folder.

7. Running all tests
>We simply change to the tests directory and run the command pytest. This command automatically discovers tests by recursing into the subtree of the working directory. It identifies all files with names starting with “test_” as test modules. Within test modules, it identifies classes with names starting with “Test” as test classes. Within each test class, it identifies all functions with names starting with “test_” as unit tests. It collects these unit tests and runs them all.

8. Running all tests
>Here is the result of the command. You wrote 16 tests so far, so the command ran all these 16 tests. 15 passed and 1 failed.

9. Typical scenario: CI server
>A typical scenario to run this command is in a CI server after a commit is pushed to the code base.

10. Binary question: do all unit tests pass?
>In this case, we are only interested in the binary question: do all unit tests pass after including the commit?

11. The -x flag: stop after first failure
>In this case, adding the -x flag to the pytest command can save time and resources. This flag makes pytest stop after the first failing test, because a failing test already answers the binary question. In the report, we see that only 9 tests ran this time since execution stopped after the failing test test_on_one_tab_with_missing_value().

12. Running tests in a test module
>Very often, we would only want to run a subset of tests. For example, we might want to just run tests contained in a particular test module, say, test_preprocessing_helpers.py. You already know how to do that since you did this several times in the exercises.

13. Running tests in a test module
>Just type pytest followed by the path to the test module. This only runs the 13 tests contained in test_preprocessing_helpers.py, as we can see in the test result report.

14. Running only a particular test class
>At other times, we want to be more specific. For example, when we are working on a particular function, say, row_to_list(), we only care about the test class TestRowToList.

15. Node ID
>During automatic test discovery, pytest assigns a node ID to every test class and unit test that it encounters. The node ID of a test class is the path to the test module followed by the name of the test class, separated by two colons. The node ID of a unit test follows the same format, with the unit test name added to the end using another double colon separator.

16. Running tests using node ID
>When we run the command pytest followed by the node ID of the test class TestRowToList, for example, it only runs the 7 tests contained in TestRowToList.

17. Running tests using node ID
>When we run the command with the node ID of the unit test test_on_one_tab_with_missing_value(), it only runs a single test.

18. Running tests using keyword expressions
>A faster and flexible way to do this is by using keyword expressions.

19. The -k option
>To run tests using keyword expressions, use the -k option. This option takes a quoted string containing a pattern as the value.

20. The -k option
>For example, we can specify a test class such as TestSplitIntoTrainingAndTestingSets as the pattern, and this will run only the 2 tests within that test class. We can also enter only part of the test class name, as long as that is unique. This saves a lot of typing and has the same outcome.

21. Supports Python logical operators
>We can even use Python logical operators in the pattern to do more complex subsetting. For example, the following command will execute all tests in TestSplitIntoTrainingAndTestingSets except the unit test test_on_one_row(), which only leaves one test to run.

22. Let's run some tests!
>Let's run some tests using these command line tricks in the exercises.

## CREATING ONE TEST FILE MORE

In [5]:
%cd univariate-linear-regression/src/test
with open("data/test_preprocessing_helpers.py", "w") as text_file:
    text_file.write("""
import pytest
from data.preprocessing_helpers import convert_to_int
from data.preprocessing_helpers import row_to_list

class TestConvertToInt(object):
    def test_with_no_comma(self):
        actual = convert_to_int("756")
        assert actual == 756, "Expected: 756, Actual: {0}".format(actual)
    
    def test_with_one_comma(self):
        actual = convert_to_int("2,081")
        assert actual == 2081, "Expected: 2081, Actual: {0}".format(actual)
    
    def test_with_two_commas(self):
        actual = convert_to_int("1,034,891")
        assert actual == 1034891, "Expected: 2081, Actual: {0}".format(actual)
    
    def test_on_string_with_missing_comma(self):
        actual = convert_to_int("178100,301")
        assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
    def test_on_string_with_incorrectly_placed_comma(self):
        actual = convert_to_int("12,72,891")
        assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
    def test_on_float_valued_string(self):
        actual = convert_to_int("23,816.92")
        assert actual is None, "Expected: None, Actual: {0}".format(actual)


class TestRowToList(object):
    def test_on_normal_argument_1(self):
        actual = row_to_list("123\\t4,567\\n")
        expected = ["123", "4,567"]
        assert actual == expected, "Expected: {0}, Actual: {1}".format(expected, actual)
    
    def test_on_normal_argument_2(self):
        actual = row_to_list("1,059\\t186,606\\n")
        expected = ["1,059", "186,606"]
        assert actual == expected, "Expected: {0}, Actual: {1}".format(actual, expected)

    def test_on_no_tab_with_missing_value(self):      # (0, 1) case
        actual = row_to_list('\\n')
        assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
    def test_on_two_tabs_with_missing_value(self):    # (2, 1) case
        actual = row_to_list("123\\t\\t89\\n")
        assert actual is None, "Expected: None, Actual: {0}".format(actual)

    def test_on_no_tab_no_missing_value(self):        # (0, 0) boundary value
        actual = row_to_list('123\\n')
        assert actual is None, 'Expected: None, Actual: {0}'.format(actual)
    
    def test_on_two_tabs_no_missing_value(self):      # (2, 0) boundary value
        actual = row_to_list('123\\t4,567\\t89\\n')
        assert actual is None, 'Expected: None, Actual: {0}'.format(actual)
    
    def test_on_one_tab_with_missing_value(self):     # (1, 1) boundary value
        actual = row_to_list('\\t4,567\\n')
        assert actual is None, 'Expected: None, Actual: {0}'.format(actual)
    """)
    
!pytest data/test_preprocessing_helpers.py
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 13 items

data\test_preprocessing_helpers.py .............                         [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


## RUNNING EVERYTHING IN THE TEST DIRECTORY

In [6]:
# Running all tests in the test directory inside our project
%cd univariate-linear-regression/src/test
!pytest
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items

data\test_preprocessing_helpers.py .............                         [ 54%]
data\test_preprocessing_helpers_conditional_avoid_fail.py s              [ 58%]
features\test_as_numpy.py .                                              [ 62%]
features\test_as_numpy_avoid_fail.py xs                                  [ 70%]
features\test_as_numpy_avoid_fail2.py s                                  [ 75%]
models\test_train.py ...                                                 [ 87%]
models\test_train_avoid_fail.py xx                                       [ 95%]
models

## MAKING PYTEST STOP WHEN FIND A FAILURE (THIS IS ANOTHER TEST DIRECTORY)

In [7]:
# The -x flag: stop after first failure
%cd test_practices
!pytest -x
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 31 items

test_TestSplitIntoTrainingAndTestingSets.py .                            [  3%]
test_convert_to_int.py .                                                 [  6%]
test_convert_to_int_TDD.py ......                                        [ 25%]
test_convert_to_int_fail.py F

________________________ test_on_string_with_one_comma ________________________

    def test_on_string_with_one_comma():
        test_argument = "2,081"
        expected = 2081
        actual = convert_to_int(test_argument)
        # Format the string with the actual return value
        message = "convert_to_int('2,081') should 

## RUNNING TESTS USING NODE ID

In [8]:
# Run the test class TestRowToList
!pytest univariate-linear-regression/src/test/data/test_preprocessing_helpers.py::TestRowToList

platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 7 items

univariate-linear-regression\src\test\data\test_preprocessing_helpers.py . [ 14%]
......                                                                   [100%]



In [9]:
%cd univariate-linear-regression/src/test

# Run the unit test test_on_one_tab_with_missing_value()
!pytest data/test_preprocessing_helpers.py::TestRowToList::test_on_one_tab_with_missing_value

%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

data\test_preprocessing_helpers.py .                                     [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


## RUNNING TEST BASE ON PATTERN SEARCH

In [10]:
%cd univariate-linear-regression/src/test

# Run the unit test test_on_one_tab_with_missing_value()
!pytest -k "TestSplitIntoTrainingAndTestingSets"

%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items / 21 deselected / 3 selected

models\test_train.py ...                                                 [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


In [11]:
%cd univariate-linear-regression/src/test

# Supports Python logical operators
!pytest -k "TestSplit and not test_on_one_row"

%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items / 22 deselected / 2 selected

models\test_train.py ..                                                  [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>3.5 One command to run them all</font>

**Instructions**

One of your colleagues pushed some changes to the functions row_to_list(), convert_to_int(), get_data_as_numpy_array() and split_into_training_and_testing_sets(). That means that you have to run all the tests again to figure out if something got broken as a result.

The current working directory in the IPython console is the tests directory, which contains all the tests in the same layout as described in the video. You can, at any time, run the tests in the IPython console using the appropriate command.

**Results**

- In the IPython console, what is the correct command for running all tests contained in the tests folder?
    - <font color=red>!pytest</font>
    - !pytest -x
    - !pytest tests
    - pytest
    
    
- When you run all tests with the command !pytest, how many of them pass and how may fail?
    - Passing: 10, Failing: 6
    - <font color=red>Passing: 15, Failing: 1</font> (On iterative ex. in datacamp)
    - Passing 16, Failing: 0
    
    
- Assuming that you simply want to answer the binary question "Are all tests passing" without wasting time and resources, what is the correct command to run all tests till the first failure is encountered?
    - !pytest -k
    - !pytest
    - <font color=red>!pytest -x</font>
    
    
- When you ran the tests using the !pytest -x command, how many tests ran in total before test execution stopped because of the first failing test?
    - 16
    - <font color=red>15</font> (On iterative ex. in datacamp)
    - 7

<font color=darkgreen>Well done! In real life, the !pytest or !pytest -x command is often used in CI servers. It can also be useful if there is a major update to the code base, which changes many application modules at the same time. Running all tests is the only way to check if anything was broken due to the update.</font>

In [12]:
%cd univariate-linear-regression/src/test
with open("features/test_as_numpy.py", "w") as text_file:
    text_file.write("""
import pytest

from features.as_numpy import get_data_as_numpy_array
import numpy as np

class TestGetDataAsNumpyArray(object):
    def test_on_clean_file(self):
        expected = np.array([[2081.0, 314942.0],
                             [1059.0, 186606.0],
                             [1148.0, 206186.0]
                            ])
        actual = get_data_as_numpy_array("example_clean_data2.txt", num_columns=2)
        message = "Expected return value: {0}, Actual return value: {1}".format(expected, actual)
        assert (actual == expected).all()
    """)

!pytest

#os.remove('features/test_as_numpy.py')
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items

data\test_preprocessing_helpers.py .............                         [ 54%]
data\test_preprocessing_helpers_conditional_avoid_fail.py s              [ 58%]
features\test_as_numpy.py .                                              [ 62%]
features\test_as_numpy_avoid_fail.py xs                                  [ 70%]
features\test_as_numpy_avoid_fail2.py s                                  [ 75%]
models\test_train.py ...                                                 [ 87%]
models\test_train_avoid_fail.py xx                                       [ 95%]
models

# <font color=darkred>3.6 Running test classes</font>

**Instructions**

When you ran the !pytest command in the last exercise, the test test_on_six_rows() failed. This is a test for the function split_into_training_and_testing_sets(). This means that this function is broken.

Short recap in case you forgot: this function takes a NumPy array containing housing area and prices as argument. The function randomly splits the argument array into training and testing arrays in the ratio 3:1, and returns the resulting arrays in a tuple.

A quick look revealed that during the code update, someone inadvertently changed the split from 3:1 to 9:1. This has to be changed back and the unit tests for the function, which now lives in the test class TestSplitIntoTrainingAndTestingSets, needs to be run again. Are you up to the challenge?

**Results**
- Fill in with a float between 0 and 1 so that num_training is approximately of the number of rows in data_array.


- Now let's see if that modification fixed the broken function. The current working directory in the IPython console is the tests folder that contains all tests. The test class TestSplitIntoTrainingAndTestingSets resides in the test module tests/models/test_train.py. What is the correct command to run all the tests in this test class using node IDs?
    - !pytest models::test_train.py::TestSplitIntoTrainingAndTestingSets
    - !pytest -k "TestSplitIntoTrainingAndTestingSets"
    - !pytest models/test_train.py/TestSplitIntoTrainingAndTestingSets
    - <font color=red>!pytest models/test_train.py::TestSplitIntoTrainingAndTestingSets</font>


- What is the correct command to run only the previously failing test test_on_six_rows() using node IDs?
    - <font color=red>!pytest models/test_train.py::TestSplitIntoTrainingAndTestingSets::test_on_six_rows</font>
    - !pytest models/test_train.py::test_on_six_rows
    - !pytest test_on_six_rows


- What is the correct command to run the tests in TestSplitIntoTrainingAndTestingSets using keyword expressions?
    - !pytest models/test_train.py::TestSplitIntoTrainingAndTestingSets
    - !pytest -x "TestSplitIntoTrainingAndTestingSets"
    - <font color=red>!pytest -k "SplitInto"</font>
    - !pytest -k "Test"


<font color=darkgreen>That's correct! The -k flag is really useful, because it helps you select tests and test classes by typing only a unique part of its name. This saves a lot of typing, and you must admit that TestSplitIntoTrainingAndTestingSets is a horrendously long name! In your projects, you will often run tests with the node IDs and the -k flag because you are often not interested in running all tests, but only a subset depending on the functions you are currently working on.</font>

In [13]:
%cd univariate-linear-regression/src/test
!pytest models/test_train.py::TestSplitIntoTrainingAndTestingSets
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 3 items

models\test_train.py ...                                                 [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


In [14]:
%cd univariate-linear-regression/src/test
!pytest models/test_train.py::TestSplitIntoTrainingAndTestingSets::test_on_six_rows
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

models\test_train.py .                                                   [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


In [15]:
%cd univariate-linear-regression/src/test
!pytest -k "SplitInto"
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items / 21 deselected / 3 selected

models\test_train.py ...                                                 [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>3.7 Expected failures and conditional skipping</font>

1. Expected failures and conditional skipping
>In the last lesson, we learned the magic command pytest that runs all tests.

2. Test suite is green when all tests pass
>If all tests pass, then our test suite is green. We can relax on the beach and drink a cocktail.

3. Test suite is red when any test fails
>If any test fails, then our test suite is red. This means we better work and fix it, otherwise our users will be very angry. This is good in theory, but, sometimes, the red light can be a false alarm that will ruin our beach vacations! An example will make things clear.

4. Implementing a function using TDD
>Let's say we are implementing this new function train_model(), which returns the best fit line on the training data. Since we are gonna use TDD, the first step is to write tests, so we create a test class TestTrainModel and add a test to it.

5. The test fails, of course!
>If we run pytest, this test will fail because the function train_model() is not yet implemented. And this is just a result of using TDD, it does not indicate a problem with the code base.

6. False alarm
>But the CI server does not know this and will set off a false alarm when that test fails. It would be nice to have a way to tell pytest that we expect this test to fail.

7. xfail: marking tests as "expected to fail"
>We do that by using the xfail decorator. The decorator goes on top of a test, and it starts with the character @.

8. xfail: marking tests as "expected to fail"
>This is followed by the name of the decorator pytest.mark.xfail. After adding the decorator, if we run pytest again, we see that one test is xfailed. But there are no reported errors,

9. Test suite stays green
>which means that the test suite remains green.

10. Expected failures, but conditionally
>At other times, we might know that the test fails only under certain conditions, and we don't want to be warned about them. Common situations are when some function won't work under a particular Python version or a particular platform. As an example, we have deliberately added the unicode() function in the failure message for the test test_with_no_comma() that we wrote earlier. This only works on Python 2.7 or lower.

11. Test suite goes red on Python 3
>If we run pytest using Python 3, the test suite will go red.

12. skipif: skip tests conditionally
>To tell pytest to skip running this test on Python versions higher than 2.7, we need the skipif decorator. The syntax is similar to xfail. The name of the decorator is pytest.mark.skipif.

13. skipif: skip tests conditionally
>It takes a single boolean expression as an argument. If the boolean expression is True, then the test will be skipped.

14. skipif when Python version is higher than 2.7
>To construct the boolean expression, import the built in module sys and use the attribute sys.version_info. This attribute can be compared against a tuple containing the major and minor Python version, in this case, 2 and 7.

15. The reason argument
>We must also add the required reason argument, which states why the test is skipped.

16. 1 skipped, 1 xfailed
>Running pytest again confirms that one test was xfailed and another one was skipped.

17. Test suite stays green
>The test suite remains green. Perfect again!

18. Showing reason in the test result report
>We can make the reason for skipping show in the report. For that, we can use the -r option.

19. The -r option
>The -r option can be followed by any number of characters.

20. Showing reason for skipping
>If we add the character s, it will show us tests that were skipped in the short test summary section near the end.

21. Optional reason argument to xfail
>The xfail decorator also takes an optional reason argument.

22. Optional reason argument to xfail
>For the test that we marked with xfail, we will add the reason “Using TDD, train_model() is not implemented”.

23. Showing reason for xfail
>If we add the character x to the -r option, it will only show us tests that are xfailed along with the reason in the test summary info.

24. Showing reason for both skipped and xfail
>We can show reasons for both by using the combination sx.

25. Skipping/xfailing entire test classes
>If we are skipping and xfailing multiple tests, note that these decorators can be applied to entire test classes as well.

26. Let's practice xfailing and skipping!
>Let's practice xfailing and skipping in the exercises!

In [16]:
%cd univariate-linear-regression/src/test
with open("models/test_train_not_implemented_avoid_fail.py", "w") as text_file:
    text_file.write("""
import pytest
import numpy as np

from models import train

class TestTrainModelNotImplemented(object):
    def test_on_linear_data(self):
        example_argument = np.array([[2081.0, 314942.0], [1059.0, 186606.0], [1697.0, 277794.0]])
        expected_value   = True
        actual_value     = train.train_model_not_implemented(example_argument)
        message          = 'This function is not implemented yet'
        assert expected_value == actual_value, message
    """)
    
!pytest models/test_train_not_implemented_avoid_fail.py
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

models\test_train_not_implemented_avoid_fail.py F                        [100%]

______________ TestTrainModelNotImplemented.test_on_linear_data _______________

self = <test_train_not_implemented_avoid_fail.TestTrainModelNotImplemented object at 0x00000231CA497790>

    def test_on_linear_data(self):
        example_argument = np.array([[2081.0, 314942.0], [1059.0, 186606.0], [1697.0, 277794.0]])
        expected_value   = True
>       actual_value     = train.train_model_not_implemented(example_argument)
E       AttributeError: module 'models.train' has no att

## <font color=blue>xfail: marking tests as "expected to fail"</font>

In [17]:
%cd univariate-linear-regression/src/test
with open("models/test_train_not_implemented_avoid_fail.py", "w") as text_file:
    text_file.write("""
import pytest
import numpy as np

from models import train

class TestTrainModelNotImplemented(object):
    @pytest.mark.xfail(reason="Using TDD, train.train_model_not_implemented() is not implemented.")
    def test_on_linear_data(self):
        example_argument = np.array([[2081.0, 314942.0], [1059.0, 186606.0], [1697.0, 277794.0]])
        expected_value   = True
        actual_value     = train.train_model_not_implemented(example_argument)
        message          = 'This function is not implemented yet'
        assert expected_value == actual_value, message
    """)
    
!pytest models/test_train_not_implemented_avoid_fail.py
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

models\test_train_not_implemented_avoid_fail.py x                        [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


## <font color=blue>Expected failures, but conditionally</font>

In [18]:
%cd univariate-linear-regression/src/test
with open("data/test_preprocessing_helpers_conditional_avoid_fail.py", "w") as text_file:
    text_file.write("""
import pytest
from data.preprocessing_helpers import convert_to_int

class TestConvertToInt(object):
    def test_with_no_comma(self):
        \"\"\"Only runs on Python 2.7 or lower\"\"\"
        test_argument = "756"
        expected = 756
        actual = convert_to_int(test_argument)
        message = unicode("Expected: 2081, Actual: {0}".format(actual)) # Requires Python 2.7 or lower
        assert actual == expected, message
    """)
    
!pytest data/test_preprocessing_helpers_conditional_avoid_fail.py
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

data\test_preprocessing_helpers_conditional_avoid_fail.py F              [100%]

_____________________ TestConvertToInt.test_with_no_comma _____________________

self = <test_preprocessing_helpers_conditional_avoid_fail.TestConvertToInt object at 0x000002AD7E66F610>

    def test_with_no_comma(self):
        """Only runs on Python 2.7 or lower"""
        test_argument = "756"
        expected = 756
        actual = convert_to_int(test_argument)
>       message = unicode("Expected: 2081, Actual: {0}".format(actual)) # Requires Python 2.7 or lower
E       NameErro

In [19]:
%cd univariate-linear-regression/src/test
with open("data/test_preprocessing_helpers_conditional_avoid_fail.py", "w") as text_file:
    text_file.write("""
import pytest
import sys

from data.preprocessing_helpers import convert_to_int

class TestConvertToInt(object):
    @pytest.mark.skipif(sys.version_info > (2, 7), reason="Requires Python 2.7 or lower.")
    def test_with_no_comma(self):
        \"\"\"Only runs on Python 2.7 or lower\"\"\"
        test_argument = "756"
        expected = 756
        actual = convert_to_int(test_argument)
        message = unicode("Expected: 2081, Actual: {0}".format(actual)) # Requires Python 2.7 or lower
        assert actual == expected, message
    """)
    
!pytest data/test_preprocessing_helpers_conditional_avoid_fail.py
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

data\test_preprocessing_helpers_conditional_avoid_fail.py s              [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


## <font color=blue>RUNNING ALL</font>

In [20]:
%cd univariate-linear-regression/src/test
!pytest 
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items

data\test_preprocessing_helpers.py .............                         [ 54%]
data\test_preprocessing_helpers_conditional_avoid_fail.py s              [ 58%]
features\test_as_numpy.py .                                              [ 62%]
features\test_as_numpy_avoid_fail.py xs                                  [ 70%]
features\test_as_numpy_avoid_fail2.py s                                  [ 75%]
models\test_train.py ...                                                 [ 87%]
models\test_train_avoid_fail.py xx                                       [ 95%]
models

## <font color=red>SHOWING REASON IN THE TEST RESULT REPORT</font>

In [21]:
# Showing reason for skipping
%cd univariate-linear-regression/src/test
!pytest -rs
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items

data\test_preprocessing_helpers.py .............                         [ 54%]
data\test_preprocessing_helpers_conditional_avoid_fail.py s              [ 58%]
features\test_as_numpy.py .                                              [ 62%]
features\test_as_numpy_avoid_fail.py xs                                  [ 70%]
features\test_as_numpy_avoid_fail2.py s                                  [ 75%]
models\test_train.py ...                                                 [ 87%]
models\test_train_avoid_fail.py xx                                       [ 95%]
models

In [22]:
# Showing reason for xfail
%cd univariate-linear-regression/src/test
!pytest -rx
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items

data\test_preprocessing_helpers.py .............                         [ 54%]
data\test_preprocessing_helpers_conditional_avoid_fail.py s              [ 58%]
features\test_as_numpy.py .                                              [ 62%]
features\test_as_numpy_avoid_fail.py xs                                  [ 70%]
features\test_as_numpy_avoid_fail2.py s                                  [ 75%]
models\test_train.py ...                                                 [ 87%]
models\test_train_avoid_fail.py xx                                       [ 95%]
models

In [23]:
# Showing reason for both skipped and xfail
%cd univariate-linear-regression/src/test
!pytest -rsx
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items

data\test_preprocessing_helpers.py .............                         [ 54%]
data\test_preprocessing_helpers_conditional_avoid_fail.py s              [ 58%]
features\test_as_numpy.py .                                              [ 62%]
features\test_as_numpy_avoid_fail.py xs                                  [ 70%]
features\test_as_numpy_avoid_fail2.py s                                  [ 75%]
models\test_train.py ...                                                 [ 87%]
models\test_train_avoid_fail.py xx                                       [ 95%]
models

## <font color=blue>SKIPPING/XFAILING ENTIRE TEST CLASSES</font>

In [24]:
%cd univariate-linear-regression/src/test
with open("features/test_as_numpy_avoid_fail.py", "w") as text_file:
    text_file.write("""
import pytest
import sys
import numpy as np

from features import as_numpy

@pytest.mark.xfail(reason="Using TDD, as_numpy.get_pandas_data is not implemented.")
class TestGetPandasData(object):
    def test_on_clean_file(self):
        expected = np.array([[2081.0, 314942.0],
                             [1059.0, 186606.0],
                             [1148.0, 206186.0]
                            ])
        actual = as_numpy.get_pandas_data("example_clean_data2.txt", num_columns=2)
        message = "Expected return value: {0}, Actual return value: {1}".format(expected, actual)
        assert (actual == expected).all()
        
@pytest.mark.skipif(sys.version_info > (2, 7), reason="requires Python 2.7 or lower.")
class TestGetDataAsNumpyArray(object):
    def test_on_clean_file(self):
        expected = np.array([[2081.0, 314942.0],
                             [1059.0, 186606.0],
                             [1148.0, 206186.0]
                            ])
        actual = as_numpy.get_data_as_numpy_array("example_clean_data2.txt", num_columns=2)
        # Requires Python 2.7 or lower
        message = unicode("Expected return value: {0}, Actual return value: {1}".format(expected, actual)) 
        assert (actual == expected).all()
    """)
    
!pytest -rxs
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items

data\test_preprocessing_helpers.py .............                         [ 54%]
data\test_preprocessing_helpers_conditional_avoid_fail.py s              [ 58%]
features\test_as_numpy.py .                                              [ 62%]
features\test_as_numpy_avoid_fail.py xs                                  [ 70%]
features\test_as_numpy_avoid_fail2.py s                                  [ 75%]
models\test_train.py ...                                                 [ 87%]
models\test_train_avoid_fail.py xx                                       [ 95%]
models

# <font color=darkred>3.8 Mark a test class as expected to fail</font>

**Instructions**

A new function model_test() is being developed and it returns the accuracy of a given linear regression model on a testing dataset. Test Driven Development (TDD) is being used to implement it. The procedure is: write tests first and then implement the function.

A test class TestModelTest has been created within the test module models/test_train.py. In the test class, there are two unit tests called test_on_linear_data() and test_on_one_dimensional_array(). But the function model_test() has not been implemented yet.

Throughout this exercise, pytest and numpy as np will be imported for you.

**Results**

- Run the tests in the test class TestModelTest in the IPython console. What is the outcome?
    - The tests fail with IndexError because some arguments to format the variable message are missing.
    - The tests pass.
    - <font color=red>The tests fail with NameError since the function model_test() has not yet been defined.</font>
    - The tests fail with AssertionError.

- Mark the whole test class TestModelTest as "expected to fail".
- Add the following reason for the expected failure: "Using TDD, model_test() has not yet been implemented".

<font color=darkgreen>Awesome! The reason you provided for the expected failure is useful for your colleagues, who might be wondering why you marked this test as expected to fail.</font>

In [25]:
%cd univariate-linear-regression/src/test
with open("models/test_train_avoid_fail.py", "w") as text_file:
    text_file.write("""
import pytest
import numpy as np

@pytest.mark.xfail(reason="Using TDD, model_test() has not yet been implemented")
class TestModelTest(object):
    def test_on_linear_data(self):
        test_input = np.array([[1.0, 3.0], [2.0, 5.0], [3.0, 7.0]])
        expected = 1.0
        actual = model_test(test_input, 2.0, 1.0)
        message = "model_test({0}) should return {1}, but it actually returned {2}".format(test_input, expected, actual)
        assert actual == pytest.approx(expected), message
        
    def test_on_one_dimensional_array(self):
        test_input = np.array([1.0, 2.0, 3.0, 4.0])
        with pytest.raises(ValueError) as exc_info:
            model_test(test_input, 1.0, 1.0)
    """)
    
!pytest -rxs models/test_train_avoid_fail.py
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 2 items

models\test_train_avoid_fail.py xx                                       [100%]

XFAIL models\test_train_avoid_fail.py::TestModelTest::test_on_linear_data
  Using TDD, model_test() has not yet been implemented
XFAIL models\test_train_avoid_fail.py::TestModelTest::test_on_one_dimensional_array
  Using TDD, model_test() has not yet been implemented
C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>3.9 Mark a test as conditionally skipped</font>

**Instructions**

In Python 2, there was a built-in function called xrange(). In Python 3, xrange() was removed. Therefore, if any test uses xrange(), it's going to fail with a NameError in Python 3.

Remember the function get_data_as_numpy_array()? You saw it in Chapter 2. It converted data in a preprocessed data file into a NumPy array.

range() has been deliberately replaced with the obsolete xrange() in the function. Evil laughter! But no worries, it will be changed back after you're done with this exercise.

You wrote a test called test_on_clean_file() for this function. This test currently resides in a test class TestGetDataAsNumpyArray inside the test module features/test_as_numpy.py.

pytest, numpy as np and get_data_as_numpy_array() has been imported for you.

**Results**
- Run the tests in the test class TestGetDataAsNumpyArray in the IPython console. What is the outcome?
    - <font color=red>The test test_on_clean_file() fails with a NameError because Python 3 does not recognize the xrange() function.</font>
    - The test test_on_clean_file() passes.
    - The test test_on_clean_file() fails with an AssertionError.

- Import the sys module.
- Mark the test test_on_clean_file() as skipped if the Python version is greater than 2.7.
- Add the following reason for skipping the test: "Works only on Python 2.7 or lower".

<font color=darkgreen>Great job! You can use any boolean expression as the first argument of pytest.mark.skipif. One other common situation is to skip tests that won't run on particular platforms like Windows, Linux or Mac using the sys.platform attribute.</font>

In [26]:
%cd univariate-linear-regression/src/test
with open("features/test_as_numpy_avoid_fail2.py", "w") as text_file:
    text_file.write("""
import numpy as np
import pytest
import sys

# importin a dummy function
def get_data_as_numpy_array(clean_data_file_path, num_columns): 
    result = np.empty((0, num_columns)) 
    with open(clean_data_file_path, "r") as f: 
        rows = f.readlines() 
        for row_num in xrange(len(rows)): 
            try: 
                row = np.array([rows[row_num].rstrip("\\ ").split("\\t")], dtype=float) 
            except ValueError: 
                raise ValueError("Line {0} of {1} is badly formatted".format(row_num + 1, clean_data_file_path)) 
            else: 
                if row.shape != (1, num_columns): 
                    raise ValueError("Line {0} of {1} does not have {2} columns".format(
                        row_num + 1, clean_data_file_path, num_columns
                    )) 
            result = np.append(result, row, axis=0) 
    return result 

class TestGetDataAsNumpyArray(object):
    @pytest.mark.skipif(sys.version_info > (2, 7), reason="Works only on Python 2.7 or lower.")
    def test_on_clean_file(self):
        expected = np.array([[2081.0, 314942.0],
                             [1059.0, 186606.0],
                             [1148.0, 206186.0]
                             ]
                            )
        actual = get_data_as_numpy_array("example_clean_data.txt", num_columns=2)
        message = "Expected return value: {0}, Actual return value: {1}".format(expected, actual)
        assert actual == pytest.approx(expected), message
    """)
    
!pytest -rxs features/test_as_numpy_avoid_fail2.py::TestGetDataAsNumpyArray
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

features\test_as_numpy_avoid_fail2.py s                                  [100%]

features\test_as_numpy_avoid_fail2.py:13
    row = np.array([rows[row_num].rstrip("\ ").split("\t")], dtype=float)

SKIPPED [1] features\test_as_numpy_avoid_fail2.py:25: Works only on Python 2.7 or lower.
C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>3.10 Reasoning in the test result report</font>

**Instructions**

In the last exercises, you marked the test class TestModelTest in the test module models/test_train.py as expected to fail. You also marked the test test_on_clean_file() in the test class TestGetDataAsNumpyArray belonging to the test module features/test_as_numpy.py as skipped if the Python version is greater than 2.7.

In both cases, you provided a reason argument which detailed why they are expected to fail or skipped. In this exercise, your job is to make this reason show up in the test result report when you run all tests in the IPython console.

Feel free to run the !pytest command with different options and flags in the IPython console while doing the exercise.

**Results**
- What is the command that would only show the reason for expected failures in the test result report?
    - !pytest -r
    - <font color=red>!pytest -rx</font>
    - !pytest -x
    - !pytest -rs

- What is the command that would only show the reason for skipped tests in the test result report?
    - !pytest -r
    - !pytest -rx
    - !pytest -x
    - <font color=red>!pytest -rs</font>

- What is the command that would show the reason for both skipped tests and tests that are expected to fail in the test result report?
    - </font>!pytest -rsx.</font>
    - !pytest -sx.
    - !pytest -s -x

<font color=darkgreen>Seems like you have become a pro at the pytest command line tool. Congratulations!</font>

In [27]:
%cd univariate-linear-regression/src/test
!pytest -rsx.
%cd ../../..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\test
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 24 items

data\test_preprocessing_helpers.py .............                         [ 54%]
data\test_preprocessing_helpers_conditional_avoid_fail.py s              [ 58%]
features\test_as_numpy.py .                                              [ 62%]
features\test_as_numpy_avoid_fail.py xs                                  [ 70%]
features\test_as_numpy_avoid_fail2.py s                                  [ 75%]
models\test_train.py ...                                                 [ 87%]
models\test_train_avoid_fail.py xx                                       [ 95%]
models

# <font color=darkred>3.11 Continuous integration and code coverage</font>

1. Continuous integration and code coverage
>In Chapter 1,

2. Code coverage and build status badges
>We saw how NumPy increases user trust

3. Code coverage and build status badges
>by adding code coverage

4. Code coverage and build status badges
>and build status badges. In this lesson, we will learn to implement these badges for our own GitHub projects.

5. The build status badge
>Let's start with the build status badge.

6. The build status badge
>This badge uses a Continuous Integration server, which runs all tests automatically whenever we push a commit to GitHub. It shows whether tests are currently passing or failing.

7. Build passing = Stable project
>To an end user, passing indicates a stable code base

8. Build failing = Unstable project
>while failing indicates instability.

9. CI server
>We will use Travis CI as our CI server.

10. Step 1: Create a configuration file
>To integrate with Travis CI, we have to create a settings file called .travis.yml at the root of our repository.

11. Step 1: Create a configuration file
>The file is arranged into sections. First, there's a language setting, and we set it to python. The python setting determines which Python version will be used to run the tests. We choose Python 3.6. The install setting is a list of commands to install our project and dependencies in the CI server. If we organized our tests in the recommended way, then we can use a local pip install using pip install -e dot. The script section lists the commands necessary to run the tests once everything is installed. We use pytest tests to run the test suite.

12. Step 2: Push the file to GitHub
>We push this settings file to GitHub.

13. Step 3: Install the Travis CI app
>Now we go to the GitHub profile page and click on MarketPlace.

14. Step 3: Install the Travis CI app
>We search for Travis CI, click on it

15. Step 3: Install the Travis CI app
>and install the app. It's free for public repositories.

16. Step 3: Install the Travis CI app
>We allow the app access to the necessary repositories or organizations. Here, we are only allowing it access to the public repository univariate-linear-regression, which holds the example code for this course.

17. Step 3: Install the Travis CI app
>We will be redirected to Travis CI, where we should login using our GitHub account. This will bring us to the Travis CI dashboard.

18. Every commit leads to a build
>That's all the setup we need! From now on, whenever we push a commit to the GitHub repo, we should see a build appearing in the Travis CI dashboard.

19. Step 4: Showing the build status badge
>When the build finishes, the badge appears here. We click on the badge,

20. Step 4: Showing the build status badge
>choose Markdown from the dropdown

21. Step 4: Showing the build status badge
>and paste the markdown code in the README file on GitHub. This adds the badge to the GitHub repo.

22. Code coverage
>We will add the code coverage badge next. The code coverage badge indicates the percentage of our application code that gets run when we run the test suite. High percentages indicate a well tested code base.

23. Codecov
>This badge comes from a service called Codecov that integrates seamlessly with GitHub and Travis CI.

24. Step 1: Modify the Travis CI configuration file
>First, we will modify the .travis.yml to enable code coverage reports.

25. Step 1: Modify the Travis CI configuration file
>In the install setting, pip install pytest-cov and codecov, as they are necessary to generate and upload coverage reports.

26. Step 1: Modify the Travis CI configuration file
>The usual pytest command to run the tests should be modified by adding a command line flag --cov which points to the application directory src. This new command will not only run the tests, but also produce a coverage report.

27. Step 1: Modify the Travis CI configuration file
>Finally, add a setting called after_success and add the command codecov. This makes Travis CI push the code coverage results to Codecov after every build.

28. Step 2: Install Codecov
>To enable Codecov for our repository, we install the Codecov app in the GitHub marketplace in the same way we installed Travis CI.

29. Commits lead to coverage report at codecov.io
>From now, when we push a new commit, the code coverage report should show up in Codecov, accessible at codecov.io, after Travis CI completes the build.

30. Step 3: Showing the badge in GitHub
>Go to the badge section in settings and paste the Markdown code to the GitHub README file.

31. Step 3: Showing the badge in GitHub
>And that adds the code coverage badge.

32. Let's practice CI and code coverage!
>Let's practice some of these concepts in the exercises. It is also recommended that you go through these steps for a GitHub repository that you own.

# <font color=darkred>3.12 Build failing</font>

**Instructions**
In the GitHub repository of a Python package, you see the following badge:

<img src='images/build_status_failing_no_whitespace.png' width=25%\>
          
What can you, as a user, conclude from this badge?


**Possible Answers**
- There are no unit tests in the package.
- The package has a code coverage less than 75%.
- There's no .travis.yml configuration file at the root of the repository.
- <font color=red>The package has bugs, which is either causing installation to error out or some of the unit tests in the test suite to fail.</font>

**Results**

<font color=darkgreen>That's correct! Since a build failing badge is indicative of bugs, the maintainer of any package should strive to keep this badge green ("passing").</font>

# <font color=darkred>3.13 What does code coverage mean?</font>

**Instructions**
In a Github repository of a Python package, you see the following badge

<img src='images/code_coverage_badge.png' width=25%/>

What does it mean?

**Possible Answers**
- <font color=red>The test suite tests about 85% of the application code.</font>
- Unit tests make up 85% of the code base of the package.
- Historically, the test suite has failed about 85% of the times it was ran.
- The insurance pays 85% of any financial damages caused by malfunctioning of the package.


**Results**

<font color=darkgreen>You got that right! This brings us to the end of Chapter 3. Congratulations on coming this far! In the next Chapter, you are going to dive into advanced topics in unit testing and look at some data science specific unit testing tricks. See you there :-)</font>

# Aditional material

- Datacamp course: https://learn.datacamp.com/courses/unit-testing-for-data-science-in-python
- Project https://github.com/gutfeeling/univariate-linear-regression