In [1]:
# Import libraries
import expectexception  #%%expect_exception TypeError
import pytest
import numpy as np
import sys

# Importing the function to test
sys.path.append("univariate-linear-regression/src/models")
from train import split_into_training_and_testing_sets

# 2. Intermediate unit testing

In this chapter, you will write more advanced unit tests. Starting from testing complicated data types like NumPy arrays to testing exception handling, you'll do it all. Once you have mastered the science of testing, we will also focus on the arts. For example, we will learn how to find the balance between writing too many tests and too few tests. In the last lesson, you will get introduced to a radically new programming methodology called Test Driven Development (TDD) and put it to practice. This might actually change the way you code forever!

# <font color=darkred>2.1 Mastering assert statements</font>

1. Mastering assert statements
>Congratulations on completing the first chapter! You can already write basic unit tests, and in this lesson, we will learn more about the assert statement.

2. Theoretical structure of an assertion
>So far, we have used only a boolean expression as an argument of the assert statement.

3. The optional message argument
>But the assert statement can take an optional second argument, called the message. The message is only printed when the assert statement raises an AssertionError and it should contain information about why the AssertionError was raised. If the assert statement passes, nothing is printed.

4. Adding a message to a unit test
>Let's look at the unit test called test_for_missing_area() for the row_to_list() function that we encountered in Chapter 1.

5. Adding a message to a unit test
>We could enhance this test by adding a message. We first capture the return value of row_to_list() in a variable called actual. We define a variable called expected holding the expected return value. Then we define a message that contains the argument, the actual and expected return values - basically everything that we need to debug in case the test fails. We follow it up with the appropriate assert statement including the message.

6. Test result report with message
>Here, on the top, we run the unit test without the message and we get pytest's automatic output on failed tests. On the bottom, we run the modified one with the message. Now the automatic output is gone, and we get the nice human readable message next to the AssertionError.

7. Recommendations
>It is recommended to include a message with assertions because it is much easier to read and understand than the automatic output. In the message, print values of any variable of choice that may be relevant to debugging.

8. Beware of float return values!
>Next, we will learn about an especially tricky situation when the function returns a float. In Python, comparisons between floats don't always work as expected, as we can see in this surprising example.

9. Beware of float return values!
>Because of the way Python represents floats, the digits on the right might be different from what we expect, causing comparisons to fail. Why this happens is out of the scope of this course.

10. Don't do this
>The bottom line is: we should not use the usual way to compare floats in the assert statement.

11. Do this
>Instead, we should use the pytest.approx() function to wrap the expected value, as we see in this example. This ensures that the digits far on the right are ignored and we can compare floats safely. We get an empty output since the assert statement passes.

12. NumPy arrays containing floats
>pytest.approx() also works for NumPy arrays containing floats. For example, notice how we are passing a NumPy array to the pytest.approx() function in the assert statement shown here.

13. Multiple assertions in one unit test
>So far, we have only seen one assert statement per unit test, but unit tests can have more than one assert statement. Remember the convert_to_int() function, which converts an integer valued string with commas to an integer?

14. Multiple assertions in one unit test
>Let's look at a unit test which you wrote for convert_to_int() in one of the exercises. We will modify this unit test on the right. We first want to test if the function returns an integer at all. For this, we use the isinstance() function, which takes the return value as a first argument, and the expected type of the return value as a second argument, which is int in this case. We follow this up with another assert statement which checks if the return value matches the expected value. The modified test will pass if both assert statements pass. It will fail if any of them raises an AssertionError.

15. Let's practice writing assert statements!
>Wow, we've just learned quite a bit about assert statements. Now, let's practice all these bells and whistles.

**THE OPTIONAL MESSAAGE ARGUMENT**

In [2]:
%%expect_exception AssertionError

assert 1 == 2, "One is not equal to two!"

[1;31m---------------------------------------------------------------------------[0m
[1;31mAssertionError[0m                            Traceback (most recent call last)
[1;32m<ipython-input-2-674636f6e006>[0m in [0;36m<module>[1;34m[0m
[1;32m----> 1[1;33m [1;32massert[0m [1;36m1[0m [1;33m==[0m [1;36m2[0m[1;33m,[0m [1;34m"One is not equal to two!"[0m[1;33m[0m[1;33m[0m[0m
[0m
[1;31mAssertionError[0m: One is not equal to two!


In [3]:
# The optional message argument
# No message is displayed
assert 1 == 1, "This will not be printed since assertion passes"

**ADDING A MESSAGE TO A UNIT TEST**

In [4]:
%cd test_practices
with open("test_for_missing_area_fail.py", "w") as text_file:
    text_file.write("""
import pytest
from _pytest.assertion import truncate
truncate.DEFAULT_MAX_LINES = 9999
truncate.DEFAULT_MAX_CHARS = 9999 

# The dummy function to evaluate
def row_to_list(row):
    # emulating a wrong function
    return [row.strip()]

# The unit test
def test_for_missing_area():
    val = '\\t293,410\\n'
    actual = row_to_list(val)
    expected = None
    message = "row_to_list({0}) returned {1} instead of {2}".format(repr(val), actual, expected)
    assert actual is expected, message
    """)


!pytest test_for_missing_area_fail.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

test_for_missing_area_fail.py F                                          [100%]

____________________________ test_for_missing_area ____________________________

    def test_for_missing_area():
        val = '\t293,410\n'
        actual = row_to_list(val)
        expected = None
        message = "row_to_list({0}) returned {1} instead of {2}".format(repr(val), actual, expected)
>       assert actual is expected, message
E       AssertionError: row_to_list('\t293,410\n') returned ['293,410'] instead of None
E       assert ['293,410'] is None

test_for_missing_area_fail.py:18: AssertionError
FAILED 

**BEWARE OF FLOAT RETURN VALUES**

In [5]:
%%expect_exception AssertionError

# Beware of float return values!
print('Beware of float return values!')
print('0.1 + 0.1 + 0.1 == 0.3 -->', 0.1 + 0.1 + 0.1 == 0.3)
print('0.1 + 0.1 + 0.1        -->', 0.1 + 0.1 + 0.1)
print('pytest.approx(0.3)     -->', pytest.approx(0.3))

# Don't do this
print("\nDon't do this")
assert 0.1 + 0.1 + 0.1 == 0.3, "Usual way to compare does not always work with floats!"

Beware of float return values!
0.1 + 0.1 + 0.1 == 0.3 --> False
0.1 + 0.1 + 0.1        --> 0.30000000000000004
pytest.approx(0.3)     --> 0.3 ± 3.0e-07

Don't do this
[1;31m---------------------------------------------------------------------------[0m
[1;31mAssertionError[0m                            Traceback (most recent call last)
[1;32m<ipython-input-5-b4a9539e38f1>[0m in [0;36m<module>[1;34m[0m
[0;32m      7[0m [1;31m# Don't do this[0m[1;33m[0m[1;33m[0m[1;33m[0m[0m
[0;32m      8[0m [0mprint[0m[1;33m([0m[1;34m"\nDon't do this"[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;32m----> 9[1;33m [1;32massert[0m [1;36m0.1[0m [1;33m+[0m [1;36m0.1[0m [1;33m+[0m [1;36m0.1[0m [1;33m==[0m [1;36m0.3[0m[1;33m,[0m [1;34m"Usual way to compare does not always work with floats!"[0m[1;33m[0m[1;33m[0m[0m
[0m
[1;31mAssertionError[0m: Usual way to compare does not always work with floats!


In [6]:
# Do this: Use pytest.approx() to wrap expected return value.
# No message is displayed
assert 0.1 + 0.1 + 0.1 == pytest.approx(0.3)

In [7]:
# NumPy arrays containing floats
# No message is displayed
assert np.array([0.1 + 0.1, 0.1 + 0.1 + 0.1]) == pytest.approx(np.array([0.2, 0.3]))

**MULTIPLE ASSERTIONS IN ONE UNIT TEST**

In [8]:
%cd test_practices
with open("test_convert_to_int.py", "w") as text_file:
    text_file.write("""
# Import the pytest package
import pytest

# Import the function convert_to_int()
import sys
sys.path.append("../univariate-linear-regression/src/data")
from preprocessing_helpers import convert_to_int

# Complete the unit test name by adding a prefix
def test_on_string_with_one_comma():
    return_value = convert_to_int("2,081")
    assert isinstance(return_value, int)
    assert return_value == 2081
    """)
    
!pytest test_convert_to_int.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

test_convert_to_int.py .                                                 [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>2.2 Write an informative test failure message</font> 

The test result reports become a lot easier to read when you make good use of the optional message argument of the assert statement.

In a previous exercise, you wrote a test for the convert_to_int() function. The function takes an integer valued string with commas as thousand separators e.g. "2,081" as argument and should return the integer 2081.

In this exercise, you will rewrite the test called test_on_string_with_one_comma() so that it prints an informative message if the test fails.

**Instructions**
- Format the message string so that it shows the actual return value.
- Write the assert statement that checks if actual is equal to expected and prints the message message if they are not equal.
- The test that you wrote was written to a test module called test_convert_to_int.py. Run the test in the IPython console and read the test result report. Which of the following is true?

    - The test passes.
    - The test fails because convert_to_int("2,081") returns the string "2081" and not the integer 2081.
    - <font color=red>The test fails because convert_to_int("2,081") returns None and not the integer 2081.</font>
    - The test fails because of a SyntaxError in the test code.

**Results**

<font color=darkgreen>That's right! It is a lot easier to understand the custom message that you wrote than the automatic messages that pytest prints. Therefore, it is recommended that you add custom failure messages to all assert statements that you write in the future.</font>

In [9]:
%cd test_practices
with open("test_convert_to_int_fail.py", "w") as text_file:
    text_file.write("""
# Import the pytest package
import pytest

# Import the dummy function
def convert_to_int(string_with_comma):
    # emulating a wrong function
    return None

# Complete the unit test name by adding a prefix
def test_on_string_with_one_comma():
    test_argument = "2,081"
    expected = 2081
    actual = convert_to_int(test_argument)
    # Format the string with the actual return value
    message = "convert_to_int('2,081') should return the int 2081, but it actually returned {0}".format(actual)
    # Write the assert statement which prints message on failure
    assert actual == expected, message
    """)
    
!pytest test_convert_to_int_fail.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

test_convert_to_int_fail.py F                                            [100%]

________________________ test_on_string_with_one_comma ________________________

    def test_on_string_with_one_comma():
        test_argument = "2,081"
        expected = 2081
        actual = convert_to_int(test_argument)
        # Format the string with the actual return value
        message = "convert_to_int('2,081') should return the int 2081, but it actually returned {0}".format(actual)
        # Write the assert statement which prints message on failure
>       assert actual == expected, message
E       Assert

# <font color=darkred>2.3 Testing float return values</font> 

The get_data_as_numpy_array() function (which was called mystery_function() in one of the previous exercises) takes two arguments: the path to a clean data file and the number of data columns in the file . An example file has been printed out in the IPython console. It contains three rows.

The function converts the data into a 3x2 NumPy array with dtype=float64. The expected return value has been stored in a variable called expected. Print it out to see it.

The housing areas are in the first column and the housing prices are in the second column. This array will be the features that will be fed to the linear regression model for learning.

The return value contains floats. Therefore you have to be especially careful when writing unit tests for this function.

**Instructions**
- Complete the assert statement to check if get_data_as_numpy_array() returns expected, when called on example_clean_data_file.txt with num_columns set to 2.

**Results**

<font color=darkgreen>Well done! The pytest.approx() function not only works for NumPy arrays containing floats, but also for lists and dictionaries containing floats.</font>

In [10]:
%cd test_practices
with open("datasets/example_clean_data2.txt", "w") as text_file:
    text_file.write("""2081\t314942\n1059\t186606\n1148\t206186""")

with open("test_get_data_as_numpy_array_fail.py", "w") as text_file:
    text_file.write("""
# Import libraries
import pytest

# Import the function to test
import numpy as np
import sys
sys.path.append("../univariate-linear-regression/src/features")
from as_numpy import get_data_as_numpy_array

# Complete the unit test name by adding a prefix
def test_on_clean_file():
  expected = np.array([[2081.0, 314942.0],
                       [1059.0, 186606.0],
                       [1148.0, 206186.0]
                       ]
                      )
  actual = get_data_as_numpy_array("datasets/example_clean_data2.txt", num_columns=2)
  message = "Expected return value: {0}, Actual return value: {1}".format(expected, actual)
  # Complete the assert statement
  assert actual == pytest.approx(expected), message
    """)
    
!pytest test_get_data_as_numpy_array_fail.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

test_get_data_as_numpy_array_fail.py .                                   [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>2.4 Testing with multiple assert statements</font> 

You're now going to test the function split_into_training_and_testing_sets() from the models module.

It takes a n x 2 NumPy array containing housing area and prices as argument. To see an example argument, print the variable example_argument in the IPython console.

The function returns a 2-tuple of NumPy arrays (training_set, testing_set). The training set contains int(0.75 * n) (approx. 75%) randomly selected rows of the argument array. The testing set contains the remaining rows.

Print the variable expected_return_value in the IPython console. example_argument had 6 rows. Therefore the training array has int(0.75 * 6) = 4 of its rows and the testing array has the remaining 2 rows.

numpy as np, pytest and split_into_training_and_testing_sets have been imported for you.

**Instructions**
- Calculate the expected number of rows of the training array using the formula int(0.75*n), where n is the number of rows in example_argument, and assign the variable expected_training_array_num_rows to this number.
- Calculate the expected number of rows of the testing array using the formula n - int(0.75*n), where n is the number of rows in example_argument, and assign the variable expected_testing_array_num_rows to this number.
- Write an assert statement that checks if training array has expected_training_array_num_rows rows.
- Write an assert statement that checks if testing array has expected_testing_array_num_rows rows.

**Results**

<font color=darkgreen>Well done! You seem to have mastered the art of writing assert statements. This test will pass only if both assertions pass. It will fail if any one of them raises an AssertionError.</font>

In [11]:
%cd test_practices
with open("test_split_into_training_and_testing_sets.py", "w") as text_file:
    text_file.write("""
# Import libraries
import pytest

# Import the function to test
import numpy as np
import sys
sys.path.append("../univariate-linear-regression/src/models")
from train import split_into_training_and_testing_sets

def test_on_six_rows():
    example_argument = np.array([[2081.0, 314942.0], [1059.0, 186606.0],
                                 [1148.0, 206186.0], [1506.0, 248419.0],
                                 [1210.0, 214114.0], [1697.0, 277794.0]]
                                )
    # Fill in with training array's expected number of rows
    expected_training_array_num_rows = int(example_argument.shape[0]*0.75)
    
    # Fill in with testing array's expected number of rows
    expected_testing_array_num_rows = example_argument.shape[0] - expected_training_array_num_rows
    
    # Call the function to test
    actual = split_into_training_and_testing_sets(example_argument)
    
    # Write the assert statement checking training array's number of rows
    assert actual[0].shape[0] == expected_training_array_num_rows, \
           "The actual number of rows in the training array is not {}".format(expected_training_array_num_rows)
    
    # Write the assert statement checking testing array's number of rows
    assert actual[1].shape[0] == expected_testing_array_num_rows, \
           "The actual number of rows in the testing array is not {}".format(expected_testing_array_num_rows)
    """)
    
!pytest test_split_into_training_and_testing_sets.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

test_split_into_training_and_testing_sets.py .                           [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>2.5 Testing for exceptions instead of return values</font> 

1. Testing for exceptions instead of return values
>So far, we have used the assert statement to check if a function returns the expected value. However, some functions may not return anything, but rather raise an exception, when called on certain arguments.

2. Example
>Consider the split_into_training_and_testing_sets() function that you tested in the last exercise. This function returns a two-tuple containing the training and the testing array. It puts 75% of the rows of the argument NumPy array into the training array, and the rest of the rows into the testing array.

3. Example
>This function expects the argument array to have rows and columns, that is, the argument array must be two dimensional. Otherwise, splitting by rows is undefined.

4. Example
>So if we pass a one dimensional array to this function, it should not return anything, but rather raise a ValueError, which is a specific type of exception.

5. Unit testing exceptions
>We will now learn how to test whether this function raises ValueError when it gets a one dimensional array as argument. Let's call the unit test test_valueerror_on_one_dimensional_argument(). The example argument is a one dimensional array, as shown here. Then we use a with statement that we will explain in detail now.

6. Theoretical structure of a with statement
>First, let's understand the with statement. Any code that is inside the with statement is known as the context.

7. Theoretical structure of a with statement
>The with statement takes a single argument, which is known as a context manager.

8. Theoretical structure of a with statement
>The context manager runs some code before entering and exiting the context, just like a security guard who checks or does something when we enter or leave a building.

9. Theoretical structure of a with statement
>In this case, we are using a context manager called pytest.raises(). It takes a single argument, which is the type of exception that we are checking for, in this case, a ValueError. This context manager does not run any code on entering the context, but it does something on exit. If the code in the context raises a ValueError, the context manager silences the error. And if the code in the context does not raise a ValueError, the context manager raises an exception itself.

10. Theoretical structure of a with statement
>Here is an example where a ValueError is raised in the context and silenced by the context manager. The second code block shows an example where no ValueError is raised and the context manager raises an exception called Failed.

11. Unit testing exceptions
>Getting back to the unit test, we call the function on the one dimensional array inside the context. If the function raises a ValueError as expected, it will be silenced and the test will pass. If the function is buggy and no ValueError is raised, the context manager raises a Failed exception, causing the test to fail.

12. Testing the error message
>We can unit test details of the raised exception as well. For example, we might want to check if the raised ValueError contains the correct error message which starts with "Argument data array must be two dimensional".

13. Testing the error message
>In order to do that, we extend the with statement with the as clause. If the ValueError is raised within the context, then exception_info will contain information about the silenced ValueError. After the context ends, we can check whether exception_info has the correct message. To do this, we use a simple assert statement with the match() method of exception_info. The match method takes a string as argument and checks if the string is present in the error message.

14. Let's practice unit testing exceptions.
>We've learned a lot. Now, let's practice the pytest.raises() context manager!

In [12]:
# Remembering the function
example_argument = np.array([[2081, 314942],
                             [1059, 186606],
                             [1148, 206186],
                            ])
split_into_training_and_testing_sets(example_argument)

(array([[  2081, 314942],
        [  1148, 206186]]),
 array([[  1059, 186606]]))

In [13]:
%%expect_exception ValueError
example_argument = np.array([2081, 314942, 1059, 186606, 1148, 206186])

# one dimensional
split_into_training_and_testing_sets(example_argument)

[1;31m---------------------------------------------------------------------------[0m
[1;31mValueError[0m                                Traceback (most recent call last)
[1;32m<ipython-input-13-2c4062f614dc>[0m in [0;36m<module>[1;34m[0m
[0;32m      2[0m [1;33m[0m[0m
[0;32m      3[0m [1;31m# one dimensional[0m[1;33m[0m[1;33m[0m[1;33m[0m[0m
[1;32m----> 4[1;33m [0msplit_into_training_and_testing_sets[0m[1;33m([0m[0mexample_argument[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m
[1;32m~\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\univariate-linear-regression\src\models\train.py[0m in [0;36msplit_into_training_and_testing_sets[1;34m(data_array)[0m
[0;32m      6[0m     [0mdim[0m [1;33m=[0m [0mdata_array[0m[1;33m.[0m[0mndim[0m[1;33m[0m[1;33m[0m[0m
[0;32m      7[0m     [1;32mif[0m [0mdim[0m [1;33m!=[0m [1;36m2[0m[1;33m:[0m[1;33m[0m[1;33m[0m[0m
[1;32m----> 8[1;33m         [1;32mraise[

In [14]:
%cd test_practices
with open("test_valueerror_on_one_dimensional_argument.py", "w") as text_file:
    text_file.write("""
# Import libraries
import pytest

# Import the function to test
import numpy as np
import sys
sys.path.append("../univariate-linear-regression/src/models")
from train import split_into_training_and_testing_sets

def test_valueerror_on_one_dimensional_argument():
    example_argument = np.array([2081, 314942, 1059, 186606, 1148, 206186])
    
    with pytest.raises(ValueError) as exception_info:
        # store the exception
        split_into_training_and_testing_sets(example_argument)
    
    # Check if ValueError contains correct message
    assert exception_info.match("Argument data_array must be two dimensional. Got 1 dimensional array instead!")
    """)
    
!pytest test_valueerror_on_one_dimensional_argument.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

test_valueerror_on_one_dimensional_argument.py .                         [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>2.6 Practice the context manager</font> 

In pytest, you can test whether a function raises an exception by using a context manager. Let's practice your understanding of this important context manager, the with statement and the as clause.

At any step, feel free to run the code by pressing the "Run Code" button and check if the output matches your expectations.

**Instructions**
- Complete the with statement by filling in with a context manager that will silence the ValueError raised in the context.
- Complete the with statement with a context manager that raises Failed if no OSError is raised in the context.
- Extend the with statement so that any raised ValueError is stored in the variable exc_info.
- Write an assert statement to check if the raised ValueError contains the message "Silence me!".

**Results**

<font color=darkgreen>Well done! In the next exercise, you will apply our knowledge of pytest.raises() to write a unit test for a function in the example linear regression project.</font>

In [15]:
# Fill in with a context manager that will silence the ValueError
with pytest.raises(ValueError):
    raise ValueError

In [16]:
try:
    # Fill in with a context manager that raises Failed if no OSError is raised
    with pytest.raises(OSError):
        raise ValueError
except:
    print("pytest raised an exception because no OSError was raised in the context.")

pytest raised an exception because no OSError was raised in the context.


In [17]:
# Store the raised ValueError in the variable exc_info
with pytest.raises(ValueError) as exc_info:
    raise ValueError("Silence me!")

In [18]:
with pytest.raises(ValueError) as exc_info:
    raise ValueError("Silence me!")
# Check if the raised ValueError contains the correct message
assert exc_info.match('Silence me!')

# <font color=darkred>2.7 Unit test a ValueError</font> 

Sometimes, you want a function to raise an exception when called on bad arguments. This prevents the function from returning nonsense results or hard-to-interpret exceptions. This is an important behavior which should be unit tested.

Remember the function split_into_training_and_testing_sets()? It takes a NumPy array containing housing area and prices as argument. The function randomly splits the array row wise into training and testing arrays in the ratio 3:1, and returns the resulting arrays in a tuple.

If the argument array has only 1 row, the testing array will be empty. To avoid this situation, you want the function to not return anything, but raise a ValueError with the message "Argument data_array must have at least 2 rows, it actually has just 1".

**Instructions**
- Fill in with the correct context manager that checks if split_into_training_and_testing_sets() raises a ValueError when called on test_argument, which is a NumPy array with a single row.
- Complete the with statement so that information about any raised ValueError will be stored in the variable exc_info.
- Write an assert statement to check if the raised ValueError contains the correct message stored in the variable expected_error_msg.
- The test test_on_one_row() was written to the test module test_split_into_training_and_testing_sets.py. Run the test in the IPython console and read the test result report. Does the test pass or fail?

    - <font color=red>The test passes.</font>
    - The test fails because no ValueError is raised when split_into_training_and_testing_sets() is called on a NumPy array with 1 row.
    - The test fails because the ValueError does not contain the correct error message.

**Results**

<font color=darkgreen>That's correct! Congratulations on writing your first unit test that checks for exceptions. In the next lesson, you will find out that it is good practice to include a few tests of this type for every function that you test.</font>

In [19]:
%cd test_practices
with open("test_split_into_training_and_testing_sets.py", "w") as text_file:
    text_file.write("""
import pytest

# Import the function to test
import numpy as np
import sys
sys.path.append("../univariate-linear-regression/src/models")
from train import split_into_training_and_testing_sets

def test_on_one_row():
    test_argument = np.array([[1382.0, 390167.0]])
    # Store information about raised ValueError in exc_info
    with pytest.raises(ValueError) as exc_info:
          split_into_training_and_testing_sets(test_argument)
    expected_error_msg = "Argument data_array must have at least 2 rows, it actually has just 1"
    # Check if the raised ValueError contains the correct message
    assert exc_info.match(expected_error_msg)
    """)
    
!pytest test_split_into_training_and_testing_sets.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 1 item

test_split_into_training_and_testing_sets.py .                           [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>2.8 The well tested function</font>

1. The well tested function
>In this lesson, we will explore the question: how many tests should one write for a function?

2. Example
>Consider the split_into_training_and_testing_sets() function that we saw in earlier lessons. This function takes a two dimensional NumPy array as an argument and randomly puts about 75% of its rows into a training array, and the remaining rows into a testing array. It returns the training and the testing arrays as a two tuple.

3. Test for length, not value
>Because this function has randomness, we test for the lengths of the training and testing arrays rather than their actual values. The length of the training array is given by the integer part of 0.75 times the number of rows in the argument. The testing array gets the rest.

4. Test arguments and expected return values
>For example, if the argument has 8 rows, then the training array should have 6 rows and the testing array should have 2 rows.

5. Test arguments and expected return values
>In general, the more arguments we check,

6. Test arguments and expected return values
>the more confident we can be that the function is working correctly.

7. How many arguments to test?
>But since we cannot write tests for hundreds of arguments because of time limitations, how many tests can be considered enough?

8. Test argument types
>The best practice is to pick a few from each of the following categories of arguments, which are called bad arguments, special arguments and normal arguments.

9. Test argument types
>If we have tested

10. Test argument types
>for all

11. Test argument types
>of these argument types,

12. The well tested function
>then our function can be declared well tested.

13. Type I: Bad arguments
>Let's define these argument types for the split_into_training_and_testing_sets() function. Bad arguments are arguments for which the function raises an exception instead of returning a value.

14. Type I: Bad arguments (one dimensional array)
>For split_into_training_and_testing_sets(), a one dimensional array, like the one shown, is a bad argument. It doesn't have rows and columns, so splitting row wise doesn't make any sense. The expected outcome is a ValueError.

15. Type I: Bad arguments (array with only one row)
>An array with a single row is also a bad argument, because this row can be put in the training array, but then the testing array will be empty. Or vice versa. Empty training or testing arrays are useless, so the function raises a ValueError.

16. Type II: Special arguments
>Next comes special arguments. These are of two types: boundary values and argument values for which the function uses a special logic to produce the return value.

17. Boundary values
>So what are boundary values? If we look at the number of rows of the argument, we see that the function raises a ValueError for one row, but returns training and testing array for arguments having more than one row.

18. Boundary values
>The value two marks the boundary for this behavior change, and therefore, is a boundary value.

19. Test arguments table
>So we append the test arguments table with the boundary value.

20. Arguments triggering special logic
>The other type of special arguments are those that trigger special logic in the function. Look at the last row where the argument has 4 rows. The standard logic of a 75% and 25% split would produce a training array with 3 rows and a testing array with 1 row.

21. Arguments triggering special logic
>But we might want the function to return a training array with 2 rows and testing array with 2 rows instead. Then 4 rows would be a special case, because the function isn't using the usual 75% and 25% logic.

22. Normal arguments
>Finally, anything that is not a bad or special argument is a normal argument. First notice that since the function uses special logic at 4, the behavior at 4 is different from the behavior on either side of 4. This turns 3 and 5, which flanks 4, into boundary values. The remaining arguments, with number of rows exceeding 5 are then normal arguments. It is recommended to test for two or three normal arguments.

23. Final test arguments table
>The final rows of the arguments table correspond to two normal arguments, having 6 and 8 rows respectively. If we test the split_into_training_and_testing_sets() function with all these arguments,

24. Insert title here...
>then we can declare it well tested.

25. Caveat
>When applying this logic to other functions, note that not all functions have bad or special arguments. In that case, we should just ignore those types of arguments.

26. Let's apply this to other functions!
>Enough theory! Let's now apply this to other functions in the exercises.

# <font color=darkred>2.9 Testing well: Boundary values</font> 

Remember row_to_list()? It takes a row containing housing area and prices e.g. "2,041\t123,781\n" and returns the data as a list e.g. ["2,041", "123,781"].

A row can be mapped to a 2-tuple (m, n), where m is the number of tab separators. n is 1 if the row has any missing values, and 0 otherwise.

For example,
- "123\t456\n"  (1, 0).
- "\t456\n"  (1, 1).
- "\t456\t\n"  (2, 1).

The function only returns a list for arguments mapping to (1, 0). All other tuples correspond to invalid rows, with either more than one tab or missing values. The function returns None in all these cases. See the plot.

This mapping shows that the function has normal behavior at (1, 0), and special behavior everywhere else.

<img src='images/Testing well - Boundary values.png' width=50%/>

**Instructions**
- Which are the boundary values for this function, according to the plot?

    - (1, 0).  <font color=gray>_It is the normal argument for this function, not a boundary value._</font>
    - (0, 0) and (2, 0).  <font color=gray>_Yes, these two are boundary values because they mark the points separating the normal behavior and special behavior in the x direction. But what about the y direction?_</font>
    - <font color=red>(0, 0), (2, 0) and (1, 1).</font>
    - (3, 0) and (3, 1).

- Assign actual to the return value of row_to_list() on the argument "123\n", which is an instance of the boundary value (0, 0).
- Complete the assert statement to check if row_to_list() indeed returns None for the instance "123\t4,567\t89\n" of the boundary value (2, 0).
- In the test test_on_one_tab_with_missing_value(), format the failure message with the actual return value.

**Results**

<font color=darkgreen>Fantastic! You just wrote tests for all the boundary values of row_to_list(). In the next exercise, you will write tests for some special values of row_to_list().</font>

In [20]:
%cd test_practices
with open("test_row_to_list_boundary_values.py", "w") as text_file:
    text_file.write("""
import pytest

import sys
sys.path.append("../univariate-linear-regression/src/data")
from preprocessing_helpers import row_to_list

def test_on_no_tab_no_missing_value():    # (0, 0) boundary value
    # Assign actual to the return value for the argument "123\\n"
    actual = row_to_list('123\\n')
    assert actual is None, 'Expected: None, Actual: {0}'.format(actual)
    
def test_on_two_tabs_no_missing_value():    # (2, 0) boundary value
    actual = row_to_list('123\\t4,567\\t89\\n')
    # Complete the assert statement
    assert actual is None, 'Expected: None, Actual: {0}'.format(actual)
    
def test_on_one_tab_with_missing_value():    # (1, 1) boundary value
    actual = row_to_list('\\t4,567\\n')
    # Format the failure message
    assert actual is None, 'Expected: None, Actual: {0}'.format(actual)
    """)
    
!pytest test_row_to_list_boundary_values.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 3 items

test_row_to_list_boundary_values.py ...                                  [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>2.10 Testing well: Values triggering special logic</font> 

Look at the plot. The boundary values of row_to_list() are now marked in orange. The normal argument is marked in green and the values triggering special behavior are marked in blue.

In the last exercise, you wrote tests for boundary values. In this exercise, you are going to write tests for values triggering special behavior, in particular, (0, 1) and (2, 1). These are values triggering special logic since the function returns None instead of a list.

<img src='images/Testing well - Values triggering special logic.png' width=50%/>

**Instructions**
- Assign the variable actual to the actual return value for "\n".
- Complete the assert statement for test_on_no_tab_with_missing_value(), making sure to format the failure message appropriately.
- Assign the variable actual to the actual return value for "123\t\t89\n".
- Complete the assert statement for test_on_two_tabs_with_missing_value(), making sure to format the failure message appropriately.

**Results**

<font color=darkgreen>Kudos! You have now written tests for both boundary values and values triggering special logic for row_to_list(). In the next exercise, you will test normal arguments and then declare this function well tested!</font>

In [21]:
%cd test_practices
with open("test_row_to_list_special_values.py", "w") as text_file:
    text_file.write("""
import pytest

import sys
sys.path.append("../univariate-linear-regression/src/data")
from preprocessing_helpers import row_to_list

def test_on_no_tab_with_missing_value():    # (0, 1) case
    # Assign to the actual return value for the argument '\\n'
    actual = row_to_list('\\n')
    # Write the assert statement with a failure message
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
def test_on_two_tabs_with_missing_value():    # (2, 1) case
    # Assign to the actual return value for the argument "123\\t\\t89\\n"
    actual = row_to_list("123\\t\\t89\\n")
    # Write the assert statement with a failure message
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    """)
    
!pytest test_row_to_list_special_values.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 2 items

test_row_to_list_special_values.py ..                                    [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>2.11 Testing well: Normal arguments</font> 

This time, you will test row_to_list() with normal arguments i.e. arguments mapping to the tuple (1, 0). The plot is provided to you for reference.

Remembering that the best practice is to test for two to three normal arguments, you will write two tests in this exercise.

<img src='images/Testing well - Values triggering special logic.png' width=50%/>

**Instructions**
- How many normal arguments is it recommended to test?
    - One.
    - <font color=red>At least two or three.</font>
    - None.
    
    
- Assign the variable expected to the expected return value for the normal argument "123\t4,567\n".
- Complete the correct assert statement for test_on_normal_argument_2(), making sure to format the failure message appropriately.


- The tests for boundary values, values triggering special behavior and normal arguments have been written to a test module test_row_to_list.py. Run the tests in the IPython shell. Which bugs does the function have?
    - <font color=red>The function does not have any bugs.</font>
    - The function returns ["", "4,567"] for the boundary value "\t4,567\n" instead of None.
    - The function raises a SyntaxError for the special value

**Results**

<font color=darkgreen>Well done! You tested the function row_to_list() on boundary values, values triggering special behavior and normal arguments. All the tests are passing. So you can be quite confident that the function is correctly coded! Note that this function does not have bad arguments, so you did not write any tests for that. Also note how mapping the arguments to tuples enabled us to categorize the arguments easily. Use this trick for other functions whenever applicable ;-)</font>

In [22]:
%cd test_practices
with open("test_row_to_list_normal_values.py", "w") as text_file:
    text_file.write("""
import pytest

import sys
sys.path.append("../univariate-linear-regression/src/data")
from preprocessing_helpers import row_to_list

def test_on_normal_argument_1():
    actual = row_to_list("123\\t4,567\\n")
    expected = ["123", "4,567"]
    assert actual == expected, "Expected: {0}, Actual: {1}".format(expected, actual)
    
def test_on_normal_argument_2():
    actual = row_to_list("1,059\\t186,606\\n")
    expected = ["1,059", "186,606"]
    assert actual == expected, "Expected: {0}, Actual: {1}".format(actual, expected)
    """)
    
!pytest test_row_to_list_normal_values.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 2 items

test_row_to_list_normal_values.py ..                                     [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


In [23]:
%cd test_practices
with open("test_row_to_list.py", "w") as text_file:
    text_file.write("""
import pytest

import sys
sys.path.append("../univariate-linear-regression/src/data")
from preprocessing_helpers import row_to_list

def test_on_normal_argument_1():
    actual = row_to_list("123\\t4,567\\n")
    expected = ["123", "4,567"]
    assert actual == expected, "Expected: {0}, Actual: {1}".format(expected, actual)
    
def test_on_normal_argument_2():
    actual = row_to_list("1,059\\t186,606\\n")
    expected = ["1,059", "186,606"]
    assert actual == expected, "Expected: {0}, Actual: {1}".format(actual, expected)

def test_on_no_tab_with_missing_value():      # (0, 1) case
    actual = row_to_list('\\n')
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
def test_on_two_tabs_with_missing_value():    # (2, 1) case
    actual = row_to_list("123\\t\\t89\\n")
    assert actual is None, "Expected: None, Actual: {0}".format(actual)

def test_on_no_tab_no_missing_value():        # (0, 0) boundary value
    actual = row_to_list('123\\n')
    assert actual is None, 'Expected: None, Actual: {0}'.format(actual)
    
def test_on_two_tabs_no_missing_value():      # (2, 0) boundary value
    actual = row_to_list('123\\t4,567\\t89\\n')
    assert actual is None, 'Expected: None, Actual: {0}'.format(actual)
    
def test_on_one_tab_with_missing_value():     # (1, 1) boundary value
    actual = row_to_list('\\t4,567\\n')
    assert actual is None, 'Expected: None, Actual: {0}'.format(actual)
    """)
    
!pytest test_row_to_list.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 7 items

test_row_to_list.py .......                                              [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# <font color=darkred>2.12 Test Driven Development (TDD)</font>

1. Test Driven Development (TDD)
>By now, we understand why it is so important to write unit tests.

2. Writing unit tests is often skipped
>But in the real world, it is all too common to skip writing them.

3. Usual priorities in the industry
>Bosses want to prioritize feature implementation, because they want fast results. Unit tests only get second priority.

4. Unit tests never get written
>So we tell ourselves that we will write the not-so-urgent unit tests tomorrow. Eventually, the unit tests never get written. Of course, we pay for this mistake in the long term.

5. Test Driven Development (TDD)
>In this lesson, we will learn a new coding method called Test Driven Development or TDD in short, which tries to ensure that unit tests do get written. We got introduced to the life cycle of a function in Chapter 1. A function is first implemented and then it is tested, according to this life cycle.

6. Test Driven Development (TDD)
>Test Driven Development alters the usual life cycle by adding a single step before implementation. This step involves writing unit tests for the function.

7. Write unit tests before implementation!
>Yes, that right. We write tests even before the function is implemented in code! By making unit tests a precondition for implementation, this ensures that writing unit tests cannot be postponed or deprioritized. It also means that we, along with our bosses, should factor in the time for writing unit tests as a part of implementation time. Furthermore, when we write unit tests first, we have to think of possible arguments and return values - which includes normal, special and bad arguments. This type of thinking before implementation actually helps in finalizing the requirements for a function. When the requirements for a function is clear and precise, it makes the implementation much easier.

8. In the coding exercises...
>That's all the theory we need. In the coding exercises following this video lesson, you will apply this coding method to the function convert_to_int(). We have seen this function before. It converts an integer valued string with comma as thousands separator to an integer.

9. Step 1: Write unit tests and fix requirements
>In the exercises, you will start with a blank slate, which means that the function is not yet implemented. Then you will go through a three step process. First, you will write the tests for this function in the test module test_convert_to_int.py. As you write the unit tests, you will think more about the requirements of this function.

10. Step 2: Run tests and watch it fail
>Then you will execute the test module. Of course, the tests will not pass because the function does not even exist yet.

11. Step 3: Implement function and run tests again
>Finally you will implement the function and run the tests again. If you implemented the function correctly, then the tests should pass this time. Otherwise, you would have to fix bugs and repeat this step.

12. Let's apply TDD!
>Got it? Then let's give it a try in the exercises.

# <font color=darkred>2.13 TDD: Tests for normal arguments</font> 

In this and the following exercises, you will implement the function convert_to_int() using Test Driven Development (TDD). In TDD, you write the tests first and implement the function later.

Normal arguments for convert_to_int() are integer strings with comma as thousand separators. Since the best practice is to test a function for two to three normal arguments, here are three examples with no comma, one comma and two commas respectively.

|Argument value|Expected return value|
|-|-|
|"756"|756|
|"2,081"|2081|
|"1,034,891"|1034891|

Since the convert_to_int() function does not exist yet, you won't be able to import it. But you will use it in the tests anyway. That's how TDD works.

pytest has already been imported for you.

**Instructions**
- Complete the assert statement for test_with_no_comma() by inserting the correct boolean expression.
- Complete the assert statement for test_with_one_comma() by inserting the correct boolean expression.
- Complete the assert statement for test_with_two_commas() by inserting the correct boolean expression.

**Results**

<font color=darkgreen>Awesome! You wrote three tests for normal arguments of convert_to_int(). But wait...what happens if the arguments are not normal? The boss didn't say anything about that! Let's find out in the next exercise.</font>

In [24]:
def test_with_no_comma():
    actual = convert_to_int("756")
    # Complete the assert statement
    assert actual==756, "Expected: 756, Actual: {0}".format(actual)
    
def test_with_one_comma():
    actual = convert_to_int("2,081")
    # Complete the assert statement
    assert actual==2081, "Expected: 2081, Actual: {0}".format(actual)
    
def test_with_two_commas():
    actual = convert_to_int("1,034,891")
    # Complete the assert statement
    assert actual==1034891, "Expected: 1034891, Actual: {0}".format(actual)

# <font color=darkred>2.14 TDD: Requirement collection</font>

**Instructions**

What should convert_to_int() do if the arguments are not normal? In particular, there are three special argument types:

1. Arguments that are missing a comma e.g. "178100,301".
2. Arguments that have the comma in the wrong place e.g. "12,72,891".
3. Float valued strings e.g. "23,816.92".

Also, should convert_to_int() raise an exception for specific argument values?

When your boss asked you to implement the function, she didn't say anything about these cases! But since you want to write tests for special and bad arguments as a part of TDD, you go and ask your boss.

She says that convert_to_int() should return None for every special argument and there are no bad arguments for this function.

pytest has been imported for you.

**Results**
- Give a name to the test by using the standard name prefix that pytest expects followed by on_string_with_missing_comma.
- Assign actual to the actual return value for the argument "12,72,891".
- Complete the assert statement.
- The tests for normal and special arguments have been written to a test module test_convert_to_int.py. Run it in the IPython console and read the test result report. What happens?

    - All tests are passing.
    - The test test_on_string_with_two_commas() is failing because the convert_to_int("1,034,891") returns None instead of the correct integer 1034891.
    - All tests are failing with a NameError since convert_to_int() has not been implemented yet.

<font color=darkgreen>Yes! In TDD, the first run of the tests always fails with a NameError or ImportError because the function does not exist yet. In the next exercise, you will implement the function and fix this. But before you move on, notice how thinking about special and bad arguments crystallized the requirements for the function. This will help us immensely in implementing the function in the coming exercise.</font>

In [25]:
# Give a name to the test for an argument with missing comma
def test_on_string_with_missing_comma():
    actual = convert_to_int("178100,301")
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
def test_on_string_with_incorrectly_placed_comma():
    # Assign to the actual return value for the argument "12,72,891"
    actual = convert_to_int("12,72,891")
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
def test_on_float_valued_string():
    actual = convert_to_int("23,816.92")
    # Complete the assert statement
    assert actual is None, "Expected: None, Actual: {0}".format(actual)

In [26]:
%cd test_practices
with open("test_convert_to_int_TDD.py", "w") as text_file:
    text_file.write("""
import pytest

def test_with_no_comma():
    actual = convert_to_int("756")
    assert actual == 756, "Expected: 756, Actual: {0}".format(actual)
    
def test_with_one_comma():
    actual = convert_to_int("2,081")
    assert actual == 2081, "Expected: 2081, Actual: {0}".format(actual)
    
def test_with_two_commas():
    actual = convert_to_int("1,034,891")
    assert actual == 1034891, "Expected: 2081, Actual: {0}".format(actual)
    
def test_on_string_with_missing_comma():
    actual = convert_to_int("178100,301")
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
def test_on_string_with_incorrectly_placed_comma():
    actual = convert_to_int("12,72,891")
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
def test_on_float_valued_string():
    actual = convert_to_int("23,816.92")
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    """)
    
!pytest test_convert_to_int_TDD.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 6 items

test_convert_to_int_TDD.py FFFFFF                                        [100%]

_____________________________ test_with_no_comma ______________________________

    def test_with_no_comma():
>       actual = convert_to_int("756")
E       NameError: name 'convert_to_int' is not defined

test_convert_to_int_TDD.py:5: NameError
_____________________________ test_with_one_comma _____________________________

    def test_with_one_comma():
>       actual = convert_to_int("2,081")
E       NameError: name 'convert_to_int' is not defined

test_convert_to_int_TDD.py:9: NameError
__________________________

# <font color=darkred>2.15 TDD: Implement the function</font> 

convert_to_int() returns None for the following:

1. Arguments with missing thousands comma e.g. "178100,301". If you split the string at the comma using "178100,301".split(","), then the resulting list ["178100", "301"] will have at least one entry with length greater than 3 e.g. "178100".

2. Arguments with incorrectly placed comma e.g. "12,72,891". If you split this at the comma, then the resulting list is ["12", "72", "891"]. Note that the first entry is allowed to have any length between 1 and 3. But if any other entry has a length other than 3, like "72", then there's an incorrectly placed comma.

3. Float valued strings e.g. "23,816.92". If you remove the commas and call int() on this string i.e. int("23816.92"), you will get a ValueError.

**Instructions**
1. Complete the if statement that checks if the i-th element of comma_separated_parts has length greater than 3.
2. Complete the if statement that checks if any entry other than the 0-th entry of comma_separated_parts has a length not equal to 3.
3. Fill in the except clause with a ValueError, which is raised when trying to convert float valued strings e.g. 23816.92 to an integer.
4. Now that you have implemented the convert_to_int() function, let's run the tests in the test module test_convert_to_int.py again. Run it the IPython console and read the test result report. Did you implement the function correctly, or are there any bugs?

    - All tests are passing and the implementation does not have a bug.
    - The test test_on_string_with_incorrectly_placed_comma() is failing because the convert_to_int("12,72,891") is returning 1272891 instead of None.
    - All tests are failing with a NameError since convert_to_int() has not been implemented yet.

**Results**

<font color=darkgreen>Yes! All tests are passing and you nailed the implementation! Congratulations are also due on finshing Chapter 2. You've learned a lot, and in the next Chapter, you will learn several best practices that will take your testing to the next level.</font>

In [27]:
def convert_to_int(integer_string_with_commas):
    comma_separated_parts = integer_string_with_commas.split(",")
    for i in range(len(comma_separated_parts)):
        # Write an if statement for checking missing commas
        if len(comma_separated_parts[i]) > 3:
            return None
        # Write the if statement for incorrectly placed commas
        if i != 0 and len(comma_separated_parts[i]) != 3:
            return None
    integer_string_without_commas = "".join(comma_separated_parts)
    try:
        return int(integer_string_without_commas)
    # Fill in with a ValueError
    except ValueError:
        return None

In [28]:
%cd test_practices
with open("pythoncode/converttoint.py", "w") as text_file:
    text_file.write("""
def convert_to_int(integer_string_with_commas):
    comma_separated_parts = integer_string_with_commas.split(",")
    for i in range(len(comma_separated_parts)):
        # Write an if statement for checking missing commas
        if len(comma_separated_parts[i]) > 3:
            return None
        # Write the if statement for incorrectly placed commas
        if i != 0 and len(comma_separated_parts[i]) != 3:
            return None
    integer_string_without_commas = "".join(comma_separated_parts)
    try:
        return int(integer_string_without_commas)
    # Fill in with a ValueError
    except ValueError:
        return None
    """)

with open("test_convert_to_int_TDD.py", "w") as text_file:
    text_file.write("""
import pytest

#import the dummy function
from pythoncode.converttoint import convert_to_int

def test_with_no_comma():
    actual = convert_to_int("756")
    assert actual == 756, "Expected: 756, Actual: {0}".format(actual)
    
def test_with_one_comma():
    actual = convert_to_int("2,081")
    assert actual == 2081, "Expected: 2081, Actual: {0}".format(actual)
    
def test_with_two_commas():
    actual = convert_to_int("1,034,891")
    assert actual == 1034891, "Expected: 2081, Actual: {0}".format(actual)
    
def test_on_string_with_missing_comma():
    actual = convert_to_int("178100,301")
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
def test_on_string_with_incorrectly_placed_comma():
    actual = convert_to_int("12,72,891")
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    
def test_on_float_valued_string():
    actual = convert_to_int("23,816.92")
    assert actual is None, "Expected: None, Actual: {0}".format(actual)
    """)
    
!pytest test_convert_to_int_TDD.py
%cd ..

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
platform win32 -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
Matplotlib: 3.3.4
Freetype: 2.10.4
rootdir: C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python\test_practices
plugins: cov-2.12.1, mock-3.6.1, mpl-0.13
collected 6 items

test_convert_to_int_TDD.py ......                                        [100%]

C:\Users\jaces\Documents\Data Science\Python\Cursos\___Unit Testing for Data Science in Python


# Aditional material

- Datacamp course: https://learn.datacamp.com/courses/unit-testing-for-data-science-in-python
- Project https://github.com/gutfeeling/univariate-linear-regression