# Introduction to Testing

Testing is an easy thing to understand but there is also an art to it as well; writing good tests often requires you to try to figure out *what input(s) is most likely to break your program*. 

In addition to this, tests can serve different purposes as well:

* Testing for correctness
* Testing for bugs
* Testing for "Let's check I didn't fuck something up"  (a.k.a 'regression testing')
* ...etc...

All of the above tests have their uses, but as a general rule of thumb a good test suite will include a range of inputs and multiple tests for each. 

I would add a small caveat that if there is documentation for a function that says something like "does not work for strings" then although it is possible to write test code for strings what would be the point? The documentation makes it clear that these tests will fail. Instead of writing test code for situations the code was **not designed to solve** focus on 'realistic' test cases.

Alright, lets write a super simple function that divides A by B:

In [2]:
def divide(a, b):
    """"a, b are ints or floats. Returns a/b"""
    return a / b

Okay so, this is where we need to put our ‘thinking hat’ on for a moment. The documentation for this function specifically states A and B are supposed to be numbers, so instead of wasting time breaking the code with obviously bad inputs lets try to break with valid inputs. In other words: 

> what are the possible integers/floats we can pass in where this function may break?

When dealing with numbers there are, I think, three basic tests that are almost always worth running:

1. Negative Numbers
1. Zero
1. Positive Numbers

And in addition to those tests we should also run tests for:

1. Small inputs (10/5)
1. Very large inputs ( 999342493249234234234234 / 234234244353452424 )

You may remember for example in lecture 21 as we tried to optimise our is_prime function we introduced some defects when working with small numbers.

Anyway, the point is these five basic cases will cover a lot of situations you may have with numbers. Obviously you should run several tests for each of these basic test cases. **And in addition to the basic tests you should run more function specific tests too**; for example, if I have a function that returns the factors of n then it would be wise to run a bunch of tests with prime numbers to check what happens there. You should also test [*highly composite numbers*](https://en.wikipedia.org/wiki/Highly_composite_number) too (e.g 720, 1260). In regard to our division function a good additional test would be when the numerator is larger than the denominator and vice versa (e.g. try both 10/2 and 2/10). Zero is also a special case for division, but we have already listed it in the basic tests.  

Okay, so lets write our first tests:

In [3]:
# Function here...
print (divide(10, 2) == 5.0)
[divide(10.0, 2.0) == 5.0, divide(10,2) == 5.0, divide(0, 1) == 0.0 ]

True


[True, True, True]

Now, we know that X/0 is a ZeroDivisionError, the question is when we do our tests what do we want the result to be? Do we want Python to raise the error? or would we prefer Python to do something else such as return a number or perhaps a string.

Remember that errors are not bad, if Python to throws an error when it gets zero as input that’s totally fine, and in this case I think I’m happy with the error. This means I have to write a test case that **expect** an error to be raised. We can do that like so…

In [26]:
try:
    divide(1, 0)
    print(False) # note that if the above line of code yields a zeroDiv error, this line of code is not executed. 
except ZeroDivisionError:
    print(True) # Test pass, dividing by zero yields an error.

True


Okay, next up we need to test for large numbers. When it came to small numbers we can easily work out the correct answer by hand, but for large sums that’s not so easy. 
 
Your first instinct here might be to say "use a calculator" and while that’s true, that solution only works in this very specific case. What we actually want is a more general solution that can solve all sorts of problems. 
 
It turns out that sometimes building code that can generate test cases is a lot easier that building the solver. In this particular example we can do just that...
 
Let's take a step back and ask ourselves what division actually is. The answer is basically the opposite of multiplication. And so, we can actually write test cases for our function by "reverse engineering" the problem. We know from math that the following is always true: 
 
    (y * y) / y = y
    (x * y) / y = x
 
And so, so long as we have a function that multiplies correctly, we can be confident that our function is getting the right answer to complex division problems *even though* we do not know what the right answer is ourselves. In code:

In [29]:
x = 30202020202020202022424354265674567456
y = 95334534534543543543545435543543545345

divide(y * y, y) == float(y)
divide(x * y, y) == float(x)

True

## Example 2: Testing n/5 function...

To make sure you understand the concept I’m trying to teach let’s try another example. Suppose we have a function that checks if a number is divisible by 5.  How can we write a tests that can prove our function works **WITHOUT** using division or modular arithmetic in our test cases?

In [1]:
def div5(x):
    return x % 5 == 0

# Test 1; positive cases
for i in range(-1000, 1000, 5):
    # remember the third argument to the range function is the step size.
    if div5(i) == False:
        print("TEST FAILED")

print("...TESTING COMPLETE...")

...TESTING COMPLETE...


So the above test checks whether our function correctly identifies numbers divisible by 5. However this is only half the job done. Let's build something to test for numbers that are not divisible by 5.

In [4]:
import random

def build_number(size):
    """
    This function returns a number that has n (or n+1) digits that is NOT divisible by 5.
    """
    
    legal_digits = "0123456789"
    digits_2 = "12346789" # <= missing 0 and 5
    
    number = ""
    for _ in range(size):
        number += random.choice(legal_digits) # <= randomly choose a digit
        
        
    if number[-1] in ["5", "0"]:
        # if the last digit is 0 or 5, then the number is divisible by 5. 
        # Thus we randomly add another digit. 
        number += random.choice(digits_2)
    
    if number[0] == 0:
        # If we have a leading zero, we get rid of it. 
        number = random.choice(digits_2) + number
        
    return int(number)

print(build_number(3))
print(build_number(23))
print(build_number(8))

593
805685903559833387139607
468131808


So, the build number function above creates random numbers but crucially for our purposes the last digit of these numbers CANNOT by either a 5 or 0. Thus, we have build a function that randomly generates numbers guaranteed not to be divisible by 5. And more importantly, nowhere does this code use division or modulo arithmetic itself.

The final part of the puzzle then is to use this function to test div5...

In [6]:
number_of_tests = 1000

for test_case in range(number_of_tests):
    number = build_number(random.choice(range(2,30)))
    if div5(number):
        print("Test Failed")
        
print("...TESTING COMPLETE...")

...TESTING COMPLETE...


## Homework assignment

'get_primes' is a function that takes a list, L, of numbers and returns a list of all the primes found within L. Your task is to finish the ‘test_it’ function below...

In [None]:
def get_primes(l):
    # This function is broken (but please dont fix it!).
    return l

def get_primes_correct(l):
    # correct implementation of get_primes
    primes = []
    for num in l:
        if num == 2:
            primes.append(num)
        elif num % 2 == 0:
            continue
        else:
            for n in range(3, math.ceil(math.sqrt(num))+1, 2):
                if num % n == 0:
                    continue
            primes.append(num)
    return primes

### YOUR CODE HERE #### 
def test_it():
    """returns a very large list of numbers (1000-to-10,000) and the expected solution when we call Get_primes on that list"""
    # Your code here (note the return statement has already been done for you!)
    # l = the big list, solution = a list containing all the primes in l. 
    return (l, solution)
    
l, solution = test_it()
solution.sort()
print(sorted(get_primes(l)) == solution)         # This test should fail.
print(sorted(get_primes_correct(l)) == solution) # This test should pass.

## Testing in the Thousands

Generally speaking the more tests you do the better. A mere ten tests does not inspire confidence that our code works. What we really want is to run hundreds of tests, but we certainly don't want to write those test cases out by hand!

We can use for-loops to call a function hundreds of times with different inputs, but there's a catch: what can we actually test for?

If we know what the answer should be in each case then we can test for correctness. But if we do not know what the answer should be then we can't test for correctness, but it is possible to check if the code will throw errors or something. 

I'm going to show you both approaches. But before I do that, lets create a function that doesn't work (some of the time).

In [9]:
def bad_divide(a, b):
    # our testing function that tests our tests! 
    # this function should work for all values EXCEPT when a > 500,000
    if a > 500000:
        return a
    else:
        return a / b

...and now let's write code to test for correctness...

In [14]:
import random

def random_ints():
    a = random.choice(range(10, 750))
    b = random.choice(range(1, a))   # b is ALWAYS smaller than a
    return a, b 

def run_test(n):
    for test in range(n):
        a, b = random_ints()
        if bad_divide(a*b, b) != float(a):
            return "TEST NO {}: A*B was: {} B was: {} Result was: {} Expected result was: {}".format(test, a*b, b, 
                                                                                       bad_divide(a*b, b), float(a))
    return "ALL {} TESTS PASSED".format(n)

Alright so we have our test function and using the math trick discussed above we can test for correctness. lets give it a spin:

In [11]:
run_test(5)

'ALL 5 TESTS PASSED'

So we five random tests and we pass them all, Great! Well actually not so great, we happen to know that bad divide is actually an incorrect function and so its bad news that our test functions failed to pick up on that. Let’s give our test function a bit more time...

In [15]:
run_test(100)

'ALL 100 TESTS PASSED'

In [16]:
run_test(300)

'ALL 300 TESTS PASSED'

In [19]:
run_test(700)

'TEST NO 535: A*B was: 516912 B was: 712 Result was: 516912 Expected result was: 726.0'

Notice here that it took quite a large number of tests before our tests could identify a software defect. Notice also that we did something clever here, when the code breaks we print data to the console and thus we now have a starting point to troubleshoot the problem.

However, as I mentioned before we can only test for correctness by the bucket-load if we have some function capable of knowing what the correct output should be.  

If we can't do that, then the only thing we can do with large random tests is test the code isn't throwing up any surprises.

For example, we could use a line like:
    
    isinstance(function, {type})

Code such as this couldn’t prove a function correct but it will potentially catch a bug or two. In this particular case, we have a function that is supposed to return lists, if there is some input out there that makes the function return strings instead we would want to know about it. As you may recall from the *"importantance or error lecture"* such testing might have been able to detect the error we had.

Another thing you can do is make educated guesses as to what the correct answer should look like. For example, in the case of the divide function above we could run hundreds of tests where we check:
    
    divide(a, b) >= a.  
    
That makes sense right? if our function is dividing A by B then, even if we don't know what the answer is exactly we can nontheless put an upper bound on its size and test for that. 

Let's now see some of these other tests in action...

In [133]:
def bad_list(n):
    """Returns a list 0 to n. """
    if n < 100:
        return [7] + list(range(n))
    elif n > 150:
        return list(range(n))

Okay, just as before, we can clearly see where and how this function will misbehave, but lets run a few tests and see if our tests can actually pin-point the problem.

In [129]:
import random

def test_list_func(n):
    for test in range(n):
        randomint = random.choice(range(0,200))
        if not isinstance(bad_list(randomint), list):  
            print("Test isinstance #{}: Failed, input: {}, output: {}".format(test, randomint, bad_list(randomint)))
            break
            
        else:
            # Smart assumption test, max of list shouldn't be greater than n
            max_test = max(bad_list(randomint)) > randomint
            if max_test:
                print("Test Max #{}: Failed. input: {}".format(test, randomint))
                break
    return "TESTING COMPLETE"

So this code is completing two tests at the same time. Firstly we are checking if the result is always a list. The second test is an intuitive one; if our function is supposed to return 0-to-N then clearly something has gone wrong is the max of the list is greater than n. 

To once again reiterate, this tests are not designed to prove correctness, but they are capable of finding a few bugs we would probably like to address. okay, lets run it. 

In [124]:
test_list_func(10)

Test isinstance #4: Failed, input: 145, output: None


'TESTING COMPLETE'

Okay, so test 4 fails, it turns out if we input 145, the function does not return a list, rather, it returns None instead. If you re-read the function this makes a lot of sense, we are asking if N < 100 or N > 150. But if N falls in the range 101-149 then both the if and elif statements are False and there isn't an else clause, so Python literally does nothing and our test function flags up the error. Rather humorously, this bug wasn't intentional! It turns out my ‘broken function’ is broken in more ways than I can imagine.

Lets run the test again and see if we can catch the other error...

In [130]:
test_list_func(100)

Test Max #20: Failed. input: 5


'TESTING COMPLETE'

So on the 20th test we get a "max test fail" on the input 5. And since we made the function terrible on purpose we actually know what happened. The number 7 gets added to the output and since 7 > 5 the max test flags up an error.

## Doctesting

The last thing to cover today is something called doctesting, this is a really quick way to making quick tests for your program.
    
    The Syntax:
    """ 
    >>> {function name} ( {function argument, if any} )
    {expected result}
    """

And then once you have done that, you'll need to copy & paste the code below to run the test:

In [21]:
def run_doctests():
    import doctest
    doctest.testmod()

By default if all your tests pass nothing will be printed, but should a doctest fail Python will give you all the juicy detail. Lets try it now:

In [22]:
def add(a, b):
    """
    Returns a + b
    
    >>> add(10, 10)
    20
    """
    return a + b

run_doctests()

We ran doctests, but since the test past nothing happened. Alright, lets show you want happens on failure:

In [136]:
def run_all_the_tests():   
    """
    >>> bad_list(4)
    [0, 1, 2, 3]
    
    >>> 1 + 1
    2
    
    >>> print(True)
    True
    
    >>> 20 + 2
    23  
    """   

    print("testing complete")
    
run_doctests()

**********************************************************************
File "__main__", line 3, in __main__.run_all_the_tests
Failed example:
    bad_list(4)
Expected:
    [0, 1, 2, 3]
Got:
    [7, 0, 1, 2, 3]
**********************************************************************
File "__main__", line 12, in __main__.run_all_the_tests
Failed example:
    20 + 2
Expected:
    23  
Got:
    22
**********************************************************************
1 items had failures:
   2 of   4 in __main__.run_all_the_tests
***Test Failed*** 2 failures.


As you can see, Python ran four tests and two of them failed. It turns out 20 + 2 does not equal 23 and bad_list (surprise surprise) it up to no good. 

Overall, I'd recommend beginners use doctesting. Its fairly easy to use and it allows you to quickly type out basic tests for your functions.

As a matter of fact, doctests are a great thing to write **BEFORE** you write the rest of your code. By spending a minute typing out a few basic test cases can help you write your code better in the first place. Why is that? Well, if you are thinking about good tests (i.e. stuff likely to break your code) at the outset, chances are you will code with that problem in mind.

Alright guys, that’s everything I think I need to say about testing. If you only learn one thing from today’s lecture please let it be thing: The secret to writing good test cases is thinking hard about all the wonderful way the code could break. 

p.s.DONT FORGET YOUR HOMEWORK!