# <center> STATS 607 - LECTURE 13
## <center> 10/22/2018

## Testing

You cannot scape it! If there is something that you'll always have to do while developing a computer program is to test the program. There are many ways you can do it and strategies have been developed for this very same purpose. Here, we'll talk a little about some of this strategies. Some of you have already used this in your assignments, but we'll start with assertions:

<ul>
    <li>An assertion is simply a statement that something holds true at a particular point in a program. Assertions can be used to ensure that inputs are valid, outputs are consistent, and so on.</li><br/>
    <li>The approach based on asserting input/output values is sometimes called *programming by contract*.</li>
    <br/>
    <ul>
        <li>First, they ensure that if something does go wrong, the program will halt immediately, which simplifies debugging.</li><br/>
        <li>Second, assertions are executable documentation, i.e., they explain the program as well as checking its behavior. This makes them more useful in many cases than comments since the reader can be sure that they are accurate and up to date.</li><br/>
    </ul>
</ul>

### Example:

In [1]:
class MyDB:
    def __init__(self):
        """"""
        self._id2name_map = {}
        self._name2id_map = {}
 
    def add(self, id, name):
        """"""
        self._name2id_map[name] = id
        self._id2name_map[id] = name
 
    def get_id_by_name(self, name):
        """"""
        return self._name2id_map[name]
            
    def print_id2name_map(self):
        """"""
        for i,j in self._id2name_map.items():
            print(i,j)
            
    def print_name2id_map(self):
        """"""
        for i,j in self._name2id_map.items():
            print(i,j)

Lets create an instance of MyDB, so then to attempt to add a couple of key, values:

In [2]:
inst = MyDB() # Creates an instance of MyDB.

In [3]:
inst.add(1,1000) 
inst.add(1,'Name2')

Lets see what we have there:

In [4]:
inst.print_id2name_map()

1 Name2


In [5]:
inst.print_name2id_map()

1000 1
Name2 1


In [6]:
inst.get_id_by_name(1000)

1

There are a couple of problems with the above. Can you identify it? Lets rewrite MyDB in a way that 'partially solves' those problems:

In [7]:
class MyDB:
    def __init__(self):
        """"""
        self._id2name_map = {}
        self._name2id_map = {}
 
    def add(self, id, name):
        """"""
        assert isinstance(id,int), "id is not an integer: %r" % id
        assert isinstance(name,str), "name is not a string: %r" % name
        self._name2id_map[name] = id
        self._id2name_map[id] = name
        
    def get_id_by_name(self, name):
        """"""
        id = self._name2id_map[name]
        assert self._id2name_map[id] == name, 'Problem!'
        return id
    
    def print_id2name_map(self):
        """"""
        for i,j in self._id2name_map.items():
            print(i,j)
            
    def print_name2id_map(self):
        """"""
        for i,j in self._name2id_map.items():
            print(i,j)

In [8]:
inst = MyDB()
inst.add(1,1000) # This should now give an assertion error.
inst.add(1,'Name2')

AssertionError: name is not a string: 1000

In [9]:
inst = MyDB()
inst.add(1,'Name1')
inst.add(1,'Name2')

In [10]:
inst.get_id_by_name('Name1') # Problem: there is no 1 associated with Name1.

AssertionError: Problem!

In [11]:
inst = MyDB()
inst.add(1,'Name1')
inst.add(2,'Name2')

In [12]:
inst.get_id_by_name('Name1')

1

In [13]:
inst.get_id_by_name('Name2')

2

Fixing bugs that have been identified is often easier if you use a symbolic debugger to track them down. The module pdb defines an interactive source code debugger for Python programs. It supports setting (conditional) breakpoints and single stepping at the source line level, inspection of stack frames, source code listing, and evaluation of arbitrary Python code in the context of any stack frame. Lets see an example of this:

In [14]:
import pdb;

def add_to_life_universe_everything(x):
    """
    """
    answer = 42
    pdb.set_trace()
    answer += x
    
    return answer

In [15]:
add_to_life_universe_everything(12)

> <ipython-input-14-01ffaed4a817>(8)add_to_life_universe_everything()
-> answer += x
(Pdb) answer
42
(Pdb) next
> <ipython-input-14-01ffaed4a817>(10)add_to_life_universe_everything()
-> return answer
(Pdb) answer
54
(Pdb) next
--Return--
> <ipython-input-14-01ffaed4a817>(10)add_to_life_universe_everything()->54
-> return answer
(Pdb) continue


54

### Run automated tests

  <ul>
      <li>Ensure a single unit of code returns the correct results (unit tests), that pieces of code work correctly when combined (integration tests).</li><br/>
      <li>Creating and managing tests is easier if programmers use an off-the-shelf unit testing library to initialize inputs, run tests, and report their results in a uniform way.</li><br/>
      <li>One way of generating tests is to check to see whether the code matches the researcher's expectations of its behavior.</li><br/>
      <li>Another approach for generating tests is to turn bugs into test cases by writing tests that trigger a bug that has been found in the code.</li><br/>
      <li>Finally, there is a whole approach called **Test-Driven Development** </li>
  </ul>

In [16]:
from collections import Counter

Counter(str.split('foo bar foo  ')) # Generates a word frequency dictionary.

Counter({'bar': 1, 'foo': 2})

In [17]:
from unittest import TestCase

In [18]:
class TestUnit(TestCase): # TestUnit inherits from TestCase
    def test_wordcount(self):
        self.assertDictEqual({'foo' : 1, 'bar' : 1}, Counter(str.split('foo bar foo  '))) # This two need to be equal.

Why do we have an error with the code below?

In [19]:
test = TestUnit() # Creates an instance of TestUnit.
test.test_wordcount() # One of the tests we want to get done.

AssertionError: {'foo': 1, 'bar': 1} != Counter({'foo': 2, 'bar': 1})
- {'bar': 1, 'foo': 1}
+ Counter({'foo': 2, 'bar': 1})

Lets try this again:

In [20]:
class TestUnit(TestCase):
    def test_wordcount(self):
        self.assertDictEqual({'foo' : 2, 'bar' : 1}, Counter(str.split('foo bar foo  ')))

In [21]:
test = TestUnit()
test.test_wordcount()

## Profiling

Profilers run code and give you a detailed breakdown of execution times, allowing you to identify bottlenecks in your programs.

In Donald Knuth's paper "Structured Programming with ``go to`` Statements", he wrote: 
> Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We *should* forget about small efficiencies, say about 97% of the time: **premature optimization is the root of all evil**. Yet *we should not pass up our opportunities in that critical 3%*."

(Ref: http://wiki.c2.com/?PrematureOptimization)

Here are some things to have in mind:

<ul>
    <li>Before optimizing it, make sure your code works correctly: it is better to have correct program running slow than (subtly) incorrect program running fast!</li><br/>
    <li>Determine if it is actually worth speeding that piece of code up.</li><br/>
    <li>If it is, use a profiler to identify bottlenecks.</li><br/>
    <li>You can be more productive when you write code in the highest-level language possible.</li>
</ul>

cProfile and profile provide deterministic profiling of Python programs. A profile is a set of statistics that describes how often and for how long various parts of the program executed. These statistics can be formatted into reports via the pstats module.

In [22]:
import cProfile

In [23]:
def called_by_profile_command(i,y):
    """"""
    lst = []
    for j in range(y):
        if j==i:
            lst.append(j)
    return lst
            
def profile_command(x,y):
    """"""
    lst = [called_by_profile_command(i,y) for i in range(x)] # Notice this functions calls the above function.
    output = [i[0] for i in lst if len(i)==1]
    return output

What is the code above doing?

In [24]:
cProfile.run("result=profile_command(5,10)")
print(result)

         21 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5    0.000    0.000    0.000    0.000 <ipython-input-23-33eb9862b3b9>:1(called_by_profile_command)
        1    0.000    0.000    0.000    0.000 <ipython-input-23-33eb9862b3b9>:11(<listcomp>)
        1    0.000    0.000    0.000    0.000 <ipython-input-23-33eb9862b3b9>:12(<listcomp>)
        1    0.000    0.000    0.000    0.000 <ipython-input-23-33eb9862b3b9>:9(profile_command)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        5    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        5    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


[0, 1, 2, 3, 4]


Lets now execute the function that will result in more time spent executing it:

In [25]:
cProfile.run("result=profile_command(10000, 5000)")

         25006 function calls in 2.447 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    2.440    0.000    2.440    0.000 <ipython-input-23-33eb9862b3b9>:1(called_by_profile_command)
        1    0.005    0.005    2.445    2.445 <ipython-input-23-33eb9862b3b9>:11(<listcomp>)
        1    0.001    0.001    0.002    0.002 <ipython-input-23-33eb9862b3b9>:12(<listcomp>)
        1    0.000    0.000    2.447    2.447 <ipython-input-23-33eb9862b3b9>:9(profile_command)
        1    0.000    0.000    2.447    2.447 <string>:1(<module>)
        1    0.000    0.000    2.447    2.447 {built-in method builtins.exec}
    10000    0.001    0.000    0.001    0.000 {built-in method builtins.len}
     5000    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




Lets try now a different example. It takes almost no time to run, but we can see all the calls made:

In [26]:
import re
cProfile.run('re.compile("foo|bar")')

         199 function calls (194 primitive calls) in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        4    0.000    0.000    0.000    0.000 enum.py:265(__call__)
        4    0.000    0.000    0.000    0.000 enum.py:515(__new__)
        2    0.000    0.000    0.000    0.000 enum.py:801(__and__)
        1    0.000    0.000    0.000    0.000 re.py:231(compile)
        1    0.000    0.000    0.000    0.000 re.py:286(_compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:223(_compile_charset)
        1    0.000    0.000    0.000    0.000 sre_compile.py:250(_optimize_charset)
        1    0.000    0.000    0.000    0.000 sre_compile.py:414(_get_literal_prefix)
        1    0.000    0.000    0.000    0.000 sre_compile.py:441(_get_charset_prefix)
        1    0.000    0.000    0.000    0.000 sre_compile.py:482(_compile_info)
        2  

You can also use 'Timer' for small pieces of code - we have seen examples of those in class...

## Use modularization (DRY)

<ul> 
    <li>Anything that is repeated in two or more places is more difficult to maintain.</li><br/>
    <li>Every piece of data must have a single authoritative representation in the system</li><br/>
    <ul>
        <li>Physical constants ought to be defined exactly once.</li>
    </ul><br/>
    <li>Modularize code rather than copying and pasting.</li><br/>
</ul>

## Don't reinvent the wheel

<p/>
"**Reinventing the wheel**" is usually counter-productive and inefficient.<br/><br/>
"*Stand on the shoulders of giants*"! Consider using:
<ul>
    <li> Standard library of your language. Chances that the language authors thought of it! </li><br/>
    <li> Third-party libraries, especially well-known in your field (such as ``NumPy``). Chances that you are not the first person facing this problem! </li><br/>
    <li> Specialized distributions (``Anaconda``,  ``Intel distribution for Python``).</li><br/>
</ul>

## Use an Interactive Development Environment (IDE)

### Core Features

<ul>
  <li>Code completion</li>
  <li>Resource management</li>
  <li>Debugging tools</li>
  <li>Compile and build</li>
</ul>

### Advantages

<ul>
  <li>Less time and effort</li>
  <li>Enforce project or company standards</li>
  <li>Project management</li>
</ul>

### Disadvantages

<ul>
  <li>Learning Curve</li>
  <li>A sophisticated IDE may not be a good tool for beginning programmers</li>
  <li>Will not fix bad code, practices, or design</li>
  <li>Often heavy on resources</li>
  <li>Enforced workflow may not be your preferred one</li>
</ul>

### IDEs Examples

<ul>
  <li>Spyder for Python applications</li>
  <li>RStudio for R applications</li>
  <li>Eclipse for Java applications</li>
  <li>XCode for C++ applications</li>
  <li>Much more...</li>
</ul>

## Use a Version Control System (VCS)

<li>When working with code and data, you need to keep track of the changes and collaborate on a program or dataset.</li><br/>
<li>Typical solutions are to email software to colleagues or to copy successive versions of it to a shared folder, e.g., Dropbox (http://www.dropbox.com) - Don't do this!</li><br/>
<li>Use a VCS - A VCS stores snapshots of a project's files in a repository (or a set of repositories).</li><br/>
<li>Crucially, if several people have edited files simultaneously, the VCS highlights the differences and requires them to resolve any conflicts before accepting the changes.</li><br/>
<li>The VCS also stores the entire history of those files.</li><br/>
<li>Many good VCS are open source and freely available:</li><br/>
    <ul>
        <li>Git (http://git-scm.com)</li>
        <li>Subversion (http://subversion.apache.org)</li>
        <li>Mercurial (http://mercurial.selenic.com)</li><br/> 
    </ul>
<li>There are also free hosting services available:</li><br/>
    <ul>
        <li>GitHub (https://github.com)</li> 
        <li>GitLab (https://gitlab.com)</li> 
        <li>BitBucket (https://bitbucket.org)</li> 
    </ul>

## The Kiss principle

**KISS** stands for **K**eep **I**t **S**imple and **S**tupid (*figuratively speaking*!)
<ul>
    <li> Other conditions being equal, prefer simplicity to complexity. </li><br/>
    <li> Do not try to be too clever. Think of other people reading your code — or of yourself 2 weeks down the road!</li><br/>
    <li> If you think you need performance, first think again.</li><br/>
    <li> If you do need clever tricks to achieve performance, comment and document extensively.</li><br/>
</ul>