## Consider module scoped code to configure deployment environments
The best way to do this is to override parts of your program at startup time to provide different functionality depending on your deployment environment. However, once your deployment environments get complicated, you should consider moving them out of Python constants (like TESTING) and into dedicated configuration files.

## Use repr strings for debugging output
When you're debugging with print, you should repr the value before printing to ensure that any difference in types is clear.

In [1]:
print(repr(5))

5


In [2]:
print(repr('5'))

'5'


For classes we can define our own repr special method:

In [3]:
class BetterClass(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __repr__(self):
        return("BetterClass(%d, %d)" % (self.x, self.y))
    
obj = BetterClass(1, 2)
print(obj)

BetterClass(1, 2)


Or where you don't have control over the class definition:

In [4]:
obj.__dict__

{'x': 1, 'y': 2}

## Test everything with unittest
You should always test your code, regardless of which language it's written in. The unittest module is popular, but Python users appear to be using pytest more these days.

In [5]:
class SomeClass(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def sumXY(self):
        return(self.x + self.y)
    
    def multiplyXY(self):
        return(self.x * self.y)
    

someObj = SomeClass(2, 3)
someObj.sumXY(), someObj.multiplyXY()

(5, 6)

In [12]:
import unittest

class TestSomeClass(unittest.TestCase):
        
    # setUp and tearDown methods run before and after each test respectively
    def setUp(self):
        self.someObj = SomeClass(2, 3)
        print("setUp finished!")
    
    def tearDown(self):
        del self.someObj
        print("tearDown finished!")
        print("")
        
    # the setUpClass and tearDownClass class methods are run when the class is created and destroyed respectively
    @classmethod
    def setUpClass(cls):
        print("setUpClass finished!")
        print("")
    
    @classmethod
    def tearDownClass(cls):
        print("tearDownClass finished!")
        print("")
    
    def test_init(self):
        self.assertEqual(self.someObj.x, 2)
        self.assertEqual(self.someObj.y, 3)
        print("test_init finished!")
    
    def test_sumXY(self):
        self.assertEqual(self.someObj.sumXY(), 5)
        print("test_sumXY finished!")
    
    def test_multiplyXY(self):
        self.assertEqual(self.someObj.multiplyXY(), 6)
        print("test_multiplyXY finished!")
    

def main():
    try:
        unittest.main() # this usually works, but it won't in the Jupyter notebook
    except:
        unittest.main(argv=["first-arg-is-ignored"], exit=False) # run this instead

if __name__ == "__main__":
    main()

...

setUpClass finished!

setUp finished!
test_init finished!
tearDown finished!

setUp finished!
test_multiplyXY finished!
tearDown finished!

setUp finished!
test_sumXY finished!
tearDown finished!

tearDownClass finished!




----------------------------------------------------------------------
Ran 3 tests in 0.009s

OK


A few things to note:
* We can run tests by inheriting from unittest.TestCase
* Testing methods need to start with the name 'test...'
* There are setUp and tearDown methods that can run before and after each test method. These assist in the actions of one test not polluting another.
* setUpClass and tearDownClass class methods can also be very useful. Creating persistent objects that are time consuming is a good candidate for setUpClass, whereas tearDownClass could do a clean up and delete any files that were created as part of the testing process, for example.
* pytest appears to be more popular that unittest at the time of writing.
* unittest is often used in conjunction with nose and coverage.
* There is no guarantee of the order in which tests will run, but there are ways around this.

In [13]:
unittest.TestCase.__dict__

dict_proxy({'__call__': <function unittest.case.__call__>,
            '__dict__': <attribute '__dict__' of 'TestCase' objects>,
            '__doc__': "A class whose instances are single test cases.\n\n    By default, the test code itself should be placed in a method named\n    'runTest'.\n\n    If the fixture may be used for many test cases, create as\n    many test methods as are needed. When instantiating such a TestCase\n    subclass, specify in the constructor arguments the name of the test method\n    that the instance is to execute.\n\n    Test authors should subclass TestCase for their own tests. Construction\n    and deconstruction of the test's environment ('fixture') can be\n    implemented by overriding the 'setUp' and 'tearDown' methods respectively.\n\n    If it is necessary to override the __init__ method, the base class\n    __init__ method must always be called. It is important that subclasses\n    should not change the signature of their __init__ method, since instan

## Consider interactive debugging with pdb
To initiate the debugger, all you have to do is import the pdb built-in module and run its set_trace function. As the debugger runs we can print out the value of variables, hit 'n' to go to the next line and 'c' to continue running the program.

In [15]:
def add_to_life_universe_everything(x):
    answer = 42
    import pdb; pdb.set_trace()
    answer += x
    
    return answer

add_to_life_universe_everything(12)

> <ipython-input-15-350461409232>(4)add_to_life_universe_everything()
-> answer += x
(Pdb) answer
42
(Pdb) x
12
(Pdb) n
> <ipython-input-15-350461409232>(6)add_to_life_universe_everything()
-> return answer
(Pdb) answer
54
(Pdb) c


54

## Profile before optimising
The dynamic nature of Python causes surprising behaviors in its runtime performance. Operations you might assume are slow are actually very fast. The best approach is to ignore your intuition and directly measure the performance of your program before you try to optimise it.

In [22]:
def insertion_sort(data):
    result = []
    for value in data:
        insert_value(result, value)
    return


In [23]:
def insert_value(array, value):
    for i, existing in enumerate(array):
        if existing > value:
            array.insert(i, value)
            return
    array.append(value)
    

In [24]:
from random import randint

max_size = 10**4
data = [randint(0, max_size) for _ in range(max_size)]
test = lambda: insertion_sort(data)

Python provides two built-in profilers, one that is pure Python (profile) and another that is a C-extension module (cProfile) which is the better option. The pure python alternative imposes a high overhead that will skew the results. 

In [27]:
# https://docs.python.org/2/library/profile.html
import cProfile

profiler = cProfile.Profile()
profiler.runcall(test)

In [35]:
import pstats

stats = pstats.Stats(profiler)
#stats.sort_stats("cumulative") # https://docs.python.org/2/library/profile.html#pstats.Stats.sort_stats
stats.sort_stats("tottime")
stats.print_stats()

         20003 function calls in 0.997 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.980    0.000    0.995    0.000 <ipython-input-23-6c0b42aafbe9>:1(insert_value)
     9986    0.014    0.000    0.014    0.000 {method 'insert' of 'list' objects}
        1    0.003    0.003    0.997    0.997 <ipython-input-22-e5e99c0030b8>:1(insertion_sort)
        1    0.000    0.000    0.997    0.997 <ipython-input-24-1054ed032f49>:5(<lambda>)
       14    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




<pstats.Stats instance at 0x00000000058A8EC8>

So most of the time is spent in the insert_value function which is inefficient. Replacing this with the bisect built-in module: 

In [37]:
from bisect import bisect_left

def insert_value(array, value):
    i = bisect_left(array, value)
    array.insert(i, value)


In [38]:
profiler = cProfile.Profile()
profiler.runcall(test)

In [39]:
stats = pstats.Stats(profiler)
stats.sort_stats("tottime")
stats.print_stats()

         30003 function calls in 0.024 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.012    0.000    0.012    0.000 {method 'insert' of 'list' objects}
    10000    0.005    0.000    0.005    0.000 {_bisect.bisect_left}
    10000    0.004    0.000    0.022    0.000 <ipython-input-37-9e3ac1f6182f>:3(insert_value)
        1    0.003    0.003    0.024    0.024 <ipython-input-22-e5e99c0030b8>:1(insertion_sort)
        1    0.000    0.000    0.024    0.024 <ipython-input-24-1054ed032f49>:5(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




<pstats.Stats instance at 0x00000000058A81C8>

In [40]:
stats.print_callers()

   Ordered by: internal time

Function                                           was called by...
                                                       ncalls  tottime  cumtime
{method 'insert' of 'list' objects}                <-   10000    0.012    0.012  <ipython-input-37-9e3ac1f6182f>:3(insert_value)
{_bisect.bisect_left}                              <-   10000    0.005    0.005  <ipython-input-37-9e3ac1f6182f>:3(insert_value)
<ipython-input-37-9e3ac1f6182f>:3(insert_value)    <-   10000    0.004    0.022  <ipython-input-22-e5e99c0030b8>:1(insertion_sort)
<ipython-input-22-e5e99c0030b8>:1(insertion_sort)  <-       1    0.003    0.024  <ipython-input-24-1054ed032f49>:5(<lambda>)
<ipython-input-24-1054ed032f49>:5(<lambda>)        <-
{method 'disable' of '_lsprof.Profiler' objects}   <-




<pstats.Stats instance at 0x00000000058A81C8>

## Use tracemalloc to understand memory usage and leaks
Here we have a program that wastes memory by keeping references.

In [41]:
import gc

found_objects = gc.get_objects()
print("%d objects before " % len(found_objects))

40387 objects before 


In [46]:
# waste memory here
from random import randint

class SomeObj(object):
    def __init__(self, x):
        self.x = x

someDict = {}
for i in range(1000000):
    someDict[i] = SomeObj(randint(0, 1000))


In [47]:
len(someDict.keys())

1000000

In [48]:
found_objects = gc.get_objects()
print("%d objects after " % len(found_objects))

1040432 objects after 


The problem with gc.get_objects is that it doesn't tell you anything about how the objects were allocated. Python 3.4 introduces a new tracemalloc built-in module for solving this problem. See the docs for more details: http://pytracemalloc.readthedocs.io/