In [1]:
import pandas as pd
import matplotlib.pyplot as plt

## Sheldon Cooper's Relationship Agreement

In the *Big Bang Theory* TV series, the show's resident genius - Sheldon Cooper - enjoys creating legally-binding agreements that specify the particulars of any relationship he is presently in.  His budding romance with Amy is regulated by a relationship agreement, as is his relationship with his roommate, Leonard Hofstadter. 


*insert photo here*

## Python Functions

The fundamental concept behind functions is the notion of a contract.  Just as Sheldon's relationship and roommate agreements ensure that the parties involved will act in regular and predictable ways, so too the interface to a function is like a contract.  If a call to a function passes the correct arguments in the correct order, a well written function responds in a predictable way, providing the promised output. 

Counterintuitively, the first step in creating a function is to define what it will do.  I say "counterintuitively" because most programmers simply dive right in and start writing code.  This strategy works well for small and rather simple programs.  However, you'll quickly run into trouble when the complexity begins to increase.  For that reason, seasoned software developers usually begin by writing a specification (spec) - a document that spells out the terms of the contract.  A well-written specification typically includes interface documentation as well as pseudocode, a clear statement in plain English of the steps to be taken to accomplish the function's task.

Let's illustrate this process by writing a spec for a rather simple function called *multiply.* Here's the initial specification.  

The multiply() function takes two arguments (arg_1, arg_2), multiplies them, and returns the result.

Because this function is so simple, we won't bother with the pseudocode at this point.  For more complex functions, however,  pseudocode is strongly recommended as it helps you understand the flow and logic of your program before you start coding. With all that said, here's a first look at the code. 


In [2]:

def multiply(arg_1, arg_2):
    '''This function takes two arguments (arg_1, arg2), multiplies them, and returns the result.  
    '''
    return(arg_1 * arg_2)

# end multiply() 


This is a straightforward function, if there ever was one.  Even so, a couple points need to be highlighted.  Consider, for example, the documentation (doc) string immediately following the function definition.  As you can see, this string is enclosed by three single quotes '''.  The information in a doc string displays whenever one runs help() on a given function.  Here's what you get when you run the command:

In [4]:
help(multiply)

Help on function multiply in module __main__:

multiply(arg_1, arg_2)
    This function takes two arguments (arg_1, arg2), multiplies them, and returns the result.



In his wonderful book, *Python without Fear,* Brian Overland makes the following points about doc strings (p. 322).  You'll want to keep these in mind as you create doc strings for your functions.

1. The doc string must be the first statement after the beginning (header) of the function definition.
2. Normal indentation rules apply.  The doc string must be indented under the heading of the definition, just as any
   statement.
3. The indentation requirement applies only to the first physical line. However, the cleanest style is to continue the
   indentation of the first line.
4. You can use any kind of quotation marks.  However, the literal quote marks (''') enable you to write doc strings
   that span any number of physical lines.
   
The last thing we need to point out is the comment at the end of the function definition.  The pound sign (#) indicates that anything following it is a comment.  In this case, we mark the end of the function with a comment.  When developing a module containing multiple functions, best practice encourages the placement of comments at the end of each function.  This helps you to clearly see where one function ends and another begins.

## The Broken Contract

Once a function has been developed, it's always best to rigorously test it before moving it to production.  Most programmers dislike testing as it's always more fun to write code than to debug it.  Nevertheless, testing is an incredibly important part of the development process, especially if your aim is to write "bullet-proof" code.  So to do this, let's test our new multiply() function by calling it in a variety of ways.

In [4]:
# Call multiply with two integers.
multiply(2, 2)

4

In [5]:
# Call multiply with two floats.
multiply(2.2, 4.8)

10.56

In [3]:
# Call multiply with an integer and a string.
multiply('Sheldon Roommate Agreement ', 2)

'Sheldon Roommate Agreement Sheldon Roommate Agreement '

The first two calls to multiply() return exactly what we expect.  If you pass numbers - either integers or floats - the function multiplies them and returns the correct result.  However, the third call is somewhat surprising.  If we pass a string and a number, the function multiplies the string by the number requested.  Most programming languages would not do this.  Instead, they will complain that the multiplication operator only works with numeric datatypes.  But not Python!  Python has a certain level of intelligence built into this operator.  Behind the scenes, the interpreter checks the datatypes of the two arguments and does a sensible thing, either multiplying numbers or duplicating strings.

But what happens if we try to multiply our string two and a half times?


In [4]:
multiply(2.5, 'Sheldon Roommate Agreement ')

TypeError: can't multiply sequence by non-int of type 'float'

Apparently, there are limits to Python's intelligence!  When our function is called this way, the interpreter generates an ugly *TypeError* and highlights the offending line with an arrow.  The error message essentially says that we cannot multiply a string by a float, just whole numbers (i.e. integers).

In the absence of any error-handling code, this is what happens.  The interpreter blows up and brings the currently executing code to a screeching halt.  The problem is that our function currently lacks any way of gently handling problems like this.  In other words, we need to practice what's called, "defensive programming."  When creating a new function, defensive programming practices are critically important, especially in large systems where a function call might be buried three or four levels deep.

Because Python is a *dynamically typed* language, multiply's two arguments (arg_1 and arg_2) can be any datatype you wish.  We can pass integers, floats, strings, even lists and dictionaries.  Python only blows up when it encounters an operation inside the function that doesn't support the datatypes.  For those who've programmed in other languages such as C, the answer appears to be simple.  Just check the datatypes upfront, at the start of the function.  But this is not the Python way.  Instead, Python encourages what's known as *duck typing.*  The idea is simple.  If it looks like a duck, waddles like a duck, and quacks like a duck, then why validate that it's a duck?  Or stated in programming terms, don't bother to check datatypes.  Let Python do that for you.  Unfortunately, this still leaves us with the problem of handling errors gracefully.  

Python's answer is to enclose code in *try* block(s).  What makes *try* blocks so appealing is that any errors raised will be handled gracefully in the *except* block, giving you a chance to avoid messy code meltdowns.

Here's what our function looks like after being rewritten this way:


In [13]:
def multiply(arg_1, arg_2):
    '''This function takes two arguments (arg_1, arg2), multiplies them, and returns the result.  
    '''
    try:
        return(arg_1 * arg_2)
        
    except TypeError:
        print('Datatype error in multiply().')
        return(-99)
# end test_parms

With the function rewritten, let's see what happens when we call the function, as we did earlier with a string and a float.

In [14]:
multiply(2.5, 'Sheldon Roommate Agreement ')

Datatype error in multiply().


-99

This looks a lot better.  And the best part is Python did not melt down.  The function displays our custom error message and returns a -99.  When handling errors, it's best practice to always indicate where the error occurred.  If function multiply() was part of a library, you'd want to specify both the library and function in the error message.  Also, it's important to clearly state what went wrong so corrective action can be taken.

## Continuous Integration

We're going to make another function to insert into our function (yes, a function within a function!) that will check the data types for us, so we don't have to. This way, we won't face any complications by putting in wrong data types. It may be easy to spot wrong datatypes now as creators of the function, but this comes in handy when handling other people's functions, and vice versa. 

We will be using a handy tool called assertions. They can easily check datatypes and will raise an error (stop a function) if a value asserts something as false.


In [103]:
# Assertions 

assert isinstance(2, int)

assert isinstance('hello world', str)

hello = 'hello world'

assert isinstance(hello, int), '%r is not a number!' % hello

AssertionError: 'hello world' is not a number!

In [104]:
def list_test(lis):
    assert isinstance(lis, list), '%r should be a list' % lis
    for i in range(len(lis)):
        assert isinstance(lis[i], str), '%r should contain strings' % lis

Now that we have a test function, let's implement it into our function. 

In practice, we should be thinking about our test function *before* we make our function. This is known as defensive programming. We want to really make sure our function is free of bugs by identifying them before our function is run. 

In [105]:
def plot_life_expect(countries, line_colors):
    '''plot_life_expect: This function creates a life expectancy line plot for a vector of countries.
       First, it creates a figure to plot our graph on, including axis labels and a title. 
       Then, it loops through the countries, extracting the value from our df and plotting them with 
       the respective color. The ith country corresponds to the ith color in the list. 
       
       Parameters:
       countries: vector of strings indicating the countries to obtain from gapminder.csv
       line_colors: vector of line colors indicating the colors of our lines
    '''
    for _ in locals().values():
        list_test(_)
        
    ax = plt.axes()
    ax.set(xlabel = 'Year', 
       ylabel = 'Years', 
         xlim = (1950, 2010), 
        title = 'Life Expectancy')
    
    for i in range(len(countries)):
        df_gap = df.loc[countries[i]]
        plt.plot(df_gap[['year']], df_gap[['lifeExp']], color = line_colors[i], label=countries[i])
        
    ax.legend()
    plt.show()

In [93]:
# Test our test functions! 

no_strings = ['China', 2, 'England']
no_list = {'color': 'blue'}

plot_life_expect(countries, no_list)

AssertionError: {'color': 'blue'} should be a list

In [95]:
plot_life_expect(no_strings, line_colors)

AssertionError: ['China', 2, 'England'] should contain strings