## Function basics

We have already seen a number of functions built in to python that let us do useful things to strings, collections and numbers etc. For example `len()` which is passed some kind of sequence object and returns the length of the sequence.

This is the general form of a function; it takes some input _arguments_ and returns some output based on the supplied arguments.

The arguments to a function, if any, are supplied in parentheses and the result of the function _call_ is the result of evaluating the function.


In [None]:
x = abs(-3.0)
print x

l = len("ACGGTGTCAA")
print l

As well as using python's built in functions, you can write your own. Functions are a nice way to encapsulate some code that you want to reuse elsewhere in your program, rather than repeating the same bit of code multiple times. They also provide a way to name some coherent block of code and allow you to structure a complex program.

## Function definition syntax

Functions are defined in Python using the `def` keyword followed by the name of the function. If your function takes some arguments (input data) then you can name these in parentheses after the function name. If your function does not take any arguments you still need some empty parentheses. Here we define a simple function named `sayHello` that prints a line of text to the screen:

In [None]:
def sayHello():
    print 'Hello world'

Note that the code block for the function (just a single print line in this case) is indented relative to the `def`. The above definition just decalares the function in an abstract way and nothing will be printed when the definition is made. To actually use a function you need to invoke it (call it) by using its name and a pair of round parentheses:

In [None]:
sayHello() # Call the function to print 'Hello world'

If required, a function may be written so it accepts input. Here we specify a variable called `name` in the brackets of the function definition and this variable is then used by the function. Although the input variable is referred to inside the function the variable does not represent any particular value. It only takes a value if the function is actually used in context.

In [None]:
def sayHello(name):
    print 'Hello ' + name

When we call (invoke) this function we specify a specific value for the input. Here we pass in the value `User`, so the name variable takes that value and uses it to print a message, as defined in the function. 

In [None]:
sayHello('User')  # Prints 'Hello User'

When we call the function again with a different input value we naturally get a different message. Here we also illustrate that the input value can also be passed-in as a variable (text in this case).

In [None]:
text = 'Mary'
sayHello(text)     # Prints 'Hello Mary'

A function may also generate output that is passed back or returned to the program at the point at which the function was called. For example here we define a function to do a simple calculation of the square of input (`x`) to create an output (`y`):

In [None]:
def square(x):

  y = x*x
  
  return y

Once the `return` statement is reached the operation of the function will end, and anything on the return line will be passed back as output. Here we call the function on an input number and catch the output value as result. Notice how the names of the variables used inside the function definition are separate from any variable names we may choose to use when calling the function.
  

In [None]:
number = 7
result = square(number)
print result           # Prints: 49

The function `square` and can be used from now on anywhere in your program as many times as required on any (numeric) input values we like.

In [None]:
print square(1.2e-3)   # Prints: 1.44e-6

A function can accept multiple input values, otherwise known as arguments. These are separated by commas inside the brackets of the function definition. Here we define a function that takes two arguments and performs a calculation on both, before sending back the result.


In [None]:
def calcFunc(x, y):

  z = x*x + y*y
  
  return z

result = calcFunc(1.414, 2.0)

print(result)  #  5.999396
 

Note that this function does not check that x and y are valid forms of input. For the function to work properly we assume they are numbers. Depending on how this function is going to be used, appropriate checks could be added.

Functions can be arbitrarily long and can peform very complex operations. However, to make a function reusable, it is often better to assign it a single responsibility and a descriptive name.

In [None]:
def calcDistance(vec1, vec2):
    
    assert len( vec1 ) == len( vec2 ) # check dimensions
    from math import sqrt # import square-root function
    
    d2 = 0
    
    for i in range( len( vec1 ) ):
        delta = vec1[i] - vec2[i]
        d2 += delta * delta
        
    dist = sqrt( d2 )
    return dist

Let's experiment a little with our function.

In [None]:
w1 = ( 23.1, 17.8, -5.6 )
w2 = ( 8.4, 15.9, 7.7 )
calcDistance( w1, w2 )

Note that the function is general and handles any two vectors (irrespective of their representation) as long as their dimensions are compatible:

In [None]:
calcDistance( ( 1, 2 ), ( 3, 4 ) ) # dimension: 2

In [None]:
calcDistance( [ 1, 2 ], [ 3, 4 ] ) # vectors represented as lists

In [None]:
calcDistance( ( 1, 2 ), [ 3, 4 ] ) # mixed representation

__[3.1] Excercises__

1. Write a function that takes 2 numerical arguments and returns their mean. Test your function on some examples.
2. Write another function that takes a list of numbers and returns the mean of all the numbers in the list.
3. Write a function that takes a single DNA sequence as an argument and estimates the molecular weight of this sequence. Test your function using some example sequences. The following table gives the weight of each (single-stranded) nucleotide in g/mol:

<table>
    <tr><th>DNA Residue</th><th>Weight</th></tr>
    <tr><td>A</td><td>331</td></tr>
    <tr><td>C</td><td>307</td></tr>
    <tr><td>G</td><td>347</td></tr>
    <tr><td>T</td><td>306</td></tr>
</table>

4. (Extra, if you have time) If the sequence passed in above contains `N` bases, use the mean weight of the other bases as the weight.

## Return value

There can be more than one `return` statement in a function, although typically there is only one, at the bottom. Consider the following function to get some text to say whether a number is positive or negative. It has three return statements: the first two return statements pass back text strings but the last, which would be reached if the input value were zero, has no explicit return value and thus passes back the Python `None` object. Any function code after this final return is ignored. 
The `return` keyword immediately exits the function, and no more of the code in that function will be run once the function has returned (as program flow will be returned to the call site)

In [None]:
def getSign(value):
    
    if value > 0:
        return "Positive"
    
    elif value < 0:
        return "Negative"
    
    return # implicit 'None'

    print "Hello world" # execution does not reach this line
    
print "getSign( 33.6 ):", getSign( 33.6 )
print "getSign( -7 ):", getSign( -7 )
print "getSign( 0 ):", getSign( 0 )

All of the examples of functions so far have returned only single values, however it is possible to pass back more than one value via the `return` statement. In the following example we define a function that takes two arguments and passes back three values. The return values are really passed back inside a single tuple, which can be caught as a single collection of values. 

In [None]:
def myFunction(value1, value2):
    
    total = value1 + value2
    difference = value1 - value2
    product = value1 * value2
    
    return total, difference, product

values = myFunction( 3, 7 )  # Grab output as a whole tuple
print "Results as a tuple:", values

x, y, z = myFunction( 3, 7 ) # Unpack tuple to rab individual values
print "x:", x
print "y:", y
print "z:", z

__[3.2] Exercises__

1. Write a function that counts the number of each base found in a DNA sequence. Return the result as a tuple of 4 numbers representing the counts of each base `A`, `C`, `G` and `T`.

__Advanced exercise__

2. Write a function to return the reverse-complement of a nucleotide sequence.

## Function arguments

The arguments we have passed to functions so far have all been _mandatory_, if we do not supply them or if supply the wrong number of arguments python will throw a exception:

In [None]:
def square(number):

  y = number*number
  
  return y

In [None]:
square(2)

Mandatory arguments are assumed to come in the same order as the arguments in the function definition, but you can also opt to specify the arguments using the argument names as _keywords_, supplying the values corresponding to each keyword with a `=` sign.

In [None]:
square(number=3)

In [None]:
def repeat(seq, n):
    result = ''
    for i in range(0,n):
        result += seq
    return result

print repeat("CTA", 3)
print repeat(n=4, seq="GTT")

Unnamed (positional) arguments must come before named arguments, even if they look to be in the right order.

In [None]:
print repeat(seq="CTA", 3)

Sometimes it is useful to give some arguments a default value that the caller can override, but which will be used if the caller does not supply a value for this argument. We can do this by assigning some value to the named argument with the `=` operator in the function definition.

In [None]:
def runSimulation(nsteps=1000):
    print "Running simulation for", nsteps, "steps"

runSimulation(500)
runSimulation()

**CAVEAT**: default arguments are defined once and keep their state between calls. This can be a problem for *mutable* objects:

In [None]:
def myFunction(parameters=[]):
    parameters.append( 100 )
    print parameters
    
myFunction()
myFunction()
myFunction()

One can either create a "new" default every time a function is run:

In [None]:
def myFunction(parameters=None):
    
    if parameters is None:
        parameters = []
        
    parameters.append( 100 )
    print parameters
    
myFunction()
myFunction()

... or avoid modifying *mutable* default arguments:

In [None]:
def myFunction(parameters=[]):
    print parameters + [ 100 ]
    
myFunction()
myFunction()

Arrange function arguments so that *mandatory* arguments come first:

In [None]:
def runSimulation(initialTemperature, nsteps=1000):
    print "Running simulation starting at %s K and doing %s steps" % ( initialTemperature, nsteps )
    
runSimulation(300, 500)
runSimulation(300)

In [None]:
def badFunction(nsteps=1000, initialTemperature):
    pass


As before, no positional argument can appear after a keyword argument, and all required arguments must still be provided.

In [None]:
runSimulation( nsteps=100, 300 )

In [None]:
runSimulation( nsteps=100 )

Keyword names must naturally match to those declared:

In [None]:
runSimulation( numSteps=100 )

__[3.3] Exercises__

1. Extend your solution to the previous exercise estimating the weight of a DNA sequence so that it can also calculate the weight of an RNA sequence, use an optional argument to specify the molecule type, but default to DNA. The weights of RNA residues are:

<table>
    <tr><th>RNA Residue</th><th>Weight</th></tr>
    <tr><td>A</td><td>347</td></tr>
    <tr><td>C</td><td>323</td></tr>
    <tr><td>G</td><td>363</td></tr>
    <tr><td>U</td><td>324</td></tr>
</table>


## Variable scope

Every variable in python has a _scope_ in which it is defined. Variables defined at the outermost level are known as _globals_ (although typically only for the current module). In contrast, variables defined within a function are local, and cannot be accessed from the outside.

In [None]:
def mathFunction(x, y):
    result = ( x + y ) * ( x - y )
    return result

In [None]:
answer = mathFunction( 4, 7 )
print answer

In [None]:
answer = mathFunction( 4, 7 )
print result

Generally, variables defined in an outer scope are also visible in functions, but you should be careful manipulating them as this can lead to confusing code and python will actually raise an error if you try to change the value of a global variable inside a function. Instead it is a good idea to avoid using global variables and, for example, to pass any necessary variables as parameters to your functions.

In [None]:
counter = 1
def increment(): 
    print counter
    counter += 1

increment()
print counter

If you really want to do this, there is a way round this using the `global` statement. But it is normally better to avoid global variables and passing through arguments instead:

In [None]:
def increment(counter): 
    return counter + 1

counter = 0
counter = increment( counter ) 
print counter

## Bio exercises

### 1) Translate RNA sequence to protein sequence

Write a function that translates a RNA sequence into a sequence of amino acids. The function should take 2 arguments, a RNA sequence and a dictionary that defines the standard genetic code.

For mapping codons to amino acids you can use the dictionary `standardGeneticCode` defined below. Notice that it only maps strings in upper case, so make sure that `codon` is in upper case before your look up. You can translate codon into an upper case with the `upper()` method: `codon = codon.upper()`

`standardGeneticCode = { 
          'UUU':'Phe', 'UUC':'Phe', 'UCU':'Ser', 'UCC':'Ser',
          'UAU':'Tyr', 'UAC':'Tyr', 'UGU':'Cys', 'UGC':'Cys',
          'UUA':'Leu', 'UCA':'Ser', 'UAA':None,  'UGA':None,
          'UUG':'Leu', 'UCG':'Ser', 'UAG':None,  'UGG':'Trp',
          'CUU':'Leu', 'CUC':'Leu', 'CCU':'Pro', 'CCC':'Pro',
          'CAU':'His', 'CAC':'His', 'CGU':'Arg', 'CGC':'Arg',
          'CUA':'Leu', 'CUG':'Leu', 'CCA':'Pro', 'CCG':'Pro',
          'CAA':'Gln', 'CAG':'Gln', 'CGA':'Arg', 'CGG':'Arg',
          'AUU':'Ile', 'AUC':'Ile', 'ACU':'Thr', 'ACC':'Thr',
          'AAU':'Asn', 'AAC':'Asn', 'AGU':'Ser', 'AGC':'Ser',
          'AUA':'Ile', 'ACA':'Thr', 'AAA':'Lys', 'AGA':'Arg',
          'AUG':'Met', 'ACG':'Thr', 'AAG':'Lys', 'AGG':'Arg',
          'GUU':'Val', 'GUC':'Val', 'GCU':'Ala', 'GCC':'Ala',
          'GAU':'Asp', 'GAC':'Asp', 'GGU':'Gly', 'GGC':'Gly',
          'GUA':'Val', 'GUG':'Val', 'GCA':'Ala', 'GCG':'Ala', 
          'GAA':'Glu', 'GAG':'Glu', 'GGA':'Gly', 'GGG':'Gly'}`

### 2) Sliding window analysis of GC content along chain

Write a function that calculates overlapping sliding windows along a DNA sequence, from start to end, and returns the GC content in each window. The function should take two arguments, the DNA sequence and the size of the sliding window.

You can use the `calculate_windows` function defined below:

In [None]:
def calculate_windows(seq, winSize): 
    """This function takes a given sequence (seq) and a sliding window size (WinSize)
    and returns all sub-sequences acording to the size of the sliding window.
    Notice that the sub-sequences are overlapping and their size is fixed according to winSize.
    """ 
    if winSize <= 0:
        raise Exception("Window size must be a positive integer")
    if winSize > len(seq): 
        raise Exception("Window size is larger than sequence length")
    result = []
    nrWindows = len(seq)-winSize+1
    for i in range(nrWindows):
        subSeq = seq[i:i+winSize]
        result.append(subSeq)
    return result

__For plotting:__

You can plot the GC content along the DNA sequence using the `matplotlib` plotting library:

In [None]:
gcResults = [0.5, 0.5, 0.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.5, 0.6, 0.6, 0.6, 0.6, 0.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.5, 0.6, 0.5, 0.6, 0.6, 0.6, 0.6, 0.5, 0.6, 0.6, 0.7, 0.7, 0.8, 0.8, 0.8, 0.7, 0.6, 0.7, 0.7, 0.7, 0.7]
from matplotlib import pyplot
pyplot.plot( gcResults )
pyplot.axis([0, 50, .45, .85])
pyplot.ylabel('%GC')
pyplot.title('GC plot')
pyplot.text(12, .7, "this is some text!")
pyplot.show()

### 3) Sliding window analysis of hydrophobicity

Similarly to the exercise above, write a function that calculates overlapping sliding windows along a protein sequence, from start to end, and returns the `sum` of the aminoacids hydrophobicity scale in each window. You can use the same `calculate_windows` function defined in the previous exercise.

The dictionary `gesScale` contains the hydrophobicity scale for each aminoacid:

`gesScale = {'F': -3.7, 'M': -3.4, 'I': -3.1,' L': -2.8, 'V': -2.6,
            'C': -2.0, 'W': -1.9, 'A': -1.6,' T': -1.2, 'G': -1.0,
            'S': -0.6, 'P': 0.2, 'Y': 0.7, 'H': 3.0, 'Q': 4.1,
            'N': 4.8, 'E': 8.2, 'K': 8.8, 'D': 9.2,' R': 12.3}`

__For plotting:__

You can plot the hydrophobicity scores using the `matplotlib` plotting library:

In [None]:
scores = [11.499999999999998, 23.4, 7.400000000000001, 12.700000000000001, 16.700000000000003, 14.000000000000004, 12.400000000000002, 6.600000000000002, 18.2, 11.3, 7.400000000000001, 9.0, 9.0, 4.300000000000001, 15.899999999999999, 31.6, 31.599999999999998, 35.5, 39.6, 39.599999999999994, 42.3, 45.8, 53.39999999999999, 42.400000000000006, 45.400000000000006, 46.00000000000001, 44.10000000000001, 46.300000000000004, 39.4, 35.4, 27.199999999999996]
from matplotlib import pyplot
pyplot.plot( scores )
pyplot.grid(True)
pyplot.show()

## Advanced topics

### Anonymous functions

In some circumstances, it is convenient to be able to define a short function without giving it a definition name, though usually it will be assigned to a variable. These are introduced with the `lambda` keyword (after the Lisp programming language). 

The syntax is `lambda` followed by a list of required arguments seperated by commas. The body of the function is defined after a colon `:` and can only contain a single expression which, when evaluated, will be the return value from the function. Anonymous functions cannot contain loops or conditions (except the special one line `if` statement and list comprehensions).

In [None]:
double = lambda x: x ** 2
double( 3 )

Anonymous functions are never necessary, you can always define an equivalent normal named function, but they can be useful when a function has limited use or is very short. A prime example is sorting:

In [None]:
values = [ "africa", "Australia", "EUROPE"  ]
values.sort()
print values

Assume we want to sort these case insensitively:

In [None]:
values.sort( key = lambda x: x.upper() )
print values

You can use this technique to sort a list by arbitrary properties of the elements of a list. For example we might want to sort some DNA fragments according their length:

In [None]:
frags = ["ACTGTGT", "TTCTG", "TTA", "GGTGTACAT", "AAAATCTGAAA"]
frags.sort()
print "sorted alphabetically:",frags

frags.sort(key = lambda s: len(s))
print "sorted by length:",frags

### Functions as values

Functions in python are values just like strings and numbers. If you want to refer to a function without calling it, just use the name of the function without parentheses.

In this way, functions can be assigned to variables and passed as arguments to other functions. returned from functions etc.

In [None]:
myabs = abs # 'myabs' points to 'abs'
print "Assessed through myabs:", myabs( -3.0 )

abs = square # reassign 'abs' to point to 'square'
print "After reassignment:", abs( -3.0 )

abs = myabs # reinstate original 'abs'
print "After reinstatement:", abs( -3.0 )

### Nested functions

You can define functions in any scope in python, and so you can define a function within another function.

As an example, in this case we return the function we have defined internally. This technique is called *currying* or *partial application*. It converts a given function to one with a reduced argument list.

In [None]:
def exponential(base):
    def fixed_base(exponent):
        return pow( base, exponent )
    
    return fixed_base

base2 = exponential(2)
base2(3)