# An introduction to solving biological problems with Python

## Session 2.1: Functions

- [Function definition syntax](#Function-definition-syntax)
- [Return value](#Return-value)
- [Function arguments](#Function-arguments)
- [Variable scope](#Variable-scope)

## Function basics

We have already seen a number of functions built in to python that let us do useful things to strings, collections and numbers etc. For example `len()` which is passed some kind of sequence object and returns the length of the sequence.

This is the general form of a function; it takes some input _arguments_ and returns some output based on the supplied arguments.

The arguments to a function, if any, are supplied in parentheses and the result of the function _call_ is the result of evaluating the function.


In [None]:
x = abs(-3.0)
print(x)

l = len("ACGGTGTCAA")
print(l)

As well as using python's built in functions, you can write your own. Functions are a nice way to encapsulate some code that you want to reuse elsewhere in your program, rather than repeating the same bit of code multiple times. They also provide a way to name some coherent block of code and allow you to structure a complex program.

## Function definition syntax

Functions are defined in Python using the `def` keyword followed by the name of the function. If your function takes some arguments (input data) then you can name these in parentheses after the function name. If your function does not take any arguments you still need some empty parentheses. Here we define a simple function named `sayHello` that prints a line of text to the screen:

In [None]:
def sayHello():
    print('Hello world')

Note that the code block for the function (just a single print line in this case) is indented relative to the `def`. The above definition just decalares the function in an abstract way and nothing will be printed when the definition is made. To actually use a function you need to invoke it (call it) by using its name and a pair of round parentheses:

In [None]:
sayHello() # Call the function to print 'Hello world'

If required, a function may be written so it accepts input. Here we specify a variable called `name` in the brackets of the function definition and this variable is then used by the function. Although the input variable is referred to inside the function the variable does not represent any particular value. It only takes a value if the function is actually used in context.

In [None]:
def sayHello(name):
    print('Hello ' + name)

When we call (invoke) this function we specify a specific value for the input. Here we pass in the value `User`, so the name variable takes that value and uses it to print a message, as defined in the function. 

In [None]:
sayHello('User')  # Prints 'Hello User'

When we call the function again with a different input value we naturally get a different message. Here we also illustrate that the input value can also be passed-in as a variable (text in this case).

In [None]:
text = 'Mary'
sayHello(text)     # Prints 'Hello Mary'

A function may also generate output that is passed back or returned to the program at the point at which the function was called. For example here we define a function to do a simple calculation of the square of input (`x`) to create an output (`y`):

In [None]:
def square(x):

  y = x*x
  
  return y

Once the `return` statement is reached the operation of the function will end, and anything on the return line will be passed back as output. Here we call the function on an input number and catch the output value as result. Notice how the names of the variables used inside the function definition are separate from any variable names we may choose to use when calling the function.
  

In [None]:
number = 7
result = square(number)
print(result)           # Prints: 49

The function `square` and can be used from now on anywhere in your program as many times as required on any (numeric) input values we like.

In [None]:
print(square(1.2e-3))   # Prints: 1.44e-6

A function can accept multiple input values, otherwise known as arguments. These are separated by commas inside the brackets of the function definition. Here we define a function that takes two arguments and performs a calculation on both, before sending back the result.


In [None]:
def calcFunc(x, y):

  z = x*x + y*y
  
  return z

result = calcFunc(1.414, 2.0)

print(result)  #  5.999396
 

Note that this function does not check that x and y are valid forms of input. For the function to work properly we assume they are numbers. Depending on how this function is going to be used, appropriate checks could be added.

Functions can be arbitrarily long and can peform very complex operations. However, to make a function reusable, it is often better to assign it a single responsibility and a descriptive name.

In [1]:
def calcDistance(vec1, vec2):
    
    #assert len( vec1 ) == len( vec2 ) # check dimensions
    from math import sqrt # import square-root function
    
    d2 = 0
    
    for i in range( len( vec1 ) ):
        delta = vec1[i] - vec2[i]
        d2 += delta * delta
        
    dist = sqrt( d2 )
    return dist

Let's experiment a little with our function.

In [4]:
w1 = ( 23.1, 17.8, -5.6 )
w2 = ( 8.4, 15.9, 7.7 )
calcDistance( w1, w2 )

19.914567532336726

Note that the function is general and handles any two vectors (irrespective of their representation) as long as their dimensions are compatible:

In [None]:
calcDistance( ( 1, 2 ), ( 3, 4 ) ) # dimension: 2

In [None]:
calcDistance( [ 1, 2 ], [ 3, 4 ] ) # vectors represented as lists

In [None]:
calcDistance( ( 1, 2 ), [ 3, 4 ] ) # mixed representation

__[3.1] Excercises__

1. Write a function that takes 2 numerical arguments and returns their mean. Test your function on some examples.
2. Write another function that takes a list of numbers and returns the mean of all the numbers in the list.
3. Write a function that takes a single DNA sequence as an argument and estimates the molecular weight of this sequence. Test your function using some example sequences. The following table gives the weight of each (single-stranded) nucleotide in g/mol:

<table>
    <tr><th>DNA Residue</th><th>Weight</th></tr>
    <tr><td>A</td><td>331</td></tr>
    <tr><td>C</td><td>307</td></tr>
    <tr><td>G</td><td>347</td></tr>
    <tr><td>T</td><td>306</td></tr>
</table>


4. (Extra, if you have time) If the sequence passed in above contains `N` bases, use the mean weight of the other bases as the weight.

## Return value

There can be more than one `return` statement in a function, although typically there is only one, at the bottom. Consider the following function to get some text to say whether a number is positive or negative. It has three return statements: the first two return statements pass back text strings but the last, which would be reached if the input value were zero, has no explicit return value and thus passes back the Python `None` object. Any function code after this final return is ignored. 
The `return` keyword immediately exits the function, and no more of the code in that function will be run once the function has returned (as program flow will be returned to the call site)

In [None]:
def getSign(value):
    
    if value > 0:
        return "Positive"
    
    elif value < 0:
        return "Negative"
    
    return # implicit 'None'

    print("Hello world") # execution does not reach this line
    
print("getSign( 33.6 ):", getSign( 33.6 ))
print("getSign( -7 ):", getSign( -7 ))
print("getSign( 0 ):", getSign( 0 ))

All of the examples of functions so far have returned only single values, however it is possible to pass back more than one value via the `return` statement. In the following example we define a function that takes two arguments and passes back three values. The return values are really passed back inside a single tuple, which can be caught as a single collection of values. 

In [None]:
def myFunction(value1, value2):
    
    total = value1 + value2
    difference = value1 - value2
    product = value1 * value2
    
    return total, difference, product

values = myFunction( 3, 7 )  # Grab output as a whole tuple
print("Results as a tuple:", values)

x, y, z = myFunction( 3, 7 ) # Unpack tuple to rab individual values
print("x:", x)
print("y:", y)
print("z:", z)

__[3.2] Exercises__

1. Write a function that counts the number of each base found in a DNA sequence. Return the result as a tuple of 4 numbers representing the counts of each base `A`, `C`, `G` and `T`.

__Advanced exercise__

2. Write a function to return the reverse-complement of a nucleotide sequence.

## Function arguments

The arguments we have passed to functions so far have all been _mandatory_, if we do not supply them or if supply the wrong number of arguments python will throw a exception:

In [None]:
def square(number):

  y = number*number
  
  return y

In [None]:
square(2)

Mandatory arguments are assumed to come in the same order as the arguments in the function definition, but you can also opt to specify the arguments using the argument names as _keywords_, supplying the values corresponding to each keyword with a `=` sign.

In [None]:
square(number=3)

In [None]:
def repeat(seq, n):
    result = ''
    for i in range(0,n):
        result += seq
    return result

print(repeat("CTA", 3))
print(repeat(n=4, seq="GTT"))

Unnamed (positional) arguments must come before named arguments, even if they look to be in the right order.

In [None]:
print(repeat(seq="CTA", 3))

Sometimes it is useful to give some arguments a default value that the caller can override, but which will be used if the caller does not supply a value for this argument. We can do this by assigning some value to the named argument with the `=` operator in the function definition.

In [None]:
def runSimulation(nsteps=1000):
    print("Running simulation for", nsteps, "steps")

runSimulation(500)
runSimulation()

**CAVEAT**: default arguments are defined once and keep their state between calls. This can be a problem for *mutable* objects:

In [6]:
def myFunction(parameters=[]):
    parameters.append( 100 )
    print(parameters)
    
myFunction()
myFunction()
myFunction()

[100]
[100, 100]
[100, 100, 100]


One can either create a "new" default every time a function is run:

In [8]:
def myFunction(parameters=None):
    
    if parameters is None:
        parameters = []
        
    parameters.append( 100 )
    print(parameters)
    
myFunction()
myFunction()

[100]
[100]


... or avoid modifying *mutable* default arguments:

In [10]:
def myFunction(parameters=[]):
    print(parameters + [ 100 ])
    
myFunction()
myFunction()

[100]
[100]


Arrange function arguments so that *mandatory* arguments come first:

In [None]:
def runSimulation(initialTemperature, nsteps=1000):
    print("Running simulation starting at %s K and doing %s steps" % ( initialTemperature, nsteps ))
    
runSimulation(300, 500)
runSimulation(300)

In [None]:
def badFunction(nsteps=1000, initialTemperature):
    pass


As before, no positional argument can appear after a keyword argument, and all required arguments must still be provided.

In [None]:
runSimulation( nsteps=100, 300 )

In [None]:
runSimulation( nsteps=100 )

Keyword names must naturally match to those declared:

In [None]:
runSimulation( numSteps=100 )

__[3.3] Exercises__

1. Extend your solution to the previous exercise estimating the weight of a DNA sequence so that it can also calculate the weight of an RNA sequence, use an optional argument to specify the molecule type, but default to DNA. The weights of RNA residues are:

<table>
    <tr><th>RNA Residue</th><th>Weight</th></tr>
    <tr><td>A</td><td>347</td></tr>
    <tr><td>C</td><td>323</td></tr>
    <tr><td>G</td><td>363</td></tr>
    <tr><td>U</td><td>324</td></tr>
</table>


## Variable scope

Every variable in python has a _scope_ in which it is defined. Variables defined at the outermost level are known as _globals_ (although typically only for the current module). In contrast, variables defined within a function are local, and cannot be accessed from the outside.

In [None]:
def mathFunction(x, y):
    result = ( x + y ) * ( x - y )
    return result

In [None]:
answer = mathFunction( 4, 7 )
print(answer)

In [None]:
answer = mathFunction( 4, 7 )
print(result)

Generally, variables defined in an outer scope are also visible in functions, but you should be careful manipulating them as this can lead to confusing code and python will actually raise an error if you try to change the value of a global variable inside a function. Instead it is a good idea to avoid using global variables and, for example, to pass any necessary variables as parameters to your functions.

In [None]:
counter = 1
def increment(): 
    print(counter)
    counter += 1

increment()
print(counter)

If you really want to do this, there is a way round this using the `global` statement. But it is normally better to avoid global variables and passing through arguments instead:

In [None]:
def increment(counter): 
    return counter + 1

counter = 0
counter = increment( counter ) 
print(counter)