<small><small><i>
Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics.
</i></small></small>

# Functions

In Python, a function is a named sequence of statements that belong together. Functions allow code to be re-used so that complex programs can be built up out of simpler parts. Python has inbuilt functions, like `print()`, `max()` etc. We can also create our own functions by using the `def` keyword. 

This is the basic syntax of a function

```python
def funcname(arg1, arg2,... argN):
    ''' Document String'''
    statements
    return <value>```

Read the above syntax as, A function by name "funcname" is defined, which accepts arguements "arg1,arg2,....argN". The function is documented and it is '''Document String'''. The function after executing the statements returns a "value".

Return values are optional (by default every function returns **None** if no return statement is executed)

We can choose any function name, except the inbuilt Python keywords. We can check for keywords using:

In [1]:
import keyword

print(keyword.kwlist)

['False', 'None', 'True', 'and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']


Defining a function using keywords will throw an error.

In [7]:
def False():
    pass

SyntaxError: invalid syntax (<ipython-input-7-a2e04ce4dbd1>, line 1)

In [1]:
def sumnos(x,y,z):
    total  = x + y + z
    
    return total

In [3]:
totals = sumnos(1,20,30)

In [4]:
print(totals)

51


In [5]:
total

NameError: name 'total' is not defined

However, Python does not prevent you from overwritting Python inbuilt functions. So you have to be careful

In [3]:
def print(name):
    """Take name as input and introduces 'Name' """
    return "My Name is ", name
    

In [4]:
print('My Name is Caleb')

('My Name is ', 'My Name is Caleb')

To restore inbuilt functions, use:

In [5]:
del print

In [6]:
print('My Name is Caleb')

My Name is Caleb


In [7]:
print("Hello Jack.")
print("Jack, how are you?")

Hello Jack.
Jack, how are you?


Instead of writing the above two statements every single time it can be replaced by defining a function which would do the job in just one line. 

Defining a function firstfunc().

In [5]:
def firstfunc():
    print("Hello Jack.")
    return "Jack, how are you?"
 # execute the function

In [7]:
greetings = firstfunc()

Hello Jack.


In [9]:
print(greetings)

Jack, how are you?


**firstfunc()** every time just prints the message to a single person. We can make our function **firstfunc()** to accept arguements which will store the name and then prints respective to that accepted name. To do so, add a argument within the function as shown.

In [10]:
def firstfunc(username):
    print("Hello %s." % username)
    print(username + ',' ,"how are you?")

In [11]:
name1 = 'Caleb' #input('Please enter your name : ')

 So we pass this variable to the function **firstfunc()** as the variable username because that is the variable that is defined for this function. i.e name1 is passed as username.

In [12]:
firstfunc(name1)

Hello Caleb.
Caleb, how are you?


## Return Statement

When the function results in some value and that value has to be stored in a variable or needs to be sent back or returned for further operation to the main algorithm, a return statement is used.

In [18]:
def times(x,y):
    z = x*y
    return z

c = times(7,59)
print(c)

413


The above defined **times( )** function accepts two arguements and return the variable z which contains the result of the product of the two arguements

In [16]:
c = times(7,59)
print(c)

413


The z value is stored in variable c and can be used for further operations.

Instead of declaring another variable the entire statement itself can be used in the return statement as shown.

In [19]:
def times(x,y):
    """This multiplies the two input arguments"""
    return x*y

In [15]:
c = times(4,5)
print(c)

20


Since the **times( )** is now defined, we can document it as shown above. This document is returned whenever **times( )** function is called under **help( )** function.

In [20]:
help(times)

Help on function times in module __main__:

times(x, y)
    This multiplies the two input arguments



Multiple variable can also be returned as a tuple. However this tends not to be very readable when returning many value, and can easily introduce errors when the order of return values is interpreted incorrectly.

In [21]:
eglist = [10,50,30,12,6,8,100]

In [22]:
def egfunc(eglist):
    highest = max(eglist)
    lowest = min(eglist)
    first = eglist[0]
    last = eglist[-1]
    return highest,lowest,first,last

If the function is just called without any variable for it to be assigned to, the result is returned inside a tuple. But if the variables are mentioned then the result is assigned to the variable in a particular order which is declared in the return statement.

In [23]:
egfunc(eglist)

(100, 6, 10, 100)

In [26]:
a,b,c,d = egfunc(eglist)
print(' a =',a,' b =',b,' c =',c,' d =',d)

 a = 100  b = 6  c = 10  d = 100


## Default arguments

When an argument of a function is common in majority of the cases this can be specified with a default value. This is also called an implicit argument.

In [34]:
def implicitadd(x,y=3,z=0):
    print("%d + %d + %d = %d"%(x,y,z,x+y+z))
    return x+y+z

**implicitadd( )** is a function accepts up to three arguments but most of the times the first argument needs to be added just by 3. Hence the second argument is assigned the value 3 and the third argument is zero. Here the last two arguments are default arguments.

Now if the second argument is not defined when calling the **implicitadd( )** function then it considered as 3.

In [37]:
implicitadd(3,z=4)

3 + 3 + 4 = 10


10

However we can call the same function with two or three arguments. A useful feature is to explicitly name the argument values being passed into the function. This gives great flexibility in how to call a function with optional arguments. All off the following are valid:

In [23]:
implicitadd(4,4)
implicitadd(4,5,6)
implicitadd(4,z=7)
implicitadd(2,y=1,z=9)
implicitadd(x=1)

4 + 4 + 0 = 8
4 + 5 + 6 = 15
4 + 3 + 7 = 14
2 + 1 + 9 = 12
1 + 3 + 0 = 4


4

## Any number of arguments

If the number of arguments that is to be accepted by a function is not known then a asterisk symbol is used before the name of the argument to hold the remainder of the arguments. The following function requires at least one argument but can have many more.

In [24]:
def add_n(first,*args):
    "return the sum of one or more numbers"
    reslist = [first] + [value for value in args]
    print(reslist)
    return sum(reslist)

The above function defines a list of all of the arguments, prints the list and returns the sum of all of the arguments.

In [26]:
add_n(6.5)

[6.5]


6.5

Arbitrary numbers of named arguments can also be accepted using `**`. When the function is called all of the additional named arguments are provided in a dictionary 

In [27]:
def namedArgs(**names):
    'print the named arguments'
    # names is a dictionary of keyword : value
    print("  ".join(name+"="+str(value) 
                    for name,value in names.items()))

namedArgs(x=3*4,animal='mouse',z=(1+2j))

x=12  animal=mouse  z=(1+2j)


##  Global and Local Variables

Whatever variable is declared inside a function is local variable and outside the function in global variable.

In [28]:
eg1 = [1,2,3,4,5]


In the below function we are appending a element to the declared list inside the function. eg2 variable declared inside the function is a local variable.

In [29]:
def egfunc1():
    x=1
    def thirdfunc():
        x=2
        print("Inside thirdfunc x =", x) 
    thirdfunc()
    print("Outside x =", x)

Let's have a look at how the variables are assigned. 

In [31]:
%%html
<iframe width="800" height="500" frameborder="0" src="http://pythontutor.com/iframe-embed.html#code=def%20egfunc1%28%29%3A%0A%20%20%20%20x%3D1%0A%20%20%20%20def%20thirdfunc%28%29%3A%0A%20%20%20%20%20%20%20%20x%3D2%0A%20%20%20%20%20%20%20%20print%28%22Inside%20thirdfunc%20x%20%3D%22,%20x%29%20%0A%20%20%20%20thirdfunc%28%29%0A%20%20%20%20print%28%22Outside%20x%20%3D%22,%20x%29%0Aegfunc1%28%29&codeDivHeight=400&codeDivWidth=350&cumulative=false&curInstr=0&heapPrimitives=nevernest&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false"> </iframe>

In [31]:
egfunc1()

Inside thirdfunc x = 2
Outside x = 1


If a **global** variable is defined as shown in the example below then that variable can be called from anywhere. Global values should be used sparingly as they make functions harder to re-use.

In [32]:
eg3 = [1,2,3,4,5]

In [33]:
def egfunc1():
    x = 1.0 # local variable for egfunc1
    def thirdfunc():
        global x # globally defined variable 
        x = 2.0
        print("Inside thirdfunc x =", x) 
    thirdfunc()
    print("Outside x =", x)

In [34]:
egfunc1()
print("Globally defined x =",x)

Inside thirdfunc x = 2.0
Outside x = 1.0
Globally defined x = 2.0


## Lambda Functions

These are small functions which are not defined with any name and carry a single expression whose result is returned. Lambda functions comes very handy when operating with lists. These function are defined by the keyword **lambda** followed by the variables, a colon and the respective expression.

In [35]:
z = lambda x: x * x

In [36]:
z(8)

64

### Composing functions

Lambda functions can also be used to compose functions

In [37]:
def double(x):
    return 2*x
def square(x):
    return x*x
def f_of_g(f,g):
    "Compose two functions of a single variable"
    return lambda x: f(g(x))
doublesquare= f_of_g(double,square)
print("doublesquare is a",type(doublesquare))
doublesquare(3)

doublesquare is a <class 'function'>


18

### Exercise
Let's return to our earlier exercise: calculating %GC content. In this exercise:
- Write a function `percentageGC` that calculates the GC content of a DNA sequence
- The function should return the %GC content
- The Function should return a message if the provided sequence is not DNA (This should be checked by a different function, called by your function)


In [61]:
mydna = "CAGTGATGATGACGAT"
yourdna = "ACGATCGAGACGTAGTA"
testdna = "ATFRACGATTGHAHYAK"

There is an invalid base 'F' at position 2
There is an invalid base 'R' at position 3
There is an invalid base 'H' at position 11
There is an invalid base 'H' at position 11
There is an invalid base 'Y' at position 14
There is an invalid base 'K' at position 16


In [1]:
# Write a function percentageGC that calculates the GC content of a DNA sequence
def dnaSeqeunceValidator(dna):
    """A fucntion that validates a DNA sequence
    """
    clean_dna_sequence = ''
    if dna:
        for base in dna:
            if base.upper() not in ('A','T','C','G', 'N'):
                print("Invalid character '{}' at index position '{}'".format(base, dna.index(base)))
                return(False, 0)
            else:
                clean_dna_sequence += base.upper()
    else:
        print('Please provide a valid sequence')
    return(True, clean_dna_sequence)         
    
    
def percentageGC(sequence):
    """
    Takes a sequence of DNA and calculates the percentage of GC content
    """
    # Validates the DNA sequence provided
    flag, dna_seq = dnaSeqeunceValidator(sequence)
    if flag and dna_seq:
        GC_percentage = (dna_seq.upper().count('G') + dna_seq.upper().count('C'))/len(dna_seq) * 100
        return('GC = {:.2f}%'.format(GC_percentage))
    else:
        print('The sequence has an Invalid character(s) which is not part of a DNA sequence')
        
percentageGC('ttgtft')


Invalid character 'f' at index position '4'
The sequence has an Invalid character(s) which is not part of a DNA sequence


In [2]:
def dnaSeqeunceValidator(dna):
    """A fucntion that validates a DNA sequence
    """
    clean_dna_sequence = ''
    for base in dna:
        if base.upper() not in ('A','T','C','G', 'N'):
            print("Invalid character '{}' at index position '{}'".format(base, dna.index(base)))
            return(False)
        else:
            clean_dna_sequence += base.upper()
    return(True, clean_dna_sequence)

print(dnaSeqeunceValidator(''))

x,y = dnaSeqeunceValidator('')

if x and y: 
    print('True')
else:
    print('False')

(True, '')
False


In [204]:
def newfun():
    """
    
    I need a function that can do a b c d
    
    """
    pass

In [34]:
# greeting function
def greetings(firstname, lastname):
    print('Hello,', firstname, lastname)
    
greetings('Samuel', 'Oduor')

Hello, Samuel Oduor


In [32]:
def hello(firstname, lastname):
    for name1, name2 in zip(firstname, lastname):
        print('Hello,', name1, name2)

name_list1 = ['Samuel', 'Violet', 'bonface', 'Henry', 'Kauthar', 'Brenda']
name_list2 = ['Oduor', 'Chege', 'Odhimabo', 'Ndugwa', 'Omar', 'Kamau']

#hello(name_list1, name_list2)

def hello2(firstname, lastname):
    for i,fname in enumerate(firstname):
        print('Hello,', fname, lastname[i])

hello2(name_list1, name_list2)

Hello, Samuel Oduor
Hello, Violet Chege
Hello, bonface Odhimabo
Hello, Henry Ndugwa
Hello, Kauthar Omar
Hello, Brenda Kamau


In [43]:
# Write a function percentageGC that calculates the GC content of a DNA sequence
def dnaSeqeunceValidator(dna):
    """A fucntion that validates a DNA sequence
    """
    clean_dna_sequence = ''
    if dna:
        for base in dna:
            if base.upper() not in ('A','T','C','G', 'N'):
                print("Invalid character '{}' at index position '{}'".format(base, dna.index(base)))
                return(False, 0)
            else:
                clean_dna_sequence += base.upper()
    else:
        print('That is an empty DNA sequence')
    return(True, clean_dna_sequence)         
    
def percentageGC(sequence):
    """
    Takes a sequence of DNA and calculates the percentage of GC content
    """
    # Validates the DNA sequence provided
    flag, dna_seq = dnaSeqeunceValidator(sequence)
    if flag and dna_seq:
        GC_percentage = (dna_seq.upper().count('G') + dna_seq.upper().count('C'))/len(dna_seq) * 100
        return('GC = {:.2f}%'.format(GC_percentage))
    else:
        print('The sequence has an Invalid character(s) which is not part of a DNA sequence')
        
percentageGC(' ')


Invalid character ' ' at index position '0'
The sequence has an Invalid character(s) which is not part of a DNA sequence
