<a name="top"></a>

# Introduction to Python Programming for Bioinformatics. Lesson 6

<details>
<summary>
About this notebook
</summary>

This notebook was originally written by [Marc Cohen](https://github.com/mco-gh), an engineer at Google. The original source can be found on [Marc's short link service](https://mco.fyi/), and starts with [Python lesson 0](https://mco.fyi/py0), and I encourage you to work through that notebook if you find some details missing here.

Rob Edwards edited the notebook, adapted it for bioinformatics, using some simple geneticy examples, condensed it into a single notebook, and rearranged some of the lessons, so if some of it does not make sense, it is Rob's fault!

It is intended as a hands-on companion to an in-person course, and if you would like Rob to teach this course (or one of the other courses) don't hesitate to get in touch with him.

</details>
<details>
<summary>
Using this notebook
</summary>

You can download the original version of this notebook from [GitHub](https://linsalrob.github.io/ComputationalGenomicsManual/Python/Python_Lesson_06.ipynb) and from [Rob's Google Drive]()

**You should make your own copy of this notebook by selecting File->Save a copy in Drive from the menu bar above, and then you can edit the code and run it as your own**

There are several lessons, and you can do them in any order. I've tried to organise them in the order I think most appropriate, but you may disagree!
</details>


# Lesson Links
<a name="lessons"></a>

* [Lesson 6 - Functions](#Lesson-6---Functions)
  * [Defining Functions](#Defining-Functions)
  * [Docstrings](#Docstrings)
  * [Return Values](#Return-Values)

Previous Lesson: [Local](Python_Lesson_05.ipynb) | [GitHub](https://linsalrob.github.io/ComputationalGenomicsManual/Python/Python_Lesson_05.ipynb) | [Google Colab](https://colab.research.google.com/drive/1VmGd4AAb1fBKOjmemYIKnPgu58xGE5so)

Next Lesson: [Local](Python_Lesson_07.ipynb) | [GitHub](https://linsalrob.github.io/ComputationalGenomicsManual/Python/Python_Lesson_07.ipynb) | [Google Colab](https://colab.research.google.com/drive/1Uq9ysM5TxMsiS9vElA53Ihl63ONOYDTA)


# Lesson 6 - Functions

## Functions are flexible software building blocks

* So far, we’ve been writing small programs.
* Things get much more complicated when we write large programs, especially with multiple authors.
* Ideally, we'd like to build software like snapping lego pieces together.
* What would that buy us?
  * abstraction
  * reuse
  * modularity
  * maintainability


### Abstraction - You don't need to do everything

* When building a house, you don't do everything yourself.
  * You hire an architect, a carpenter, an electrician, a roofer, a plumber, a mason.
  * You might hire a contractor to hire and manage all those people.
* In our programs we delegate tasks to certain functions, like `print()`, so that we don't have to worry about all the details.
  * It's a bit like hiring an electrician so that we don't have to worry about the details of electrical wiring in our house (or getting blown up!)


### Reuse - Don't Reinvent the Wheel

* it's OK to reuse other people's work
  * it's not stealing
  * it makes you more efficient and more productive
* Very few people build a house from scratch
  * so don't try to build programs from scratch
* Most of the software we produce is called `open source software` which means
  * you can look at the source
  * you can change the source
  * you can do more complex things with it


#### Reuse Example
You can count the number of bases in a DNA sequence the hard way:

In [None]:
sequence = "ATGCATAGCTAGCATCAGACTGATGCATCGACTGATCGACTGT"
bases = 0
valid_bases = ["A", "T", "G", "C"]
for i in sequence:
    if i in valid_bases:
        bases += 1

print(f"There are {bases} bases in {sequence}")

Or the easy way, by calling a method...

In [None]:
sequence = "ATGCATAGCTAGCATCAGACTGATGCATCGACTGATCGACTGT"
print(f"There are {len(sequence)} bases in {sequence}")

Which would you rather use? Which is more reliable?
* The first approach is great for learning.
* The second approach is great for getting real work done.


### Modularity - Divide and Conquer
* So far, our programs have been monoliths - one  continous sequence of Python statements.
* Real programs are often much bigger than the ones we've written.
  * Google's software repository has billions of lines of source code ([Why Google Stores Billions of Lines of Code in a Single Repository](https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext))
  * No one person can write a program that big.
  * Large programs are built by teams.
  * In order to build large, complex programs, we need the ability to divide program logic into manageable pieces.
* We call this modularity - dividing software into pieces or modules.


### Maintainability - Keeping your code DRY (Don't Repeat Yourself)
* Imagine that you need to do roughly the same thing in ten different places so you copy the code to those ten locations.
  * What happens when you find a bug or want to improve that piece of code?
  * You need to make the change ten times.
  * Will you remember to do that?
  * If you do remember, will you catch all ten locations?
  * Copying code is a **bad thing** - it leads to bugs.


### Functions solve all of these Problems

* Functions give us the ability to:
  * Hide low level details (abstraction)
  * Share and reuse pieces of functionality (reuse)
  * Split programs into manageable pieces (modularity)
  * Write one copy of an algorithm and use it anywhere (maintainability)
* We've already used several functions
  * `print()`, `input()`, `int()`, `len()`, `range()`, etc.
* Now let's see how to define our own functions.


## Defining Functions

* example:
```
def function_name(arg1, arg2):
    '''This is a docstring.'''  # optional but a good idea
    statement1
    statement2
    ...
```
* Not surprisingly, we define the scope of the function body using indentation (just like how we define blocks for if statements, for loops, etc.).
* This is a bit like an assignment statement in that it assigns a block of code (the function body) to the function name.
  * Function names have the same rules as variable names.
  * This only defines a function - it doesn't execute it.


In [None]:
def next_base():
  '''
     This function generates a DNA sequence base.
     It's how Illumina sequencing works.
  '''
  bases = ["A", "G", "T", "C"]
  print(bases[0])

next_base()
next_base()
next_base()
print()

### Docstrings
* string defined immediately after the def line
* usually triple quoted since it may be multi-line
* not required but a good way to document your functions
* IDEs use the docstring to make your life easier
* automates output of `help(function)`

### Example Function

In [None]:
# Reverse complement a DNA sequence
# Here's an example function definition...
def reverse_complement(dna):
    """
    Reverse complement a DNA sequence
    :param dna: The DNA sequence
    :type dna: str
    :return: The reverse complement of the DNA sequence
    :rtype: str
    """
    complements = str.maketrans('acgtrymkbdhvACGTRYMKBDHV', 'tgcayrkmvhdbTGCAYRKMVHDB')
    rcseq = dna.translate(complements)[::-1]
    return rcseq

In [None]:
# Get help about this function...
help(reverse_complement)

In [None]:
# And here's how we would call this function...
sequence = "ATCGATCGCATAGCTACGACTAGCTACGACTGACT"
reverse_complement(sequence)

### Passing Values to a Function

* The variables we define in a function to take on the values passed by the caller are called parameters.
* In this code, `a`, `b` and `c` are parameters:
```
def sum(a, b, c):
    return a + b + c
```
* The values supplied by the caller when calling a function are called arguments.
* In this code, `1`, `2`, and `3` are arguments:
```
sum(1, 2, 3)
```

So in our example above, the `function` `rc()` has one parameter (`dna`) and the code to run it passes one argument (`sequence`).

Don't get hung up on this, most people use argument and parameter interchangeably.


### Passing Arguments
* Functions can define any number of parameters, including zero.
* Multiple parameters are separated by commas, like this...
```
def product(a, b, c):
    return a * b * c
```
* If you pass the wrong number of arguments, you'll hear about it:
```
product(1, 2)
...
TypeError: product() takes exactly 3 positional argument (2 given)
...
```


## Return Values

Instead of printing the result, we can also have the function return a result to the caller so that the caller can print it or use it in a calculation.

For example, our `reverse_complement()` function returns the reverse complement of the sequence. We can store that in a new variable and do things with it.

```
sequence = "ATGCATCGCATCGATCAGCTACGACTCGACTCGAT"
reverse_complement = reverse_complement(sequence)
# do something with reverse_complement
```


* Functions return a value to the caller via the `return` statement.
* The `return` statement causes two things to happen...
  * the function ends and control is returned to the caller
  * the returned value is passed back to the caller
* You can have as many return statements as you like (including zero).
* If the caller wants to do something with a returned value, it needs to save it or use it in an expression...


[Return to the lesson listing](#lessons)

[Return to the top of the notebook](#top)