<a name="top"></a>
# Introduction to Python Programming for Bioinformatics

## About this notebook

This notebook was originally written by [Marc Cohen](https://github.com/mco-gh), an engineer at Google. The original source can be found on [Marc's short link service](https://mco.fyi/), and starts with [Python lesson 0](https://mco.fyi/py0), and I encourage you to work through that notebook if you find some details missing here.

Rob Edwards edited the notebook, adapted it for bioinformatics, using some simple geneticy examples, condensed it into a single notebook, and rearranged some of the lessons, so if some of it does not make sense, it is Rob's fault!

It is intended as a hands-on companion to an in-person course, and if you would like Rob to teach this course (or one of the other courses) don't hesitate to get in touch with him.

## Using this notebook

You can download the original version of this notebook from [GitHub](https://linsalrob.github.io/ComputationalGenomicsManual/Python/Python_Lesson_01.ipynb) and from [Rob's Google Drive](https://drive.google.com/file/d/11rcjrwznjZS-qoFoEcCQYF85PCRtN9dr/view?usp=sharing)

**You should make your own copy of this notebook by selecting File->Save a copy in Drive from the menu bar above, and then you can edit the code and run it as your own**

There are several lessons, and you can do them in any order. I've tried to organise them in the order I think most appropriate, but you may disagree!


<a name="lessons"></a>

# Lesson Links

* [Lesson 1 - Variables and Types](#Lesson-1---Variables-and-Types)
  * [Variables](#Variables)
  * [Naming variables](#Naming-variables)
  * [Types of data](#Types-of-data)
  * [Numeric Types](#Numeric-Types)
  * [String Types](#String-Types)
  * [Using Variables in Python](#Using-Variables-in-Python)
  * [Built in Python functions](#Built-in-Python-functions)

---

Previous Lesson: Local | GitHub | Google Colab

Next Lesson: [Local](Python_Lesson_02.ipynb) | [GitHub](https://linsalrob.github.io/ComputationalGenomicsManual/Python/Python_Lesson_02.ipynb) | [Google Colab](https://colab.research.google.com/drive/1Sm7N8Agf0aFj6qbd6GwenGBaqchZYTug)


# Lesson 1 - Variables and Types

Things you'll learn in this lesson:
- The basic data types you can work with in Python
- How to create and assign values to variables in Python
- How to call a function

## Variables

Any Python interpreter can be used as a calculator. To run this cell, either click on the triangle, or put your cursor in the cell and press Shift and Enter at the same time.


In [None]:
print(3 + 5 * 4)


This is great but not very interesting.
To do anything useful with data, we need to assign its value to a _variable_.
In Python, we can assign a value to a variable using the equals sign `=`.

If a variable doesnâ€™t already exist, when you assign to it, Python creates it on the fly. If you assign to a variable that already exists, Python replaces its current value with a new value.

Examples

    instructor = "Rob"         # string value
    instructor = "Stevie" # same name, diff string value
    instructor = 42             # same name, integer value
    todays_high_temp = 18.2     # diff name, floating point value

We can track the length of a bacterial genome by assigning its length in basepairs to a variable. For example, if the length is 4,500,000 bp, we
could assign that to a variable called `genome_size`:

In [None]:
genome_size = 4500000
print(genome_size)

From now on, whenever we use `genome_size`, Python will substitute the value we assigned to it. In simpler terms, **a variable is a reference to a value**.

In Python, variable names:

 - can include letters, digits, and underscores
 - cannot start with a digit
 - are case-sensitive

This means that, for example:
 - `genome0` is a valid variable name, whereas `0genome` is not
 - `genome` and `Genome` are different variables, and this will sometimes trip you up. It is usual practice to use lower case letters for variable names, and if you want to use two words, like `genome size`, to join them with an underscore (i.e. `genome_size`). Sometimes people will use capitals for words, like `GenomeSize`, (but that's wrong!) (but it still works).

## Naming variables

It is a good idea (and good practice!) to name variables something meaningful. Remember that when you come back to your code in 6 months or a year later, it is going to look like gibebrish, so try and reduce the amount of gibberish as much as possible!

For example, while you are writing the code it might be obvious to use `r` to mean `RNA sequence coverage averaged by gene length`, however in 6 months, you'll be wondering if this was really averaged, or just the raw counts. Using a variable like `rnaseq_averaged` is more meaningfull. You should avoid short names (one or two letters), however, you might also want to avoid using `rna_sequence_coverage_averaged_by_gene_length`, which even though it is valid, it will be a pain to type everytime (and make your code look ugly!)


### Reserved Words

The following words have special meaning in Python. We call them keywords or reserved words and you may not use these names for your program variables.

> ```and, as, assert, break, class, continue, def, del, elif, else, except, False, finally, for, from, global, if, import, in, is, lambda, nonlocal, None, not, or, pass, raise, return, True, try, while, with, yield```

<details>
<summary>
Python also has a lot of built-in functions. We'll use some of these as we go through, but here is a table of all of them. Although you don't need to worry about them right now, you should avoid using the names of these functions as the name of a variable. Click the triangle at the start of this line to see the complete table.
</summary>


Function | Description
--- | ---
abs() | Returns the absolute value of a number
all() | Returns True if all items in an iterable object are true
any() | Returns True if any item in an iterable object is true
ascii() | Returns a readable version of an object. Replaces none-ascii characters with escape character
bin() | Returns the binary version of a number
bool() | Returns the boolean value of the specified object
bytearray() | Returns an array of bytes
bytes() | Returns a bytes object
callable() | Returns True if the specified object is callable, otherwise False
chr() | Returns a character from the specified Unicode code.
classmethod() | Converts a method into a class method
compile() | Returns the specified source as an object, ready to be executed
complex() | Returns a complex number
delattr() | Deletes the specified attribute (property or method) from the specified object
dict() | Returns a dictionary (Array)
dir() | Returns a list of the specified object's properties and methods
divmod() | Returns the quotient and the remainder when argument1 is divided by argument2
enumerate() | Takes a collection (e.g. a tuple) and returns it as an enumerate object
eval() | Evaluates and executes an expression
exec() | Executes the specified code (or object)
filter() | Use a filter function to exclude items in an iterable object
float() | Returns a floating point number
format() | Formats a specified value
frozenset() | Returns a frozenset object
getattr() | Returns the value of the specified attribute (property or method)
globals() | Returns the current global symbol table as a dictionary
hasattr() | Returns True if the specified object has the specified attribute (property/method)
hash() | Returns the hash value of a specified object
help() | Executes the built-in help system
hex() | Converts a number into a hexadecimal value
id() | Returns the id of an object
input() | Allowing user input
int() | Returns an integer number
isinstance() | Returns True if a specified object is an instance of a specified object
issubclass() | Returns True if a specified class is a subclass of a specified object
iter() | Returns an iterator object
len() | Returns the length of an object
list() | Returns a list
locals() | Returns an updated dictionary of the current local symbol table
map() | Returns the specified iterator with the specified function applied to each item
max() | Returns the largest item in an iterable
memoryview() | Returns a memory view object
min() | Returns the smallest item in an iterable
next() | Returns the next item in an iterable
object() | Returns a new object
oct() | Converts a number into an octal
open() | Opens a file and returns a file object
ord() | Convert an integer representing the Unicode of the specified character
pow() | Returns the value of x to the power of y
print() | Prints to the standard output device
property() | Gets, sets, deletes a property
range() | Returns a sequence of numbers, starting from 0 and increments by 1 (by default)
repr() | Returns a readable version of an object
reversed() | Returns a reversed iterator
round() | Rounds a numbers
set() | Returns a new set object
setattr() | Sets an attribute (property/method) of an object
slice() | Returns a slice object
sorted() | Returns a sorted list
staticmethod() | Converts a method into a static method
str() | Returns a string object
sum() | Sums the items of an iterator
super() | Returns an object that represents the parent class
tuple() | Returns a tuple
type() | Returns the type of an object
vars() | Returns the __dict__ property of an object
zip() | Returns an iterator, from two or more iterators

</details>


## Types of data
Python knows about several types of data. Three common ones are:

* integer numbers
* floating point numbers
* character strings

In the example above, variable `genome_size` was assigned an integer value of `4500000`. If we want to store a fraction, like the %GC of the genome,
we can use a floating point value by executing:

In [None]:
percent_gc = 0.45
print(percent_gc)

To create a string, we add single or double quotes around some text.
To identify and track a bacteria throughout our study, we can assign it a unique identifier by storing it in a string:

In [None]:
bacteria_id = "001"
print(bacteria_id)

## Numeric Types

Python supports two main types of numbers
* int, arbitrary size signed integers, like these:
  * `2011`
  * `-999999999999`
* float, arbitrary precision floating point numbers, like these:
  * `3.1415926539`
  * `3.8 * 10**6`

For the most part, you don't need to worry about which type of number to use - Python will take care of that for you. The decimal point tells Python which to use.

Mixing floats and ints results in a float so, for example, `2011 * 3.14` results in a floating point number.

Try entering these expressions in the following cell:

```
print(5 - 6)  
print(8 * 9)
print(6 / 2)
print(5.0 / 2)
print(5 % 2)  
print(2 * 10 + 3)  
print(2 * (10 + 3))  
print(2 ** 4)
```
Were there any outputs you didn't expect?

In [None]:
# talk about the print variables with your neighbour, and then copy and paste them here!
# press shift-enter to execute the code after you have pasted it.

## String Types

Strings are really a [list](#Lesson-3-Lists) of characters, and we can do a few interesting things with them. Note that we will talk more about Lists later, so some of this will become clearer when we cover that material.

A string is a collection of individual characters, and you can access each of them separately.

In [None]:
sequence = "ACGT"
print(sequence[0])
print(sequence[1])
print(sequence[2])
print(sequence[3])

We can also access a `slice` of a string which is the start and stop positions of the string

In [None]:
print(sequence[0:2])

## Using Variables in Python

Once we have data stored with variable names, we can make use of those variables in our calculations. We call these combinations of variables and values  **expressions**. When evaluating an expression, Python internally replaces the variable names with the values to which they refer.

We may want to store our genome in kilo base pairs as well as base pairs.



In [None]:
genome_kb = genome_size / 1000
print(genome_kb)

We might also decide to add a bacterial genus and species to our bacterial id

In [None]:
bacteria_id = "E coli: " + bacteria_id
print(bacteria_id)

## Built-in Python functions

To carry out common tasks with data and variables in Python,
the language provides us with several built-in functions.
To display information to the screen, we use the `print` function:

In [None]:
print(genome_size)
print(bacteria_id)

When we want to make use of a function, what computer scientists refer to as **calling the function**, we follow its name by parentheses. The parentheses are important: if you leave them off, the function doesn't actually run!

Sometimes you will include values or variables inside the parentheses for the function to use. In the case of `print`, we use the parentheses to tell the function which value we want to display. We will learn more about how functions work and how to create our own in later lessons.

We can display multiple things at once using only one `print` function call:

In [None]:
print(bacteria_id, " genome size in kb: ", genome_kb)

We can also call a function inside of another function call. For example, Python has a built-in function called `type` that tells you a value's data type:

In [None]:
print(type(60.3))
print(type(bacteria_id))
print(type(genome_size))
print(type(genome_kb))

We can also do arithmetic with variables right inside the `print` function:

In [None]:
print("genome size in MB:", genome_size / 1000000)

Note that the above function call did not change the value of `genome_size`:

In [None]:
print(genome_size)

To change the value of the `genome_size` variable, we have to
**assign** a new value to `genome_size` using the equals `=` sign:

In [None]:
genome_size = 3100000000
print("genome size is now:", genome_size)

What values do the variables `rrna` and `protein` have after each of the following statements?

Guess before executing the lines below...

In [None]:
rrna = 400
print("There are ", rrna, " rRNAs encoded in the human genome")
print("There are ", protein, " proteins encoded in the human genome")

In [None]:
protein = 19126
print("There are ", rrna, " rRNAs encoded in the human genome")
print("There are ", protein, " proteins encoded in the human genome")

In [None]:
rrna = rrna * 2.0
print("There are ", rrna, " rRNAs encoded in the human genome")
print("There are ", protein, " proteins encoded in the human genome")

In [None]:
protein = protein - 126.0
print("There are ", rrna, " rRNAs encoded in the human genome")
print("There are ", protein, " proteins encoded in the human genome")

## Multiple definitions at once!

Python allows you to assign multiple values to multiple variables in one line by separating the variables and values with commas. What does the following program print out?

In [None]:
a, b = "E. coli", "Salmonella"
print(a, b)

In [None]:
first, second = "crAssphage", "phiX174"
third, fourth = second, first
print(third, fourth)

[Return to the lesson listing](#lessons)

[Return to the top of the notebook](#top)