# Introductory python: ad-hoc usage

This  is  an  introductory  workshop  for  python  programming.   The  aim  is  to  familiarize  users  with
the basics of python for general bioinformatics usage (format wrangling).  The workshop assumes users already know basic
programming concepts such as loops, conditionals and functions.  Be aware that this is an ad-hoc workshop in the sense
that it doesn’t go into the details of python as an object-oriented programming language.  There are plenty
of resources available online for in-depth learning of classes, methods and other related topics, and I will
not attempt to cover them here.

As  an  extra,  there  will  be  an  introduction  to
**conda** and **Jupyter**. Both  tools  are  widely  used  for
reproducible and manageable bioinformatics projects.

## 0. General syntax and methods

### 0.0 Objects


Let's begin with the most common types of objects

In [21]:
myInteger = 1
myString = "bananaz"
myFloat = "2.71828"
myList = [1,2,"cat",4,5,6,7,8]
myTuple = (1, "pineapple")
myDictionary = { "apple" : 2,
                 "pear" : 3,
                 "grapes" : 1}

Let's see the content of the variables.

In [8]:
myInteger

1

In [9]:
myString

'bananaz'

In [10]:
myFloat

'2.71828'

In [11]:
myList

[1, 2, 'cat', 4, 5, 6, 7, 8]

In [12]:
myTuple

(1, 'pineapple')

In [13]:
myDictionary

{'apple': 2, 'pear': 3, 'grapes': 1}

a. Try to define your own set of similar objects (follow the syntax!!)


b. Write, for example type(myInteger). Do it for the other objects.

The objects' names are pretty self explanatory, but let's review them:


* **myInteger** is an integer object.
* **myString** is a string object.
* **myFloat** is a double-precision floating point number (best approximation for a non integer real).
* **myList** is a list.  Think of it as a vector.  It can contain any kind of object inside it (even other lists).
* **myTuple** is a tuple.  These are similar to lists, the key point is that tuples cannot be modified.
* **myDictionary** is a dictionary.  They are the python implementation of a hash.  We’ll go into them soon


Integers and floats behave as you would expect them to. We will go into detail on how the other types work.

#### 0.0.0 Strings


Strings are a concatenation of characters.  They are defined when you enclose a sequence of characters in
quotes.

Strings can be sliced and indexed to return substrings.  Keep in mind that python
uses 0-based indexing, which means that the first item of a string will be the zeroth item.  Let’s see how
this works, run:


In [14]:
myString

'bananaz'

In [15]:
myString[0] # get the first letter

'b'

In [17]:
myString[3] # get the fourth letter

'a'

In [18]:
# A useful trick to access the last letter
myString[-1]

'z'

In [20]:
# What about slicing a string?
myString

'bananaz'

In [21]:
myString[0:3]

'ban'

* Try to reason how python handles the slicing with respect to the coordinates that you give to the indexing operator.
* Slice your string into different substrings. Did you get the output that you expected?

#### 0.0.1 Lists


Lists can contain several types of objects, including lists themselves. They behave similarly to lists with respect to indexing and slicing:

In [24]:
newList = ["a","b", "c", myList , "monkey"]

Let's see its contents.

In [25]:
newList

['a', 'b', 'c', [1, 2, 'cat', 4, 5, 6, 7, 8], 'monkey']

In [30]:
newList[0]

'a'

In [32]:
newList[3:5]

[[1, 2, 'cat', 4, 5, 6, 7, 8], 'monkey']

* Can you guess a way to access some element of the nested list?

#### 0.0.2 Dictionaries

Dictionaries are a useful way to handle data in python. They are the python implementation of a hash function, which consists in mapping a **unique** key to data of any kind. We call this pair of related information **key-value** pairs, where the unique key is associated with any kind of value.

Let's see how it works by examples. One way to initialize python dictionaries is:

In [33]:
myDictionary = { "apple" : 2,
                 "pear" : 3,
                 "grapes" : 1}

In [35]:
myDictionary

{'apple': 4, 'pear': 3, 'grapes': 1}

* This assigns the integer 2 value to the key "apple", and so on. Notice that the keys are unique. What happens if we initialize the dictionary with non-unique keys?

We can easily access each value by providing the corresponding key:

In [39]:
myDictionary["grapes"]

1

In [40]:
myDictionary["pear"]

3

The previous kind of initialization is useful when we have a low number of key-value pairs, but what if we wanted, for example, to map thousands of SNPs to their respective chromosomal positions? Another way to initialize a dictionary and update it would be:

In [43]:
genome_dict = {} # Create an empty dictionary

In [44]:
genome_dict["BovineHD4100000577"] =  98367573
genome_dict["BovineHD4100000819"] = 144587013

In [45]:
genome_dict

{'BovineHD4100000577': 98367573, 'BovineHD4100000819': 144587013}

* This behaviour works for any kind of initialized dictionary. Add more fruits to myDictionary.

### 0.2 Loops and control flow

Let's see how a simple python for loop works. Lets loop through the letters of `myString` and `myList` and print them.

In [49]:
for i in myString:
    print(i)

b
a
n
a
n
a
z


In [50]:
for i in myList:
    print(i)

1
2
cat
4
5
6
7
8


Python for loops are implicit in their index handling of strings and lists. One could read these loops as "for every item in my object..."

What if we wanted to just print the integers from 1 to 10? We would need to resort to the `range()` function.

In [66]:
for i in range(0,11):
    print(i)

0
1
2
3
4
5
6
7
8
9
10


If we want a similar behaviour to R for printing the items in `myList`(avoid if possible):

In [64]:
for i in range( 0,len(myList) ):    # len() returns the length of an object
    print(myList[i] )

1
2
cat
4
5
6
7
8


Let's see the behaviour of for loops on dictionaries:

In [87]:
for i in myDictionary:
    print(i)

apple
pear
grapes


In [70]:
for i in myDictionary:
    print( myDictionary[i] )

4
3
1


Let's see the syntax of conditional statements. We'll print all **even** numbers from 1 to 10.

In [75]:
for i in range(1,11):
    if i % 2 == 0 :               # % is the modulo operator
        print(i)

2
4
6
8
10


Let's slightly modify our code to report on odd numbers:

In [76]:
for i in range(1,11):
    if i % 2 == 0 :               
        print(i)
    else:
        print("ODD!")

ODD!
2
ODD!
4
ODD!
6
ODD!
8
ODD!
10


Let's further modify our code to introduce elif statements. We'll print numbers from 1 to 20, but if the number is a multiple of 3 or 7, well print `PUM!`

In [79]:
for i in range(1,21):
    if i % 3 == 0:
        print("PUM!")
    elif i % 7 == 0:
        print("PUM!")
    else:
        print(i)

1
2
PUM!
4
5
PUM!
PUM!
8
PUM!
10
11
PUM!
13
PUM!
PUM!
16
17
PUM!
19
20


We could reduce the above code using logical operators, as the action followed for multiples of 3 and 7 is the same:

In [81]:
for i in range(1,21):
    if i % 3 == 0 or i % 7 == 0:
        print("PUM!")
    else:
        print(i)

1
2
PUM!
4
5
PUM!
PUM!
8
PUM!
10
11
PUM!
13
PUM!
PUM!
16
17
PUM!
19
20


* Using the `myDictionary` dictionary, print the name of keys whose values are greater than 2.
* Using myList, print all entries which are of type str (string)


**Important note**: Python is *very* strict with indentation. Try writing a poorly indented for loop to see what happens.


### 0.3 Methods

We have already seen some python functions such as `print()` , `len()` and `range()`. Methods are similar to functions but have some extra properties (not all listed):

1. They depend on their association with objects
2. They may not return any value


We'll learn a few of the most used methods.

#### 0.3.0 String methods

String methods are for manipulating strings. They all generate new values (they don't modify its input).

##### The `format()` method is for creating strings using predefined variables:

In [9]:
today = "Monday"
"Today is {0}".format(  today  )

'Today is Monday'

In [5]:
tomorrow = "Tuesday"
"Today is {0}, tomorrow is {1}".format( today, tomorrow )

'Today is Monday, tomorrow is Tuesday'

An alternative way to do this, would be using the string `+` operator, wich concatenates strings:

In [8]:
"Today is " + today + ", tomorrow is " + tomorrow

'Today is Monday, tomorrow is Tuesday'

The preferred usage depends on the situation, but using the `format()` method improves code readability.

* Try using both methods to create a string from `myList`. (Hint: use the `str()` function to convert an integer into a string)

##### The `replace()` method returns a string where the specified value has been replaced with another specified value.

`string.replace("old", "new", count)`

In [13]:
s = "I'd like to pet my dog right now"
s

"I'd like to pet my dog right now"

In [11]:
s.replace( "dog", "cat" )

"I'd like to pet my cat right now"

* Try using the "count" option with this new string:

In [17]:
s = "I'd like to pet my dog right now. My dog is amazing"

* You can use the `replace()` method to delete (instead of replace) parts of a string. Figure out how to use it that way.

##### The `strip()` method removes any trailing whitespace from a string.

This is particularly useful when dealing with files. Let's see a simple example

In [34]:
s = "fire coming out of a monkey's head\n\n\n\n\n"
r = "water it!"
s

"fire coming out of a monkey's head\n\n\n\n\n"

In [33]:
print(s)
print(r)

fire coming out of a monkey's head





water it!


In [35]:
print( s.strip() )
print(r)

fire coming out of a monkey's head
water it!


##### The `split()` method splits the string at a specified character, and returns a list.


This method is really useful for handling character-delimited tables. The syntax would be:

`string.split(delimiter, max)` where max is the maximium number for splitting (default would be -1, i.e. all occurrences)



In [39]:
grocery = "banana, apple, cheese, milk, fishing rod"

grocery.split(",")

['banana', ' apple', ' cheese', ' milk', ' fishing rod']

* What is the default value for delimiter?
* Say you read a plink map file whose lines look like this:

In [40]:
bed_line = "10 ARS-BFGL-BAC-10960 0 20776707" # chromosome, snp id, centimorgan, position

Read the line into a dictionary

##### The startswith() method returns a boolean regarding the first character(s) of the input string

* There are plenty more string methods. Search the web for other string methods and put one of them to use.

#### 0.3.1 List methods

append, extend, insert, pop, remove, reverse, sort

#### 0.3.2 Dictionary methods


get, items, keys, pop, values

### 0.3.3 Important extras


For the sake of time, we've left out some very important concepts in python such as **classes**, **anonymous functions** and **exceptions**. These concepts become very important once the complexity of the data we're handling grows, but for the moment we can do without them.

## 1. Files I/O




readlines, next, for line in file

### 1.0 Reading files


The preferred syntax for reading files in python would be:

In [50]:
with open( "file.txt", 'r' ) as file:
    #some set of actions

SyntaxError: unexpected EOF while parsing (<ipython-input-50-bcdee7bf3103>, line 2)

The previous code is just a template, so it will return an error if you attempt to run it. 

Let's dissect the code:


* The `with` statement.. ..
* The `open()` function takes the filename and the action as input. In this case `'r'` means **r**ead
* the **`as`** ` file` assigns the input file to a variable called in this case, `file`


If we want to print each file's line:

In [51]:
with open( "file.txt", 'r' ) as file:
    for i in file:
        print(i)

product	quantity	est_price

apple	3	2

banana	1	0.5

boat	1	250000000



* The `next()` function lets you skip one line of the file at a time. The syntax would be `next(file)`. Skip the header implementing `next()`.

### 1.1 Writing into files

The syntax for writing into files is similar to reading files. We use the `write()` method to write into files.


Syntax:

`file.write(string)`

We want to write a famous haiku into a file called `haiku.txt`:

*old pond*


*frog leaps in*


*water's sound*


In [55]:
haiku = "old pond\nfrog leaps in\nwater's sound"

with open("haiku.txt", 'w' ) as file:
    
    file.write(haiku)

Success! But what if the string was given to us as a list?

In [58]:
haiku = "old pond\nfrog leaps in\nwater's sound".split("\n")
haiku

['old pond', 'frog leaps in', "water's sound"]

In [59]:
with open("haiku.txt", 'w' ) as file:
    
    file.write(haiku)

TypeError: write() argument must be str, not list

The `write()` method only takes strings as input, so we have to modify our code.

In [69]:
with open("haiku.txt", 'w' ) as file:
    
    for i in haiku:
        
        file.write(i + "\n")
        

* Check the resulting file. Something's wrong. It looks like each item on the list is concatenated without any space between them. This is because the `write()` method keeps writing in a line unless you make it write a newline character. Modify the code to write a correct `haiku.txt` file.

The `'w'` option tells the `open()` function to **overwrite** an existing file, so be careful! If we want to keep writing into the same file we would use the `'a'` option (as in **a**ppend). Let's append the original japanese version into the `haiku.txt` file.

In [70]:
haiku_original = "furu ike ya \nkawazu tobikomu \nmizu no oto".split("\n")

In [71]:
with open("haiku.txt", 'a' ) as file:
    
    file.write("-----------\n")
    file.write("The original japanese: \n")
    for i in haiku_original:
        file.write( i + "\n" )
        

### 1.2 Combining reading and writing


Let's use a more realistic example to review what we know so far. We have a genomic gtf file, and we would like to generate a gtf file whose entries only belong to chromosome 11.

We need to write a script:

**input:** genomic gtf file


**output:** chromosome 11 gtf file



In [72]:
inFile = "bos_taurus.gtf"
outFile = "bos_taurus_ch11.gtf"

In [None]:
with open( inFile, 'r' ) as genome:                                       # open genome gtf (read mode)
       
    with open(outFile, 'w' ) as chromosome_11:                            # open chr11 gtf (write mode)
        
        for line in genome:                                               # for each line in genome file
            
            if line.startswith("#"):                                      # if line begins with comment character #
            
                pass                                                      # do nothing (skip line)
            
            else:                                                         # if it is an entry line
                
                g_list = line.split()                                     # read contents into a list
                
                if g_list[0] == "chr11":                                  # if the first element (chromosome) is chr11
                        
                        chromosome_11.write( line )                       # write it to the new file
                        
        

There are many ways to read and write files into python. The choice really depends on your file formats, and if you want extra steps between reading and writing. This is a general way that should work in a memory efficient manner in most cases.

## 2. Useful packages 




os

sys

Biopython

numpy, math



## 3. Example scripts


blastout_to_gff.py

fasta1linea.py

modify_map.py

## 4. Extras: python related stuff



conda, bioconda


jupyter notebooks