# Introductory python: ad-hoc usage

This  is  an  introductory  workshop  for  python  programming.   The  aim  is  to  familiarize  users  with
the basics of python for general bioinformatics usage (format wrangling).  The workshop assumes users already know basic
programming concepts such as loops, conditionals and functions.  Be aware that this is an ad-hoc workshop in the sense
that it doesnâ€™t go into the details of python as an object-oriented programming language.  There are plenty
of resources available online for in-depth learning of classes, methods and other related topics, and I will
not attempt to cover them here.



## 0 General syntax, types and methods

In [228]:
message = "Hello world!"

In [229]:
print( message )

Hello world!


### 0.0 Objects


Let's begin with the most common types of objects

#### 0.0.1 Integer and doubles

In [230]:
myInteger = 1

In [232]:
myInteger

1

In [233]:
type( myInteger )

int

In [234]:
myFloat = 2.71828

In [235]:
myFloat

2.71828

In [236]:
type(myFloat)

float

* myInteger is an integer object.

* myFloat is a double-precision floating point number (best approximation for a non integer real).

Integers and doubles can be operated on by **arithmetic operators**.

The implementation of arithmetic operators in python is as follows:

* `+` for summation
* `-` for substraction
* `*` for multiplication
* `/` for division
* `**` for exponentiation
* `%` for modulo

In [237]:
1 + 1

2

In [238]:
2 - 3

-1

In [239]:
2 * 4

8

In [240]:
1 / 2

0.5

In [241]:
2 ** 10

1024

In [242]:
10 % 7

3

In [243]:
80 % 2

0

If we want to perform more complex operations that require functions, we resort to the `math` module.

In [245]:
import math

In [246]:
math.log(1)

0.0

We will look into modules later.

#### 0.0.0 Strings


Strings are a concatenation of characters.  They are defined when you enclose a sequence of characters in
quotes.

Strings can be sliced and indexed to return substrings.  Keep in mind that python
uses 0-based indexing, which means that the first item of a string will be the zeroth item.  

In [247]:
myString = "bananaz"

In [248]:
myString

'bananaz'

In [249]:
type(myString)

str

**Indexing and slicing**

In [250]:
myString[0] # get the zeroth letter

'b'

In [251]:
myString[3] # get the third letter

'a'

In [252]:
# A useful trick to access the last letter
myString[-1]

'z'

In [253]:
# What about slicing a string?
myString

'bananaz'

In [254]:
myString[0:3]

'ban'

Strings are **inmmutable**, they cannot be modified. If we want to assign a value to one of it's indices:

In [255]:
myString[-1] = "s"

TypeError: 'str' object does not support item assignment

* Try to reason how python handles the slicing with respect to the coordinates that you give to the indexing operator.
* Slice your string into different substrings. Did you get the output that you expected?

**String operators**


Strings can be concatenated or repeated with the `+` and `*` operators respectively.

In [256]:
s = "A " + "weird " + "monkey " 

In [257]:
s

'A weird monkey '

In [258]:
s*5

'A weird monkey A weird monkey A weird monkey A weird monkey A weird monkey '

#### 0.0.2 Lists


Lists can contain several types of objects, including lists themselves.  They behave similarly to lists with respect to indexing and slicing:

In [259]:
myList = [1,2,"cat",4,5,6,7,8]

In [260]:
myList

[1, 2, 'cat', 4, 5, 6, 7, 8]

In [261]:
type(myList)

list

In [262]:
newList = ["a","b", "c", myList , "monkey"]

Let's see its contents.

In [263]:
newList

['a', 'b', 'c', [1, 2, 'cat', 4, 5, 6, 7, 8], 'monkey']

**Indexing and slicing** works in the same way as strings:

In [265]:
newList[0]

'a'

In [266]:
newList[1:3]

['b', 'c']

In this case, `newList` contains a **nested** list in the fourth entry. The syntax to access items in nested lists would be:

`list_var[top_level_index][nested_index]`

* Access the nested values of newList


Unlike strings or tuples, lists can be modified. One way of doing that is assigning new values to particular entries:

In [268]:
newList

['a', 'b', 'c', [1, 2, 'cat', 4, 5, 6, 7, 8], 'monkey']

In [269]:
newList[3] = "d"

In [270]:
newList

['a', 'b', 'c', 'd', 'monkey']

#### 0.0.3 Dictionaries

Dictionaries are a useful way to handle data in python. They are the python implementation of a hash function, which consists in mapping a **unique** key to data of any kind. We call this pair of related information **key-value** pairs, where the unique key is associated with any kind of value.

Let's see how it works by examples. One way to initialize python dictionaries is:

In [274]:
myDictionary = { "apple" : 2,
                 "pear" : 3,
                 "grapes" : 1 }

In [275]:
myDictionary

{'apple': 42, 'pear': 3, 'grapes': 1}

In [273]:
type(myDictionary)

dict

* This assigns the integer 2 value to the key "apple", and so on. Notice that the keys are unique. What happens if we initialize the dictionary with non-unique keys?

We can easily access each value by providing the corresponding key:

In [277]:
myDictionary["grapes"]

1

In [278]:
myDictionary["pear"]

3

The previous kind of initialization is useful when we have a low number of key-value pairs, but what if we wanted, for example, to map thousands of SNPs to their respective chromosomal positions? Another way to initialize a dictionary and update it would be:

In [279]:
genome_dict = dict() # Create an empty dictionary

In [280]:
genome_dict = {} # Works the same way

In [281]:
genome_dict["BovineHD4100000577"] =  98367573
genome_dict["BovineHD4100000819"] = 144587013

In [282]:
genome_dict

{'BovineHD4100000577': 98367573, 'BovineHD4100000819': 144587013}

* This behaviour works for any kind of initialized dictionary. Add more fruits to myDictionary.

#### 0.0.4 Tuples


Tuples are **immutable** sequences of values. They can be indexed and sliced like strings and lists, but cannot be modified.

In [283]:
myTuple = (1, "pineapple")

In [284]:
myTuple

(1, 'pineapple')

In [285]:
type(myTuple)

tuple

In [286]:
t = 1,2,3,4,"five"
t

(1, 2, 3, 4, 'five')

In [287]:
t = (1,2,3,4,"five")
t

(1, 2, 3, 4, 'five')

In [288]:
t[-1]

'five'

In [289]:
t[0]

1

In [290]:
t[1:2]

(2,)

In [291]:
t[0] = "one"

TypeError: 'tuple' object does not support item assignment

#### 0.0.5 Booleans


Booleans expressions are the logical expressions `True` and `False`.


In [292]:
a = True
b = False

In [293]:
type(a)

bool

In [294]:
type(b)

bool

Boolean expressions are the return value for a number of operators, functions and methods. 

Let's begin with the most basic examples, numerical inequalities.

In [295]:
a = 6
b = 2
c = 6

In [296]:
a == c

True

In [297]:
a == b

False

In [298]:
a != b

True

In [299]:
a > b

True

In [300]:
a > c

False

In [301]:
a >= c

True

**Logical operators:**

In [302]:
a = 6
b = 2
c = 6

In [303]:
a == c and a > b 

True

In [304]:
a == b and a > b

False

In [307]:
a == b or a > b

False

In [308]:
not a == c 

False

In [309]:
not a == c or a == c 

True

In [310]:
not ( a == c or a == c )

False

`in` is a useful operator to check for membership:

In [311]:
myList

[1, 2, 'cat', 4, 5, 6, 7, 8]

In [312]:
"cat" in myList

True

In [313]:
2 in myList

True

In [314]:
myString

'bananaz'

In [315]:
"b" in myString

True

In [316]:
"j" in myString

False

#### 0.0.6 Functions

Functions are a piece of code written to perform a particular action. 

Let's see the basic syntax writing a function that returns the zeroth letter of an input string:

In [317]:
def zeroth( s ):
    
    return s[0]
    

In [318]:
type(first)

function

In [319]:
myString

'bananaz'

In [320]:
zeroth_letter = zeroth( myString )

In [321]:
zeroth_letter

'b'

The basic syntax should follow this overall structure:

In [None]:
def function(  argument(s)  ):

        #perform an action

        return value # this is optional

The `return` statement is optional, because sometimes we want a function to only perform an action instead of returning a value. Let's see some examples.

In [322]:
def print_zeroth( s ):
    
    print( s[0] )

In [323]:
zeroth_letter = print_zeroth( myString )

b


In [324]:
zeroth_letter

In [325]:
print(zeroth_letter)

None


In [326]:
type(zeroth_letter)

NoneType

In this case, the `print_zeroth()` function only printed the first character of the input string without returning a value, which is the reason why `zeroth_letter` contains a None value.

Let's see another, more realistic example. We will create a function that takes a list and a value as input, and it will modify the list reassigning the value to its second entry.

In [327]:
def modify_list( l , v ):

    l[2] = v


In [328]:
myList

[1, 2, 'cat', 4, 5, 6, 7, 8]

In [329]:
modify_list( myList,  3 )

In [330]:
myList

[1, 2, 3, 4, 5, 6, 7, 8]

As we can see, `modify_list()` modified the input list without returning any values.

##### Some example built-in functions


Some example functions that we will use in the following sections are:

* `len()` which returns the length of an input string, list, tuple or dictionary.
* `range(n,m)` creates a sequence of numbers from `n` to `m-1`.

In [None]:
len(myString)

In [None]:
len(myList)

### 0.1 Loops and control flow

#### 0.1.0 Loops

Let's see how a simple python for loop works. Lets loop through the letters of `myString` and `myList` and print them.

In [331]:
for i in myString:
    
    print(i)

b
a
n
a
n
a
z


In [332]:
for i in myList:
    
    print(i)

1
2
3
4
5
6
7
8


Python for loops are implicit in their index handling of strings and lists. One could read these loops as "for every item in my object..."

What if we wanted to just print the integers from 0 to 10? We would need to resort to the `range()` function.

In [333]:
for i in range(0,11):
    
    print(i)

0
1
2
3
4
5
6
7
8
9
10


If we want a similar behaviour to R for printing the items in `myList`(avoid if possible):

In [334]:
for i in range( 0,len(myList) ):    # len() returns the length of an object
    
    print( myList[i] )

1
2
3
4
5
6
7
8


Let's see the behaviour of for loops on dictionaries:

In [336]:
for i in myDictionary:
    
    print(i)

apple
pear
grapes


In [337]:
for i in myDictionary:
    
    print( myDictionary[i] )

42
3
1


As dictionaries use a hash function to store values, the order of which the items are printed seem random.

#### 0.1.1 Control flow

Let's see the syntax of conditional statements. We'll print all **even** numbers from 1 to 10.

In [338]:
for i in range(1,11):
    
    if i % 2 == 0 :               # % is the modulo operator
     
        print(i)

2
4
6
8
10


Let's slightly modify our code to report on odd numbers:

In [339]:
for i in range(1,11):
    
    if i % 2 == 0 :               
    
        print(i)
    
    else:
    
        print("ODD!")

ODD!
2
ODD!
4
ODD!
6
ODD!
8
ODD!
10


Let's further modify our code to introduce elif statements. We'll print numbers from 1 to 20, but if the number is a multiple of 3 or 7, well print `PUM!`

In [340]:
for i in range(1,21):
    
    if i % 3 == 0:
    
        print("PUM!")
        
    elif i % 7 == 0:
        
        print("PUM!")
        
    else:
        
        print(i)

1
2
PUM!
4
5
PUM!
PUM!
8
PUM!
10
11
PUM!
13
PUM!
PUM!
16
17
PUM!
19
20


We could reduce the above code using logical operators, as the action followed for multiples of 3 and 7 is the same:

In [342]:
for i in range(1,21):
    
    if i % 3 == 0 or i % 7 == 0:
    
        print("PUM!")
    
    else:
    
        print(i)

1
2
PUM!
4
5
PUM!
PUM!
8
PUM!
10
11
PUM!
13
PUM!
PUM!
16
17
PUM!
19
20


* Using the `myDictionary` dictionary, print the name of keys whose values are greater than 2.
* Using myList, print all entries which are of type str (string)


**Important note**: Python is *very* strict with indentation. Try writing a poorly indented for loop to see what happens.


### 0.2 Methods

We have already seen some python functions such as `print()` , `len()` and `range()`. Methods are similar to functions but have some extra properties (not all listed):

1. They depend on their association with objects
2. They may not return any value


We'll learn a few of the most used methods.


The general syntax for methods is:

`object.method( arguments )`

Where different methods can take different number of arguments, even none at all.

#### 0.2.0 String methods

String methods are for manipulating strings. As we would expect, they can only generate new values without modifying its input (strings are **immutable**!).

##### The `strip()` method removes any trailing whitespace from a string.

This is particularly useful when dealing with files. Let's see a simple example

In [343]:
s = "fire coming out of a monkey's head\n\n\n\n\n"

In [344]:
r = "water it!"

In [345]:
s

"fire coming out of a monkey's head\n\n\n\n\n"

In [346]:
print(s)
print(r)

fire coming out of a monkey's head





water it!


In [347]:
print( s.strip() )
print(r)

fire coming out of a monkey's head
water it!


##### The startswith() method returns a boolean wether the first character(s) of the input string matches a specified character

Syntax:

`string.startswith(substring)`


In [348]:
fruit = "banana"

In [349]:
fruit.startswith("app")

False

In [350]:
fruit.startswith("b")

True

##### The `split()` method splits the string at a specified character, and returns a list.


This method is really useful for handling character-delimited tables. The syntax would be:

`string.split(delimiter, max)` where max is the maximium number for splitting (default would be -1, i.e. all occurrences)



In [353]:
grocery = "banana, apple, cheese, milk, fishing rod"

grocery.split(",")

['banana', ' apple', ' cheese', ' milk', ' fishing rod']

* What is the default value for delimiter?
* Say you read a plink map file whose lines look like the code below. Read the line into a dictionary.

In [None]:
map_line = "10 ARS-BFGL-BAC-10960 0 20776707" # chromosome, snp id, centimorgan, position

##### The `format()` method is for creating strings using predefined variables:

In [355]:
today = "Monday"
tomorrow = "Tuesday"

In [356]:
"Today is {0}".format(  today  )

'Today is Monday'

In [359]:
"Today is {0}, tomorrow is {1}".format( today, tomorrow )

'Today is Monday, tomorrow is Tuesday'

An alternative way to do this, would be using the string `+` operator, wich concatenates strings:

In [360]:
"Today is " + today + ", tomorrow is " + tomorrow

'Today is Monday, tomorrow is Tuesday'

The preferred usage depends on the situation, but using the `format()` method improves code readability.

* Try using both methods to create a string from `myList`. (Hint: use the `str()` function to convert an integer into a string)

##### The `replace()` method returns a string where the specified value has been replaced with another specified value.

`string.replace("old", "new", count)`

In [361]:
s = "I'd like to pet my dog right now"
s

"I'd like to pet my dog right now"

In [362]:
s.replace( "dog", "cat" )

"I'd like to pet my cat right now"

* Try using the "count" option with this new string:

In [None]:
s = "I'd like to pet my dog right now. My dog is amazing"

* You can use the `replace()` method to delete (instead of replace) parts of a string. Figure out how to use it that way.

* There are plenty more string methods. Search the web for other string methods and put one of them to use.

#### 0.2.1 List methods

List methods can modify existing lists, or return values. Let's see the most commonly used methods.

##### The `append()` method adds an element to the end of a list

In [363]:
myList

[1, 2, 3, 4, 5, 6, 7, 8]

In [364]:
myList.append("dog")

In [365]:
myList

[1, 2, 3, 4, 5, 6, 7, 8, 'dog']

In [366]:
myList.append(10)

In [367]:
myList

[1, 2, 3, 4, 5, 6, 7, 8, 'dog', 10]

In [368]:
myList.append(    ["apple", "banana"]   )

In [369]:
myList

[1, 2, 3, 4, 5, 6, 7, 8, 'dog', 10, ['apple', 'banana']]

##### The `extend()` method adds the element of a list to another list

In [370]:
myIntegerList = [ 12 , 13 , 14 , 15 ]

In [371]:
myList.extend( myIntegerList )

In [372]:
myList

[1, 2, 3, 4, 5, 6, 7, 8, 'dog', 10, ['apple', 'banana'], 12, 13, 14, 15]

##### The `insert()` method inserts elements into a list in a specified position

In [373]:
abc = ["a", "c", "d", "e", "f", "g"]

In [374]:
abc.insert(1,"b")

In [375]:
abc

['a', 'b', 'c', 'd', 'e', 'f', 'g']

##### The `pop()`  method removes an element from a list at a specified position

In [None]:
abc.pop(-1)

In [None]:
abc

In [None]:
abc.pop(2)

In [None]:
abc

* As you may have realized, `pop()` modifies the list given as input and returns the removed value. Write a one-liner using `insert()` and `pop()` to complete the `vowels` list using values from `abc`.

In [None]:
vowels = [ "a", "i", "o", "u" ]

In [None]:
vowels.insert( 1, abc.pop(3) )

In [None]:
vowels

##### The `remove()` method removes an element of specified value from a list


`remove()` takes a specified value from the list and modifies it withou returning any values.

In [None]:
abc

In [None]:
abc.remove("f")

In [None]:
abc

What if we provide a value that is not on the list?

In [None]:
abc.remove("e")

##### The `reverse()` method reverses the order of an input list

In [378]:
abc

['g', 'f', 'e', 'd', 'c', 'b', 'a']

In [379]:
abc.reverse()

In [380]:
abc

['a', 'b', 'c', 'd', 'e', 'f', 'g']

##### The `sort()` method sorts an input list


`sort()` can take some extra options such as the order of the sortering, and particular ordering functions.

In [381]:
abc = ["a", "c", "e", "k" ,"b"]

In [382]:
abc.sort()

In [383]:
abc

['a', 'b', 'c', 'e', 'k']

In [384]:
abc.sort(reverse = True)

In [385]:
abc

['k', 'e', 'c', 'b', 'a']

A particular example of an ordering function would be:

In [386]:
sentence = "I am used to writing flamboyant sentences"
sentence = sentence.split()
sentence

['I', 'am', 'used', 'to', 'writing', 'flamboyant', 'sentences']

Define a function that returns the length of a string as an ordering function.

In [388]:
def string_length(s):
    
    
    return len(s)

Use that function as the `key` argument's input:

In [389]:
sentence.sort(  key = string_length  )

In [390]:
sentence

['I', 'am', 'to', 'used', 'writing', 'sentences', 'flamboyant']

#### 0.2.2 Dictionary methods

Let's see some useful dictionary methods.

In [None]:
myDictionary

##### The `get()` method returns the value of a specified key

If the key is not found,`get()` will not raise an error, instead it will return `None` or a specified value.

In [None]:
myDictionary.get('pear')

In [None]:
myDictionary.get('mango')

In [None]:
myDictionary.get('mango', 0)

##### The `pop()` method removes an item from the dictionary, returning its value.

This method works in a similar way to the list `pop()` method. If the key is not found, it will raise an error by default. If specified, it will return a value.

In [None]:
myDictionary.pop('apple')

In [None]:
myDictionary

In [None]:
myDictionary.pop('mango')

In [None]:
myDictionary.pop('mango', "fruit not found")

### 0.2.3 Anonymous functions


Anonymous functions, or lambda expressions are functions that do not have a name, and are usually created on the go.

The syntax would be:

`lambda x1, x2, x3, ..., xn : (some action on x1, x2, x3, ..., xn)`

Lambda expressions can take any number of inputs (including zero).

Let's see some simple examples.

In [391]:
f = lambda x: 10*x + 1

In [392]:
f(3)

31

In [393]:
f(4)

41

In [394]:
c = lambda x, y: x**2 + y**2

In [395]:
c(1,1)

2

In one of the `sort()` method examples, we could have used a lambda expression to make our code more concise. 

Our previous code looked like:

In [396]:
sentence = "I am used to writing flamboyant sentences"
sentence = sentence.split()
sentence

['I', 'am', 'used', 'to', 'writing', 'flamboyant', 'sentences']

In [397]:
def string_length(s):
    return len(s)

In [398]:
sentence.sort(  key = string_length  )

In [399]:
sentence

['I', 'am', 'to', 'used', 'writing', 'sentences', 'flamboyant']

We want to sort the word list according to the length of the word. We can use a lambda expression as a key in in the `sort()` method.

In [None]:
sentence = "I am used to writing flamboyant sentences"
sentence = sentence.split()
sentence

In [400]:
sentence.sort(  key = lambda s: len(s) ) 

In [401]:
sentence

['I', 'am', 'to', 'used', 'writing', 'sentences', 'flamboyant']

### 0.2.4 Exceptions


Sometimes our code can encounter an error that makes our program to stop. Exceptions are a way to capture errors and perform actions accordingly.


Let's see a couple of examples:

In [402]:
myDictionary

{'apple': 42, 'pear': 3, 'grapes': 1}

In [404]:
myDictionary["mango"]

KeyError: 'mango'

When trying to acess a value with a non-existant key, python raises a `KeyError`. We'll handle this error by adding a key to the dictionary with a value of 0.

In [405]:
try:
    myDictionary["mango"]
except KeyError:
    myDictionary["mango"] = 0

In [406]:
myDictionary

{'apple': 42, 'pear': 3, 'grapes': 1, 'mango': 0}

Earlier today, we used the `modify_list()` function to re-assign the third value of a list. But what if the list's length is less than 3?

In [407]:
def modify_list( l , v ):
    
    l[2] = v

In [408]:
short_list = [1,2]

In [409]:
modify_list( short_list , 3 )

IndexError: list assignment index out of range

The code raises an `IndexError`. We'll modify our function to take into account this posibility.

In [410]:
def modify_list( l , v ):
    
    try:
        
        l[2] = v
    
    except IndexError:
        
        print("ERROR: List length should be at least 3!")

In [411]:
modify_list( short_list , 3 )

ERROR: List length should be at least 3!


In [412]:
short_list

[1, 2]

* Modify the `modify_list()` function so that it adds a third item if the list length is equal to 2.

-----------------------------------

## 1 Files I/O





### 1.0 Reading files


A particular syntax for reading files in python would be:

In [None]:
with open( "file.txt", 'r' ) as file:
    
    #some set of actions

The previous code is just a template, so it will return an error if you attempt to run it because it expects code after the colon. 

Let's break down the code:


* The `with` allows us to automatically close the file once it is used.
* The `open()` function takes the filename and the action as input. In this case `'r'` means **r**ead.
* the **`as`** ` file` assigns the input file to a variable called in this case, `file`.


If we want to print each file's line:

In [414]:
with open( "data/file.txt", 'r' ) as file:
    
    
    for i in file:
    
    
        print(i)

product	quantity	est_price

apple	3	2

banana	1	0.5

boat	1	250000000



* The `next()` function lets you skip one line of the file at a time. The syntax would be `next(file)`. Skip the header implementing `next()`.

### 1.1 Writing into files

The syntax for writing into files is similar to reading files. We use the `write()` method to write into files.


Syntax:

`file.write(string)`

We want to write a famous haiku into a file called `haiku.txt`:

*old pond*


*frog leaps in*


*water's sound*


In [415]:
haiku = "old pond\nfrog leaps in\nwater's sound"

In [416]:
with open("haiku.txt", 'w' ) as file:
    
    file.write(haiku)

Success! But what if the string was given to us as a list?

In [417]:
haiku = "old pond\nfrog leaps in\nwater's sound".split("\n")

In [418]:
haiku

['old pond', 'frog leaps in', "water's sound"]

In [419]:
with open("haiku.txt", 'w' ) as file:
    
    file.write(haiku)

TypeError: write() argument must be str, not list

The `write()` method only takes strings as input, so we have to modify our code.

In [420]:
with open("haiku.txt", 'w' ) as file:
    
    for i in haiku:
        
        file.write(i)
        

* Check the resulting file. Something's wrong. It looks like each item on the list is concatenated without any space between them. This is because the `write()` method keeps writing in a line unless you make it write a newline character. Modify the code to write a correct `haiku.txt` file.

The `'w'` option tells the `open()` function to **overwrite** an existing file, so be careful! If we want to keep writing into the same file we would use the `'a'` option (as in **a**ppend). Let's append the original japanese version into the `haiku.txt` file.

In [421]:
haiku_original = "furu ike ya \nkawazu tobikomu \nmizu no oto".split("\n")

In [422]:
haiku_original

['furu ike ya ', 'kawazu tobikomu ', 'mizu no oto']

In [423]:
with open("haiku.txt", 'a' ) as file:
    
    file.write("-----------\n")
    
    file.write("The original japanese: \n")
    
    for i in haiku_original:
    
        file.write( i + "\n" )
        

### 1.2 Combining reading and writing


Let's use a more realistic example to review what we know so far.

Say we have a genomic gtf file, and we would like to generate a gtf file whose entries only belong to chromosome 11.

We need to write a script:

**input:** genomic gtf file


**output:** chromosome 11 gtf file



In [None]:
inFile = "bos_taurus.gtf"
outFile = "bos_taurus_ch11.gtf"

In [None]:
with open( inFile, 'r' ) as genome:                                       # open genome gtf (read mode)
       
    with open(outFile, 'w' ) as chromosome_11:                            # open chr11 gtf (write mode)
        
        for line in genome:                                               # for each line in genome file
            
            if line.startswith("#"):                                      # if line begins with comment character #
            
                pass                                                      # do nothing (skip line)
            
            else:                                                         # if it is an entry line
                
                g_list = line.split()                                     # read contents into a list
                
                if g_list[0] == "chr11":                                  # if the zeroth element (chromosome) is chr11
                        
                        chromosome_11.write( line )                       # write it to the new file
                        
        

There are many ways to read and write files into python. The choice really depends on your file formats and memory efficiency.

-----------------------------

## 2 Useful modules


Python modules are imported in the following way:


`import [module]`

If we want to only load a particular method or class from a module, we can use:


`from [module] import [method]`

For example,

`from math import log`

### 2.0 `sys` module

When writing a script, we also want to make it possible to give arguments to the script. The `sys` module lets us do that.


We want to create a script that prints numbers from 1 to a specified value. Running this script in the terminal should look like:

`python3 print_integers.py <maximum>`

For example:

`python3 print_integers.py 11`


In [None]:
import sys  # First, we import the sys module

The `sys` module provides a list of all the argument values passed onto the script; `sys.argv`.

The first element of the list `sys.argv[0]` contains the name of the script, and the following items contain the arguments that have been passed into the script.


Our script would look like:

In [None]:
import sys

maximum = sys.argv[1]


for i in range( 1, maximum  + 1 ):
    
    print(i)



### 2.1 `math` and `numpy` modules

These modules are necessary for a more mathematical use of python. 

* [`math`](https://docs.python.org/3/library/math.html) contains plenty of mathematical functions.

* [`numpy`](https://www.numpy.org/devdocs/user/quickstart.html) contains methods and classes for linear algebra and random number generators.

### 2.2 Biopython

[Biopython](https://biopython.org/) is a module that contains classes and methods for handling biological data. We will see an example usage down below.


Import it using:

`import Bio`


-----------------------------

## 3 Example scripts

In this section we will check some simple example scripts written in python for bioinformatics data wrangling. The code can be found in [github](https://github.com/gaxyz/utilidad).

### 3.0 One-line fasta using Biopython

The next script uses Biopython for converting a multiline fasta into a one-line fasta.

The terminal usage would be:

`fasta1linea.py <multiline.fa> > singleLine.fa`

In [None]:
#!/usr/bin/env python3
import sys                                                                  # import sys module
from Bio import SeqIO                                                       # import SeqIO from Bio module (biopython)

inputfile = sys.argv[1]                                                     # inputfile: first argument passed to script

seqdict = {}                                                                # initialize sequence dictionary


for seq in SeqIO.parse( inputfile , "fasta" ):                              # read fasta file with biopython

    seqdict[ seq.id ] = seq.seq                                             # update: dict[sequence_id] = sequence

for item in seqdict:                                                        # for each item in the dictionary

    print(">" + item )                                                      # print the id preceded by ">"
    print( str( seqdict[item] ) )                                           # convert sequence into str object and print

### 3.1 Update coordinates of a plink map file

The next script takes two map files as input. It uses the second map file's coordinates as a reference for creating a map file identical to the first map file with updated coordinates.

The terminal usage would be:

`modify_map.py map_to_update.map reference.map updated.map`

In [None]:
#!/usr/bin/env python3                                    # add a shebang 

import sys                                                # import sys module

print( "\nRunning {0}...".format( sys.argv[0] ) )         # print running message

                                                          # -----------Assign input variables!----------

oldMap = sys.argv[1]                                      # old map file (old coordinates)

newMap = sys.argv[2]                                      # new map file (reference)

output = sys.argv[3]                                      # updated map file 


d = {}                                                    # initialize dictionary for storing reference data

                                                          # first , read reference map into dictionary
with open( newMap , 'r' ) as handle:                      # open file


    for line in handle:                                   # for each line in file
        
        chromosome, snpName, cM, pos = line.split()       # split the line into list (whitespace)
        
        d[snpName] = [ chromosome, cM, pos ]              # add snp id as key (should be unique), add info as list

        
        
                                                          # Read old and create updated map file!
            
        
with open( oldMap , 'r' ) as old:                             # open old file (read mode)
    
    with open( output, 'w' ) as out:                          # open updated file (write mode)

        modifiableChr = ["30", "31", "32", "33"]              # Chromosomes id to modify
        chrDict = {                                           # Create dict that maps old chr ids to new chr ids
                "30":"X",                                     # we could have done this earlier no problem
                "31":"Y",
                "32":"30",
                "33":"MT"
                }

        for line in old:                                     # for each line in old file
            
            old_chr, old_id, old_cm, old_pos = line.split()  # assign each column to corresponding var

            new_chr, new_cm, new_pos = d[old_id]             # get the reference info of a particular marker and assign
            
            new_cm = 0 # i dont need this for now            # set new_cm to 0 
            
            if new_chr in modifiableChr:                     # if chromosome number is in list of chr to be modified
                
                new_chr = chrDict[ new_chr  ]                # modify the chromosome id
                       


            out.write( 
                "{0} {1} {2} {3}\n".format(new_chr,
                                           old_id,
                                           new_cm,
                                           new_pos  )       # write a new line into the output file
            )

print( "--> Success...\n\n" )                               # report success