## Data types (continued) and control structures 
### BIOINF 575 - Fall 2020

An Introduction to Programming for Bioscientists: A Python-Based Primer   
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004867

https://datacarpentry.org/semester-biology/exercises/#Python  
https://www.pythonforbiologists.org

### Sequence types: <b>String</b>, <b>List</b>, and <b>Tuple</b> - are iterable

________________

### String
Sequence of characters - immutable

<img src='https://media.geeksforgeeks.org/wp-content/cdn-uploads/20200204160843/strings.jpg' width="500"/>

https://www.geeksforgeeks.org/python-strings/



In [None]:
dir(str)

In [None]:
help(str.find)

In [None]:
help(str.index)

#### <font color = "red">Exercise</font>

A restriction enzyme is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. EcoRI is a restriction endonuclease enzyme isolated from species <i>E. coli</i>.  
EcoRI recognition site with cutting pattern indicated by a green line.

<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/EcoRI_restriction_enzyme_recognition_site.svg/350px-EcoRI_restriction_enzyme_recognition_site.svg.png" width = "100" /> 

https://en.wikipedia.org/wiki/EcoRI  

- Find the position of the string in "GAATTCT" in the following two DNA sequences: 
"AACGTCAAGGTTCCTA"  
"ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT"
- Split the sequence in which you found in the two sequences resulted after the cut


In [None]:
help(str.replace)

In [None]:
help(str.translate)

In [None]:
help(str.maketrans)

#### <font color = "red">Exercise</font>
- Compute the GC content (percentage of C and G bases) in the following DNA sequence:  
"AACGTCAAGGTTCCTA"  
- Compute the complement of the sequence (A<=>T,C<=>G)
- Reverse the sequence complement (compute the reverse strand on the DNA)


#### <font color = "red">Exercise</font>
- Create a string that contains every third nucleotide from following sequence starting with the first nucleotide and stopping at the 11th nucleotide:  
"AACGTCAAGGTTCCTA"  



<b>String join()</b> - makes a string outof the elements of a list using a given string as a separator/delimiter (default: any whitespace)

In [None]:
" ".join(["Making","a","sentence", "given the words."])

<b>String split()</b> - always creates a list - splits a string by a given separator/delimiter 

In [None]:
"Split a sentence into words".split()

_______________
### Range - an immutable sequence of numbers

It is commonly used for looping a specific number of times.  
It is not generated until needed - lazy loading - generator type

range(stop)<br>
range(start, stop[, step])

#### <font color = "red">Exercise</font>
- Use the function range() to compute the indices (in the original string) of the nuclotides retrieved in the previous exercise.



__________
### List - a collection of elements, allows duplicates, is orderred, and is mutable (changeable)

A list may be constructed in several ways:<br>

Using a pair of square brackets to denote the empty list: []   
Using square brackets, separating items with commas: [a], [a, b, c]   
Using a list comprehension: [x for x in iterable]  
Using the type constructor: list() or list(iterable)  
A list is a dynamic array of references, a contiguous allocation of references in memory  
https://docs.python.org/3/faq/design.html#how-are-lists-implemented-in-cpython   
https://www.tutorialspoint.com/difference-between-std-vector-and-std-array-in-cplusplus

<img src="http://henry.precheur.org/python/list2.png" width=200 />

In [None]:
# list with differet types of elements



#### <font color = "red">Exercise</font> 

Assign an empty list to a variable amino_acids.  
Add the following elements to the list: "arginine", "alanine"   
Add the elements of the follwing list to the amino_acids lists all at once:
- \["asparagine", "aspartic acid", "cysteine", "glutamine", "glutamic acid"] 

Sort the amino_acids list.

<img src = "https://pixy.org/src/37/thumbs350/372053.jpg" width = "400"/>

https://pixy.org/372053/


In [None]:
"""insert an element at a certain position
insert(position, element)"""

help(list.insert)


In [None]:
#Change list elements through subsetting and assignment:  list_name[:] = 



In [None]:
#Check if an element is in our list - the in operator



In [None]:
#Retrieve the position of an element in a list - index function

help(list.index)

In [None]:
# List concatenation - Adding lists (+) 



In [None]:
#Count the occurences of an element in a list - count function

help(list.count)

Remove list elements

<b>remove()</b> removes the first matching value, not a specific index.

In [None]:
help(list.remove)

<b>del</b> command removes the item at a specific index.

In [None]:
help("keywords")

In [None]:
help("del")

<b>pop()</b> removes the item at a specific index and returns it.

In [None]:
help(list.pop)

<b>clear()</b> removes all the elements of a list

In [None]:
help(list.clear)

#### <font color = "red">Exercise</font> 

Create a list with the following sublists: 
 - a sublist with 4 organisms: human, mouse, worm, yeast 
 - a sublist with 4 genes (symbols): TP53, TNF, EGFR, IL6 
 - a sublist with 4 values: 90, 70, 70, 50

Check if the organism "human" is in the organism list.<br>
Find the index of value 70 in the values list.<br>
Count how many times the value 70 occurs in the vlaues sublist.<br><br>


____________

### Tuple - a collection of elements, allows duplicates, is orderred, and is <u>unchangeable</u>
Faster than lists.

Tuples may be constructed in a number of ways:<br>

Using a pair of parentheses to denote the empty tuple: ()<br>
Using a trailing comma for a singleton tuple: a, or (a,)<br>
Separating items with commas: a, b, c or (a, b, c)<br>
Using the tuple() built-in: tuple() or tuple(iterable)<br>

In [None]:
dir(tuple)

In [None]:
# create tuple 

(2,)

In [None]:
# create tuple
gene = ("EGFR", "Epidermal growth factor receptor", 10, 200.5)


In [None]:
# tuple subsetting - same as string and list


##### <b>Unpacking a tuple - assigning each element of a tuple to a variable</b>



In [None]:
gene_symbol, gene_description, exon_number, expression = gene

In [None]:
print(gene_symbol)
print(gene_description)
print(exon_number)
print(expression)

In [None]:
len(gene)

Easy switch of values.

In [None]:
a, b = 3, 4
print(a)
print(b)

In [None]:
a, b = b, a
print(a)
print(b)

#### <font color = "red">Exercise</font>
Check if value 10 is in gene.<br>
Find the index of value 10 in gene.<br>
Count how many times 10 occurs in gene.<br><br>


<b>Concatenating immutable sequences always results in a new object</b>

This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length. <br>
To get a linear runtime cost, you must switch to one of the alternatives below:
* if concatenating str objects, you can build a list and use str.join() at the end or else write to an io.StringIO instance and retrieve its value when complete
* if concatenating tuple objects, extend a list instead

Some sequence types (such as range) only support item sequences that follow specific patterns, and hence don’t support sequence concatenation or repetition.
The index() funtion raises ValueError when x is not found in s

https://docs.python.org/3/library/stdtypes.html

__________________

### Dictionaries - a collection of key:value pairs, is unordered, has no duplicates, is changeable, and is indexed.
### The mapping type

Dictionaries can be created by:
- placing a comma-separated list of key: value pairs within braces
    - {key1: value1, key2: value2, ..., key_n: value_n}
- by the dict constructor

dict(**kwarg)  
dict(mapping, **kwarg)<br>
dict(iterable, **kwarg)<br>

Dictionaries are unorderred

In [None]:
dir(dict)

In [None]:
dict()

____
<img src = "https://pixy.org/src/37/thumbs350/372053.jpg" width = "400"/>

https://pixy.org/372053/

In [None]:
amino_acids_map = {"Ser": "Serine", "Lys": "Lysine"}


In [None]:
amino_acids_test = dict(Ser = "Serine", Lys= "Lysine")

In [None]:
amino_acids_map == amino_acids_test

In [None]:
# Check if key in dictionary
"Ser" in amino_acids_test

In [None]:
'''
Length of dictionary
'''
print(len(d1))

Retrieving a dictionary value - subset by key or use the get function

In [None]:
amino_acids_map["Leu"]

In [None]:
amino_acids_map.get("Leu")

To add a new dictionary element - subset the dictionay using the new key and assign the value or use the update function

In [None]:
amino_acids_map["Ala"] = "Alanine"

In [None]:
amino_acids_map["Leu"] = "Leu cinne"

In [None]:
# update if key exists, add a new element if it does not
amino_acids_map.update({"Leu": "Leucine"})

In [None]:
amino_acids_map.update({"Val": "Valeine", "Gln": "Glutamine"})

In [None]:
amino_acids_map.values()

In [None]:
amino_acids_map.keys()

In [None]:
amino_acids_map.items()

Removing elements is done using: del - subset by key to get element, pop(), popitem() or clear().<br>
It is done using the key of an element.<br>
popitem() removes the last element.<br>

In [None]:
amino_acids_map.popitem()

#### <font color = "red">Exercise</font> 
Remove the element with Key "Lys" from the dictionary<br><br>

_______________
### Sets - a collection of elements, is unordered, has no duplicates, can be changeable or unchangeable.

A set object is an unordered collection of distinct objects.<br>
A set is mutable, unless it is a frozenset.<br>
To create a set use braces to separate set elements or the set([iterable]) constructor.<br>
Elements cannot be changed/updated, but can be added and removed.<br>
The update() function can be used to add multiple elements.<br>

In [None]:
dir(set)

In [None]:
set()

In [None]:
model_organisms = {"human", "mouse", "rat", "fruit fly", "worm", "E coli"}


In [None]:
"human" in model_organisms


In [None]:
model_organisms.add("yeast")


In [None]:
model_organisms.update(["zebrafish","frog"])


In [None]:
# remove element from set using the remove function
# raises exception if element does not exist

model_organisms.remove("rat")
model_organisms

In [None]:
# remove element from set using the discard function
# does not raise exception if element does not exist

model_organisms.discard("rat")
model_organisms

In [None]:
model_organisms.discard("frog")
model_organisms

In [None]:
study_transcriptomics = {"human", "mouse", "rat", "fruit fly", "worm", "E coli"}
study_proteomics = {"rat, zebrafish", "frog", "yeast", "worm"}

In [None]:
study_overall = study_transcriptomics.union(study_proteomics)

https://docs.python.org/3/library/stdtypes.html#set

#### <font color = "red">Exercise</font> 

Given the previous 2 sets compute the intersection and difference.<br>
Use the dir funtion to find out the name of the functions to use for these operations.  
Update study_proteomics with the elements of study_transcriptomics.<br>
Check if study_proteomics is now equal with study_overall. <br>
Explain the result.<br>

____________

### Control Structures

**Control Structures** are constructs that **give a block of code context as well as help repeat or selectivelly run it**.

Control Structures control how the associated code works by "wrapping" it within its structure.

Types of control structures:
* Sequential: default mode - execution of statements line by line - like reading a book or following a recipe
* Selection: decisions, branching - allows the distinction between two or more disjunct situations in which different sets of statements should be executed <br>
Selection structures:
    * if &emsp;&emsp;&emsp;&emsp;&emsp;&emsp; - considers additional code on only one branch
    * if-else &emsp; &emsp;&emsp;  - considers two branches
    * if-elif-else &emsp;&nbsp;    - considers three branches
* Repetition: looping - allows the repeatition of a set of statements multiple times <br>
Loop structures:
    * while &emsp;  &emsp;   &emsp; - number of repetitions **unknown**
    * for &emsp;&emsp;&emsp;&emsp;&emsp; - number of repetitions **known**

```python
if [not] <condition>:
    <statements>
```

Selection and repetition statements typically rely on conditions that are evaluated as either true or false.

### if: selection/decision control structure

if statements check specific conditions through the processing of the code <br>
The syntax is as follows:

In [1]:
if 5 > 4:
    print('5 is greater than 4!')

5 is greater than 4!


Add more functionality by using the else keyward

```python
if [not] <condition>:
    <statements>
else:
    <statements>
```

In [2]:
cond_res = "CG" in "ACGT" and len("AACTGGATC") == 7
if cond_res:
    print('This is True')
else:
    print('This is False')

This is False


What if we need more branches? <br>
E.g.: Checking for a value range and doing differet things depending on the range

```python
if [not] <condition>:
    <statements>
elif <condition>:
    <statements>
else:
    <statements>
```

In [3]:
# Comprehensive if-elif-else statement
int_var = 20
if int_var < 5:
    print("Less than 5")
elif int_var < 10:
    print("Between 5 and 10")
else:
    print("Equal to or greater than 10")

Equal to or greater than 10


#### <font color = "red">Exercise</font>
Write an if statement to check who is the youngest of three people with ages:
age1, age2, age3


In [None]:
age1 = 20
age2 = 50
age3 = 34

### for: the repetitive control structure with a known number of steps

To loop through a sequence of elements is to iterate

```python
for var in sequence:
    statements
```

In [None]:
# Print elements of a range
for i in range(3):
    print(i)

In [None]:
# nested loop
for i in range(3):
    for j in range(4):
        print(i,j)

In [None]:
list_var = ["elem1","elem2","elem3"]
for list_elem in list_var:
    print(list_elem)

In [None]:
for i in range(len(list_var)):
    print(list_var[i])

In [None]:
# use the enumerate to create pairs of index value
for i, list_elem in enumerate(list_var):
    print(i, list_var[i])

#### <font color = "red">Exercise</font> 
Codons are sequences of three nucleotides<br>
Nucleotides are: 'A', 'C', 'G', 'T'<br>
Write code that outputs all possible combinations of codons<br>
Then add a condition to display only the ones that have the group 'AC'

<b>Think of a repetitive case when you could not use a for loop readily</b>

### while: the repetitive control structure with unkown number of steps

loop stopped by a condition

<b>while loops make it easy to cause an 'infinite loop'</b>

```python
while condition:
    statements
```

In [None]:
# while counting steps
step = 1
while step < 10:
    print('Step ' + str(step))
    step = step + 1

In [None]:
# while testing input value
cond = True
while cond:
    value = int(input()) 
    if value > 10:
        print(cond, value, 'Greater than 10')
    else:
        print(cond, value, 'Lower than or equal to 10')
    cond = (value <= 10)

```python
# Infinite loop
cond = True
while cond:
    print('Infinite loop')
```

<font color = "red">Exercise</font> <br>
Assign a variable sequence_length the value 100 <br>
Substract powers of 2 (2, 4, 8, 16,...) until the value drops below 0<br>
Print the power of 2 used in tha last substraction.

### More control statements

Control statements are special structures that have control behavior. 
They can be represented by a keyword:

```python
pass
break
continue
```

<b>pass: do nothing</b>

In [4]:
value = 20
if value == 20:
    pass

<b>break: stops loops

<b>break: allows the user to stop the closest loop</b>

In [None]:
# while break - it already has a condition that stops it but ...
cond = True
i = 0
while cond:
    if i == 3:
        break
    else:
        print(i)
        i = i + 1

In [None]:
# for break - kind of makes it a while
for i in range(10):
    if i==3:
        break
    else:
        print(i)

<b>continue: just continue looping</b>

Forces the loop to move on to the next item, skipping anything after it (that is in the loop) 

In [None]:
# Continue example
for i in range(10):
    if i > 3 and i < 7:
        continue
    else:
        print(i)
    print('This statement gets executed')