## Data types (continued) and control structures 
### BIOINF 575 - Fall 2020

An Introduction to Programming for Bioscientists: A Python-Based Primer   
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004867

https://datacarpentry.org/semester-biology/exercises/#Python  
https://www.pythonforbiologists.org

### Sequence types: <b>String</b>, <b>List</b>, and <b>Tuple</b> - are iterable

________________

### String
Sequence of characters - immutable

<img src='https://media.geeksforgeeks.org/wp-content/cdn-uploads/20200204160843/strings.jpg' width="250"/>

https://www.geeksforgeeks.org/python-strings/

https://docs.python.org/2.5/lib/string-methods.html


<b>capitalize(	) - </b>
Return a copy of the string with only its first character capitalized.
For 8-bit strings, this method is locale-dependent.

<b>center(	width[, fillchar]) - </b>
Return centered in a string of length width. Padding is done using the specified fillchar (default is a space). Changed in version 2.4: Support for the fillchar argument.

<b>count(	sub[, start[, end]]) - </b>
Return the number of occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

<b>endswith(	suffix[, start[, end]]) - </b>
Return True if the string ends with the specified suffix, otherwise return False. suffix can also be a tuple of suffixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position.
Changed in version 2.5: Accept tuples as suffix.

<b>find(	sub[, start[, end]]) - </b>
Return the lowest index in the string where substring sub is found, such that sub is contained in the range [start, end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.

<b>index(	sub[, start[, end]]) - </b>
Like find(), but raise ValueError when the substring is not found.

<b>isalnum(	) - </b>
Return true if all characters in the string are alphanumeric and there is at least one character, false otherwise.
For 8-bit strings, this method is locale-dependent.

And many other boolean similar functions.

<b>join(	seq) - </b>
Return a string which is the concatenation of the strings in the sequence seq. The separator between elements is the string providing this method.

<b>ljust(	width[, fillchar]) - </b>
Return the string left justified in a string of length width. Padding is done using the specified fillchar (default is a space). The original string is returned if width is less than len(s). Changed in version 2.4: Support for the fillchar argument.

<b>lower(	) - </b>
Return a copy of the string converted to lowercase.
For 8-bit strings, this method is locale-dependent.

<b>maketrans(...) - </b>
    Return a translation table usable for str.translate().
    If there is only one argument, it must be a dictionary mapping Unicode
    ordinals (integers) or characters to Unicode ordinals, strings or None.
    Character keys will be then converted to ordinals.
    If there are two arguments, they must be strings of equal length, and
    in the resulting dictionary, each character in x will be mapped to the
    character at the same position in y. If there is a third argument, it
    must be a string, whose characters will be mapped to None in the result.
    
<b>partition(	sep) - </b>
Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings. New in version 2.5.

<b>replace(	old, new[, count]) - </b>
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

<b>split(	[sep [,maxsplit]]) - </b>
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. Splitting an empty string or a string consisting of just whitespace returns an empty list.

<b>strip(	[chars]) - </b>
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:

<b>title(	) - </b>
Return a titlecased version of the string: words start with uppercase characters, all remaining cased characters are lowercase.
For 8-bit strings, this method is locale-dependent.

<b>translate(	table[, deletechars]) - </b>
Return a copy of the string where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a string of length 256.
You can use the maketrans() helper function in the string module to create a translation table.

<b>upper(	) - </b>
Return a copy of the string converted to uppercase.
For 8-bit strings, this method is locale-dependent.

<b>zfill(	width) - </b>
Return the numeric string left filled with zeros in a string of length width. The original string is returned if width is less than len(s). New in version 2.2.2.




__________
### List - a collection of elements, allows duplicates, is orderred, and is mutable (changeable)

A list is a dynamic array of references, a contiguous allocation of references in memory  
https://docs.python.org/3/faq/design.html#how-are-lists-implemented-in-cpython   
https://www.tutorialspoint.com/difference-between-std-vector-and-std-array-in-cplusplus

<img src="http://henry.precheur.org/python/list2.png" width=150 />

Common operations for mutable sequence types:

| Operation             | Result                                                                               |
|-----------------------|--------------------------------------------------------------------------------------|
| s[i] = x              | item i of s is replaced by x                                                         |
| s[i:j] = t            | slice of s from i to j is replaced by the contents of the iterable t                 |
| del s[i:j]            | same as s[i:j] = []                                                                  |
| s[i:j:k] = t          | the elements of s[i:j:k] are replaced by those of t                                  |
| del s[i:j:k]          | removes the elements of s[i:j:k]from the list                                        |
| s.append(x)           | appends x to the end of the sequence (same as s[len(s):len(s)] = [x])                |
| s.clear()             | removes all items from s (same as del s[:])                                          |
| s.copy()              | creates a shallow copy of s (same as s[:])                                           |
| s.extend(t) or s += t | extends s with the contents of t (for the most part the same ass[len(s):len(s)] = t) |
| s *= n                | updates s with its contents repeated n times                                         |
| s.insert(i, x)        | inserts x into s at the index given by i(same as s[i:i] = [x])                       |
| s.pop([i])            | retrieves the item at i and also removes it from s                                   |
| s.remove(x)           | remove the first item from s where s[i] is equal to x                                |
| s.reverse()           | reverses the items of s in place                                                     |

In [1]:
x = [1,2,3,4,[5,6,7,8]]

In [3]:
y = x[:]
y[4].append(10)
y

[1, 2, 3, 4, [5, 6, 7, 8, 10]]

In [4]:
x

[1, 2, 3, 4, [5, 6, 7, 8, 10]]

In [5]:
y.append(20)

In [6]:
y


[1, 2, 3, 4, [5, 6, 7, 8, 10], 20]

In [7]:
x

[1, 2, 3, 4, [5, 6, 7, 8, 10]]

#### <font color = "red">Exercise</font>

Break the following text into words and return the 5th word.  
Retrieve the 4th word and swap "e" with "o", "a" with "e" and "i" with "t".   
Reverse the the newly formed word.  
Create the sentence "Virologists win medicine Nobel." from the initial text.  

```python
words_text = "Virologists who discovered hepatitis C win medicine Nobel." 
```


In [9]:
words_text = "Virologists who discovered hepatitis C win medicine Nobel." 
words_list = words_text.split()
words_list[4]

'C'

In [24]:
word4 = words_list[3]
word4_new = word4.translate(str.maketrans("it","ti"))

In [22]:
#dir(str)

In [25]:
word4_new[::-1]

'stitiapeh'

In [29]:
" ".join([words_list[0]] + words_list[-3:])

'Virologists win medicine Nobel.'

___

Remove list elements

<b>remove()</b> removes the first matching value, not a specific index.

In [31]:
help(list.remove)

Help on method_descriptor:

remove(self, value, /)
    Remove first occurrence of value.
    
    Raises ValueError if the value is not present.



In [30]:
words_list

['Virologists',
 'who',
 'discovered',
 'hepatitis',
 'C',
 'win',
 'medicine',
 'Nobel.']

In [32]:
words_list.remove("win")

In [33]:
words_list

['Virologists', 'who', 'discovered', 'hepatitis', 'C', 'medicine', 'Nobel.']

<b>del</b> command removes the item at a specific index.

In [34]:
help("keywords")


Here is a list of the Python keywords.  Enter any keyword to get more help.

False               class               from                or
None                continue            global              pass
True                def                 if                  raise
and                 del                 import              return
as                  elif                in                  try
assert              else                is                  while
async               except              lambda              with
await               finally             nonlocal            yield
break               for                 not                 



In [35]:
help("del")

The "del" statement
*******************

   del_stmt ::= "del" target_list

Deletion is recursively defined very similar to the way assignment is
defined. Rather than spelling it out in full details, here are some
hints.

Deletion of a target list recursively deletes each target, from left
to right.

Deletion of a name removes the binding of that name from the local or
global namespace, depending on whether the name occurs in a "global"
statement in the same code block.  If the name is unbound, a
"NameError" exception will be raised.

Deletion of attribute references, subscriptions and slicings is passed
to the primary object involved; deletion of a slicing is in general
equivalent to assignment of an empty slice of the right type (but even
this is determined by the sliced object).

Changed in version 3.2: Previously it was illegal to delete a name
from the local namespace if it occurs as a free variable in a nested
block.

Related help topics: BASICMETHODS



In [36]:
del words_list[1]

In [37]:
words_list

['Virologists', 'discovered', 'hepatitis', 'C', 'medicine', 'Nobel.']

<b>pop()</b> removes the item at a specific index and returns it.

In [None]:
help(list.pop)

In [38]:
word_element = words_list.pop(2)


In [42]:
word_element + " is bad"

'hepatitis is bad'

In [40]:
words_list

['Virologists', 'discovered', 'C', 'medicine', 'Nobel.']

<b>clear()</b> removes all the elements of a list

In [None]:
help(list.clear)

In [43]:
words_list.clear()

In [44]:
words_list

[]

#### <font color = "red">Exercise</font> 

Create a list with the following sublists: 
 - a sublist with 4 organisms: human, mouse, worm, yeast 
 - a sublist with 4 genes (symbols): TP53, TNF, EGFR, IL6 
 - a sublist with 4 values: 90, 70, 70, 50

Check if the organism "human" is in the organism list.<br>
Find the index of value 70 in the values list.<br>
Count how many times the value 70 occurs in the vlaues sublist.<br><br>


In [45]:
 data_list = [["human", "mouse", "worm", "yeast"],["TP53", "TNF", "EGFR", "IL6"], [90, 70, 70, 50]]

In [46]:
data_list

[['human', 'mouse', 'worm', 'yeast'],
 ['TP53', 'TNF', 'EGFR', 'IL6'],
 [90, 70, 70, 50]]

In [52]:
"human" in data_list[0]

True

In [55]:
data_list[2].index(70)

1

In [56]:
data_list[2].count(70)

2

In [62]:
x = print("DNA")


DNA


In [64]:
type(x)

NoneType

____________

### Tuple - a collection of elements, allows duplicates, is orderred, and is <u>unchangeable</u>
Faster than lists.

Tuples may be constructed in a number of ways:<br>

Using a pair of parentheses to denote the empty tuple: ()<br>
Using a trailing comma for a singleton tuple: a, or (a,)<br>
Separating items with commas: a, b, c or (a, b, c)<br>
Using the tuple() built-in: tuple() or tuple(iterable)<br>

In [66]:
#dir(tuple)

In [68]:
# create tuple 

(2,)[0]

2

In [76]:
[(1,"hi",2,3),[4,5,"test"]][0][1]

'hi'

In [72]:
# create tuple
gene = ("EGFR", "Epidermal growth factor receptor", 10, 200.5)


In [74]:
# tuple subsetting - same as string and list
gene[1]

'Epidermal growth factor receptor'

In [77]:
gene[:3]

('EGFR', 'Epidermal growth factor receptor', 10)

##### <b>Unpacking a tuple - assigning each element of a tuple to a variable</b>



In [80]:
gene

('EGFR', 'Epidermal growth factor receptor', 10, 200.5)

In [78]:
gene_symbol, gene_description, exon_number, expression = gene

In [79]:
print(gene_symbol)
print(gene_description)
print(exon_number)
print(expression)

EGFR
Epidermal growth factor receptor
10
200.5


In [81]:
len(gene)

4

Easy switch of values.

In [85]:
a, b = 3, 4
print(a)
print(b)

3
4


In [82]:
x = 3, 4

In [84]:
type(x)

tuple

In [86]:
a, b = b, a
print(a)
print(b)

4
3


In [88]:
("hi","there") + (1,2)

('hi', 'there', 1, 2)

#### <font color = "red">Exercise</font>
Check if value 10 is in gene.<br>
Find the index of value 10 in gene.<br>
Count how many times 10 occurs in gene.<br><br>


In [89]:
gene

('EGFR', 'Epidermal growth factor receptor', 10, 200.5)

In [93]:
10 in gene

True

In [94]:
gene.index(10)

2

In [95]:
gene.count(10)

1

Strings - "...", '...', """...""",'''...'''   
Lists - [..., ..., ]  
Tuple - (..., ..., )  

<b>Concatenating immutable sequences always results in a new object</b>

This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length. <br>
To get a linear runtime cost, you must switch to one of the alternatives below:
* if concatenating str objects, you can build a list and use str.join() at the end or else write to an io.StringIO instance and retrieve its value when complete
* if concatenating tuple objects, extend a list instead

Some sequence types (such as range) only support item sequences that follow specific patterns, and hence don’t support sequence concatenation or repetition.
The index() funtion raises ValueError when x is not found in s

https://docs.python.org/3/library/stdtypes.html

In [96]:
list(gene)

['EGFR', 'Epidermal growth factor receptor', 10, 200.5]

__________________

### Dictionaries - a collection of key:value pairs, is unordered, has no duplicates, is changeable, and is indexed.
### The mapping type

Dictionaries can be created by:
- placing a comma-separated list of key: value pairs within braces
    - ```{key1: value1, key2: value2, ..., key_n: value_n}```
    - item - ```key: value```
- by the dict constructor

dict(**kwarg)  
dict(mapping, **kwarg)<br>
dict(iterable, **kwarg)<br>

Dictionaries are unorderred - you access elemnts by key not index.

<img src = "https://upload.wikimedia.org/wikipedia/commons/5/5b/GooglePythonClass_Day1_Part3_Pic.jpg" width = 400/>

https://commons.wikimedia.org/wiki/File:GooglePythonClass_Day1_Part3_Pic.jpg



Dictionaries methods:

| Method       | Description                                                                                                 |
|--------------|-------------------------------------------------------------------------------------------------------------|
| clear      | Removes all the elements from the dictionary                                                                |
| copy       | Returns a copy of the dictionary                                                                            |
| fromkeys   | Returns a dictionary with the specified keys and value                                                      |
| get        | Returns the value of the specified key                                                                      |
| items      | Returns a list containing a tuple for each key value pair                                                   |
| keys       | Returns a list containing the dictionary's keys                                                             |
| pop        | Removes the element with the specified key                                                                  |
| popitem    | Removes the last inserted key-value pair                                                                    |
| setdefault | Returns the value of the specified key. If the key does not exist: insert the key, with the specified value |
| update     | Updates the dictionary with the specified key-value pairs                                                   |
| values     | Returns a list of all the values in the dictionary                                                          |

In [None]:
dir(dict)

In [None]:
dict()

____
<img src = "https://pixy.org/src/37/thumbs350/372053.jpg" width = "350"/>

https://pixy.org/372053/

In [None]:
amino_acids_map = {"Ser": "Serine", "Lys": "Lysine"}


In [None]:
amino_acids_test = dict(Ser = "Serine", Lys= "Lysine")

In [None]:
amino_acids_map == amino_acids_test

In [None]:
# Check if key in dictionary
"Ser" in amino_acids_test

In [None]:
'''
Length of dictionary
'''
print(len(d1))

Retrieving a dictionary value - subset by key or use the get function

In [None]:
amino_acids_map["Leu"]

In [None]:
amino_acids_map.get("Leu")

To add a new dictionary element - subset the dictionay using the new key and assign the value or use the update function

In [None]:
amino_acids_map["Ala"] = "Alanine"

In [None]:
amino_acids_map["Leu"] = "Leu cinne"

In [None]:
# update if key exists, add a new element if it does not
amino_acids_map.update({"Leu": "Leucine"})

In [None]:
amino_acids_map.update({"Val": "Valeine", "Gln": "Glutamine"})

In [None]:
amino_acids_map.values()

In [None]:
amino_acids_map.keys()

In [None]:
amino_acids_map.items()

Removing elements is done using: del - subset by key to get element, pop(), popitem() or clear().<br>
It is done using the key of an element.<br>
popitem() removes the last element.<br>

In [None]:
amino_acids_map.popitem()

#### <font color = "red">Exercise</font> 
Remove the element with Key "Lys" from the dictionary<br><br>

_______________
### Sets - a collection of elements, is unordered, has no duplicates, can be changeable or unchangeable.

A set object is an unordered collection of distinct objects.<br>
A set is mutable, unless it is a frozenset.<br>
To create a set use braces to separate set elements or the set([iterable]) constructor.<br>
Elements cannot be changed/updated, but can be added and removed.<br>
The update() function can be used to add multiple elements.<br>

In [None]:
dir(set)

In [None]:
set()

In [None]:
model_organisms = {"human", "mouse", "rat", "fruit fly", "worm", "E coli"}


In [None]:
"human" in model_organisms


In [None]:
model_organisms.add("yeast")


In [None]:
model_organisms.update(["zebrafish","frog"])


In [None]:
# remove element from set using the remove function
# raises exception if element does not exist

model_organisms.remove("rat")
model_organisms

In [None]:
# remove element from set using the discard function
# does not raise exception if element does not exist

model_organisms.discard("rat")
model_organisms

In [None]:
model_organisms.discard("frog")
model_organisms

In [None]:
study_transcriptomics = {"human", "mouse", "rat", "fruit fly", "worm", "E coli"}
study_proteomics = {"rat, zebrafish", "frog", "yeast", "worm"}

In [None]:
study_overall = study_transcriptomics.union(study_proteomics)

https://docs.python.org/3/library/stdtypes.html#set

#### <font color = "red">Demo</font> 

Given the previous 2 sets compute the intersection and difference.<br>
Use the dir funtion to find out the name of the functions to use for these operations.  
Update study_proteomics with the elements of study_transcriptomics.<br>
Check if study_proteomics is now equal with study_overall. <br>


____________

### Control Structures

**Control Structures** are constructs that **give a block of code context as well as help repeat or selectivelly run it**.

Control Structures control how the associated code works by "wrapping" it within its structure.

Types of control structures:
* Sequential: default mode - execution of statements line by line - like reading a book or following a recipe
* Selection: decisions, branching - allows the distinction between two or more disjunct situations in which different sets of statements should be executed <br>
Selection structures:
    * if &emsp;&emsp;&emsp;&emsp;&emsp;&emsp; - considers additional code on only one branch
    * if-else &emsp; &emsp;&emsp;  - considers two branches
    * if-elif-else &emsp;&nbsp;    - considers three branches
* Repetition: looping - allows the repeatition of a set of statements multiple times <br>
Loop structures:
    * while &emsp;  &emsp;   &emsp; - number of repetitions **unknown**
    * for &emsp;&emsp;&emsp;&emsp;&emsp; - number of repetitions **known**

```python
if [not] <condition>:
    <statements>
```

Selection and repetition statements typically rely on conditions that are evaluated as either true or false.

### if: selection/decision control structure

if statements check specific conditions through the processing of the code <br>
The syntax is as follows:

In [None]:
if 5 > 4:
    print('5 is greater than 4!')

Add more functionality by using the else keyward

```python
if [not] <condition>:
    <statements>
else:
    <statements>
```

In [None]:
cond_res = "CG" in "ACGT" and len("AACTGGATC") == 7
if cond_res:
    print('This is True')
else:
    print('This is False')

What if we need more branches? <br>
E.g.: Checking for a value range and doing differet things depending on the range

```python
if [not] <condition>:
    <statements>
elif <condition>:
    <statements>
else:
    <statements>
```

In [None]:
# Comprehensive if-elif-else statement
int_var = 20
if int_var < 5:
    print("Less than 5")
elif int_var < 10:
    print("Between 5 and 10")
else:
    print("Equal to or greater than 10")

#### <font color = "red">Exercise</font>
Write an if statement to check who is the youngest of three people with ages:
age1, age2, age3


In [None]:
age1 = 20
age2 = 50
age3 = 34

### for: the repetitive control structure with a known number of steps

To loop through a sequence of elements is to iterate

```python
for var in sequence:
    statements
```

In [None]:
# Print elements of a range
for i in range(3):
    print(i)

In [None]:
# nested loop
for i in range(3):
    for j in range(4):
        print(i,j)

In [None]:
list_var = ["elem1","elem2","elem3"]
for list_elem in list_var:
    print(list_elem)

In [None]:
for i in range(len(list_var)):
    print(list_var[i])

In [None]:
# use the enumerate to create pairs of index value
for i, list_elem in enumerate(list_var):
    print(i, list_var[i])

#### <font color = "red">Demo</font> 
Codons are sequences of three nucleotides<br>
Nucleotides are: 'A', 'C', 'G', 'T'<br>
Write code that outputs all possible combinations of codons<br>
Then add a condition to display only the ones that have the group 'AC'

<b>Think of a repetitive case when you could not use a for loop readily</b>

### while: the repetitive control structure with unkown number of steps

loop stopped by a condition

<b>while loops make it easy to cause an 'infinite loop'</b>

```python
while condition:
    statements
```

In [None]:
# while counting steps
step = 1
while step < 10:
    print('Step ' + str(step))
    step = step + 1

In [None]:
# while testing input value
cond = True
while cond:
    value = int(input()) 
    if value > 10:
        print(cond, value, 'Greater than 10')
    else:
        print(cond, value, 'Lower than or equal to 10')
    cond = (value <= 10)

```python
# Infinite loop
cond = True
while cond:
    print('Infinite loop')
```

#### <font color = "red">Demo</font> 
Assign a variable sequence_length the value 100 <br>
Substract powers of 2 (2, 4, 8, 16,...) until the value drops below 0<br>
Print the power of 2 used in tha last substraction.

### More control statements

Control statements are special structures that have control behavior. 
They can be represented by a keyword:

```python
pass
break
continue
```

<b>pass: do nothing</b>

In [None]:
value = 20
if value == 20:
    pass

<b>break: stops loops

<b>break: allows the user to stop the closest loop</b>

In [None]:
# while break - it already has a condition that stops it but ...
cond = True
i = 0
while cond:
    if i == 3:
        break
    else:
        print(i)
        i = i + 1

In [None]:
# for break - kind of makes it a while
for i in range(10):
    if i==3:
        break
    else:
        print(i)

<b>continue: just continue looping</b>

Forces the loop to move on to the next item, skipping anything after it (that is in the loop) 

In [None]:
# Continue example
for i in range(10):
    if i > 3 and i < 7:
        continue
    else:
        print(i)
    print('This statement gets executed')