# Lecture 2: Data Types and Flow Control


> ### __At the core of any language:__

- Control the flow of the program

- Construct and access data elements

- operate on data elements

- Construct functions

- Construct classes

- Libraries and built-in classes

> "The Practice of Computing Using Python"Python, , 3 rd Edition", Punch & Enbody, 2017


#### Today we will present a very brief overview of the basics of __data types__, __flow control__ and __functions__.

------
## Part 1: Data Types

### What is a Type?

A type in Python essentially defines two things:

 </br>

- the __internal structure__ of the type (what it contains)

    - For example, Python allocates a fixed number of bytes of space in memory for each variable of a normal integer type. Tpically, an integer occupies four bytes, or 32 bits. (~ billions)

 </br>
 
- the kinds of __operations__ you can perform:

    - `'abc'.capitalize()` is a method you can call on strings, but not integers.

    -  Each data type has its own `operators` to compute the operations: Operators are special symbols in Python that carry out arithmetic or logical computation. An operator takes one or more operands, computes a result, and makes that result available to Python for further use. 

        - e.g.,`+`, `<`, `-`, `*`, `**`, `//`, `!=`, `==`

 </br>
 
Some types have multiple elements (__collections__); we'll see those later

 </br>
 

------

### Fundamental Types

#### __Numbers__

__Before we start: help your future self! Always give variables meaningful names.__ 

In [1]:
# intergers


In [2]:
# Floating point numbers


In [1]:
# You can also do math on numbers

stock_conc = 50
final_vol = 100
final_conc = 10





#### __Bonus coverage: The function type__

A function is itself a data type in Python. You can think of the name of the function as a variable that contains the address of the function’s lines of code. The several lines are packaged together to be reused later.

> __Python format of defining a function:__

``` python
#---------------------------------

def my_function(argument1, arguement2): 

    '''what this function does''' #docstring
    
    function body

    return values # return statement

#---------------------------------
```


In [4]:
# create a dilution calculator function:



There are functions that are very useful and wildly applicable that has been written into scripts by others, and can be inported to be used. Usually, the collection of related functions are called a `library`.

In [None]:
import math # e.g. the math library

print("180 / pi Degrees is equal to Radians : ", end ="")
print (math.radians(180 / math.pi)) 

In the coming classes the several libraries we will almost always import are `numpy`, `pandas`, and `matplotlib.pyplot`.

#### __Booleans__

In [None]:
# Boolean variables represent quantities that are True or False

read_gene_1 = 56
threshold = 5



#### __Strings__

In [None]:
# Strings can be defined using either single or double quotes


In [None]:
# Multiline strings can be defined using triple quotes (single or double)

address = """
Cold Spring Harbor Laboratory
1 Bungtown Rd.
Cold Spring Harbor, NY 11724
"""
print(address)

> __small detour for Unicode__:

Today’s programs need to be able to handle a wide variety of characters. Applications are often internationalized to display messages and output in a variety of user-selectable languages; the same program might need to output an error message in English, French, Japanese, Hebrew, or Russian. Web content can be written in any of these languages and can also include a variety of emoji symbols. Python’s string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters.

[Unicode](https://www.unicode.org/) is a specification that aims to list every character used by human languages and give each character its own unique code. The Unicode specifications are continually revised and updated to add new languages and symbols.

How to write strings using Unicode?
- use Unicode escapes! (`\u...`)

In [None]:
# Chinese character example:
print('\u4f60\u597d')

In [2]:
# The '+' sign concatenates strings

# EcoRI recognition site:

top_strand_5 = 'G'
top_strand_3 = 'AATTC' 
bottom_strand_3 = 'CTTAA'  
bottom_stran_5 = 'G'



Here is the DNA sequence of the multiple cloning site (MCS) on the plasmid [pcDNA5](https://www.addgene.org/vector-database/2132/), a popular vector for mammalian gene expression.

In [4]:
# It is simple to test if one string is contained within another

seq = 'GAGACCCAAGCTGGCTAGCGTTTAAACTTAAGCTTGGTACCGAGCTCGGATCCACTA' \
      'GTCCAGTGTGGTGGAATTCTGCAGATATCCAGCACAGTGGCGGCCGCTCGAGTCTAG' \
      'AGGGCCCGTTTAAACCCGCTGATCAGCCT'

# Does this MCS contain a restriction site for NheI (GCTAGC)? 


In [5]:
# How about for MscI (TGGCCA)?


In [7]:
# The len() function tells you the length of a string



In [6]:
# The contents in a string can be indexed using brackets


There are many functions and methods for strings. You'll encounter the use of some of them for sequencing data in __Exercise_2.1__.

### Converting Types

a character `'1'` is not an integer `1`!

In [None]:
# int(some_var)returns an integer, can fail if the original variable cannot be an interger!


# my_int = int(5.2)
# my_int = int('five')

In [None]:
# float(some_var)returns a float


In [None]:
# str(some_var) returns a string

my_str = str(5)
print('my_str =', my_str, '; type is', type(my_str))

In [None]:
# input() fuction always returns a string
min_copy = input('threshold transcript copy number:')
3 < min_copy 

-----

### Collections types

#### __lists and tuples__




In [17]:
# Define a list using brackets and commas.

lb_ingredients = ['Tryptone', 'NaCl', 'Yeast extract', 'Distilled water']

# can contain variables of different types: 

mixed_dtype_list = [1, 'two', 3.0, 'four', 5]


In [18]:
# Lists can be indexed using brackets just like strings can.



First element: Tryptone
Last element: Distilled water
The list reversed: ['Distilled water', 'Yeast extract', 'NaCl', 'Tryptone']


In [19]:
# Use 'in' to test whether an element is contained in a list.


False

In [None]:
# Change an element in a list.
print('Before: ', lb_ingredients)
lb_ingredients[-1] = 'Micropore water'
print('After:', lb_ingredients) 

In [None]:
# Append an element to the end of a list.
print('Before:', lb_ingredients)
lb_ingredients.append('NaOH')
print('After:', lb_ingredients)

In [None]:
# You get an error if you try to access an index that doesn't exist.
lb_ingredients[10]

In [None]:
# You also get an error if you pass a non-integer as an index.
lb_ingredients[4.0]

In [24]:
# To create a list of numbers from 0 to n, use list(range(n))


In [None]:
# Sort a list of numbers
vals = [0,2,4,6,8,1,3,5,7,9]


In [None]:
# Tuples are like lists, though they are defined using parentheses instead of brackets.
# Functions often pass tuples (not lists) back to the user.

t = (0, 1, 2, 3, 4)
print(t)

#### __Dictionaries__
Dictionaries are one of Python's most useful datatypes. 

They can be thought of as a list of key-value pairs that allow values to be rapidly looked up via keys. 
- Keys can be any (immutable) variable. Values can be anything.

In [8]:
# Dictionaries are defined using braces, colons, and commas


In [9]:
# Access dictionary elements using a "key" enclosed in brackets


In [10]:
# You can replace and add elements to a dictionary after it is created.


In [11]:
# From a dictionary, you can get a list of both the keys and the values.


#### __Exercise 1__



In [31]:
# The sequence we've defined before; note how to define a long string over multiple lines
seq = 'GAGACCCAAGCTGGCTAGCGTTTAAACTTAAGCTTGGTACCGAGCTCGGATCCACTA' \
      'GTCCAGTGTGGTGGAATTCTGCAGATATCCAGCACAGTGGCGGCCGCTCGAGTCTAG' \
      'AGGGCCCGTTTAAACCCGCTGATCAGCCT'

__E1.1__: Using the string method `.find()`, find the location(s) of the above restriction sites within the MCS.

In [32]:
# Answer here

__E2.2__: Using the string method `.replace()`, compute the RNA sequence of the DNA sequence above. 

In [33]:
# Answer here

**E2.3**: We have not yet discussed sets. Using Google, figure out what `set` objects are and explain what they represent. In particular, explain why Python evaluates {2,3,3} < {1,2,3} as True.

--------

## Part 2: flow control

__Selection__:

- Selection is how programs make choices, and it is the process of making choices that provides a lot of the power of computing. In python we use conditional statement to evalulate and select statements to execute.


__Repeatition__:

- Besides selecting which statements to execute, a fundamental need in a program is repetition: repeat a set of statements under some conditions

<br/>


### Selection

#### __Python Selection: `if` statements:__


> __Python format:__

``` python
#---------------------------------

if boolean expression:
    code body # watch out for indentations! 
    
#---------------------------------
```


We evaluate the boolean (`True` or `False`); if `True`, execute all statements in the suite

- example boolean operators: `<`, `>`, `<=`, `==`, `!=`


In [34]:
# review on boolean logic:
read_gene_1 = 56
threshold = 5

read_gene_1 > threshold 

True

In [35]:
# use if statement to output a statement if read is above a threshold.

read_gene_1 = 56
threshold = 5



Gene_1 will be included in the analysis.


#### __Python Selection with multiple rounds: `if/else`__

> __Python format:__

``` python
#---------------------------------

if boolean expression: # evaluate the boolean, if True, run suite1
    code body 1
else:
    code body 2 # if False, run suite2

#---------------------------------

```


In [None]:
# use if/else statement to output statement for inclusion of a gene.

read_gene_1 = 56
read_gene_2 = 3

threshold = 5

if read_gene_2 > threshold:
    print('Gene_2 will be included in the analysis.')
else:
    print('We disregard gene_2 in the analysis.')

#### __Python Selection with multiple rounds: `if/elif/else`__

> __Python format:__

``` python
#---------------------------------

if boolean expression A: # evaluate the boolean, if True, run suite1
    code body 1
elif boolean expression B:
    code body 2 # if true, run suite2
else:
    code body 3 # if both A and B are false, run suite2

#---------------------------------

```


In [None]:
# use if/elif/else statement:

read_gene_1 = 56
read_gene_2 = 3
read_gene_3 = 300

threshold = 5
high_exp = 100

if read_gene_3 > high_exp:
    print('Gene_3 is considered as highly expressed.')

elif read_gene_3 > threshold:
    print('Gene_3 will be included in the analysis.')
    
else:
    print('We disregard gene_3 in the analysis.')

-----

### Repetition (loop)

#### __`while` and `for` Statements__

__while statement__:

– repeats a set of statements while some condition is `True`

– more general repetition construct

__for statement__:

– useful for iteration, moving through all the elements of data structure, one at a time

<br/>


> __Python format:__

``` python
#---------------------------------

flag = True # define a variable tracker

while flag:

    code body

# exit when flag is no longer true

#---------------------------------

```

- while loop will repeat the statements in the suite while the boolean is `True` (or its Python equivalent)

- if the Boolean expression never changes during the course of the loop, the loop will continue __forever__


In [None]:
# find cells with intensity that passed the threshold:
fluorescent_intesnity = [10.1, 15,3, 78.5, 0.3, 46.9, 0.2, 0.7, 2.2, 12.5, 33.2]

passed_index = []


__General Approach to a `while` loop__:

__i)__ outside the loop, initialize the boolean

__ii)__ somewhere inside the loop, perform some operation which changes the state of the program, eventually leading to a False boolean and exiting the loop

Must have both!

<br/>


#### __`for` loop and iteration__

The `for` statement iterates through each element of a collection (list, etc.)

> __Python format:__

``` python
#---------------------------------

for element in collection:

    code body

#---------------------------------

```

We actually have encountered collection and their elements: 
- a string and the characters;
- a list and objects in the list;
- a dictionary and a key, value pair in the dictionary.

Those objects are  __"iterable"__: capable of returning its members one at a time, permitting it to be iterated over in a for-loop. 

In [12]:
# iterate over genes to decide whether to include them based on reads:

read_gene_1 = 56
read_gene_2 = 3
read_gene_3 = 300
read_gene_4 = 129

read_list = [56, 3, 300, 129]

threshold = 5
high_exp = 100



There are other objects that are iterable and can be unpacked into lists:

In [None]:
full_name = 'Barbara McClintock'

# example iterable object classes:

# range()
v_iter = range(10) 
print(type(v_iter)) 
print('iterable:', range(10)) 

# make the range() a list:
v_list = list(range(10))
print(type(v_list))
print('list:    ', v_list, '\n')

# enumerate():
e_iter = enumerate(full_name)
print(type(e_iter))
print('iterable:', e_iter)

# make the enumerate() a list:
e_list = list(enumerate(full_name))
print(type(e_list))
print('list:    ', e_list, '\n')

# dictionary:
d = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
print(type(d.keys()))
print('iterable:', d.keys())
print(type(list(d.keys())))
print('list:    ', list(d.keys()))

In [None]:
# find cells with intensity that passed the threshold, 2nd way:

fluorescent_intesnity = [10.1, 15,3, 78.5, 0.3, 46.9, 0.2, 0.7, 2.2, 12.5, 33.2]

passed_index = []



[3, 5, 10]



#### __Exercise 2__
__Exercise 2.1__ Fill in the code to complete the `while` loop to know the date when the intensity of P32 is below threshold.

In [None]:
# Assume you have a vial of P32
half_life = 14.3 # days

# Initially, the vial is at 100% activity
current_activity = 100

# As long as it has ~10% activity, it's still good to use for radioactive gels
min_activity = 10

# Compute how many days the vial is good for before it needs to be thrown out
num_days = 0

while #### (1) fill in the code here ####

    # exponential decay 
    current_activity /= 2**(1/half_life) 

    # keep track of the days:
     #### (2) fill in the code here ####
    
print('P32 activity will be reduced to %.1f%% by day %d.'%(current_activity, num_days))

__Exercise 2.2__: use `for` loop and python dictionary to translate the sequence in Exercise 1 (there are many ways to achieve the same thing!)

In [None]:
# Answer here

<br/>

--------

You can do this in one line with [list conprehension](https://docs.python.org/3/tutorial/datastructures.html)! Try it on your own!

But can also be an easy task using __vectorized computations__ with `numpy`.

__NumPy__ stands for Numerical Python. NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices.



In [43]:
import numpy as np

import matplotlib.pyplot as plt # library for plotting 

%matplotlib inline 
# the plotting will be stored in jupyter notebooks

What we priviously need to do in a loops become a one liner:

In [None]:
np.argwhere(np.array(fluorescent_intesnity) > 20)

if we are dealing with multiple dimensions and very big data set, computation with numpy arrays is way faster than lists.

Can apply math functions to all entries in a n-dimentional array (see numpy [math functions](https://numpy.org/doc/stable/reference/routines.math.html)):

In [45]:
# create a mesh of x values:

# find the sine of each x:


In [None]:
# plot x, y:

In [None]:
# find correlation coefficient matrix:


mouse_day_1 = np.array([2, 1, 1, 4, 3])
mouse_day_2 = np.array([2, 2, 1, 5, 2])
mouse_day_3 = np.array([1, 2, 1, 4, 2])

mouse_3days = np.stack((mouse_day_1, mouse_day_2, mouse_day_3), axis=0)
  
# Covariance matrix
print("\nCovariance matrix of the recordings of the two days:\n", 
      np.corrcoef(mouse_3days))