# Your First Jupyter Notebook

This cell renders Markdown code to text. You can change the cell's type with the drop-down menu in the menu bar.

You can format Python code (or other languages) within Markdown text using backticks. 

Double-click the cell to enter edit mode. 

Shift-Enter will "execute" the cell's code.

```python
greetings = "Hello, world!"
```

During the course, feel free to add to the existing cells and create new cells. The explanations and subsequent discussion are not presented exahustively here. You should customize your notes as you wish, provided it results in a clear record of everything we worked on. 

You can always download this notebook again, but it's probably a good moment to **make a copy**.

In [None]:
greetings = "Hello, world!"
print(greetings)

# Configuring Jupyter - Working Directory

The easiest solution to starting Jupyter Notebook in a particular working directory:

* open a terminal (Anaconda Prompt in Windows, or Unix-type shell) and change to the directory of interest
* execute `jupyter notebook`

Alternatively you can create and use a config file. Open a shell or command prompt (cmd in Windows) and type:
    
    jupyter notebook --generate-config
    
This will create a configuration file. On Windows this normally lives at: 

    C:\Users\<username>\.jupyter\jupyter_notebook_config

And on Unix-type machines this normally lives at:

    ~/.jupyter/jupyter_notebook_config.py

Find the following line (usually 179):

    #c.NotebookApp.notebook_dir = ''

And change this to:

    c.NotebookApp.notebook_dir = '<your_path>'

Be sure to use forward slashes in your path.

Yet another approach is to use a Python module called "OS" (more on modules later):

    import os
    os.chdir('<your_path>')

# Configuring Jupyter - Default Browser

If you prefer a specific browser for use with Jupyter Notebook, you can add the path to the browser executable/App in the config file as per above. 

The relevant line you are looking for is:

    c.NotebookApp.browser
    
On OSX, you could use the following to switch to Chrome:

    c.NotebookApp.browser = u'open -a /Applications/Google\ Chrome.app %s'
    
Otherwise just set the path to the executable of your favourite browser.

# Conda Notes - envs and packages

It should go without saying - *read the docs!*

https://conda.io/docs/user-guide

## Managing environments

Before creating any new environments, it is a good idea to install the [nb_conda_kernels](https://github.com/Anaconda-Platform/nb_conda_kernels) module. This will automatically add any new Python conda environments (or other Jupyter-compatible kernels) to your Jupyter Notebook sessions:

    conda install nb_conda_kernels

Creating an environment with a specific version of Python:

    conda create --name myenv python=2.7.11
    
To avoid any potential conflicts between modules, it is good practice to build the environment with all the modules you will need at the same time. *eg:*

    conda create --name myenv python=3.4 numpy scipy matplotlib pandas scikit-learn

Which environments are installed?

    conda info --envs
    
Which packages are installed?

    conda list --name myenv
    
Activating and deactivating an environment (Windows):

    # from the Anaconda Prompt
    activate myenv
    deactivate
    
Activating and deactivating an environment (Unix-type):

    source activate myenv
    source deactivate

## Installing modules/packages

There are two package managers associated with Python - conda (bundled with Anaconda) and pip. Generally they play nicely together. The pseudo-consensus seems to be to use conda for Python-only packages. Some packages which require compilation (thus a clear picture of which other library dependencies are on your computer) are better installed with pip.

    conda install mypkg
    pip install mypkg

In [None]:
# https://github.com/conda/conda/issues/8197
conda config --set channel_priority strict

# Data and Container Types

Python uses essentially the same data and container types as other programming languages. The basic types include:

* `bool` - the boolean constants `True` and `False`
* `int` - integer numbers, 1, -8, etc
* `float` - double-precision floating point numbers, 
* `str` - text, strings of byte-size characters
* `list` - a "mutable" list of Python "objects"
* `dict` - a dictionary of "key" and "value" pairs
* `set` - a "unique" set of elements
* `tuple` - an "immutable" list of Python objects

Lists, tuples, and dicts are data containers/structures for storing data of various types. These are essential in Python, and we'll spend quite some time working with them.

Everything in Python is an "object". We will continue to discuss what this means in the context of "object-oriented programming".

# Operators

There are several types of operators in Python, including arithmetic, comparison, assignment, logical, and a few other special cases.

## Arithmetic

* +
* -
* `*`
* /
* `**`
* % (modulus, or the remainder)


## Comparison

* ==
* !=
* >
* <
* `>=`
* `<=`


## Assignment

* =
* +=
* -=
* `*=`
* /=
* `**=`
* %=


## Logical and Others

* and
* or
* is
* is not
* in
* not in
* % (string formatting context)
* :
* #


In [None]:
0.1 + 0.2 == 0.3

In [None]:
0.1 + 0.2

In [None]:
"hello" == 'hello'

# Strings

Strings are sequences of characters (letters, digits, spaces, special characters). Strings should be surrounded with double or single quotation marks. Usually double quotation marks are used, more on this later when we discuss style and "Pythonic" code.

Strings are particularly relevant for biology - we often find ourselves trying to **interpret** and **parse** the output of a particular piece of software, and potentially modify the output to a format specific to our needs/questions.

```python
myGeneSeq = "ATGCGACTGATCGATCGATCGATCGATGATCGATCGATCGATGCTAGCTAC"

myString = "Hoi zaeme"

myPhrase = "He's dead, Jim"
```

Multi-line strings can be delimited by three quotes:

```python
MontyPy = """Let me tell you something, my lad. 
When you’re walking home tonight and some great 
homicidal maniac comes after you with a bunch 
of loganberries, don’t come crying to me!"""
```

Strings can be combined using the + operator:

```python
myPhrase = "This " + "is " + "really " + "happening"
```


Strings are often formatted with the special % operator:

```python
"This is day %d of %d of your first %s course" % (1, 2, "Python")
```



## String Literals

These are sometimes referred to as special characters. For example, tabs and newlines. How these special characters are encoded depends on the operating system and a number of other factors. Sometimes this can cause problems, particularly with newline characters. To access these special characters, you typically use the backslash character to "escape" the character which would otherwise have a different meaning. 
```python
print("tab" + "\t" + "delimited")
print("Let's put a newline" + "\n" + "in between")
```

In [None]:
print("Let's put a newline" + "\n" + "in between")

In [None]:
print("tab" + "\t" + "delimited")

In [None]:
greeting = "I want to print some quotes \"\" "
print(greeting)

# Functions

Functions **encapsulate** a sequence of commands or operations. "Arguments" can be supplied to functions, usually in the form of variable objects which the function acts upon.

A function is executed by calling it by name, followed by round brackets wherein the arguments are supplied.

```python
print(111111111*111111111)
```

equivalent to:

```python
a = 111111111
b = 111111111
print(a*b)
```

The most important Python function:

```python
help(max)
```

## Python Built-in Functions (as of v3.6, https://docs.python.org/3/library/functions.html):

abs()	dict()	help()	min()	setattr()
all()	dir()	hex()	next()	slice()
any()	divmod()	id()	object()	sorted()
ascii()	enumerate()	input()	oct()	staticmethod()
bin()	eval()	int()	open()	str()
bool()	exec()	isinstance()	ord()	sum()
bytearray()	filter()	issubclass()	pow()	super()
bytes()	float()	iter()	print()	tuple()
callable()	format()	len()	property()	type()
chr()	frozenset()	list()	range()	vars()
classmethod()	getattr()	locals()	repr()	zip()
compile()	globals()	map()	reversed()	__import__()
complex()	hasattr()	max()	round()
delattr()	hash()	memoryview()	set()

## Building your own functions

Of course, you will often want to define your own functions. 

```python
def greeting(name):

    print("Hello %s" % name)
```

The function starts with the `def` statement. The colon operator (`:`) is used to indicate the start of the **"block"** of code belonging to the function. Code blocks are **delimited by spaces or indentation** - this is essential in Python. Regardless of whether you use four spaces or a tab, it is important to be consistent!

After you've defined your function, you can call it like all other functions in Python:

```python
greeting("Jim Bob")
```

It is good practice to document your functions. You can do this by adding a "docstring" at the start of your function definition:

```python
def greeting(name):

    """
    This function greets you by name, because you not-so-secretly love hearing it.
    """
    
    print("Hello %s" % name)

help(greeting)
```

Results of a function can be returned as an object using the `return()` function:

```python
def greeting(name):

    """
    This function returns your greeting by name, because you not-so-secretly love hearing it.
    """
    
    return("Hello %s" % name)

myGreeting = greeting("Jim Bob")

print(myGreeting)
```

In [None]:
def greeting(name):

    """
    This function greets you by name, because you not-so-secretly love hearing it.
    """

    print("Hello %s" % name)

help(greeting)

In [None]:
def readData(..):
    ...
    
def calculations(...):
    ...
    
def writeReport(...):
    ...
    
    
myData = readData(path2data)
myResults = calculations(myData)
writeReport(myResults)


# Object Types - Strings and Numerical

What happens if we try to print the following?

```python
print("The sum of 1 + 2 equals " + (1+2))
```

How do we determine the type of a Python object?

How can we "coerce" object types?

In [None]:
print("The sum of 1 + 2 equals " + str(1+2))

In [None]:
print(1 + "somestring")

# Exercise 1

Write a function that accepts two (short) DNA sequences, a name for the sequence, and prints the FASTA formatted version of the two sequences concatenated together. Include the length of the concatenated DNA sequence in the fasta header.

# Methods

Methods are a special case of functions - they can be applied to only certain "classes" of objects.

We can determine the methods available for a particular object class using `dir()`

```python
geneFrag1 = "ATGCTCGACTAGCTACGACTAGCATCCGCGAGCGATCAGCT"
dir(geneFrag1)
```

Hmm, count() seems like it could be useful! Let's apply this method to our DNA sequence.

```python
geneFrag1 = "ATGCTCGACTAGCTACGACTAGCATCCGCGAGCGATCAGCT"
geneFrag1.count("A")
```

The syntax for methods are slightly different - they are connected/attached to an object using the dot notation.

# Exercise 2

Write a function that accepts a DNA sequence as an argument and returns the %GC content of the sequence.

# Exercise 3

Use the find() method to determine the position(s) of the motif "CTCGA":
    
GTGCCCCTCGAGAGGAGGGCGCGCGCCGCGCGCTCGACGCGATCGGCGCTCAGCGAGCGAGCTCCTCGAAGCGATCCGCGCGCGCT

What is a limitation here?

# Exercise 4

Use the replace() method to create the complement of the above sequence.

# Lists

Lists can be created and modified in a few ways:

```python
plants = ["Solanum lycopersicum","Solanum melongena","Solanum tuberosum"]

print(plants[0])

print(plants[(len(plants)-1)])

plants.append("Capsicum chinense")

print(plants[(len(plants)-1)])
```

Strings can also be coerced to lists:

```python
word = "supercalifragilisticexpialidocious"

word_as_list = list(word)

print(word_as_list[0:5])
```



## List subsetting/slicing

You saw in the above example how to return a subset of a lists using indices enclosed in square brackets `[]`. You can also reassign list elements in this way:

```python
word = "supercalifragilisticexpialidocious"

word_as_list = list(word)

word_as_list[0] = 'd'

print(word_as_list[0:5])

print(word_as_list[-7:])
```

Lists are referred to as "mutable" because we can modify or "mutate" them.




# Three-argument subsetting of lists

```python
seq = "ATGCTCAGCTGTACGATCGTAGCA"

print(seq[::3])
print(seq[::-1])

```

## Other useful list-related methods

Another very handy way to generate lists is by "splitting" a string or other data.

```python
wordy = """Dans l'attente de votre réponse, je vous prie d'agréer, 
Madame, Monsieur, l'expression de nos sentiments distingués"""

wordyList = wordy.split()

print(wordyList)
```

By default (without arguments) the split method will use any whitespace as a delimiter. You can also specify any arbitrary delimiter. To split a string into every character, simply coerce it with `list()`

Several methods work "in place", meaning they don't return a value and simply apply the modification to the object they are attached to. For example:

```python
plants = ["Solanum lycopersicum","Solanum melongena","Solanum tuberosum"]
plants.pop()
print(plants)
```

VS

```python
seq1_lower = "atcg"
seq1_upper = seq1.upper()
print(seq1_upper)
```


### What other methods apply to lists?

# Exercise 5

Print a tab-delimited output of the species in the following list (hint, use the join method):
    
```python
plants = ["Solanum lycopersicum","Solanum melongena","Solanum tuberosum"]
```

# Exercise 6

Write a function that returns the reverse complement of the following sequence:
    
GTGCCCCTCGAGAGGAGGGCGCGCGCCGCGCGCTCGACGCGATCGGCGCTCAGCGAGCGAGCTCCTCGAAGCGATCCGCGCGCGCT

# Sets

Sets are similar to lists, however they store **unique** elements.

```python
nameSet = set(["Larry","Mo","Curly","Larry","Mo","Curly"])

print(nameSet)
```

Note the difference in syntax, with the square braces within the round braces. This reflects one of the common applications of sets, which is to derive the unique elements of a list.

```python
nameList = ["Larry","Mo","Curly","Larry","Mo","Curly"]

nameSet = set(nameList)

print(nameSet)
```

**How can we add an element to a set?**

# Dictionaries

This class of data container is very useful for storing **"paired data"**. Consider, for example:

* gene or protein names and the corresponding sequence(s)
* gene or protein names and the corresponding annotation(s)
* restriction enzymes and their motifs
* etc

These are examples of **"key-value pairs"**. Normally the key is something akin to a name, and the value is some data that is paired to the key/name.

```python
enzymes = { 'EcoRI':'GAATTC', 'AvaII':'GG(A|T)CC', 'BisI':'GC[ATGC]GC' }

enzymes = {
    'EcoRI' : 'GAATTC',
    'AvaII' : 'GG(A|T)CC',
    'BisI'  : 'GC[ATGC]GC'
}

print(enzymes['EcoRI'])
```

This is similar to how we access data in lists, although instead of using an integer index we reference by key.

What happens if we try to access a non-existent key-value pair?

```python
print(enzymes['XbaI'])
```

There are a few ways to deal with this situation:

```python
if 'XbaI' in enzymes:
    print(enzymes['XbaI'])
    
else:
    print("RE does not exist in dict")
```

```python
print(enzymes.get('EcoRI', "NA"))
print(enzymes.get('XbaI', "NA"))
```

**What other useful methods apply to dicts?**

# Control Flow

There are several ways to control the flow of your code using logical statements. Above you saw how to implement a simple `if`/`else` control flow. You can add additional conditions using `elif`:

```python
if 'XbaI' in enzymes:
    print(enzymes['XbaI'])
    
elif 'EcoRI' in enzymes:
    print(enzymes['EcoRI'])
    
else:
    print("Couldn't find your RE")
```

## What is returned from the following code snippets?

* ```python
1000 is 10**3
```
* ```python
1000==10**3
```
* ```python
print(1000!=3)
```
* ```python
1000==10**3 and 1000 < 1001
```

# Loops

Now that you have a sense of the common data types and containers at your disposal in Python, we will learn how to efficiently **populate** and work with these structures.

If you ever find yourself in the following situation, you're doing it wrong:

```python
enzymes = {
    'EcoRI' : 'GAATTC',
    'AvaII' : 'GG(A|T)CC',
    'BisI'  : 'GC[ATGC]GC'
}

print(enzymes['EcoRI'])
print(enzymes['AvaII'])
print(enzymes['BisI'])
```

## Iteration with for loops

```python
for name in enzymes:
    print(name + "\t" + enzymes[name])
```


```python
mySeq = 'GTGCCCCTCGAGAGGAGGGCGCGCGCCGCGCGCTCGACGCGATCGGCGCTCAGCGAGCGAGCTCCTCGAAGCGATCCGCGCGCGCT'

for char in mySeq:
    print(char)
```

**How can one iterate over the values of a dict?**

When iterating over lists, one often encounters "list comprehensions". These are just another syntax of for loops which generate a list. Consider the following equivalent approaches:

In [None]:
dna_list = ['TAGC', 'ACGTATGC', 'ATG', 'ACGGCTAG']

# generate a list of lengths using procedural code
lengths = []
for dna in dna_list:
    lengths.append(len(dna))
print(lengths)

In [None]:
# do the same with a list comprehension
lengths = [len(dna) for dna in dna_list]
print(lengths)

In [None]:
# do the same with map()
lengths = map(len, dna_list)
print(type(lengths))
print(list(lengths))

## Iterating over sorted dictionaries

Dictionary key-value pairs are ordered in a way that is meaningful to human interpretation. However, we can sort dictionaries by keys and then retrieve the associated values in a particular order:

```python
SortedKeys = sorted(enzymes.keys())
```

We can then loop over the sorted keys using (we don't need to write the .keys()):

```python
for EnzymeName in SortedKeys:
    print EnzymeName, enzymes[EnzymeName]
```

To get a list containing sorted values we do:

```python
SortedValues = sorted(enzymes.values())
```

If we need the key-value pairs sorted by values it is a bit more complicated (we'll come back to lambda functions later):

```python
SortedValuesAsPairs = sorted(enzymes.items(), key=lambda x: x[1])
```

Again we are getting back a list.

The sorting order can be changed using the reverse=True parameter.

## Iteration with while loops

```python
x = 10
y = 1

while y < x:

    print(y)
    y += 1
```

**Be sure to include a condition that ends your while loop**. What happens if you don't?

```python
x = range(100)
y = 1

while y < max(x):

    y += 1

print(y)
```

Other ways to alter code flow through loops include `break` and `continue`:

```python
x = range(100)
y = 1

while y < max(x):

    y = y*2

    if y > 50:
        break
    
    else:
        continue

print(y)
```

**What other arguments can be supplied to `range()`?**

# Iterable Objects

This is an important concept for "functional programming". Formally, iterable objects in Python have the `__iter__` or `__get_item__` method which return an iterator. The latter object has the `__next__` method, which allows you to efficiently loop through the elements of the object. 

;tldr An iterable can be called using the following form:

```python
for element in iterable:
    # do something with element
```

Iterable objects include strings, lists, dicts, tuples, sets, ranges, etc. You can read more about this topic here: https://docs.python.org/dev/howto/functional.html#functional-howto-iterators

# Exercise 7

From the following DNA sequence, return a 5-mer sequence using a sliding window of 4 bases:

`ACGATCGATGCATGCTAGCTAGTTTATATGCGAGGCGATGCTAGTGATCGCGAGCGTACGCTAGCTAGTCGATGCCGGATCGAGCGTCGAT`

# Exercise 8

Write a function that iterates over a DNA sequence and counts and reports the frequency of each base.

# Modules

Almost everything you'll want to do with Python has already been implemented by someone else. Many workflows have been developed into **modules** which can be **imported** into your Python session.

Check out: https://pypi.org

There are quite a few methods which come bundled with basic Python installations. Additional modules can be installed to your (environment-specific) library using Anaconda, or from the command line using `conda` or `pip`. **It is not advisable to mix `conda` and `pip` within one Python environment.**

```python
import this
```

In [None]:
import sys

In [None]:
import os

In [None]:
print(type(os))

In [None]:
print(os.getcwd())

In [None]:
0.1 + 0.2

In [None]:
varA = 0.1 + 0.2
varB = 0.3
varA == varB

In [None]:
import math
    
math.isclose(varA, varB, abs_tol=0.01)

In [None]:
import this

**Any Python code can be imported, so you can also write your own library of methods!**

https://docs.python.org/3/tutorial/modules.html

# Exercise 9

Write a module which includes a function to generate the reverse complement of a DNA sequence. Import the module into your Jupyter Notebook session and generate the reverse complement of the following sequence:

`GCCACCCGTAGCTGGGGCGTAGCTAGTGTCGAGGCGAGCGGCGGCAGTCGATGCTAGCCTAGCATGCTGCTAGTGATAAAAAAATTTGG`

# Nested Data Structures

Dictionaries can store lists, lists can store dictionaries, dictionaries can store dictionaries, and lists can store lists! Basically, you can flexibly "nest" your data structures. This requires some careful thought regarding design.

```python
nest1 = { "A":[0,1,2], "B":[3,4,5], "C":[6,7,8] }
```


Imagine we are interested in a large number of proteins which we are characterizing biochemically (using robots, or many pipetting thumbs ;). We have the following information:

* Protein ID
* Protein MW
* Protein annotation
* Three numerical measurements for an assay

How might you structure this data in Python?

We will return later to implementing nested data structures.

```python
persons = [
        {'name': 'Naomi', 'age': 32, 'sex': 'F', 'status': 'Single'},
        {'name': 'Jane', 'age': 29, 'sex': 'F', 'status': 'Married'},
        {'name': 'Brian', 'age': 23, 'sex': 'M', 'status': 'Single'}
    ]

for person in persons:
    print(person['name'] + "\t" + str(person['age']) + "\t"
    + person['sex'] + "\t" + person['status'])
```

# Recap of Concepts

* components of a "computer"
* abstraction
* encapsulation
* functions vs methods
* code blocks
* object class
* iteration
* modularity