# Table of Content <a id='toc'></a>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Importing modules](#1)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[1. Import the module name, without adding all of its content to your namespace.](#2)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[2. Import the module name as an alias](#3)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3. Import specific objects from a module](#4)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Micro-Exercise](#5)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Importing your own module](#6)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Python native modules: `os`](#7)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Other python built-in modules of interest...](#8)

&nbsp;&nbsp;&nbsp;&nbsp;[Install modules needed for the upcoming notebooks](#9)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Exercises 4.1 and 4.2](#10)

&nbsp;&nbsp;&nbsp;&nbsp;[Additional Theory and Exercises](#11)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Python native modules: `time`](#12)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Bonus: the `%timeit` special function (only works in Jupyter Notebooks)](#13)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Building your own modules](#14)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[From a regular script](#15)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[From a Jupyter notebook](#16)


# Module 4 - working with modules <a id='0'></a>
---------------------------------------------------

Almost everything you'll want to do with Python has already been implemented by someone else. 
Many workflows have been developed into **modules** which can be **imported** into your Python session.

There are quite a few modules which come bundled with the basic Python installation (native modules), and even more if you installed Python via the **Anaconda distribution** (which you in principle have for this course).

Additional modules can be installed to your (environment-specific) library using the <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html">`conda package manager`</a> or <a href="https://pypi.org">`pip`</a>, both of which are shipped with Anaconda. 

> **It is not advisable to mix installations via `conda` and via `pip` within a Conda environment.**  
  So it's best if you stick to using conda for the time being.

<br>

**Official python documentation**, including python native modules: https://docs.python.org/  

<br>


[back to the toc](#toc)

<br>

## Importing modules <a id='1'></a>

There are a number of ways to **import modules** into your code. Modules can be imported entirely, or partially.
Here are 3 different ways of importing a module (examplified here with the `os` module):


[back to the toc](#toc)

<br>

### 1. Import the module name, without adding all of its content to your namespace. <a id='2'></a>
* This is the **simplest, and most frequently used**, way to import a module.
* Any object of the module (e.g. a function) must be called using the syntax: **`modulename.object`**
  as the name of the object is not directly available in the namespace - only the name of the module is.
  This is actually a good thing because:
    * It avoids adding a lot of names to the namespace (most of which we probably don't use).
    * It gives an indication of where the function/class/object was taken from.

In [None]:
import os 
print("Your operating system is:", os.name)
print("The current work dir is:\n", os.getcwd(), sep="")

In [None]:
import statistics
statistics.mean(range(101))

<br>

**Warning:** trying to call directly the a function (in this case `mean()`), without prefixing it with its module name raises a `NameError` (because the name individual functions are not imported into the namespace).

In [None]:
import statistics
mean(range(101))

<br>


[back to the toc](#toc)

<br>

### 2. Import the module name as an alias <a id='3'></a>
This is essentially the same as the first solution above, with the only difference that the module name is given an alias.  
This is generally used for modules with a long name.

In [None]:
import statistics as stats
stats.mean(range(11))

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.plot(np.linspace(0, 10, 100), [np.sin(x) for x in np.linspace(0, 10, 100)], color='darkorange')
plt.show()

<br>


[back to the toc](#toc)

<br>

### 3. Import specific objects from a module <a id='4'></a>
This is useful when you only need a limited number of objects from a module.

In this example, we only import the function `getcwd()` and the attribute `name` from the `os` module:

In [None]:
from os import getcwd, name
print("The type of the operating system running this Jupyter instance is:\n ->", name)
print("The current working directory is:\n ->", getcwd())

At first, this third method may appear nicer as it leads to shorter code. However, it often **hampers code readability**: you now have a variable called `name` but it is not directly obvious that it contains the name of the type of the os that you are operating on!

Therefore **the third method should be used with parcimony**: only in specific cases, e.g. when you need only a specific function (with a specific name) from a very large module for instance.

Finally, it is also possible to import all the object from a module at once, doing something like 
`from os import *`. While it might again look convenient, it is in reality **bad practice**, and we only show it here so you know to **avoid it** when you see it! This is because it:
  1. Unnecessarily pollutes your namespace (i.e. creates many new names that you will not use)
  2. Can lead to unpredictable results, since the content of a module might change over time and
     you are simply importing it all without any check of what it actually is.


In [None]:
from pandas import *

<br>


[back to the toc](#toc)

<br>

### Micro-Exercise: <a id='5'></a>
* Compute the base 10 logarithm of the factorial of 101 (in mathematical notation: `log10(101!)`).
  **Hint:** have a look at the [math module](https://docs.python.org/3/library/math.html)

<br>


[back to the toc](#toc)

<br>

## Importing your own module <a id='6'></a>

Often it can be useful to import your **own module**, typically so you can:
* **Re-use elements** - e.g. a function that you wrote earlier.
* **Organise your code** into multiple files, e.g. your main workflow in one file, and functions 
  grouped by category in different files.

This is done exactly like with built-in and external modules:

In [None]:
import my_own_module
help(my_own_module)

In [None]:
import my_own_module
my_own_module.greeting()
my_own_module.greeting(my_own_module.DEFAULT_USER)

In [None]:
from my_own_module import greeting, DEFAULT_USER
greeting(name="Bob")
greeting(DEFAULT_USER)


In [None]:
import my_own_module as mom
mom.greeting(name="James")

<br>
<br>


[back to the toc](#toc)

<br>

## Python native modules: `os` <a id='7'></a>

The <a href="https://docs.python.org/3/library/os.html">`os`</a> module is a **native python module** (meaning it comes installed with base python) **designed to manage interactions with the operating system**.  
It greatly enhances code portability, as it allows you to run the same code on different platforms (Linux, Windows, MacOS).

Here we give an overview of a few useful functions from `os`, but there are plenty more.


**Get** and **set working directory** with:
* `os.getcwd()` - returns the current working directory.
* `os.chdir(path)` - sets the working directory to `path`.

#### Examples

In [None]:
import os

current_wd = os.getcwd()
print('Current working dir:', current_wd, '\n')

In [None]:
os.chdir('../solutions')
print('Working dir changed to:', os.getcwd(), '\n')

In [None]:
os.chdir(current_wd)
print('Working dir is now again:', os.getcwd(), '\n')

<br>

#### Manipulate files and directories:
* `os.mkdir(path)` - creates a new directory non-recursively. To create directories recursively use `os.makedirs(path)`.
* `os.rmdir(path)` - deletes `path` if it is an empty directory.
* `os.remove(path)` - deletes the file `path` (does not delete directories, even if empty).
* `os.listdir(path)` - lists the content (files and directories) of `path`.

#### Manipulate paths:
* `os.path.basename(path)` - returns the **basename** of a path, i.e. the last element (file or dir) of a path.
* `os.path.dirname(path)` - returns the parent directory of the last element of a path.
* `os.path.isfile(path)` - returns `True` if `path` is an existing regular file (note: follows symlinks
  -> returns `True` for symlinks).
* `os.path.isdir()` - returns `True` if `path` is an existing directory.
* `os.path.join(path1, path2, ...)` - returns a new path by appending all paths passed as arguments one after the other.

<br>

#### Examples:
* Get the parent directory of a file or directory:

In [None]:
current_wd = os.getcwd()
print("Current working directory:", current_wd)

parent_dir = os.path.dirname(current_wd)
print("Its parent directory is  :", parent_dir)

* List content of a directory:

In [None]:
print(os.listdir(current_wd))

* Test if a path is a file or directory:

In [None]:
path_1 = os.path.join(current_wd, "00_jupyter_setup.ipynb")
path_2 = os.path.join(current_wd, "data")
path_3 = os.path.join(current_wd, "exam_solution.py")

for file_path in (path_1, path_2, path_3):
    
    print("Is '", os.path.basename(file_path), "' a file?  ", sep="", end="")
    
    if os.path.isfile(file_path):
        print("Yes, it is!")
    elif os.path.isdir(file_path):
        print("No, its a directory.")
    else:
        print("Looks like this file does not exist!")

<br>

* Example of a function that list the content of a directory.
  Can be used as **inspiration for exercise 4.1**

In [None]:
import os

def list_files_from_dir(path, show_hidden=False):
    """Prints files and directories found at a given path.
    Ignores files part of the ignored list.
    """
    
    # Print name of input directory:
    input_summary = "Content of directory: " + os.path.basename(path)
    if show_hidden:
        input_summary += " (including hidden files)"
    print(input_summary)
    
    # Print files in the directory.
    if not os.listdir(path):
        print(" - directory is empty")
        
    for f in os.listdir(path):
        if not f.startswith(".") or show_hidden:
            print(" -", f)
    
    # This simply adds an empty line at the end of the output.                        
    print("\n", end="")
    
    
# List files in the parent of the current working directory.
parent_dir = os.path.dirname(os.getcwd())
list_files_from_dir(parent_dir)
list_files_from_dir(parent_dir, show_hidden=True)

files_orig = os.listdir(path='.')


# Create a new directory:
new_dir = os.path.join(parent_dir, 'tmp_dir')
os.mkdir(new_dir)
list_files_from_dir(new_dir)
os.rmdir(new_dir)                # this removes the newly created directory


<br>


[back to the toc](#toc)

<br>

### Other python built-in modules of interest... <a id='8'></a>

There are [many more modules](https://docs.python.org/3/py-modindex.html) integrated to the basic python distribution. Here are a few of them:
* **time**: access and measure time (see Additional Theory section below).
* **argparse**: to manage LINUX-like options for your scripts.
* **random**: 	to generate random numbers with various common distributions.
* **collections**: contains some useful container classes.
* **itertools**: useful iterators. A must-go for combinatorics (eg. permutations, combinations, ...).

<br>
<br>


[back to the toc](#toc)

<br>

# Install modules needed for the upcoming notebooks <a id='9'></a>
----------------------------------------------------------------------------------

In the comming lessons, we will introduce you to several well known Python libraries that are particularly useful when doing bioinformatics or (biological) data-analysis.

* **matplotlib:** creating graphics and images with python.
* **pandas:** efficient and easy data tables (DataFrame) reading, writing and manipulation.
  Allows to manipulate tabular data in a similar way as in R dataframes.
* **biopython:** for manipulating biological sequences, database records and phylogenetic trees.
* **numpy and scipy:** efficient matrix manipulation, and statistical functions.

Use the following code to ensure every library is properly installed :


In [None]:
# Note: you may comment-out any library you are not interested in.

import Bio            # biopython : bioinformatics in python.
import matplotlib     # create high-quality plots.
import numpy          # powerful array structure for fast numerical computation.
import scipy          # scientific computing package, with linear algebra and statistical tests.
import pandas         # powerful DataFrame structure that mimics R dataframe. A must for data analysis.
print('All libraries imported successfully')

If any of these fail, install them by typing the relevant command (see below) in your **terminal (Linux and MacOS)**, or the **conda console (Windows)** - not in the Jupyter Notebook:

* **biopython:** type `conda install -c anaconda biopython`
* **matplotlib:** follow instructions from [here](https://github.com/conda-forge/matplotlib-feedstock#installing-matplotlib-suite)
* **pandas:** type `conda install pandas`
* **scipy:** type `conda install scipy` 
* **numpy:** type `conda install numpy`

<br>
<br>


[back to the toc](#toc)

<br>

## Exercises 4.1 and 4.2 <a id='10'></a>


<br>
<br>
<br>



[back to the toc](#toc)

<br>

# Additional Theory and Exercises <a id='11'></a>
--------------------------------------------------


[back to the toc](#toc)

<br>

## Python native modules: `time` <a id='12'></a>

The <a href="https://docs.python.org/3/library/time.html">`time`</a> module is designed to measure and format time. It is very useful to monitor code execution times, e.g. when doing optimization.  
Here are a few interesting functions from the `time` module:
* `time.time()` - returns the **time in seconds since the epoch** as a floating point number.
  The epoch is the point from when the time starts (for your computer!), and is platform dependent.
  For Unix, the epoch is January 1, 1970, 00:00:00 (UTC - Coordinated Universal Time - the same as GMT).
* `time.gmtime()` - transforms the number of seconds given by `time.time()` into a human readable 
  UTC **`struct_time`** object.
* `time.localtime()` - same as `.gmtime()` but transforms to local time.
* `time.asctime(struct_time)` - formats `struct_time` objects into a nice string.

In [None]:
import time

current_time = time.time()
print("The current time is:", current_time)
print("Oh, sorry, I forgot you are a mere human... \nLet me convert that for you:", 
      time.asctime(time.localtime(current_time)), '\n')

Let's have a look at `time_struct` object.

In [None]:
current_time_struct = time.localtime(current_time) 
print("This is the structure returned by 'localtime()' and 'gmtime()':\n", current_time_struct, "\n")

Let's look at what the epoch is for your system:

In [None]:
print("The current Epoch is:", time.asctime(time.gmtime(0)))
print("The current Epoch is:", time.asctime(time.localtime(0)), "(in local time)")

<br>

#### Example: use the `time` module to measure the execution time of some code

In [None]:
# Implementation version 1 of the reverse complement function: uses if... else...
def reverse_complement_v1(seq):
    """Returns the reverse complement of a DNA sequence"""
    
    reversed_seq = "" 
    for nucleotide in seq:
        if nucleotide == 'A':
            reversed_seq += 'T'
        elif nucleotide == 'T':
            reversed_seq += 'A'
        elif nucleotide == 'G':
            reversed_seq += 'C'
        elif nucleotide == 'C':
            reversed_seq += 'G'
        else:
            pass
    
    return reversed_seq[::-1]


# Implementation version 2 of the reverse complement function: uses a dictionary.
def reverse_complement_v2(seq):
    """Returns the reverse complement of a DNA sequence"""
    
    nucleotide_complements = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
    reversed_complement = ""
    for nucleotide in seq[::-1]:
        reversed_complement += nucleotide_complements[nucleotide]
    
    return reversed_complement


Let's benchmark our 2 implementations:

To compute the time needed to run a function, we proceed as follow:
1. Store the current time at start of execution in the "start_time" variable.
2. Run our function.
3. Compute the runtime by comparing the current time before and after the function is run.

In [None]:
import time 

test_sequence = "ATAGAGCGATCGATCCCTAG"

start_time = time.time()                          
revcomp_v1 = reverse_complement_v1(test_sequence)
time_v1 = time.time() - start_time
print(time_v1)

start_time = time.time()
revcomp_v2 = reverse_complement_v2(test_sequence)
time_v2 = time.time() - start_time
print(time_v2)

In [None]:
def time_it(function, argument):
    """Compute runtime of a function. NB: Only works with function that have a single argument"""
    start_time = time.time()
    function(argument)
    return time.time() - start_time


test_sequence_patterns = ["ATAGAGCGATCGATCCCTAG",
                          "AAAAAAAAAAAAAAAAAAAA",
                          "CCCCCCCCCCCCCCCCCCCC"]

for sequence_pattern in test_sequence_patterns:
    print("Starting benchmark for pattern:", sequence_pattern)
    
    for sequence_length in (1e3, 1e6, 1e8):
        test_sequence = sequence_pattern * int(sequence_length / 20)
        
        time_v1 = time_it(reverse_complement_v1, test_sequence)
        time_v2 = time_it(reverse_complement_v2, test_sequence)
        
        print("Benchmark sequence length:", len(test_sequence))
        print("Time method 1 (uses if else):", round(time_v1, 5))
        print("Time method 1 (uses dict)   :", round(time_v2, 5))
        print("Time ratio: dict method is", round(time_v1/time_v2, 2), "times faster.\n")
    

<br>


[back to the toc](#toc)

<br>

### Bonus: the `%timeit` special function (only works in Jupyter Notebooks) <a id='13'></a>

`%timeit` is a so-called ["magic function"](https://ipython.readthedocs.io/en/stable/interactive/tutorial.html?highlight=timeit#magic-functions) for the IPython shell (the shell in which Notebooks run). See [here for details](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).

* usage: `%timeit my_function(...)`

Let's compare the two implementations of GC content computing from exercise 2.3:

In [None]:
from solutions.solution_23 import get_GC_percent, get_GC_percent_2
# get_GC_percent   => single "for nucleotide in seq" loop.
# get_GC_percent_2 => seq.count("G") + seq.count("C")

sequence_pattern = "ATAGAGCGATCGATCCCTAG"
for sequence_length in (1e3, 1e6):
    test_sequence = sequence_pattern * int(sequence_length / len(sequence_pattern))
    print("Sequence length:", len(test_sequence))    
    
    print("-> single 'for' loop:            ", end='')
    %timeit get_GC_percent(test_sequence)
    
    print("-> double call to 'str.count()': ", end='')
    %timeit get_GC_percent_2(test_sequence)


<br>



[back to the toc](#toc)

<br>

## Building your own modules <a id='14'></a>

Building your own module in python is fairly easy.


[back to the toc](#toc)

<br>

### From a regular script <a id='15'></a>
Any python script - i.e. a plain text file with `.py` extension and some python code in it - can be imported as a module. The only restriction is that the imported module must either:
 * Be in the same directory as the code that imports it.
 * Have been installed with anaconda: [here's an idea on how to do this](https://stackoverflow.com/questions/49474575/how-to-install-my-own-python-module-package-via-conda-and-watch-its-changes)
 * Be in a directory listed in the environment variable `PYTHONPATH` : [windows](https://docs.python.org/3/using/windows.html#excursus-setting-environment-variables), [UNIX-like](https://stackoverflow.com/a/3402176)
 
You can lean more about creating modules in this [python3 module online tutorial](https://docs.python.org/3/tutorial/modules.html).


[back to the toc](#toc)

<br>

### From a Jupyter notebook <a id='16'></a>
Although it is a bit tricky, you can import a Jupyter notebook as a module, so that you may re-use the functions you have coded in it.

E.g., to import a Jupyter Notebook named `MyOtherNotebook.ipynb`, you can use the following syntax that uses the `%run` "magic" command:
* `%run MyOtherNotebook.ipynb`

If you want to import a Notebook into a classical script, the [import-ipynb](https://pypi.org/project/import-ipynb/) module is what you are looking for.

In [None]:
%run 01_python_basics.ipynb