# Lab 1 : Computational Frameworks for Evolutionary Genomics

## Learning objectives

* Become aware of the field of computational evolutionary genetics
* Learn to use the Unix or OS X terminal or the Window Command Interpreter
* Learn to install and run Python programs on your computer

## Overview

In recent years, the field of evolutionary genetics has sifted towards requiring some knowledge of R, Python/Perl/C and the use of high performance computers (often requiring some fundamental Unix skill) available at national computing centers for working with large data sets typical in evolutionary genetics research.  While there are many great software packages available for particular computational problems in evolutionary biology, many software programs do not have a user interface (e.g. drop down menus and such) and are run in command line mode. The lab sessions in this course have been designed to give students an introduction to working with Python code in addition to learning some standalone software packages.  A complimentary course Advanced Genetics introduces the R programming language.  

### What is Bioinformatics?
Bioinformatics is the field of science in which biology, computer science, statistics and information technology merge into a single discipline. There are three important sub-disciplines within bioinformatics:

* The development of new algorithms and statistics with which to assess relationships among members of large data sets. 
* The development and implementation of tools that enable efficient access and management of different types of information. 
* The analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures. - paraphrased from old NCBI web sit.

### Bioinformatics...
* is a term coined in response to the high demand of techniques and resources for handling the explosion of molecular data.
* is a buzzword to describe a growing field.
* benefits from the physicists, chemists and mathematicians crossing over into biology.
* is a collection of tools.
* is way of thinking about a problem.

### The Development and Implementation of Tools
In order to make new algorithms and data sources available to biologists someone needs to write applications that include these algorithms and create new databases. Often this is first done by academic research groups.
Later redone by private companies when market is large and profitable enough. There is a large gap between what is done by research groups and companies. Sometimes this is filled by large government funded projects, but not usually in time for most researchers. This is why bioinformatics and programming skills have become very valuable.

### Computer Operating Systems
Nearly everything we do in this course can be done in Windows, Apple's OS X, Linux and Unix and on most desk- or laptop computers sold in the past few years. One of the goals of this course is for you to be able to set up an environment to program and run bioinformatics tools on your own computer or the computers in your research laboratory.  Some of your course projects may involve using a high performance computing center.

### Python
Python is open source and multi-platform (e.g. Linux/GNU, Microsoft Windows, Mac OS X). Python is a popular programming language for the Bioinformatics and is also popular in other areas of biology and in engineering disciplines. Python is an interpreted language and comes with its own interpreter. Python can be used interactively inside the Python shell, and this is considered one of Python's strengths since it encourages "exploratory computing" that lets the programmer try out simple steps and algorithms before attempting to write functions and modules. Python has a handful of mature 3rd-party open source libraries, namely Numpy/Scipy for numerical operations, Cython for low-level optimization in C, IPython for interactive work, and MatPlotLib for plotting.  Here are a few tutorials and courses for learning Python.  Recently the </p>

* <a href="http://www.greenteapress.com/thinkpython/thinkpython.html" target="_blank">Think Python: How to Think Like a Computer Scientist</a>
* <a href="http://interactivepython.org/runestone/static/thinkcspy/index.html" target="_blank">Think Python: How to Think Like a Computer Scientist - Interactive Version</a>
* <a href="https://www.coursera.org/course/interactivepython" target="_blank">An Introduction to Interactive Programming in Python</a> A Coursera course
* <a href="http://www.learnpython.org/" target="_blank">LearnPython</a>
* <a href="https://software-carpentry.org/lessons/" target="_blank">Software Carpentry's Tutorials</a>
* <a href="http://intro-prog-bioinfo-2012.wikispaces.com/" target="_blank">QB3 Python Bioinformatics Course 2012</a>
* <a href="http://www.programmingforbiologists.org//" target="_blank">Ethan White's Programming for Biologists</a>

### SciPy
 <a href="http://www.scipy.org/" target="_blank">SciPy</a> (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages:</p>

* NumPy - Base N-dimensional array package
* SciPy library Fundamental library for scientific computing
* Matplotlib - Comprehensive 2D Plotting
* IPython - Enhanced Interactive Console
* Sympy - Symbolic mathematics
* Pandas - Data structures &amp; analysis

### Jupyter
Problem sets in this class will be turned in as <a href="https://jupyter.org/" target="_blank">Jupyter</a> notebooks.  Jupyter provides a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results. In this way, notebook files can serve as a complete computational record of an analysis and/or workflow. Notebooks may be exported to a range of static formats, including HTML, PDF, and slide shows and shared using email, Dropbox and Moodle.  

### Open Source Software

The Open Source movement treats program source code in a similar manner to the way scientists publish their results: publicly and open to unfettered examination and discussion. Examples include:

* Linux operating system
* Apache web server
* Firefox web browser
* Python and IPython
* R
* BioPerl, BioJava, BioPython
* EMBOSS, Bioconductor, Cytoscape, and many of the programs we will use in bioinformatics.

We can also look at the code to see how they solved they problem, what algorithms they used and even use the code in our programs as long as we properly acknowledge the source.

### Free Software

Some researchers make their software available for free, but as executables where the code is hidden. Often this software is available free to academic users, but requires a fee for commercial use. It is similar to getting a bacterial strain for free, but not being able to modify it for your research. That can be ok if the strain works for you as is.

### GitHub

GitHub - https://github.com/ has become a popular way to manage, share and view code for open source projects. You can read more about version control - http://git-scm.com/book/en/Getting-Started-About-Version-Control. Once you sign up for an account you will be able to see the introductory guide which includes (1) Setting up Git, (2) Creating repositories, (3) Forking repositories and (4) Working together.  

The tutorials created for this course will posted on GitHub. The files can be downloaded to your computer and you can modify the files to create new examples, better exercises or simply correct my typos. Since these tutorials are written using Jupyter notebooks, the GitHub files can be viewed as web pages using the Jupyter notebook viewer http://nbviewer.ipython.org/ or on your computer using Jupyter Notebook.  


## On the Computer
### Text Editors

A text editor is used to type computer programs and other documents and save the contents as files. We could just use Microsoft Word or Open Office, but word processing and text editing are very different things. Word processors, for all their other strengths, are actually surprisingly weak at text manipulation. Here are some popular free text editors. Please install one of this one your computer

#### Ubuntu (Use Synaptic Package Manager to install)

* gedit
* Kate
* Vim
* Emacs

#### Mac OSX

* <a href="https://www.barebones.com/products/bbedit/download.html" target="_blank">BBEdit</a>

#### Windows

* <a href="http://notepad-plus.sourceforge.net/uk/site.htm" target="_blank">Notepad++</a>
* <a href="http://www.vim.org/download.php" target="_blank">GVim</a>

In order to open .tar.gz files on a windows machine, you may need to install a program that can decompress these archive files. Ex. <a href="http://www.rarlab.com/rar/wrar371.exe" target="_blank"> WinRaR</a>. You can also use <a href="http://www.sfsu.edu/ftp/win/utils/" target="_blank"> PowerArchiver</a> (freeware) or <a href="http://www.7-zip.org/" target="_blank"> 7-zip</a> (freeware) or <a href="http://www.winzip.com/index.htm" target="_blank"> Winzip</a> (commercial).

### Installing Python

1. Download and install <a href="https://www.anaconda.com/download/#linux" target="_blank">Anaconda</a> which includes Python 3.6, the Scipy packages, Spyder and Jupyter



#### If you already have Python installed on your computer

Some of you may already have Python installed on your computer.  There are two slightly different versions of Python, versions 2 and 3 that are commonly used.  We will use version 3.6 in this class.   You may need to install <a href="http://www.scipy.org/" target="_blank">SciPy</a> and <a href="https://jupyter.org/" target="_blank">Jupyter</a> on your computer.



### Learning Unix or the Command Interpreter
Many bioinformatic programs do not have a user interface (e.g. drop down menus and such) and are run in command line mode. If you are working on a Linux or Apple computer follow the tutorial Session 1A on using the Shell/Terminal. If you are working on a Windows operation system follow the Command Interpreter tutorial Session 1B.

* <a href="https://github.com/jeffreyblanchard/EvoGenV5/blob/master/EvoGenV5_Lab1A.ipynb">Lab 1A - UNIX Commands for Linux and OSX </a>
* <a href="https://github.com/jeffreyblanchard/EvoGenV5/blob/master/EvoGenV5_Lab1B.ipynb">Lab 1B - MS-DOS Commands for Windows </a>

<a href="http://www.windowsreference.com/windows-xp/dos-commands-and-equivalent-linux-commands/" target="_blank">