Steven Roberts edited this page Nov 29, 2016 · 45 revisions

This course will teach core computing skills as well as project specific approaches. Each student will be developing and completing a research project targeting journal article submission by the end of the Quarter. There will be an emphasis on developing habits that increase automation which in turn will facilitate reproducibility. The course primary course platform will be GitHub, with each student creating their own repository.

T   3:00-4:20	FSH 213    
Th	9:30-11:20	FSH 213
Week Description Reading Quiz
zero Biology, Course Framework, Getting set-up Preface xiii-xxv; How to Learn Bioinformatics 1-18 Questions
one Bash, version control, Project Set-up Setting Up and Managing a Bioinformatics Project 21-35; Remedial Unix Shell 37-54 Questions
two Jupyter, Annotation Retrieving Bioinformatics Data 109-124, Unix Data tools 125-168 Questions
three Projects Working with Sequence Data 339-354 Questions
four RNA-seq Git for Scientists 67-83 Questions
five lncRNA miRNA Working with Remote Machines 57-66 Questions
six DNA methylation Gavery Slides Questions
seven Genome Browser Working with Alignment Data 355-383, Working with Range Data 329-338 Questions
eight Holiday :turkey: :turkey: 🌽 🍰 💻
nine SNP Bioinformatics Shell Scripting, Writing Pipelines 395-423 Questions
ten Projects

🔺 subject to change based on guest availability


Bioinformatics Data Skills: Reproducible and Robust Research with Open Source Tools
By Vince Buffalo
Publisher: O'Reilly Media
Final Release Date: July 2015
Pages: 538



  • Quizzes (10✖️3) = 30 DUE Friday Midnight
  • Project Progress (10✖️3) = 30 DUE Friday Midnight
  • Draft Product (Week 5) = 15
  • Final Product (Week 10) = 25
    ➡️ conversion

##Getting Started

###Please get an account with following online services:

###Please review these webpages:

##Computing Environments and Software This course will be taught using personal computers of students. This approach (as opposed to using virtual machines in the cloud) has its disadvantages and advantages.

Any modern laptop should work fine. There will be some analysis that we will not complete during the course (given time constraints), however students should be able to clearly understand how to carryout the analysis. We will also be introducing students to cloud based options. Generally speaking, what we will be doing is more straightforward to do on Unix based machines, (Linux and MacOSx) though we will also show students Windows-centric solutions.

Text Editors

A good text editor will be very useful. There are several built in options with nano recommended by Software Carpentry. For this course I suggest stand alone applications.


Mac OS X


Markdown Editors

We will use Markdown. Below are some recommended editors. Text editors above would also work.



  • Jupyter will work - see below.


Mac OS X

We will be using the "command-line", specifically the Bash shell. Below is information for this for different operating systems taken from the Software Carpentry website.

The Bash Shell

Bash is a commonly-used shell that gives you the power to do simple tasks more quickly.


Download the Git for Windows installer. Run the installer. Important: on the 6th page of the installation wizard (the page titled Configuring the terminal emulator...) select Use Windows' default console window. This will provide you with both Git and Bash in the Git Bash program.

Mac OS X

The default shell in all versions of Mac OS X is bash, so no need to install anything. You access bash from the Terminal (found in /Applications/Utilities). You may want to keep Terminal in your dock for this workshop.


The default shell is usually Bash, but if your machine is set up differently you can run it by opening a terminal and typing bash.

Note you should be able to run bash shell on any platform within Jupyter, once installed

GitHub Local Clients

We will be using GitHub, a Web-based Git repository hosting service. It offers distributed revision control of Git as well as adding its own features.


Formerly IPython Notebook

Installation instructions are available here. If you are new to Python and Jupyter, it is recommended you use Anaconda.

On a Mac, there is a stand alone version of the notebook - Pineapple


The newest version of BLAST+ for all operating systems is available @ ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/


R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis.

Below is information for this for different operating systems taken from the Software Carpentry website.


Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE.

Mac OS X

Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.


You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R). Also, please install the RStudio IDE.


"Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks"

Available for download @ https://github.com/arq5x/bedtools2/releases Likely only available for Linux and Mac OS


"high-performance visualization tool for interactive exploration of large, integrated genomic datasets"

To download the software you will need to register. See https://www.broadinstitute.org/software/igv/log-in.

More Software

There are a number of programs that will be used that we might not actually run at full production level during the course given time and or processor constraints. It would be fine to install these to get familar with parameters

Here is a list of free web services we will likely use during the course