Skip to content
Steven Roberts edited this page Dec 7, 2018 · 54 revisions

This course will teach core computing skills as well as project specific approaches. Each student will be developing and completing a research project targeting journal article submission by the end of the Quarter. There will be an emphasis on developing habits that increase automation which in turn will facilitate reproducibility. The course primary course platform will be GitHub, with each student creating their own repository.

T   3:00-4:20	FSH 213    
Th	9:30-11:20	FSH 213
Description Reading Quiz & PP Recordings
Biology, Course Framework, Getting set-up Preface xiii-xxv;
How to Learn Bioinformatics 1-18;
Roberts and Gavery (2018) Opportunities in Functional Genomics: A Primer on Lab and Computational Aspects
Questions video-Th
one Bash, version control, Project Set-up Setting Up and Managing a Bioinformatics Project 21-35;
Remedial Unix Shell 37-54
two Jupyter, Annotation Retrieving Bioinformatics Data 109-124, Unix Data tools 125-168 Questions video-Th
three Projects Working with Sequence Data 339-354 Questions video-Tues
four FastQC Git for Scientists 67-83 Questions video-Tues
five cloud resources Working with Remote Machines 57-66 Questions
six R and find_xargs Bioinformatics Shell Scripting, Writing Pipelines 395-423 Questions
seven Genome Browser Working with Alignment Data 355-383, Working with Range Data 329-338 Questions video-Tues
eight Holiday 🦃 🦃 🌽 🍰 💻 No Questions video-Tues
nine Visualize Considering best ways to summarize your effort Questions video_Tues
ten Projects Presentations Questions video_Tues

🔺 subject to change based on guest availability


Bioinformatics Data Skills: Reproducible and Robust Research with Open Source Tools
By Vince Buffalo
Publisher: O'Reilly Media
Final Release Date: July 2015
Pages: 538



  • Quizzes (10✖️3) = 30 DUE Friday Midnight Weekly
  • Project Progress (10✖️3) = 30 DUE Friday Midnight Weekly
  • Draft Product (Week 5) = 15
  • Final Product (Week 10) = 25
    ➡️ conversion

Getting Started

Please get an account with following online services:

Please review these webpages:

Computing Environments and Software

This course will be taught using personal computers of students. This approach (as opposed to using virtual machines in the cloud) has its disadvantages and advantages.

Any modern laptop should work fine. There will be some analysis that we will not complete during the course (given time constraints), however students should be able to clearly understand how to carryout the analysis. We will also be introducing students to cloud based options. Generally speaking, what we will be doing is more straightforward to do on Unix based machines, (Linux and MacOSx) though we will also show students Windows-centric solutions.

Text Editors

A good text editor will be very useful. There are several built in options with nano recommended by Software Carpentry. For this course I suggest stand alone applications.


Mac OS X


Markdown Editors

We will use Markdown. Below are some recommended editors. Text editors above would also work.



  • Jupyter will work - see below.


Mac OS X

We will be using the "command-line", specifically the Bash shell. Below is information for this for different operating systems taken from the Software Carpentry website.

The Bash Shell

Bash is a commonly-used shell that gives you the power to do simple tasks more quickly.


Download the Git for Windows installer. Run the installer. This will provide you with both Git and Bash in the Git Bash program.

Detailed Instructions

Video Tutorial
Download the Git for Windows installer.
Run the installer and follow the steps bellow:

  1. Click on "Next".
  2. Click on "Next".
  3. Keep "Use Git from the Windows Command Prompt" selected and click on "Next". If you forgot to do this programs that you need for the workshop will not work properly. If this happens rerun the installer and select the appropriate option.
  4. Click on "Next".
  5. Keep "Checkout Windows-style, commit Unix-style line endings" selected and click on "Next".
  6. Keep "Use Windows' default console window" selected and click on "Next".
  7. Click on "Install".
  8. Click on "Finish".

If your "HOME" environment variable is not set (or you don't know what this is):

  1. Open command prompt (Open Start Menu then type cmd and press [Enter])
  2. Type the following line into the command prompt window exactly as shown: `setx HOME "%USERPROFILE%"``
  3. Press [Enter], you should see `SUCCESS: Specified value was saved.``
  4. Quit command prompt by typing exit then pressing [Enter

This will provide you with both Git and Bash in the Git Bash program.

Mac OS X

The default shell in all versions of Mac OS X is bash, so no need to install anything. You access bash from the Terminal (found in /Applications/Utilities). You may want to keep Terminal in your dock for this workshop.


The default shell is usually Bash, but if your machine is set up differently you can run it by opening a terminal and typing bash.

Note you should be able to run bash shell on any platform within Jupyter, once installed

GitHub Local Clients

We will be using GitHub, a Web-based Git repository hosting service. It offers distributed revision control of Git as well as adding its own features.


Formerly IPython Notebook

Installation instructions are available here. If you are new to Python and Jupyter, it is recommended you use Anaconda.


The newest version of BLAST+ for all operating systems is available @


R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis.

Below is information for this for different operating systems taken from the Software Carpentry website.


Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE.

Mac OS X

Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.


You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R). Also, please install the RStudio IDE.

Introductory R Material

If you are new to R or would like a refresher have a look at the material for Data Science for SAFS. And check out R for Data Science by Garrett Grolemund and Hadley Wickham. A version of the textbook is available free online:


"Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks"

Available for download @ Likely only available for Linux and Mac OS


"high-performance visualization tool for interactive exploration of large, integrated genomic datasets"

To download the software you will need to register. See

Here is a list of free web services we will likely use during the course