This course will teach core computing skills as well as project specific approaches. Each student will be developing and completing a research project targeting journal article submission by the end of the Quarter. There will be an emphasis on developing habits that increase automation which in turn will facilitate reproducibility. The course primary course platform will be GitHub, with each student creating their own repository.
T 3:00-4:20 FSH 213 Th 9:30-11:20 FSH 213
|Description||Reading||Quiz & PP||Recordings|
||Biology, Course Framework, Getting set-up||Preface xiii-xxv;
How to Learn Bioinformatics 1-18;
Roberts and Gavery (2018) Opportunities in Functional Genomics: A Primer on Lab and Computational Aspects
|one||Bash, version control, Project Set-up||Setting Up and Managing a Bioinformatics Project 21-35;
Remedial Unix Shell 37-54
|two||Jupyter, Annotation||Retrieving Bioinformatics Data 109-124, Unix Data tools 125-168||Questions||video-Th|
|three||Projects||Working with Sequence Data 339-354||Questions||
|four||FastQC||Git for Scientists 67-83||Questions||
|five||cloud resources||Working with Remote Machines 57-66||Questions||
|six||R and find_xargs||Bioinformatics Shell Scripting, Writing Pipelines 395-423||Questions||
|seven||Genome Browser||Working with Alignment Data 355-383, Working with Range Data 329-338||Questions||video-Tues|
|eight||Holiday||🌽 🍰 💻||No Questions||video-Tues|
|nine||Visualize||Considering best ways to summarize your effort||Questions||
🔺 subject to change based on guest availability
Bioinformatics Data Skills: Reproducible and Robust Research with Open Source Tools
By Vince Buffalo
Publisher: O'Reilly Media
Final Release Date: July 2015
- Quizzes (10✖️3) = 30 DUE Friday Midnight Weekly
- Project Progress (10✖️3) = 30 DUE Friday Midnight Weekly
- Draft Product (Week 5) = 15
- Final Product (Week 10) = 25
Please get an account with following online services:
Please review these webpages:
Computing Environments and Software
This course will be taught using personal computers of students. This approach (as opposed to using virtual machines in the cloud) has its disadvantages and advantages.
Any modern laptop should work fine. There will be some analysis that we will not complete during the course (given time constraints), however students should be able to clearly understand how to carryout the analysis. We will also be introducing students to cloud based options. Generally speaking, what we will be doing is more straightforward to do on Unix based machines, (Linux and MacOSx) though we will also show students Windows-centric solutions.
A good text editor will be very useful. There are several built in options with nano recommended by Software Carpentry. For this course I suggest stand alone applications.
Mac OS X
We will use Markdown. Below are some recommended editors. Text editors above would also work.
- Jupyter will work - see below.
Mac OS X
We will be using the "command-line", specifically the Bash shell. Below is information for this for different operating systems taken from the Software Carpentry website.
The Bash Shell
Bash is a commonly-used shell that gives you the power to do simple tasks more quickly.
Download the Git for Windows installer. Run the installer. This will provide you with both Git and Bash in the Git Bash program.
- Click on "Next".
- Click on "Next".
- Keep "Use Git from the Windows Command Prompt" selected and click on "Next". If you forgot to do this programs that you need for the workshop will not work properly. If this happens rerun the installer and select the appropriate option.
- Click on "Next".
- Keep "Checkout Windows-style, commit Unix-style line endings" selected and click on "Next".
- Keep "Use Windows' default console window" selected and click on "Next".
- Click on "Install".
- Click on "Finish".
If your "HOME" environment variable is not set (or you don't know what this is):
- Open command prompt (Open Start Menu then type
cmdand press [Enter])
- Type the following line into the command prompt window exactly as shown: `setx HOME "%USERPROFILE%"``
- Press [Enter], you should see `SUCCESS: Specified value was saved.``
- Quit command prompt by typing
exitthen pressing [Enter
This will provide you with both Git and Bash in the Git Bash program.
Mac OS X
The default shell in all versions of Mac OS X is bash, so no need to install anything. You access bash from the Terminal (found in
/Applications/Utilities). You may want to keep Terminal in your dock for this workshop.
The default shell is usually Bash, but if your machine is set up differently you can run it by opening a terminal and typing
Note you should be able to run bash shell on any platform within Jupyter, once installed
GitHub Local Clients
We will be using GitHub, a Web-based Git repository hosting service. It offers distributed revision control of Git as well as adding its own features.
- GitHub Desktop is available for Mac and Windows
Formerly IPython Notebook
The newest version of BLAST+ for all operating systems is available @ ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis.
Below is information for this for different operating systems taken from the Software Carpentry website.
Mac OS X
You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run
sudo apt-get install r-base and for Fedora run
sudo yum install R). Also, please install the RStudio IDE.
Introductory R Material
If you are new to R or would like a refresher have a look at the material for Data Science for SAFS. And check out R for Data Science by Garrett Grolemund and Hadley Wickham. A version of the textbook is available free online: http://r4ds.had.co.nz/
"Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks"
Available for download @ https://github.com/arq5x/bedtools2/releases Likely only available for Linux and Mac OS
"high-performance visualization tool for interactive exploration of large, integrated genomic datasets"
To download the software you will need to register. See https://www.broadinstitute.org/software/igv/log-in.
Here is a list of free web services we will likely use during the course