Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
146 lines (89 sloc) 6.49 KB

BioC 2015: Where Software and Biology Connect

The properly rendered version of this document can be found at Read The Docs.

If you are reading this on github, you should instead click here.

This workshop was presented at the annual Bioconductor Developer's Conference.

Google has some pretty amazing big data computational "hammers" that they have been applying to search and video data for a long time. In this workshop we take those same hammers and apply them to whole genome sequences.

We will work with both the :doc:`/use_cases/discover_public_data/1000_genomes` reads and variants and also the :doc:`/use_cases/discover_public_data/platinum_genomes` gVCF variants.

We do this all from the comfort of the R prompt using common packages including `VariantAnnotation`_, `ggbio`_, `ggplot2`_, `dplyr`_, `bigrquery`_, and the new Bioconductor package `GoogleGenomics`_ which provides an R interface to Google's implementation of the `Global Alliance for Genomics and Health API`_.

And we'll do this in a reproducible fashion running RMarkdown files via `Dockerized Bioconductor`_ running on `Google Compute Engine`_ VMs!

Get Started with Google Cloud Platform

Create a Google Cloud Platform project

Enable APIs

Enable all the Google Cloud Platform APIs we will use in this workshop by clicking on this link.

Install gcloud

Set up Bioconductor

To further the goals of reproducibility, ease of use, and convenience, you can run this codelab in a Bioconductor Docker container deployed to `Google Compute Engine`_. But this codelab can be run from anywhere since all the heavy lifting is happening in the cloud regardless of where R is running.

Bioconductor maintains Docker containers with R, Bioconductor packages, and RStudio Server all ready to go! Its a great way to set up your R environment quickly and start working. The instructions are below but if you want to learn more, see http://www.bioconductor.org/help/docker/.

  1. Click on `click-to-deploy Bioconductor`_ to navigate to the deployer page on the Cloud Platform Console.
  2. In field Docker Image choose item custom.
  3. Click on More to display the additional form fields.
  4. In field Custom docker image paste in value gcr.io/bioc_2015/devel_sequencing.
  5. Click on the Deploy Bioconductor button.
  6. Follow the post-deployment instructions to log into RStudioServer via your browser!
If you prefer to run this docker container locally, click here to Show/Hide Instructions
If you prefer to setup R manually instead, click here to Show/Hide Instructions

Run the Codelabs

  1. View the workshop documentation.
help(package="GoogleGenomicsBioc2015Workshop")
  1. Click on "User guides, package vignettes and other documentation."
  2. Early on in the workshop you will need an API_KEY. You can get this by clicking on this link: https://console.cloud.google.com/project/_/apiui/credential
  3. Click on vignette "Bioc2015Workshop" and follow the instructions there to run the vignettes line-by-line or chunk-by-chunk!
  • To run line-by-line, put your cursor on the desired line and click the "Run" button or use keyboard shortcuts for Windows/Linux: Ctrl+Enter and Mac: Command+Enter.
  • To run chunk-by-chunk, put your cursor in the desired chunk and click the "Chunks -> Run Current Chuck" button. or use keyboard shortcuts for Windows/Linux: Ctrl+Alt+C and Mac: Command+Option+C.
Run Rmarkdown

If you just want to read the rendered results of the four codelabs, here they are:

Looking for more to try?

Wrap up

"Stop" or "Delete" your VM

Stay Involved