BioC 2015: Where Software and Biology Connect
The properly rendered version of this document can be found at Read The Docs.
If you are reading this on github, you should instead click here.
This workshop was presented at the annual Bioconductor Developer's Conference.
Google has some pretty amazing big data computational "hammers" that they have been applying to search and video data for a long time. In this workshop we take those same hammers and apply them to whole genome sequences.
We will work with both the :doc:`/use_cases/discover_public_data/1000_genomes` reads and variants and also the :doc:`/use_cases/discover_public_data/platinum_genomes` gVCF variants.
We do this all from the comfort of the R prompt using common packages including `VariantAnnotation`_, `ggbio`_, `ggplot2`_, `dplyr`_, `bigrquery`_, and the new Bioconductor package `GoogleGenomics`_ which provides an R interface to Google's implementation of the `Global Alliance for Genomics and Health API`_.
Enable all the Google Cloud Platform APIs we will use in this workshop by clicking on this link.
To further the goals of reproducibility, ease of use, and convenience, you can run this codelab in a Bioconductor Docker container deployed to `Google Compute Engine`_. But this codelab can be run from anywhere since all the heavy lifting is happening in the cloud regardless of where R is running.
Bioconductor maintains Docker containers with R, Bioconductor packages, and RStudio Server all ready to go! Its a great way to set up your R environment quickly and start working. The instructions are below but if you want to learn more, see http://www.bioconductor.org/help/docker/.
- Click on `click-to-deploy Bioconductor`_ to navigate to the deployer page on the Cloud Platform Console.
- In field Docker Image choose item
- Click on More to display the additional form fields.
- In field Custom docker image paste in value
- Click on the Deploy Bioconductor button.
- Follow the post-deployment instructions to log into RStudioServer via your browser!
- View the workshop documentation.
- Click on "User guides, package vignettes and other documentation."
- Early on in the workshop you will need an API_KEY. You can get this by clicking on this link: https://console.cloud.google.com/project/_/apiui/credential
- Click on vignette "Bioc2015Workshop" and follow the instructions there to run the vignettes line-by-line or chunk-by-chunk!
- To run line-by-line, put your cursor on the desired line and click the "Run" button or use keyboard shortcuts for Windows/Linux:
- To run chunk-by-chunk, put your cursor in the desired chunk and click the "Chunks -> Run Current Chuck" button. or use keyboard shortcuts for Windows/Linux:
If you just want to read the rendered results of the four codelabs, here they are:
- Working with Reads
- Working with Variants
- Analyzing Variants with BigQuery
- Data Analysis using Google Genomics, also available on YouTube:
- Try these samples on different datasets :doc:`/use_cases/discover_public_data/index`.
- Find more example BigQuery queries in:
- Run a `Google Cloud Dataflow`_ pipeline: