Skip to content
This repository has been archived by the owner on Oct 29, 2023. It is now read-only.

Latest commit

 

History

History
158 lines (101 loc) · 7.82 KB

bioc-2015.rst

File metadata and controls

158 lines (101 loc) · 7.82 KB

Note: Google Genomics is now Cloud Life Sciences. The Google Genomics Cookbook on Read the Docs is not actively maintained and may contain incorrect or outdated information. The cookbook is only available for historical reference. For the most up to date documentation, view the official Cloud Life Sciences documentation at https://cloud.google.com/life-sciences.

Also note that much of the Genomics v1 API surface has been superseded by Variant Transforms and htsget.

The properly rendered version of this document can be found at Read The Docs.

If you are reading this on github, you should instead click here.

This workshop was presented at the annual Bioconductor Developer's Conference.

Google has some pretty amazing big data computational "hammers" that they have been applying to search and video data for a long time. In this workshop we take those same hammers and apply them to whole genome sequences.

We will work with both the :doc:`/use_cases/discover_public_data/1000_genomes` reads and variants and also the :doc:`/use_cases/discover_public_data/platinum_genomes` gVCF variants.

We do this all from the comfort of the R prompt using common packages including `VariantAnnotation`_, `ggbio`_, `ggplot2`_, `dplyr`_, `bigrquery`_, and the new Bioconductor package `GoogleGenomics`_ which provides an R interface to Google's implementation of the `Global Alliance for Genomics and Health API`_.

And we'll do this in a reproducible fashion running RMarkdown files via `Dockerized Bioconductor`_ running on `Google Compute Engine`_ VMs!

Enable all the Google Cloud Platform APIs we will use in this workshop by clicking on this link.

To further the goals of reproducibility, ease of use, and convenience, you can run this codelab in a Bioconductor Docker container deployed to `Google Compute Engine`_. But this codelab can be run from anywhere since all the heavy lifting is happening in the cloud regardless of where R is running.

Bioconductor maintains Docker containers with R, Bioconductor packages, and RStudio Server all ready to go! Its a great way to set up your R environment quickly and start working. The instructions are below but if you want to learn more, see http://www.bioconductor.org/help/docker/.

  1. Click on `click-to-deploy Bioconductor`_ to navigate to the deployer page on the Cloud Platform Console.
  2. In field Docker Image choose item custom.
  3. Click on More to display the additional form fields.
  4. In field Custom docker image paste in value gcr.io/bioc_2015/devel_sequencing.
  5. Click on the Deploy Bioconductor button.
  6. Follow the post-deployment instructions to log into RStudioServer via your browser!
If you prefer to run this docker container locally, click here to Show/Hide Instructions
If you prefer to setup R manually instead, click here to Show/Hide Instructions
  1. View the workshop documentation.
help(package="GoogleGenomicsBioc2015Workshop")
  1. Click on "User guides, package vignettes and other documentation."
  2. Early on in the workshop you will need an API_KEY. You can get this by clicking on this link: https://console.cloud.google.com/project/_/apiui/credential
  3. Click on vignette "Bioc2015Workshop" and follow the instructions there to run the vignettes line-by-line or chunk-by-chunk!
  • To run line-by-line, put your cursor on the desired line and click the "Run" button or use keyboard shortcuts for Windows/Linux: Ctrl+Enter and Mac: Command+Enter.
  • To run chunk-by-chunk, put your cursor in the desired chunk and click the "Chunks -> Run Current Chuck" button. or use keyboard shortcuts for Windows/Linux: Ctrl+Alt+C and Mac: Command+Option+C.
Run Rmarkdown

If you just want to read the rendered results of the four codelabs, here they are: