-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
smrgit
committed
Dec 22, 2015
1 parent
a100798
commit 7934710
Showing
11 changed files
with
211 additions
and
9 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,27 @@ | ||
Programmatic Interfaces | ||
======================= | ||
|
||
Programmatic access to molecular data in BigQuery, Google Cloud Storage, or Google Genomics | ||
is based directly on the interfaces provided by the Google Cloud Platform, as | ||
illustrated throughout the ISB-CGC code repositories on github_. | ||
|
||
.. _github: https://github.com/isb-cgc | ||
|
||
In order to query the ISB-CGC metadata or to get information such as details regarding a | ||
cohort that a user may have saved during an interactive session, a series of APIs based | ||
on Google Cloud Endpoints have been defined. Details about these APIs as well as instructions | ||
on using helper scripts for the oAuth flow can be found here. | ||
|
||
Metadata API | ||
------------ | ||
*Documentation currently under construction! Please email info@isb-cgc.org if you have questions.* | ||
|
||
Cohort API | ||
---------- | ||
|
||
User API | ||
-------- | ||
|
||
Authorization Process | ||
--------------------- | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,20 @@ | ||
About the ISB Cancer Genomics Cloud | ||
=================================== | ||
|
||
The ISB-CGC provides interactive and programmatic access to the TCGA data, leveraging many | ||
aspects of the Google Cloud Platform including BigQuery, Compute Engine, App Engine, Cloud | ||
Datalab and Google Genomics. Open-access clinical and biospecimen information for all TCGA | ||
patients and samples, combined with the Level-3 TCGA data and genomic reference and | ||
platform-annotation sources are stored in BigQuery, enabling fast SQL-like queries against | ||
the entire dataset. Controlled-access DNA and RNA sequence data is available to | ||
dbGaP-authorized users in the original BAM and FASTQ file formats. | ||
|
||
The ISB-CGC aims to serve the needs of a broad range of cancer researchers ranging from | ||
scientists or clinicians who prefer to use an interactive web-based application to | ||
access and explore the rich TCGA dataset, to computational scientists who want to write | ||
their own custom scripts using languages such as R or Python, accessing the data through APIs, | ||
to algorithm developers who want to spin up thousands of virtual machines to analyze hundreds | ||
of terabytes of sequence data. The ISB-CGC allows scientists to interactively define and | ||
compare cohorts, examine the underlying molecular data for specific genes or pathways of | ||
interest, and share insights with collaborators around the globe. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,75 @@ | ||
Frequently Asked Questions | ||
========================== | ||
|
||
ISB-CGC Accounts and Cloud Projects | ||
----------------------------------- | ||
**Do I have to request an ISB-CGC account before I can try out the web interface?** | ||
No, you can ust "sign in" using your Google identity at isb-cgc.FIXME.appspot.com | ||
|
||
**Where can I find the TCGA data that ISB-CGC has made publicly available in BigQuery tables?** | ||
The BigQuery web interface can be accessed at bigquery.cloud.google.com. If you have not already added the ISB-CGC datasets to your BigQuery "view", click on the blue arrow | ||
next to your username in the left side-bar, select "Switch to Project", then "Display Project...", | ||
and enter "isb-cgc" (without quotes) in the text box labeled "Project ID". All ISB-CGC public BigQuery | ||
datasets and tables will now be visible in the left side-bar of the BigQuery web interface. | ||
Note that in order to use BigQuery, you need to be a member of a Google Cloud Project. | ||
|
||
**I want to be able to run big jobs using Google Compute Engine and the TCGA data hosted by the ISB-CGC. What should I do?** | ||
You will need to request a Google Cloud Project. Please send a request to request-gcp@isb-cgc.org | ||
|
||
|
||
Data Access | ||
----------- | ||
**Does all TCGA data require dbGaP authorization prior to access?** | ||
No, generally only the low-level sequence (DNA and RNA) and SNP-array data (CEL files) require | ||
dbGaP authorization. All of the "high-level" molecular data, as well as the clinical data are | ||
open-access and much of this has been made available in a convenient set of BigQuery tables. | ||
|
||
**How can I apply for access to the low-level DNA sequence data?** | ||
In order to access the TCGA controlled-access data, you will need to apply to dbGaP_. | ||
|
||
.. _dbGaP: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?login=&page=login | ||
|
||
**I have dbGaP authorization. How do I provide this information to the ISB-CGC platform?** | ||
In order for us to verify your dbGaP authorization, you first need to associate your Google identity | ||
(used to sign-in to the web-app) with a valid NIH login (*eg* your eRA Commons id). After you have | ||
signed in, click on your avatar (next to your name in the upper-right corner) | ||
and you will be taken to your account details page where you can | ||
verify your dbGaP authorization. You will be redirected to the NIH iTrust login page and after you | ||
successfully authenticate you will be brought back to the ISB-CGC web-app. After you successfully | ||
authenticate, we will verify that you also have dbGaP authorization for the TCGA controlled-access data. | ||
|
||
**My professor has dbGaP authorization. Do I have to have my own authorization too?** | ||
Yes, your professor will need to add you as a "data downloader" to his/her dbGaP application so that you | ||
have your own dbGaP authorization associated with your own eRA Commons id. | ||
|
||
**I already authenticated using my eRA Commons id but now I want to use a different Google identity to | ||
access the ISB-CGC web-app. Can I re-authenticate using the same eRA Commons id?** | ||
Yes, but you will first need to sign-in using your previous Google identity and "unlink" your eRA Commons | ||
id from that one before you can link it with your new Google identity. An eRA Commons id cannot be | ||
associated with more than one Google identity within the ISB-CGC platform at any one time. | ||
|
||
**Can I authenticate to NIH programmatically?** No, the current NIH authentication flow requires | ||
web-based authentication and must therefore be done from within the ISB-CGC web-app. Once you have | ||
authenticated to NIH via the web-app, and your dbGaP authorization has been verified, the Google | ||
identity associated with your account will have access to the controlled-data for 24 hours. | ||
|
||
Python Users | ||
------------ | ||
**I want to write python scripts that access the TCGA data hosted by the ISB-CGC. Do you have some | ||
examples that can get me started?** Yes, of course! The best place to start is with our examples-Python_ | ||
repository on github. You can run any of those examples yourself by signing in | ||
to your Google Cloud Project and deploying an instance of Google Cloud Datalab_. | ||
|
||
.. _examples-Python: https://github.com/isb-cgc/examples-Python | ||
.. _Datalab: https://datalab.cloud.google.com/ | ||
|
||
R and Bioconductor Users | ||
------------------------ | ||
**I want to use R and Bioconductor packages to work with the TCGA data. How can I do that?** | ||
You can run RStudio locally or deploy a dockerized version on a Google Compute Engine VM. You can | ||
find some great examples to get you started in our examples-R_ repository on github, and also in | ||
the documentation from the Google Genomics workshop_ at BioConductor 2015. | ||
|
||
.. _examples-R: https://github.com/isb-cgc/examples-R | ||
.. _workshop: http://googlegenomics.readthedocs.org/en/latest/workshops/bioc-2015.html | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,31 @@ | ||
Other Useful Links | ||
================== | ||
|
||
The ISB-CGC platform is built on top of the Google Cloud Platform and has been designed to make | ||
the TCGA data as accessible as possible to a wide | ||
range of users. For the programmatic users, this includes *complete* access to the tools that Google | ||
is pioneering to allow users to scale-up their analyses on the Google infrastructure using a variety of means. | ||
|
||
The ISB-CGC documentation and the example code on github will continue to grown to provide | ||
starting-points and use-cases designed to suit the needs of a variety of end-users. If you | ||
have a particular use-case that has not yet been addressed, please contact us | ||
(email info@isb-cgc.org) and we will work with you to determine the best approach to | ||
run the analysis you have in mind. | ||
|
||
**Cloud Datalab** is a powerful web-based interactive computational environment built on the | ||
familiar IPython (now known as Jupyter) environment, running on a Google VM in your own Google Cloud Project. | ||
Cloud Datalab_ allows you to combine | ||
SQL-like queries into the TCGA BigQuery tables with all the power of Python packages like Pandas | ||
and Matplotlib. See our examples-Python_ repository on github. | ||
|
||
.. _Datalab: https://datalab.cloud.google.com/ | ||
.. _examples-Python: https://github.com/isb-cgc/examples-Python | ||
|
||
**Google Genomics** provides tools for storing, processing, exploring, and sharing DNA sequence | ||
reads, reference-based alignments, and variant calls, using Google's infrastructure. An extensive | ||
Cookbook_ here on Read the Docs as well as an ever-growing set of examples on github_ showcase | ||
some of the tools at your disposal. | ||
|
||
.. _Cookbook: https://googlegenomics.readthedocs.org/en/latest/ | ||
.. _github: https://github.com/googlegenomics | ||
|
Oops, something went wrong.