Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
smrgit committed Dec 22, 2015
1 parent a100798 commit 7934710
Show file tree
Hide file tree
Showing 11 changed files with 211 additions and 9 deletions.
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/sections/programmatic-api.doctree
Binary file not shown.
25 changes: 25 additions & 0 deletions docs/build/html/_sources/sections/programmatic-api.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,27 @@
Programmatic Interfaces
=======================

Programmatic access to molecular data in BigQuery, Google Cloud Storage, or Google Genomics
is based directly on the interfaces provided by the Google Cloud Platform, as
illustrated throughout the ISB-CGC code repositories on github_.

.. _github: https://github.com/isb-cgc

In order to query the ISB-CGC metadata or to get information such as details regarding a
cohort that a user may have saved during an interactive session, a series of APIs based
on Google Cloud Endpoints have been defined. Details about these APIs as well as instructions
on using helper scripts for the oAuth flow can be found here.

Metadata API
------------
*Documentation currently under construction! Please email info@isb-cgc.org if you have questions.*

Cohort API
----------

User API
--------

Authorization Process
---------------------

16 changes: 14 additions & 2 deletions docs/build/html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,22 @@ <h1>The ISB Cancer Genomics Cloud<a class="headerlink" href="#the-isb-cancer-gen
<ul>
<li class="toctree-l1"><a class="reference internal" href="sections/About-ISB-CGC.html">About the ISB Cancer Genomics Cloud</a></li>
<li class="toctree-l1"><a class="reference internal" href="sections/Web-UI.html">Graphical Web Interface</a></li>
<li class="toctree-l1"><a class="reference internal" href="sections/Programmatic-API.html">Programmatic Interfaces</a></li>
<li class="toctree-l1"><a class="reference internal" href="sections/Programmatic-API.html">Programmatic Interfaces</a><ul>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#metadata-api">Metadata API</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#cohort-api">Cohort API</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#user-api">User API</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#authorization-process">Authorization Process</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="sections/TCGA-Data.html">About the TCGA Data</a></li>
<li class="toctree-l1"><a class="reference internal" href="sections/Reference-Data.html">Reference Data</a></li>
<li class="toctree-l1"><a class="reference internal" href="sections/FAQ.html">Frequently Asked Questions</a></li>
<li class="toctree-l1"><a class="reference internal" href="sections/FAQ.html">Frequently Asked Questions</a><ul>
<li class="toctree-l2"><a class="reference internal" href="sections/FAQ.html#isb-cgc-accounts-and-cloud-projects">ISB-CGC Accounts and Cloud Projects</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/FAQ.html#data-access">Data Access</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/FAQ.html#python-users">Python Users</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/FAQ.html#r-and-bioconductor-users">R and Bioconductor Users</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="sections/Support.html">Support</a></li>
<li class="toctree-l1"><a class="reference internal" href="sections/Other-Useful-Links.html">Other Useful Links</a></li>
</ul>
Expand Down
2 changes: 1 addition & 1 deletion docs/build/html/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 31 additions & 0 deletions docs/build/html/sections/programmatic-api.html
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,26 @@ <h3>Navigation</h3>

<div class="section" id="programmatic-interfaces">
<h1>Programmatic Interfaces<a class="headerlink" href="#programmatic-interfaces" title="Permalink to this headline"></a></h1>
<p>Programmatic access to molecular data in BigQuery, Google Cloud Storage, or Google Genomics
is based directly on the interfaces provided by the Google Cloud Platform, as
illustrated throughout the ISB-CGC code repositories on <a class="reference external" href="https://github.com/isb-cgc">github</a>.</p>
<p>In order to query the ISB-CGC metadata or to get information such as details regarding a
cohort that a user may have saved during an interactive session, a series of APIs based
on Google Cloud Endpoints have been defined. Details about these APIs as well as instructions
on using helper scripts for the oAuth flow can be found here.</p>
<div class="section" id="metadata-api">
<h2>Metadata API<a class="headerlink" href="#metadata-api" title="Permalink to this headline"></a></h2>
<p><em>Documentation currently under construction! Please email info&#64;isb-cgc.org if you have questions.</em></p>
</div>
<div class="section" id="cohort-api">
<h2>Cohort API<a class="headerlink" href="#cohort-api" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="user-api">
<h2>User API<a class="headerlink" href="#user-api" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="authorization-process">
<h2>Authorization Process<a class="headerlink" href="#authorization-process" title="Permalink to this headline"></a></h2>
</div>
</div>


Expand All @@ -63,6 +83,17 @@ <h1>Programmatic Interfaces<a class="headerlink" href="#programmatic-interfaces"
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="../index.html">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Programmatic Interfaces</a><ul>
<li><a class="reference internal" href="#metadata-api">Metadata API</a></li>
<li><a class="reference internal" href="#cohort-api">Cohort API</a></li>
<li><a class="reference internal" href="#user-api">User API</a></li>
<li><a class="reference internal" href="#authorization-process">Authorization Process</a></li>
</ul>
</li>
</ul>

<h4>Previous topic</h4>
<p class="topless"><a href="Web-UI.html"
title="previous chapter">Graphical Web Interface</a></p>
Expand Down
18 changes: 18 additions & 0 deletions docs/source/sections/About-ISB-CGC.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,20 @@
About the ISB Cancer Genomics Cloud
===================================

The ISB-CGC provides interactive and programmatic access to the TCGA data, leveraging many
aspects of the Google Cloud Platform including BigQuery, Compute Engine, App Engine, Cloud
Datalab and Google Genomics. Open-access clinical and biospecimen information for all TCGA
patients and samples, combined with the Level-3 TCGA data and genomic reference and
platform-annotation sources are stored in BigQuery, enabling fast SQL-like queries against
the entire dataset. Controlled-access DNA and RNA sequence data is available to
dbGaP-authorized users in the original BAM and FASTQ file formats.

The ISB-CGC aims to serve the needs of a broad range of cancer researchers ranging from
scientists or clinicians who prefer to use an interactive web-based application to
access and explore the rich TCGA dataset, to computational scientists who want to write
their own custom scripts using languages such as R or Python, accessing the data through APIs,
to algorithm developers who want to spin up thousands of virtual machines to analyze hundreds
of terabytes of sequence data. The ISB-CGC allows scientists to interactively define and
compare cohorts, examine the underlying molecular data for specific genes or pathways of
interest, and share insights with collaborators around the globe.

73 changes: 73 additions & 0 deletions docs/source/sections/FAQ.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,75 @@
Frequently Asked Questions
==========================

ISB-CGC Accounts and Cloud Projects
-----------------------------------
**Do I have to request an ISB-CGC account before I can try out the web interface?**
No, you can ust "sign in" using your Google identity at isb-cgc.FIXME.appspot.com

**Where can I find the TCGA data that ISB-CGC has made publicly available in BigQuery tables?**
The BigQuery web interface can be accessed at bigquery.cloud.google.com. If you have not already added the ISB-CGC datasets to your BigQuery "view", click on the blue arrow
next to your username in the left side-bar, select "Switch to Project", then "Display Project...",
and enter "isb-cgc" (without quotes) in the text box labeled "Project ID". All ISB-CGC public BigQuery
datasets and tables will now be visible in the left side-bar of the BigQuery web interface.
Note that in order to use BigQuery, you need to be a member of a Google Cloud Project.

**I want to be able to run big jobs using Google Compute Engine and the TCGA data hosted by the ISB-CGC. What should I do?**
You will need to request a Google Cloud Project. Please send a request to request-gcp@isb-cgc.org


Data Access
-----------
**Does all TCGA data require dbGaP authorization prior to access?**
No, generally only the low-level sequence (DNA and RNA) and SNP-array data (CEL files) require
dbGaP authorization. All of the "high-level" molecular data, as well as the clinical data are
open-access and much of this has been made available in a convenient set of BigQuery tables.

**How can I apply for access to the low-level DNA sequence data?**
In order to access the TCGA controlled-access data, you will need to apply to dbGaP_.

.. _dbGaP: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?login=&page=login

**I have dbGaP authorization. How do I provide this information to the ISB-CGC platform?**
In order for us to verify your dbGaP authorization, you first need to associate your Google identity
(used to sign-in to the web-app) with a valid NIH login (*eg* your eRA Commons id). After you have
signed in, click on your avatar (next to your name in the upper-right corner)
and you will be taken to your account details page where you can
verify your dbGaP authorization. You will be redirected to the NIH iTrust login page and after you
successfully authenticate you will be brought back to the ISB-CGC web-app. After you successfully
authenticate, we will verify that you also have dbGaP authorization for the TCGA controlled-access data.

**My professor has dbGaP authorization. Do I have to have my own authorization too?**
Yes, your professor will need to add you as a "data downloader" to his/her dbGaP application so that you
have your own dbGaP authorization associated with your own eRA Commons id.

**I already authenticated using my eRA Commons id but now I want to use a different Google identity to
access the ISB-CGC web-app. Can I re-authenticate using the same eRA Commons id?**
Yes, but you will first need to sign-in using your previous Google identity and "unlink" your eRA Commons
id from that one before you can link it with your new Google identity. An eRA Commons id cannot be
associated with more than one Google identity within the ISB-CGC platform at any one time.

**Can I authenticate to NIH programmatically?** No, the current NIH authentication flow requires
web-based authentication and must therefore be done from within the ISB-CGC web-app. Once you have
authenticated to NIH via the web-app, and your dbGaP authorization has been verified, the Google
identity associated with your account will have access to the controlled-data for 24 hours.

Python Users
------------
**I want to write python scripts that access the TCGA data hosted by the ISB-CGC. Do you have some
examples that can get me started?** Yes, of course! The best place to start is with our examples-Python_
repository on github. You can run any of those examples yourself by signing in
to your Google Cloud Project and deploying an instance of Google Cloud Datalab_.

.. _examples-Python: https://github.com/isb-cgc/examples-Python
.. _Datalab: https://datalab.cloud.google.com/

R and Bioconductor Users
------------------------
**I want to use R and Bioconductor packages to work with the TCGA data. How can I do that?**
You can run RStudio locally or deploy a dockerized version on a Google Compute Engine VM. You can
find some great examples to get you started in our examples-R_ repository on github, and also in
the documentation from the Google Genomics workshop_ at BioConductor 2015.

.. _examples-R: https://github.com/isb-cgc/examples-R
.. _workshop: http://googlegenomics.readthedocs.org/en/latest/workshops/bioc-2015.html

29 changes: 29 additions & 0 deletions docs/source/sections/Other-Useful-Links.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,31 @@
Other Useful Links
==================

The ISB-CGC platform is built on top of the Google Cloud Platform and has been designed to make
the TCGA data as accessible as possible to a wide
range of users. For the programmatic users, this includes *complete* access to the tools that Google
is pioneering to allow users to scale-up their analyses on the Google infrastructure using a variety of means.

The ISB-CGC documentation and the example code on github will continue to grown to provide
starting-points and use-cases designed to suit the needs of a variety of end-users. If you
have a particular use-case that has not yet been addressed, please contact us
(email info@isb-cgc.org) and we will work with you to determine the best approach to
run the analysis you have in mind.

**Cloud Datalab** is a powerful web-based interactive computational environment built on the
familiar IPython (now known as Jupyter) environment, running on a Google VM in your own Google Cloud Project.
Cloud Datalab_ allows you to combine
SQL-like queries into the TCGA BigQuery tables with all the power of Python packages like Pandas
and Matplotlib. See our examples-Python_ repository on github.

.. _Datalab: https://datalab.cloud.google.com/
.. _examples-Python: https://github.com/isb-cgc/examples-Python

**Google Genomics** provides tools for storing, processing, exploring, and sharing DNA sequence
reads, reference-based alignments, and variant calls, using Google's infrastructure. An extensive
Cookbook_ here on Read the Docs as well as an ever-growing set of examples on github_ showcase
some of the tools at your disposal.

.. _Cookbook: https://googlegenomics.readthedocs.org/en/latest/
.. _github: https://github.com/googlegenomics

0 comments on commit 7934710

Please sign in to comment.