update

isb-cgc · Dec 22, 2015 · 7934710 · 7934710
1 parent a100798
commit 7934710
Show file tree

Hide file tree

Showing 11 changed files with 211 additions and 9 deletions.
diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle
diff --git a/docs/build/doctrees/sections/programmatic-api.doctree b/docs/build/doctrees/sections/programmatic-api.doctree
diff --git a/docs/build/html/_sources/sections/programmatic-api.txt b/docs/build/html/_sources/sections/programmatic-api.txt
@@ -1,2 +1,27 @@
 Programmatic Interfaces
 =======================
+
+Programmatic access to molecular data in BigQuery, Google Cloud Storage, or Google Genomics
+is based directly on the interfaces provided by the Google Cloud Platform, as 
+illustrated throughout the ISB-CGC code repositories on github_.
+
+.. _github: https://github.com/isb-cgc
+
+In order to query the ISB-CGC metadata or to get information such as details regarding a
+cohort that a user may have saved during an interactive session, a series of APIs based 
+on Google Cloud Endpoints have been defined.  Details about these APIs as well as instructions
+on using helper scripts for the oAuth flow can be found here.
+
+Metadata API
+------------
+*Documentation currently under construction!  Please email info@isb-cgc.org if you have questions.*
+
+Cohort API
+----------
+
+User API
+--------
+
+Authorization Process
+---------------------
+
diff --git a/docs/build/html/index.html b/docs/build/html/index.html
@@ -60,10 +60,22 @@ <h1>The ISB Cancer Genomics Cloud<a class="headerlink" href="#the-isb-cancer-gen
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="sections/About-ISB-CGC.html">About the ISB Cancer Genomics Cloud</a></li>
 <li class="toctree-l1"><a class="reference internal" href="sections/Web-UI.html">Graphical Web Interface</a></li>
-<li class="toctree-l1"><a class="reference internal" href="sections/Programmatic-API.html">Programmatic Interfaces</a></li>
+<li class="toctree-l1"><a class="reference internal" href="sections/Programmatic-API.html">Programmatic Interfaces</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#metadata-api">Metadata API</a></li>
+<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#cohort-api">Cohort API</a></li>
+<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#user-api">User API</a></li>
+<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#authorization-process">Authorization Process</a></li>
+</ul>
+</li>
 <li class="toctree-l1"><a class="reference internal" href="sections/TCGA-Data.html">About the TCGA Data</a></li>
 <li class="toctree-l1"><a class="reference internal" href="sections/Reference-Data.html">Reference Data</a></li>
-<li class="toctree-l1"><a class="reference internal" href="sections/FAQ.html">Frequently Asked Questions</a></li>
+<li class="toctree-l1"><a class="reference internal" href="sections/FAQ.html">Frequently Asked Questions</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="sections/FAQ.html#isb-cgc-accounts-and-cloud-projects">ISB-CGC Accounts and Cloud Projects</a></li>
+<li class="toctree-l2"><a class="reference internal" href="sections/FAQ.html#data-access">Data Access</a></li>
+<li class="toctree-l2"><a class="reference internal" href="sections/FAQ.html#python-users">Python Users</a></li>
+<li class="toctree-l2"><a class="reference internal" href="sections/FAQ.html#r-and-bioconductor-users">R and Bioconductor Users</a></li>
+</ul>
+</li>
 <li class="toctree-l1"><a class="reference internal" href="sections/Support.html">Support</a></li>
 <li class="toctree-l1"><a class="reference internal" href="sections/Other-Useful-Links.html">Other Useful Links</a></li>
 </ul>

diff --git a/docs/build/html/searchindex.js b/docs/build/html/searchindex.js
diff --git a/docs/build/html/sections/programmatic-api.html b/docs/build/html/sections/programmatic-api.html
@@ -55,6 +55,26 @@ <h3>Navigation</h3>
 
   <div class="section" id="programmatic-interfaces">
 <h1>Programmatic Interfaces<a class="headerlink" href="#programmatic-interfaces" title="Permalink to this headline">¶</a></h1>
+<p>Programmatic access to molecular data in BigQuery, Google Cloud Storage, or Google Genomics
+is based directly on the interfaces provided by the Google Cloud Platform, as
+illustrated throughout the ISB-CGC code repositories on <a class="reference external" href="https://github.com/isb-cgc">github</a>.</p>
+<p>In order to query the ISB-CGC metadata or to get information such as details regarding a
+cohort that a user may have saved during an interactive session, a series of APIs based
+on Google Cloud Endpoints have been defined.  Details about these APIs as well as instructions
+on using helper scripts for the oAuth flow can be found here.</p>
+<div class="section" id="metadata-api">
+<h2>Metadata API<a class="headerlink" href="#metadata-api" title="Permalink to this headline">¶</a></h2>
+<p><em>Documentation currently under construction!  Please email info&#64;isb-cgc.org if you have questions.</em></p>
+</div>
+<div class="section" id="cohort-api">
+<h2>Cohort API<a class="headerlink" href="#cohort-api" title="Permalink to this headline">¶</a></h2>
+</div>
+<div class="section" id="user-api">
+<h2>User API<a class="headerlink" href="#user-api" title="Permalink to this headline">¶</a></h2>
+</div>
+<div class="section" id="authorization-process">
+<h2>Authorization Process<a class="headerlink" href="#authorization-process" title="Permalink to this headline">¶</a></h2>
+</div>
 </div>
 
 
@@ -63,6 +83,17 @@ <h1>Programmatic Interfaces<a class="headerlink" href="#programmatic-interfaces"
       </div>
       <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
         <div class="sphinxsidebarwrapper">
+  <h3><a href="../index.html">Table Of Contents</a></h3>
+  <ul>
+<li><a class="reference internal" href="#">Programmatic Interfaces</a><ul>
+<li><a class="reference internal" href="#metadata-api">Metadata API</a></li>
+<li><a class="reference internal" href="#cohort-api">Cohort API</a></li>
+<li><a class="reference internal" href="#user-api">User API</a></li>
+<li><a class="reference internal" href="#authorization-process">Authorization Process</a></li>
+</ul>
+</li>
+</ul>
+
   <h4>Previous topic</h4>
   <p class="topless"><a href="Web-UI.html"
                         title="previous chapter">Graphical Web Interface</a></p>

diff --git a/docs/source/sections/About-ISB-CGC.rst b/docs/source/sections/About-ISB-CGC.rst
@@ -1,2 +1,20 @@
 About the ISB Cancer Genomics Cloud
 ===================================
+
+The ISB-CGC provides interactive and programmatic access to the TCGA data, leveraging many 
+aspects of the Google Cloud Platform including BigQuery, Compute Engine, App Engine, Cloud
+Datalab and Google Genomics.  Open-access clinical and biospecimen information for all TCGA 
+patients and samples, combined with the Level-3 TCGA data and genomic reference and 
+platform-annotation sources are stored in BigQuery, enabling fast SQL-like queries against 
+the entire dataset.  Controlled-access DNA and RNA sequence data is available to 
+dbGaP-authorized users in the original BAM and FASTQ file formats.
+
+The ISB-CGC aims to serve the needs of a broad range of cancer researchers ranging from 
+scientists or clinicians who prefer to use an interactive web-based application to 
+access and explore the rich TCGA dataset, to computational scientists who want to write 
+their own custom scripts using languages such as R or Python, accessing the data through APIs, 
+to algorithm developers who want to spin up thousands of virtual machines to analyze hundreds 
+of terabytes of sequence data.  The ISB-CGC allows scientists to interactively define and 
+compare cohorts, examine the underlying molecular data for specific genes or pathways of 
+interest, and share insights with collaborators around the globe.  
+
diff --git a/docs/source/sections/FAQ.rst b/docs/source/sections/FAQ.rst
@@ -1,2 +1,75 @@
 Frequently Asked Questions
 ==========================
+
+ISB-CGC Accounts and Cloud Projects
+-----------------------------------
+**Do I have to request an ISB-CGC account before I can try out the web interface?**
+No, you can ust "sign in" using your Google identity at isb-cgc.FIXME.appspot.com
+
+**Where can I find the TCGA data that ISB-CGC has made publicly available in BigQuery tables?**
+The BigQuery web interface can be accessed at bigquery.cloud.google.com.  If you have not already added the ISB-CGC datasets to your BigQuery "view", click on the blue arrow
+next to your username in the left side-bar, select "Switch to Project", then "Display Project...",
+and enter "isb-cgc" (without quotes) in the text box labeled "Project ID".  All ISB-CGC public BigQuery
+datasets and tables will now be visible in the left side-bar of the BigQuery web interface.
+Note that in order to use BigQuery, you need to be a member of a Google Cloud Project.
+
+**I want to be able to run big jobs using Google Compute Engine and the TCGA data hosted by the ISB-CGC.  What should I do?**
+You will need to request a Google Cloud Project.  Please send a request to request-gcp@isb-cgc.org
+
+
+Data Access
+-----------
+**Does all TCGA data require dbGaP authorization prior to access?**
+No, generally only the low-level sequence (DNA and RNA) and SNP-array data (CEL files) require
+dbGaP authorization.  All of the "high-level" molecular data, as well as the clinical data are
+open-access and much of this has been made available in a convenient set of BigQuery tables. 
+
+**How can I apply for access to the low-level DNA sequence data?**
+In order to access the TCGA controlled-access data, you will need to apply to dbGaP_.
+
+.. _dbGaP: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?login=&page=login
+
+**I have dbGaP authorization.  How do I provide this information to the ISB-CGC platform?**
+In order for us to verify your dbGaP authorization, you first need to associate your Google identity
+(used to sign-in to the web-app) with a valid NIH login (*eg* your eRA Commons id).  After you have
+signed in, click on your avatar (next to your name in the upper-right corner) 
+and you will be taken to your account details page where you can 
+verify your dbGaP authorization.  You will be redirected to the NIH iTrust login page and after you
+successfully authenticate you will be brought back to the ISB-CGC web-app.  After you successfully
+authenticate, we will verify that you also have dbGaP authorization for the TCGA controlled-access data. 
+
+**My professor has dbGaP authorization.  Do I have to have my own authorization too?**
+Yes, your professor will need to add you as a "data downloader" to his/her dbGaP application so that you
+have your own dbGaP authorization associated with your own eRA Commons id.
+
+**I already authenticated using my eRA Commons id but now I want to use a different Google identity to
+access the ISB-CGC web-app.  Can I re-authenticate using the same eRA Commons id?**
+Yes, but you will first need to sign-in using your previous Google identity and "unlink" your eRA Commons
+id from that one before you can link it with your new Google identity.  An eRA Commons id cannot be
+associated with more than one Google identity within the ISB-CGC platform at any one time.
+
+**Can I authenticate to NIH programmatically?**  No, the current NIH authentication flow requires
+web-based authentication and must therefore be done from within the ISB-CGC web-app.  Once you have
+authenticated to NIH via the web-app, and your dbGaP authorization has been verified, the Google 
+identity associated with your account will have access to the controlled-data for 24 hours.
+
+Python Users
+------------
+**I want to write python scripts that access the TCGA data hosted by the ISB-CGC.  Do you have some 
+examples that can get me started?**  Yes, of course!  The best place to start is with our examples-Python_
+repository on github.  You can run any of those examples yourself by signing in 
+to your Google Cloud Project and deploying an instance of Google Cloud Datalab_.
+
+.. _examples-Python: https://github.com/isb-cgc/examples-Python
+.. _Datalab: https://datalab.cloud.google.com/
+
+R and Bioconductor Users
+------------------------
+**I want to use R and Bioconductor packages to work with the TCGA data.  How can I do that?**
+You can run RStudio locally or deploy a dockerized version on a Google Compute Engine VM.  You can
+find some great examples to get you started in our examples-R_ repository on github, and also in
+the documentation from the Google Genomics workshop_ at BioConductor 2015.
+
+.. _examples-R: https://github.com/isb-cgc/examples-R
+.. _workshop: http://googlegenomics.readthedocs.org/en/latest/workshops/bioc-2015.html
+
diff --git a/docs/source/sections/Other-Useful-Links.rst b/docs/source/sections/Other-Useful-Links.rst
@@ -1,2 +1,31 @@
 Other Useful Links
 ==================
+
+The ISB-CGC platform is built on top of the Google Cloud Platform and has been designed to make
+the TCGA data as accessible as possible to a wide
+range of users.  For the programmatic users, this includes *complete* access to the tools that Google
+is pioneering to allow users to scale-up their analyses on the Google infrastructure using a variety of means.
+
+The ISB-CGC documentation and the example code on github will continue to grown to provide
+starting-points and use-cases designed to suit the needs of a variety of end-users.  If you 
+have a particular use-case that has not yet been addressed, please contact us 
+(email info@isb-cgc.org) and we will work with you to determine the best approach to 
+run the analysis you have in mind. 
+
+**Cloud Datalab** is a powerful web-based interactive computational environment built on the 
+familiar IPython (now known as Jupyter) environment, running on a Google VM in your own Google Cloud Project. 
+Cloud Datalab_ allows you to combine
+SQL-like queries into the TCGA BigQuery tables with all the power of Python packages like Pandas
+and Matplotlib.  See our examples-Python_ repository on github.
+
+.. _Datalab: https://datalab.cloud.google.com/
+.. _examples-Python: https://github.com/isb-cgc/examples-Python
+
+**Google Genomics** provides tools for storing, processing, exploring, and sharing DNA sequence
+reads, reference-based alignments, and variant calls, using Google's infrastructure.  An extensive
+Cookbook_ here on Read the Docs as well as an ever-growing set of examples on github_ showcase
+some of the tools at your disposal.
+
+.. _Cookbook: https://googlegenomics.readthedocs.org/en/latest/
+.. _github: https://github.com/googlegenomics
+