Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
smrgit committed Jan 7, 2016
1 parent ee94da5 commit 705212e
Show file tree
Hide file tree
Showing 39 changed files with 217 additions and 79 deletions.
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/About-ISB-CGC.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/FAQ.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/Other-Useful-Links.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/Reference-Data.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/Support.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/TCGA-Data.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/Web-UI.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/programmatic-api.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/webapp/Cohorts.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/webapp/General-Permissions.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/webapp/IGV-Browser.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/webapp/SeqPeek.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/webapp/Sharing.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/webapp/User-Dashboard.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/sections/webapp/Visualizations.doctree
Binary file not shown.
8 changes: 5 additions & 3 deletions docs/build/html/_sources/index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ tips on how to use it, and details about the data that we are hosting on the
Google Cloud Platform.

This documentation is a work-in-progress, please let us know how we can improve
it!
it! feedback@isb-cgc.org

-- the ISB-CGC team

Contents
########
Expand All @@ -27,9 +29,9 @@ Contents
sections/TCGA_on_ISBCGC
sections/Web-UI
sections/Programmatic-API
sections/Compute-Engine
sections/Reference-Data
sections/FAQ
sections/Support
sections/Other-Useful-Links


sections/BigQuerySummaries
14 changes: 9 additions & 5 deletions docs/build/html/_sources/sections/Support.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
Contacts & Support
******************

Your Own GCP
############
Your Own GCP project
####################

To request a Google Cloud Project (GCP), please send a request to request-gcp@isb-cgc.org.
To request a Google Cloud Platform (GCP) project, please send a request to request-gcp@isb-cgc.org.

In your request, please describe your research goals in some detail, including information such as the type
of data that you plan to use (whether it is your own data that you plan to upload or
TCGA data currently hosted by the ISB-CGC), the algorithms and/or methods you plan to apply,
and an estimate of the storage and computing costs you expect to incur.
Please let us know if you have students or collaborators who will also be accessing the
same GCP. Note that if you are working as a team on a single project, you should all
use the same GCP -- if your group is large, we will take this into consideration when
same cloud project. Note that if you are working as a team on a single project, you should all
use the same cloud project -- if your group is large, we will take this into consideration when
determining your funding level.

If you have previous experience using the Google Cloud Platform, that would be
Expand All @@ -27,6 +27,10 @@ to become familiar with the platform. If you expect that you will need addition
to complete your planned research, this initial amount should be used to perform prototype
analyses and to better estimate your total costs. At that time, you may request additional funding.

Please be aware that we will be monitoring your cloud resource usage on a daily basis and will alert you as you begin
to approach your funding limit. If you exceed your allocation limit and we are not able to contact
you by email for several days, we may need to take action to shut your project down which could cause you to lose work and data.

Contact Us
##########

Expand Down
16 changes: 13 additions & 3 deletions docs/build/html/_sources/sections/TCGA-Data.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Storage (GCS_) and in BigQuery_.
The data being hosted by the ISB-CGC was obtained from the two main TCGA data
repositories:

* **TCGA DCC**: this is the TCGA Data Coordinating Center which provides a `Data Portal <https://tcga-data.nci.nih.gov/tcga/>`_ from which users may download open-access or controlled-access data. This portal provides access to all TCGA data *except* for the low-level sequence data.
* **CGHub**: this is NCI's current secure data repository for all TCGA BAM and FASTQ sequence data files.
* **TCGA DCC**: the TCGA Data Coordinating Center which provides a `Data Portal <https://tcga-data.nci.nih.gov/tcga/>`_ from which users may download open-access or controlled-access data. This portal provides access to all TCGA data *except* for the low-level sequence data.
* **CGHub**: the `Cancer Genomics Hub <https://cghub.ucsc.edu>`_ is NCI's current secure data repository for all TCGA BAM and FASTQ sequence data files.

The ISB-CGC platform is one of NCI's `Cancer Genomics Cloud Pilots <https://cbiit.nci.nih.gov/ncip/nci-cancer-genomics-cloud-pilots>`_
and our mission is to host the TCGA data in the cloud so that researchers around the world may work with the data without needing
Expand Down Expand Up @@ -65,10 +65,20 @@ samples should take this into consideration. Another example where multiple pla
expression data: most tumor samples were processed at UNC and the normalized gene-expression values are based on the RSEM method, while some tumor samples were
processed at BCGSC and the normalized gene-expression values are based on RPKM.

TCGA Data Reports
=================

A number of useful `Data Reports <https://tcga-data.nci.nih.gov/datareports/dataReportsHome.htm>`_
are available directly from TCGA. There are several different reports that you can access from that
page, including these nice dashboards:

* **Data Statistics**: this `dashboard <https://tcga-data.nci.nih.gov/datareports/statsDashboard.htm>`_ provides high-level statistics describing TCGA data content and usage.
* **Project Case Overview**: this `dashboard <https://tcga-data.nci.nih.gov/datareports/projectCaseDashboard.htm>`_ provides a high-level snapshot of TCGA project progress through the multiple phases of sample analysis.

Understanding Data Access
#########################

* **Public Data** Sometimes the word "public" is misinterpreted as meaning "open". All of the TCGA data is *public* data, but some of it is *open*, meaning that it is accessible and available to *all* users; while some TCGA data is *controlled* and restricted to authorized users.
* **Public Data** Sometimes the word "public" is misinterpreted as meaning "open". All of the TCGA data is *public* data, and much of it is *open*, meaning that it is accessible and available to *all* users; while some low-level TCGA data is *controlled* and restricted to authorized users.
* **Open-Access Data** Depending on how you categorize the data, *most* of the TCGA data is open-access data. This includes all de-identified clinical and biospecimen data, as well as all Level-3 molecular data including gene expression data, DNA methylation data, DNA copy-number data, protein expression data, somatic mutation calls, etc.
* **Controlled-Access Data** All low-level sequence data (both DNA-seq and RNA-seq), the raw SNP array data (CEL files), germline mutation calls, and a small amount of other data are treated as *controlled* data and require that a user be properly authenticated and have dbGaP-authorization prior to access these data.

72 changes: 61 additions & 11 deletions docs/build/html/_sources/sections/programmatic-api.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,78 @@ Programmatic Interfaces

Programmatic access to molecular data in BigQuery, Google Cloud Storage, or Google Genomics
is based directly on the interfaces provided by the Google Cloud Platform, as
illustrated throughout the ISB-CGC code repositories on github_.

.. _github: https://github.com/isb-cgc
illustrated throughout the
`ISB-CGC code repositories on github <https://github.com/isb-cgc>`_.

In order to query the ISB-CGC metadata or to get information such as details regarding a
cohort that a user may have saved during an interactive session, a series of APIs based
on Google Cloud Endpoints have been defined. Details about these APIs as well as instructions
on using helper scripts for the oAuth flow can be found here.

Metadata API
############
*Documentation currently under construction! Please email info@isb-cgc.org if you have questions.*
The Google
`APIs Explorer <https://apis-explorer.appspot.com/apis-explorer/?base=https://isb-cgc.appspot.com/_ah/api#p/>`_
can be used to see each API and try it out through your web browser.

Each API may bundle several endpoints that are functionally related.

Cohort API
##########
*Documentation currently under construction! Please email info@isb-cgc.org if you have questions.*

Cohorts are the primary organizing principle for subsetting and working with the TCGA data.
A cohort is a list of samples (identified using the 16-character TCGA sample barcode). Users may
create and share cohorts using the ISB-CGC web-app and then programmatically access these cohorts
using this API.

This API currently bundles several different cohort-related endpoints:

* **cohorts_list**: returns a list of all cohorts that the user has OWNER or READER access to; each cohort is identified by a unique "id" and includes other information such as "name", "comments", "last_date_saved", *etc*;

* **cohorts_patients_samples_list**: given a cohort id (required), this endpoint returns the patient_count and sample_count, as well as two lists of barcodes: one for the patients and one for the samples; (note that the number of patients can be less than the number of samples)

* **patient_details**: given a patient barcode (of length 12, *eg* TCGA-B9-7268), this endpoint returns all available information about this patient, including a list of samples and aliquots derived from this patient;

* **sample_details**: given a sample barcode (of length 16, *eg* TCGA-B9-7268-01A), this endpoint returns all available "biospecimen" information about this sample, the associated patient barcode, a list of associated aliquots, and a list of "data_details" blocks describing each of the data files associated with this sample;

* **datafilenamekey_list**: given a sample barcode (of length 16) this endpoint returns a list of GCS objects containing data from that sample;

* **delete_cohort**: given a cohort id (required), the user may programmatically delete a cohort for which they are OWNER (this can also be done interactively via the web-app);

* **save_cohort**:


Metadata API
############

This API currently bundles several different metadata-related endpoints:

* **metadata_list**: returns all metadata about each patient (aka participant) in the specified cohort (if a cohort id is given); a list of "selectors" can also be passed in if only some of the metadata is requested;

* **cohort_files**: given a cohort id, this endpoint returns the total number of files associated with that cohort, the counts according to platform, and details about each file;

* **sample_files**: given a sample barcode (of length 16) this endpoint returns the total number of files associated with that sample, the counts according to platform, and details about each file;

* **metadata_attr_list**: returns a list of the metadata attributes; each item contained in the list includes:
- attribute: a string describing the attribute (*eg* "age_at_initial_pathologic_diagnosis");
- code: indicates whether the attribute is numeric (N), binary (B), or categorical (includes strings) (C); and
- spec: indicates whether the attribute is a clinical feature associated with a patient (CLIN), a sample feature (SAMP).

User API
########
*Documentation currently under construction! Please email info@isb-cgc.org if you have questions.*

Authorization Process
#####################
*Documentation currently under construction! Please email info@isb-cgc.org if you have questions.*
This API currently contains a single endpoint:

* **am_i_dbgap_authorized**: accesses the user's Google identity and checks whether that identity is currently on the access control list (ACL) for controlled-data (which requires not only that the user have dbGaP authorization but also that the user has authenticated within the past 24 hours; returns one of two messages:
- *"You are not on the controlled-access google group."* or
- *"<user's Google identity> has dbGaP authorization and is a member of the controlled-access google group."*

Helper Scripts
##############

Two helper scripts,
`isb_auth.py <https://github.com/isb-cgc/ISB-CGC-Webapp/blob/master/scripts/isb_auth.py>`_
and
`isb_curl.py <https://github.com/isb-cgc/ISB-CGC-Webapp/blob/master/scripts/isb_curl.py>`_
are available for use from the command-line or from a python script. The first one is a wrapper
for the OAuth process, and the second can be used to send a GET or POST request with the
proper access token to the specified endpoint.

22 changes: 14 additions & 8 deletions docs/build/html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@ <h1>The ISB Cancer Genomics Cloud<a class="headerlink" href="#the-isb-cancer-gen
tips on how to use it, and details about the data that we are hosting on the
Google Cloud Platform.</p>
<p>This documentation is a work-in-progress, please let us know how we can improve
it!</p>
it! <a class="reference external" href="mailto:feedback&#37;&#52;&#48;isb-cgc&#46;org">feedback<span>&#64;</span>isb-cgc<span>&#46;</span>org</a></p>
<p>&#8211; the ISB-CGC team</p>
<div class="section" id="contents">
<h2>Contents<a class="headerlink" href="#contents" title="Permalink to this headline"></a></h2>
<div class="toctree-wrapper compound">
Expand All @@ -66,9 +67,8 @@ <h2>Contents<a class="headerlink" href="#contents" title="Permalink to this head
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="sections/TCGA_on_ISBCGC.html">TCGA Data hosted on the ISB-CGC Platform</a><ul>
<li class="toctree-l2"><a class="reference internal" href="sections/TCGA_on_ISBCGC.html#open-access-tcga-data">Open-Access TCGA Data</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/TCGA_on_ISBCGC.html#controlled-access-tcga-data">Controlled-Access TCGA Data</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/TCGA_on_ISBCGC.html#tcga-metadata">TCGA Metadata</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/TCGA_on_ISBCGC.html#tcga-data-by-access-class">TCGA Data by Access Class</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/TCGA_on_ISBCGC.html#tcga-data-by-source-repository">TCGA Data by Source Repository</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="sections/Web-UI.html">Web Interface</a><ul>
Expand All @@ -83,12 +83,13 @@ <h2>Contents<a class="headerlink" href="#contents" title="Permalink to this head
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="sections/Programmatic-API.html">Programmatic Interfaces</a><ul>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#metadata-api">Metadata API</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#cohort-api">Cohort API</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#metadata-api">Metadata API</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#user-api">User API</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#authorization-process">Authorization Process</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Programmatic-API.html#helper-scripts">Helper Scripts</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="sections/Compute-Engine.html">Using Google Compute Engine</a></li>
<li class="toctree-l1"><a class="reference internal" href="sections/Reference-Data.html">Reference Data</a><ul>
<li class="toctree-l2"><a class="reference internal" href="sections/Reference-Data.html#isb-cgc-hosted-reference-data">ISB-CGC Hosted Reference Data</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Reference-Data.html#other-reference-data-sources">Other Reference Data Sources</a></li>
Expand All @@ -102,16 +103,21 @@ <h2>Contents<a class="headerlink" href="#contents" title="Permalink to this head
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="sections/Support.html">Contacts &amp; Support</a><ul>
<li class="toctree-l2"><a class="reference internal" href="sections/Support.html#your-own-gcp">Your Own GCP</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Support.html#your-own-gcp-project">Your Own GCP project</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/Support.html#contact-us">Contact Us</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="sections/Other-Useful-Links.html">Other Useful Links</a></li>
<li class="toctree-l1"><a class="reference internal" href="sections/BigQuerySummaries.html">BigQuery Data ETL</a><ul>
<li class="toctree-l2"><a class="reference internal" href="sections/BigQuerySummaries.html#data-quality-and-general-formatting">Data Quality and General Formatting</a></li>
<li class="toctree-l2"><a class="reference internal" href="sections/BigQuerySummaries.html#data-types">Data Types</a></li>
</ul>
</li>
</ul>
</div>
<hr class="docutils" />
<div class="isbcgcfooter container">
Have feedback or corrections? All improvements to these docs are welcome! You can file an issue <a class="reference external" href="https://github.com/isb-cgc/readthedocs/issues">here</a> or email us at <a class="reference external" href="mailto:feedback&#37;&#52;&#48;isb-cgc&#46;org">feedback<span>&#64;</span>isb-cgc<span>&#46;</span>org</a>.</div>
Have feedback or corrections? You can file an issue <a class="reference external" href="https://github.com/isb-cgc/readthedocs/issues">here</a> or email us at <a class="reference external" href="mailto:feedback&#37;&#52;&#48;isb-cgc&#46;org">feedback<span>&#64;</span>isb-cgc<span>&#46;</span>org</a>.</div>
</div>
</div>

Expand Down

0 comments on commit 705212e

Please sign in to comment.