Skip to content

Commit

Permalink
Merge pull request #718 from isb-cgc/staging
Browse files Browse the repository at this point in the history
Staging
  • Loading branch information
DeenaBleich committed Apr 28, 2023
2 parents 48417f5 + 6ba274b commit e2a4679
Show file tree
Hide file tree
Showing 3 changed files with 88 additions and 0 deletions.
14 changes: 14 additions & 0 deletions docs/source/sections/Hosted-Data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ Clinical, biospecimen and processed -omics data (such as RNASeq, etc.) are avail
- |checkmark|
- |checkmark|
- |checkmark|
* - `CDDP EAGLE <data/CDDP_EAGLE_about.html>`_
- |checkmark|
- |checkmark| *
-
* - `CGCI <data/CGCI_about.html>`_
- |checkmark|
- |checkmark|
Expand Down Expand Up @@ -75,6 +79,10 @@ Clinical, biospecimen and processed -omics data (such as RNASeq, etc.) are avail
- |checkmark|
- |checkmark|
-
* - `MATCH <data/MATCH_about.html>`_
- |checkmark|
- |checkmark| *
-
* - `MMRF <data/MMRF_about.html>`_
- |checkmark|
- |checkmark|
Expand Down Expand Up @@ -136,6 +144,7 @@ Clinical, biospecimen and processed -omics data (such as RNASeq, etc.) are avail

data/BEATAML_about
data/CCLE_top
data/CDDP_EAGLE_about
data/CGCI_about
data/CMI_about
data/CPTAC_about
Expand All @@ -144,6 +153,7 @@ Clinical, biospecimen and processed -omics data (such as RNASeq, etc.) are avail
data/FM_about
data/GENIE_about
data/HCMI_about
data/MATCH_about
data/MMRF_about
data/MP2PRT_about
data/NCICCR_about
Expand Down Expand Up @@ -176,6 +186,10 @@ PDC protein expression data are available in ISB-CGC BigQuery tables. The table
-
- |checkmark|
-
* - Broad Institute
-
- |checkmark|
-
* - `CBTN <data/CBTN_about.html>`_
-
- |checkmark|
Expand Down
37 changes: 37 additions & 0 deletions docs/source/sections/data/CDDP_EAGLE_about.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
*****************
CDDP EAGLE Data Set
*****************

About the CDDP Environment And Genetics in Lung cancer Etiology (EAGLE) Program
------------------------------------------------------------
The `Environment And Genetics in Lung cancer Etiology <https://dceg.cancer.gov/research/who-we-study/cancer-cases-controls/eagle-study>`_ (EAGLE) program investigated the genetic and environmental determinants of lung cancer and smoking persistence. It integrated analysis of genetic, environmental, clinical, and behavioral data.

About the CDDP EAGLE Data Set
---------------------------------------------------------------------
Data from the CDDP EAGLE Program are available at the `Genomics Data Commons (GDC) <https://portal.gdc.cancer.gov/>`_. It includes data from the Integrative Analysis of Lung Adenocarcinoma project.

Accessing the CDDP EAGLE Data on the Cloud
-------------------------------------------------------------------------------------------

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the ``isb-cgc-bq.GDC_case_file_metadata`` data set in BigQuery.

- To access these metadata files, go to the Google BigQuery console.
- Perform SQL queries to find the CDDP EAGLE files. Here is an example:

.. code-block:: sql
SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'CDDP_EAGLE'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the CDDP EAGLE Data in Google BigQuery
------------------------------------------------

ISB-CGC has CDDP EAGLE data, such as clinical and metadata, stored in Google BigQuery tables. Information about these tables can be found using the `ISB-CGC BigQuery Table Search <https://isb-cgc.appspot.com/bq_meta_search/>`_ with CDDP EAGLE selected for filter PROGRAM. To learn more about this tool, see the `ISB-CGC BigQuery Table Search documentation <../BigQueryTableSearchUI.html>`_.

The CDDP_EAGLE tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the `ISB-CGC BigQuery Tables documentation <../BigQuery.html>`_.

- Data set ``isb-cgc-bq.CDDP_EAGLE`` contains the latest tables for each data type.
- Data set ``isb-cgc-bq.CDDP_EAGLE_versioned`` contains previously released tables, as well as the most current table.
37 changes: 37 additions & 0 deletions docs/source/sections/data/MATCH_about.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
*****************
MATCH Data Set
*****************

About the Molecular Analysis for Therapy Choice (MATCH) Program
------------------------------------------------------------
`Molecular Analysis for Therapy Choice <https://www.cancer.gov/about-cancer/treatment/clinical-trials/nci-supported/nci-match>`_ (MATCH) is a precision medicine cancer treatment clinical trial which investigated the effectiveness of treating cancer based on the specific genetic changes in a person's tumor.

About the Molecular Analysis for Therapy Choice Data Set
---------------------------------------------------------------------
Data from the MATCH Program are available at the `Genomics Data Commons (GDC) <https://portal.gdc.cancer.gov/>`_. It includes data for over 15 primary sites and at least eight disease types.

Accessing the MATCH Data on the Cloud
-------------------------------------------------------------------------------------------

Besides accessing the files on the GDC Data Portal, you can also access them from the GDC Google Cloud Storage Bucket, which means that you don’t need to download them to perform analysis. ISB-CGC stores the cloud file locations in tables in the ``isb-cgc-bq.GDC_case_file_metadata`` data set in BigQuery.

- To access these metadata files, go to the Google BigQuery console.
- Perform SQL queries to find the MATCH files. Here is an example:

.. code-block:: sql
SELECT active.*, file_gdc_url
FROM `isb-cgc-bq.GDC_case_file_metadata.fileData_active_current` as active, `isb-cgc-bq.GDC_case_file_metadata.GDCfileID_to_GCSurl_current` as GCSurl
WHERE program_name = 'MATCH'
AND active.file_gdc_id = GCSurl.file_gdc_id
Accessing the MATCH Data in Google BigQuery
------------------------------------------------

ISB-CGC has MATCH data, such as clinical and metadata, stored in Google BigQuery tables. Information about these tables can be found using the `ISB-CGC BigQuery Table Search <https://isb-cgc.appspot.com/bq_meta_search/>`_ with MATCH selected for filter PROGRAM. To learn more about this tool, see the `ISB-CGC BigQuery Table Search documentation <../BigQueryTableSearchUI.html>`_.

The MATCH tables are in project isb-cgc-bq. To learn more about how to view and query tables in the Google BigQuery console, see the `ISB-CGC BigQuery Tables documentation <../BigQuery.html>`_.

- Data set ``isb-cgc-bq.MATCH`` contains the latest tables for each data type.
- Data set ``isb-cgc-bq.MATCH_versioned`` contains previously released tables, as well as the most current table.

0 comments on commit e2a4679

Please sign in to comment.