Skip to content

Commit

Permalink
Merge pull request #568 from isb-cgc/staging
Browse files Browse the repository at this point in the history
Staging
  • Loading branch information
DeenaBleich committed May 17, 2021
2 parents 75dc1b2 + b056de7 commit 0dc3e9b
Show file tree
Hide file tree
Showing 8 changed files with 13 additions and 13 deletions.
2 changes: 1 addition & 1 deletion docs/source/sections/BigQuery/ControlledAccessVCF.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ VCF BigQuery Table

Because VCF files at the GDC contain sensitive patient information which cannot be displayed to the public, they are deemed controlled-access, meaning only authorized users can access the data. For the purposes of demonstration, we have generated a random VCF file that emulates a typical TCGA VCF file. The BigQuery table in the image below was generated using the randomized VCF file and mimics a controlled access VCF BigQuery table.

.. note:: The actual BiqQuery variant data tables are not randomized and are controlled access.
.. note:: The actual BigQuery variant data tables are not randomized and are controlled access.

The first 11 columns, seen in the image, begin just as a VCF file does. In addition to keeping a similar structure, the new table splits VCF columns such as NORMAL and TUMOR into their own individual columns. The objective of the flattened file is to bring ease and understandability to our users who have worked with VCF files in the past or who are brand new to this area of research.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/sections/BigQuery/ISBCGC-BQ-Projects.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ ISB-CGC has two open-access Google BigQuery projects. To quickly access the ISB
isb-cgc project
===============

The isb-cgc project contains all of the ISB-CGC BiqQuery tables created before July 2020.
The isb-cgc project contains all of the ISB-CGC BigQuery tables created before July 2020.

Tables in isb-cgc will be retired and labeled as deprecated as we copy them over to the new project. Table descriptions will include the new table location. Eventually they will be turned into only views (with no preview ability) to ensure that existing references will continue to work correctly. Many older tables with light usage may remain in isb-cgc and not be copied over; tables with no logged recent usage may be deleted. When using the `BigQuery Table Search UI <https://isb-cgc.appspot.com/bq_meta_search/>`_ to find these retired tables, select Status of **Deprecated**.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/sections/BigQueryTableSearchUI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -171,15 +171,15 @@ The following information is displayed:

* **Full ID** - This is the Project, Dataset ID, and Table ID concatenated with periods between them. The Full ID is used in SQL queries.
* **Dataset ID** - The BigQuery dataset of the table. A data set is a group of related tables.
* **Table ID** - The BiqQuery table ID.
* **Table ID** - The BigQuery table ID.
* **Description** - A description of the table, which includes information such as how the data was created, its source, data type, and contents.
* **Schema** - The schema displays the Field Name, Type, Mode and Field Description for each field in the table.
* **Labels** - Labels are table metadata describing the source, data type, reference genome build, status, and access of the table data.


**Copy button**

Next to the Full ID is a **Copy** button. When the user clicks this, the Full ID is copied to the clipboard. The Full ID can then be pasted into an SQL query within the BiqQuery Query editor.
Next to the Full ID is a **Copy** button. When the user clicks this, the Full ID is copied to the clipboard. The Full ID can then be pasted into an SQL query within the BigQuery Query editor.

**Open button**

Expand Down
4 changes: 2 additions & 2 deletions docs/source/sections/GDCTutorials/FromGDCtoISBCGC.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Creating a table from a GDC file manifest is remarkably easy:

**Find the file locations on the Google Cloud**

Now that you have a table containing the GDC file identifiers, the next step is to find the locations for the Level 1 files on the Google Cloud. To help with that task, ISB-CGC maintains BiqQuery tables that contain the GDC file identifier and the Google bucket location for the file in data set GDC_metadata. Adding the Google bucket location to our GDC information can be done via a simple SQL query:
Now that you have a table containing the GDC file identifiers, the next step is to find the locations for the Level 1 files on the Google Cloud. To help with that task, ISB-CGC maintains BigQuery tables that contain the GDC file identifier and the Google bucket location for the file in data set GDC_metadata. Adding the Google bucket location to our GDC information can be done via a simple SQL query:

.. code-block:: sql
Expand All @@ -51,7 +51,7 @@ Now that you have a table containing the GDC file identifiers, the next step is
Note that you'll need to replace "Your-project.GDC_Import.GDC_Kidney_File_manifest" with your project and the data set and table that you created above.

This query will return the results shown below and, as with any BiqQuery result, you can either export it as a file or save it as a new table in BigQuery.
This query will return the results shown below and, as with any BigQuery result, you can either export it as a file or save it as a new table in BigQuery.


.. image:: BQ-Results-KidneyManifestURLTable.png
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ To access controlled data programmatically, such as through Google Cloud or when
Controlled Access in the Google BigQuery Console
------------------------------------------------------

The BigQuery project "isb-cgc-cbq" contains the ISB-CGC controlled access data which is stored in BigQuery tables. To obtain access to these ISB-CGC tables within the Google BigQuery Console, you must link to them within the BiqQuery Console. Before doing so, you must have followed all the prerequisites above, including `linking your Google identity to your NIH/eRA account <controlled-access/Controlled-data-Interactive.html>`_ via the ISB-CGC Web App.
The BigQuery project "isb-cgc-cbq" contains the ISB-CGC controlled access data which is stored in BigQuery tables. To obtain access to these ISB-CGC tables within the Google BigQuery Console, you must link to them within the BigQuery Console. Before doing so, you must have followed all the prerequisites above, including `linking your Google identity to your NIH/eRA account <controlled-access/Controlled-data-Interactive.html>`_ via the ISB-CGC Web App.

When you access BigQuery from your Google Cloud Platform Console (see `here <progapi/bigqueryGUI/HowToAccessBigQueryFromTheGoogleCloudPlatform.html>`_ for more information on this), you will be presented with the following page:

Expand Down
6 changes: 3 additions & 3 deletions docs/source/sections/HowToGetStarted-Analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ filter data from one or more public data sets (such as TCGA, CCLE, and TARGET),

Cancer data analysis using Google BigQuery
##########################################################
Processed data are consolidated by data type (ex. Clinical, DNA Methylation, RNAseq, Somatic Mutation, etc.) and transformed
into ISB-CGC Google BigQuery tables for ease of access and analysis. This novel approach allows users to quickly analyze
information from thousands of patients in our curated BigQuery tables.
Processed data are consolidated by data type (ex. Clinical, DNA Methylation, RNAseq, Somatic Mutation, Protein Expression, etc.) from sources including
the Genomics Data Commons (GDC) and Proteomics Data Commons (PDC) and transformed
into ISB-CGC Google BigQuery tables. This allows users to quickly analyze information from thousands of patients in curated BigQuery tables using Structured Query Language (SQL). SQL can be used from the Google BigQuery Console but can also be embedded within Python, R and complex workflows, providing users with flexibility. The easy, yet cost effective, “burstability” of BigQuery allows you to, within minutes (as compared to days or weeks on a non-cloud based system), calculate statistical correlations across millions of combinations of data points.

.. list-table::
:widths: 60, 40
Expand Down
2 changes: 1 addition & 1 deletion docs/source/sections/PanCancer-Atlas-Mirror.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ google.com:biggene.

You can also search for and learn about Pan-Cancer Atlas tables through the `ISB-CGC BigQuery Table Search UI <https://isb-cgc.appspot.com/bq_meta_search/>`_. Type 'pancancer' in the **Search** box in the upper right-hand corner to filter for them.

Pan-Cancer Atlas BiqQuery Query Example
Pan-Cancer Atlas BigQuery Query Example
#######################################

Ready to query? Follow the steps below to run a query in the Google BigQuery Console. More details are `here <https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui>`_.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ The ISB-CGC BigQuery Table Search UI is a discovery tool that allows users to ex

Major features in the initial release include:

- The ability to search for BiqQuery tables by multiple filters:
- The ability to search for BigQuery tables by multiple filters:
- Status
- Categories
- Reference Genome Build
Expand All @@ -67,7 +67,7 @@ Major features in the initial release include:
- Table Description
- Labels
- Field Name
- Display of search results in a tabular format, with the following information about BiqQuery tables:
- Display of search results in a tabular format, with the following information about BigQuery tables:
- Dataset ID
- Table ID
- Status
Expand Down

0 comments on commit 0dc3e9b

Please sign in to comment.