Skip to content

Commit

Permalink
Merge pull request #629 from isb-cgc/staging
Browse files Browse the repository at this point in the history
Staging
  • Loading branch information
DeenaBleich committed Sep 14, 2021
2 parents 194c142 + 6dba1cb commit 81ebea1
Show file tree
Hide file tree
Showing 11 changed files with 78 additions and 219 deletions.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions docs/source/sections/BigQueryTableSearchUI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Currently, ISB-CGC hosts open access BigQuery tables containing data for over 25


.. image:: BigQuery/BigQueryTableSearch-UI-homepage.png
:scale: 50
:align: center


Expand Down Expand Up @@ -165,6 +166,7 @@ Schema Description
For detailed table information, click on the blue plus sign (+) on the left-hand side.

.. image:: BigQuery/BigQueryTableSearchUI-descriptions.png
:scale: 50
:align: center

The following information is displayed:
Expand Down Expand Up @@ -192,8 +194,41 @@ A few rows of the data in a BigQuery table can be viewed by clicking on the **Pr


.. image:: BigQuery/BigQueryTableSearch-PreviewTableOption.png
:scale: 50
:align: center

Example Joins
++++++++++++++

The **Example Joins** column specifies the number of example SQL join queries, for the table on that row, which are provided by the BigQuery Table Search. Clicking on the number will display a list of the examples.

.. image:: BigQuery/BigQueryTableSearch-ExampleJoinList.png
:scale: 50
:align: center

The following information is displayed:

* **Join Subject** - This is the topic of the query.
* **Joined Tables** - Here, the tables which are joined with the table in the row are listed.
* **View** - The View Details button takes you to a screen which displays the SQL statement and a more detailed description of the query.

**Join Details**

Clicking on the **View** button displays the **Join Details** screen.

.. image:: BigQuery/BigQueryTableSearch-JoinDetails.png
:align: center

The following information is displayed:

* **Table identification** - Both the table name and the table Full ID are displayed.
* **Join Subject** - This is the topic of the query.
* **Description** - Here, the query is described in more detail. For instance, it will describe what kind of data is extracted.
* **Joined Tables** - Tables which are joined with the main table of interest are listed here. The table name is also a link, in case you would like to easily learn more about the joined table. Clicking on it will open up the ISB-CGC BigQuery Search in another tab, with the table information in the query results.
* **SQL Statement** - This is the SQL statement for the joined tables.
* **COPY** - Clicking this button copies the SQL Statement to your clipboard. You can then directly copy the SQL query into the Google Cloud Platform BigQuery Console, a Jupyter notebook, or anywhere that you would like. These queries can be run as they are, or you can tailor them to your needs.
* **Joined Condition** - There are the fields being joined between the tables.


Table Access in Google BigQuery
-------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,18 @@ To learn about this discovery tool created by the ISB-CGC, please visit `ISB-CGC

For more detailed information about the data stored in ISB-CGC BigQuery tables please visit `ISB-CGC BigQuery Tables <https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/BigQuery.html>`_.

*September 8, 2021*

**New Features**

On the Search results, added an “Example Joins” column. This column specifies the number of example join queries, for the table on that row, which are provided by the BigQuery Table Search.

Functionality includes:

- Click the number in the “Example Joins” column to see a list of examples.
- From there, click on View Details for a particular example to see the SQL Query and a longer description.
- On the View Details screen, click on COPY to copy the query to your clipboard.

*December 8, 2020* v1.04

**New Features**
Expand Down
11 changes: 11 additions & 0 deletions docs/source/sections/ReleaseNotes/ISB-CGCDataReleases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,17 @@ New Reactome datasets and tables added to isb-cgc-bq.
- isb-cgc-bq.reactome_versioned.pe_to_pathway_v77
- isb-cgc-bq.reactome_versioned.pathway_hierarchy_v77

Added release 28 miRNAseq isoform table and RNAseq for TCGA

**BigQuery tables created**

- isb-cgc-bq.TCGA_versioned.miRNAseq_isoform_hg38_gdc_r28
- isb-cgc-bq.TCGA_versioned.RNAseq_hg38_gdc_r28

**BigQuery tables updated**

- isb-cgc-bq.TCGA.miRNAseq_isoform_hg38_gdc_current
- isb-cgc-bq.TCGA.RNAseq_hg38_gdc_current

*August 2, 2021*

Expand Down
17 changes: 17 additions & 0 deletions docs/source/sections/ReleaseNotes/WebAppReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,23 @@
ISB-CGC Web App Release Notes
#############################

*September 8, 2021*

- On the ISB-CGC home page, the banner with the link to the survey has been removed.
- On the Tutorials for Workflow on Google Cloud page, a GeneFlow RNA-seq workflow has been added. (From the ISB-CGC home page, on the Pipelines and APIs box, click Launch. Then on the displayed Pipelines and APIs page, click the Tutorials for Workflow on Google Cloud box.)
- From the Web App Programs page, remove the ability for the user to upload their own program.
- On the Create Cohort – Filters page:

- Add a Clear All option to the Cohort Filters panel.
- Previously, if all the filters under a program name were removed, the program name remained in the Cohort Filters box. It has been changed so that if the last filter for the program is removed, the program name is also removed from the Cohort Filters panel.

- On the ISB-CGC BigQuery Table Search, on the search results, added an “Example Joins” column. This column specifies the number of example join queries, for the table on that row, which are provided by the BigQuery Table Search. Functionality includes:

- Click the number in the “Example Joins” column to see a list of examples.
- From there, click on View Details for a particular example to see the SQL Query and a longer description.
- On the View Details screen, click on COPY to copy the query to your clipboard.


*July 19, 2021*

- The Warning Notice about accessing a government website that should pop up when ISB-CGC is accessed was missing. It has been reinstated.
Expand Down
3 changes: 2 additions & 1 deletion docs/source/sections/webapp/Saved-Cohorts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ Cohort Filters Panel

This panel displays the selected filters for the cohort. Filters are listed under the program name. If you click on the program name, the screeb will change to display the information for that program.

Selecting an X beside a single filter will remove that filter. Note that you cannot removed filters once the cohort has been saved. (See Set Operations below for more ways to add or remove filters from your cohorts.)
Selecting an X beside a single filter will remove that filter. Selecting **Clear All** in the top right of the panel will remove all the filters.
Note that you cannot removed filters once the cohort has been saved. (See Set Operations below for more ways to add or remove filters from your cohorts.)

Data Set Details Panel
^^^^^^^^^^^^^
Expand Down
219 changes: 1 addition & 218 deletions docs/source/sections/webapp/program_data_upload.rst
Original file line number Diff line number Diff line change
@@ -1,225 +1,8 @@
*********
Programs
*********
Uploading your own data is a way of creating custom groupings of the samples and/or cases that you are interested in analyzing further along with the data that is already preexisting in our system, using tools that are on the system. You may frequently reuse the data that was uploaded in multiple analyses. Creating a Program allows you to do this. If you have any existing Programs with data uploaded, they will appear here for you to view, edit and share.

Upload Program Data
####################

Selecting **Upload Program Data** from the **PROGRAMS** menu dropdown displays the **Register a Google Cloud Project** screen, or if you already have a registered Google Cloud Project, it will display the **Data Upload** screen.

Or, from **Your Dashboard**, click on the **Upload Program Data** link in the **Saved Programs** panel at the bottom of the page.

If you already have Programs created, they will be listed in the **Saved Programs** panel of your dashboard. Click on the **Saved Programs** link in that panel and this will take you to a page that displays the details of your existing Programs. Alternatively, to go directly to a given Program, click on its name and you will be taken to the program details page of that program.


Registering Cloud Storage Buckets and BigQuery Datasets
=======================================================


.. _registered:

Registering a Google Cloud Storage Bucket and a BigQuery Data Set is a prerequisite for using your own data in ISB-CGC. (Please note: The names of the buckets and data sets are case sensitive.)

**How To Register Buckets and Data sets**

Once you have created a bucket and a dataset in the Google Cloud Console of your Google Cloud Project, you will need to register them with your project name using the Web App.

**Step 1**: Click on your user icon in the upper right or **Account Details** from the drop down menu under your name.


.. image:: Register_Step_1.png

**Step 2**: Click on the **View** button under **Google Cloud Projects**.


.. image:: Register_Step_2.png

**Step 3**: Click on the project you wish to use. If you have not registered a project, follow the instructions `here`_.

.. _here: ../controlled-access/Controlled-data-GCP.html

.. image:: Register_Step_3.png

**Step 4**: Use the "Register Cloud Storage Bucket" or "Register BigQuery Dataset" links to add buckets and datasets as needed.


.. image:: Register_Step_4.png


Data Upload Page
================

A New Program
-------------
To start an entirely new program, users should click on the **A New Program** tab on the Data Upload screen. This will bring up a form where a new program can be defined. Users should fill out the required fields (Program Name, Project Name) and any optional fields (Program Description, Project Description) that would be helpful. Clicking on the **Select File(S)** button will bring up a dialog to select the data file for upload.

**NOTE:** You can upload multiple files in a single step. The **Type** drop-down should be used to indicate what data type the file represents. If the data type is one of the choices besides **Other**, the file will have to conform to the specifications below. For a more complete description of the options on this page, see the `Data Upload Page Components`_ section.

Files and File Formats
**********************

.. _page:

The **Upload Program Data** uses a number of predefined file formats to get data into the system and make it available for use. The **Other/Generic** file format is the most flexible. This format assumes that the first row of the file contains the column headers and all subsequent rows contain data. The remaining file formats are all matrix formats where the first column (or columns in some data types) contain identifiers like gene or miRNA name. The first row contains sample identifiers and the "cells" contain the actual data values. Examples of the accepted matrix format files are shown below:

**NOTE:** For the matrix files, the text case matters for the required columns (lower case is different from upper case). In addition, the ISB-CGC system will not validate any identifiers such as barcodes or gene names. It is up to the user to make sure that uploaded data is correctly identified.


* DNA Methylation

This is a simple matrix file. The first column should have the header **Probe_ID**. Sample barcodes should be the headers for all remaining columns.

+-----------+-----------+----------+----------+
| Probe_ID | Barcode 1 | Barcode 2| Barcode N|
+===========+===========+==========+==========+
|Probe ID 1 | Value 1 | Value 2 | Value N |
+-----------+-----------+----------+----------+
|Probe ID 2 | Value 1 | Value 2 | Value N |
+-----------+-----------+----------+----------+
|Probe ID N | Value 1 | Value 2 | Value N |
+-----------+-----------+----------+----------+


* Gene Expression

The Gene Expression matrix file has two required columns:

* **Name**: This is the accession number for the gene.
* **Description**: This is the gene symbol for the gene.

+------------+-------------+----------+-----------+-----------+
| Name | Description | Barcode 1| Barcode 2 |Barcode N |
+============+=============+==========+===========+===========+
|Accession 1 | Gene name 1 | Value 1 | Value 2 | Value N |
+------------+-------------+----------+-----------+-----------+
|Accession 2 | Gene name 3 | Value 1 | Value 2 | Value N |
+------------+-------------+----------+-----------+-----------+
|Accession N | Gene name N | Value 1 | Value 2 | Value N |
+------------+-------------+----------+-----------+-----------+


* microRNA

There is one required and one optional column for microRNA:

* **miRNA_ID**: This is generally the ID for the miRNA_ID; required.
* **miRNA_name**: This can be used to provide alternative names for the miRNA; optional. If not present, the BigQuery data table will have **null** in this column.

+------------+-------------+----------+-----------+-----------+
| miRNA_ID | miRNA_name | Barcode 1| Barcode 2 |Barcode N |
+============+=============+==========+===========+===========+
|miRNA ID 1 | Alt name 1 | Value 1 | Value 2 | Value N |
+------------+-------------+----------+-----------+-----------+
|miRNA ID 2 | Alt name 2 | Value 1 | Value 2 | Value N |
+------------+-------------+----------+-----------+-----------+
|miRNA ID N | Alt name N | Value 1 | Value 2 | Value N |
+------------+-------------+----------+-----------+-----------+


* Protein Expression

Protein Expression has three required columns:

* **Protein_Name**: This is the name or symbol for the protein.
* **Gene_Name**: This is the name of the gene associated with the protein.
* **Gene_Id**: This is the accession number for the gene.

+--------------+-------------+-----------+-----------+-----------+-----------+
| Protein_name | Gene_Name | Gene_Id | Barcode 1 |Barcode 2 |Barcode N |
+==============+=============+===========+===========+===========+===========+
| Protein 1 | Gene Name 1 | Gene ID 1 | Value 1 | Value 2 | Value N |
+--------------+-------------+-----------+-----------+-----------+-----------+
| Protein 2 | Gene Name 2 | Gene ID 2 | Value 1 | Value 2 | Value N |
+--------------+-------------+-----------+-----------+-----------+-----------+
| Protein 3 | Gene Name 3 | Gene ID 3 | Value 1 | Value 2 | Value N |
+--------------+-------------+-----------+-----------+-----------+-----------+


* Other/Generic

Files in Other/Generic format are not matrix files, but rather have the data in columns. The order of the columns is very flexible, and the upload interface will allow users to define what kind of data is in each of the columns. The only requirement is that one, and only one, of the columns should be sample barcodes. In addition, all rows must have the same number of columns. Any completely blank columns will be flagged and should be removed. Any columns containing blank entries will have *null* used for the blanks in the BigQuery data table.

**NOTE:** Currently, each Sample Barcode can only be represented once in a file. Files with the same barcode on multiple rows will cause a failure. If you have multiple data values for a single barcode (like gene expression values for multiple genes) you will either have to create a matrix file or upload multiple files using Other/Generic.



.. image:: MouseProject.PNG

Project description and file selection
**************************************

Clicking on the **Next** button brings up a form where users will select which bucket and BigQuery data set the file upload should use. These buckets and data sets were registered_ according to the process above. The **Platform** and **Pipeline** fields can contain any useful description a user wishes to provide.

.. image:: Mouse_bucket_and_dataset.png

Lastly, the user should click on the **Upload Data** button to start the process. Users will first see a page with a message indicating their data is being processed. Refresh the screen occasionally until either the final page is displayed or an error is shown indicating a problem with loading the file. Your data is being loaded into the BigQuery table you specified earlier for this data set.

.. image:: Mouse_processing.PNG

Correcting Data Uploaded As Other
*********************************
If your data does not fit into any of the existing pre-defined matrix formats, the *Other* data type will allow users to upload data that is in a tabular format. In this format, the first row of the file is assumed to be the description of each of the columns and all subsequent rows are assumed to be data. The system will attempt to define what kind of data are in each column; however this process may not always be correct and users must review the column data type assignments before proceeding.

In the example shown below, the automated process has identified two columns as potentially containing Sample Barcodes and has further misidentified a column containing decimal data (numeric float values) as containing categorical (text) data. The user will need to correct both instances so there is only one Sample Barcode column and define the expression data as decimal.

.. image:: OtherExample.PNG

A New Project For An Existing Program
-------------------------------------
Adding a new project to an existing program follows the same steps as creating a new program. However, instead of filling out the new program information fields, users should click on the **A New Project For An Existing Program** tab and select an existing program from the drop-down menu. All other steps for describing and uploading the file will remain the same.

.. image:: MouseExisting.png



Data Upload Page Components
***************************
This section describes the features found on the Data Upload page.

**Sharing User Uploaded Programs**

This will share the web view of your uploaded program with users you select by entering the users email. The user will receive an email
message with a link to your shared uploaded program explaining that you wanted to share a program with them and that you have invited
them to join. If the email address you entered in not registered in the database, you are prompted with a message saying, "The following user emails could not be found; please ask them to log into the site first:(email entered)."


**System Data Dictionary Link**

This link goes to the System Data Dictionary which is a comprehensive list of all clinical data fields and possible values. This dictionary can be helpful in aligning metadata from the imported program to ISB-CGC data fields.


**High Level Data Files**

High level data files usually represent some level of data analysis as opposed to raw files. High level files can be used in Workbooks and visualized alongside ISB-CGC data.

**Low Level Files for API Access**

Files uploaded as low-level files for API access will not be usable in the Web App, but rather will appear in the user's Google Storage Bucket. This feature is intended for files like BAM or VCF files that contain more raw data.

**File Type**

This is the data type of the uploaded file. Currently the allowed data types are:

* Gene Expression
* miRNA Expression
* Protein Expression
* Methylation
* Other

**File Format Requirements**

All files must be tab delimited and meet the formatting requirements described in `Files and File Formats`_.

.. image:: MouseProjectAnnotated.png

Saved Programs
##############

Selecting **Saved Programs** from the **PROGRAMS** menu dropdown displays the **Programs** screen, **SAVED PROGRAMS** tab. This screen displays your saved programs and allows you to edit or delete them, as well as start a new workbook using your favorite.

Clicking on the **Upload Data** button will take you to the **Register a Google Cloud Project** screen.

The Programs screen displays information about public programs available through the Web App.

Public Programs
###############
Expand Down

0 comments on commit 81ebea1

Please sign in to comment.