-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0dd9372
commit ea42706
Showing
1 changed file
with
23 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
GDC RNA Seq Workflow | ||
========================== | ||
|
||
The data in the RNAseq_hgXX_gdc_XX tables (example: isb-cgc-bq.BEATAML1_0.RNAseq_hg38_gdc_current) is from the Genomic Data Commons. | ||
Please visit the GDC documentation for more information on the GDC mRNA Sequencing Pipeline. | ||
|
||
Overview of the ISB-CGC ETL steps: | ||
|
||
- A list of RNA seq files is created for the RNAseq tables from a table in the GDC_case_file_metadata_versioned data set | ||
|
||
* more information on metadata workflow here | ||
* The version of GDC metadata used is typically the same as the GDC data release the data was released in, if another metadata table was used, it is noted in the table description | ||
|
||
- The files are downloaded from the GDC Google Cloud Buckets using their Google Storage (GS) URL which is gathered from the GDC metadata BigQuery Tables | ||
- Individual files are joined into one file for each gene expression count type | ||
- The expression count data is transferred to BigQuery tables | ||
- Extra data from GENCODE v22 and the GDC metadata tables are added to each expression count table. | ||
|
||
* Added columns are: project, case barcode and id, sample barcode and id, aliquot barcode and id, primary site, gene name, gene type, and platform | ||
|
||
- The expression count tables are merged into one table with each sequencing count type in its own column along with the added extra data | ||
- The table schema is updated | ||
- The table is then reviewed and published to the isb-cgc-bq project |