Skip to content

Commit

Permalink
Update ETL-GDC_RNA_Seq.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
DeenaBleich committed Jan 20, 2021
1 parent 9e6067b commit e61c8f1
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions docs/source/sections/BigQuery/ETL/ETL-GDC_RNA_Seq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@ Please visit the GDC documentation for more information on the GDC mRNA Sequenci

Overview of the ISB-CGC ETL steps:

- A list of RNA seq files is created for the RNAseq tables from a table in the GDC_case_file_metadata_versioned data set
1. A list of RNA seq files is created for the RNAseq tables from a table in the GDC_case_file_metadata_versioned data set

* more information on metadata workflow here
* The version of GDC metadata used is typically the same as the GDC data release the data was released in, if another metadata table was used, it is noted in the table description

- The files are downloaded from the GDC Google Cloud Buckets using their Google Storage (GS) URL which is gathered from the GDC metadata BigQuery Tables
- Individual files are joined into one file for each gene expression count type
- The expression count data is transferred to BigQuery tables
- Extra data from GENCODE v22 and the GDC metadata tables are added to each expression count table.
2. The files are downloaded from the GDC Google Cloud Buckets using their Google Storage (GS) URL which is gathered from the GDC metadata BigQuery Tables
3. Individual files are joined into one file for each gene expression count type
4. The expression count data is transferred to BigQuery tables
5. Extra data from GENCODE v22 and the GDC metadata tables are added to each expression count table.

* Added columns are: project, case barcode and id, sample barcode and id, aliquot barcode and id, primary site, gene name, gene type, and platform

- The expression count tables are merged into one table with each sequencing count type in its own column along with the added extra data
- The table schema is updated
- The table is then reviewed and published to the isb-cgc-bq project
6. The expression count tables are merged into one table with each sequencing count type in its own column along with the added extra data
7. The table schema is updated
8. The table is then reviewed and published to the isb-cgc-bq project

0 comments on commit e61c8f1

Please sign in to comment.