-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
7bbe5ee
commit d12b039
Showing
1 changed file
with
34 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
How ISB-CGC BigQuery Tables are Created | ||
======================================= | ||
|
||
The ISB-CGC team extracts, transforms, and loads data from cancer data repositories into Google BigQuery tables to make it easier to access for analysis. | ||
|
||
Extract, Transform and Load (ETL) process overview | ||
--------------------------------------------------- | ||
|
||
The process differs slightly based on the source of the data (GDC, PDC, GENCODE, etc.) and the data type (RNA sequencing, Somatic Mutation, etc.) but generally, | ||
data are either gathered by using an Application Programming Interface (API) provided by the source for this purpose or by accessing files provided by the source. | ||
|
||
Data for each program are consolidated by data type (ex. Clinical, DNA Methylation, RNAseq, Somatic Mutation, etc.) and transformed into ISB-CGC Google BigQuery tables. | ||
This novel approach allows our users to quickly analyze information from thousands of patients in our curated BigQuery tables. | ||
|
||
ISB-CGC Workflow Components | ||
+++++++++++++++++++++++++++ | ||
|
||
Each workflow is made up of the following files: | ||
|
||
- YAML file (configuration file) | ||
- Python files (extracts, transforms, and loads data) | ||
- shell script file (runs the workflow) | ||
|
||
GDC Workflows | ||
------------- | ||
|
||
|
||
|
||
PDC Workflows | ||
------------- | ||
|
||
|
||
Other Workflows | ||
--------------- |