Skip to content

Commit

Permalink
Update ETL-GDC_Clinical.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
DeenaBleich committed Jan 20, 2021
1 parent f9eb569 commit 9e6067b
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions docs/source/sections/BigQuery/ETL/ETL-GDC_Clinical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The data in the clinical_gdc_XX tables (example: isb-cgc-bq:BEATAML1_0.clinical_

Overview of the ISB-CGC ETL steps:

#. For every case available from the cases endpoint, the following field groups are requested and added to a master BigQuery table:
1. For every case available from the cases endpoint, the following field groups are requested and added to a master BigQuery table:

* demographic
* diagnoses, diagnoses.treatments, diagnoses.annotations
Expand All @@ -14,10 +14,10 @@ Overview of the ISB-CGC ETL steps:
* follow_ups, follow_ups.molecular_tests
* project

#. Field descriptions are retrieved from the cases mapping endpoint, for inclusion in the final table schemas. (Some fields have null descriptions or require more detail. In this case, the column descriptions are modified after BQ table generation using a JSON definition file.)
2. Field descriptions are retrieved from the cases mapping endpoint, for inclusion in the final table schemas. (Some fields have null descriptions or require more detail. In this case, the column descriptions are modified after BQ table generation using a JSON definition file.)

#. The data contained in each field is evaluated to determine its schema type definition. (Example: not every sequence of numeric digits should be considered an INTEGER type--if the sequence contains a leading zero, as is sometimes the case for id fields, INTEGER typing would result in data loss.)
3. The data contained in each field is evaluated to determine its schema type definition. (Example: not every sequence of numeric digits should be considered an INTEGER type--if the sequence contains a leading zero, as is sometimes the case for id fields, INTEGER typing would result in data loss.)

#. Each case is mapped to its corresponding program using the GDC case metadata table. The following steps occur on a per-program basis.
4. Each case is mapped to its corresponding program using the GDC case metadata table. The following steps occur on a per-program basis.

#. Lauren’s note to self: to be continued -- flattening/deriving tables
5. Lauren’s note to self: to be continued -- flattening/deriving tables

0 comments on commit 9e6067b

Please sign in to comment.