This Notebook shows how to consolidate the Gen3 graph data into a single Terra 
table using functions from the `terra_data_utils` Notebook.

Ensure that a recent version of `firecloud` is installed.  
The version must be 0.16.23 or later for flexible entity support.

In [1]:
# ! pip install --upgrade firecloud
# ! pip show firecloud

Import the terra_data_util Notebook to make those functions available in this Notebook.  
Note: The warning message shown below is new with the recent Notebook Runtime upgrade, it is not due to these Notebooks per se.

In [2]:
%run terra_data_util.ipynb

To test with data in a different workspace than the one that contains this Notebook, specify remote workspace information below. This enables convenient testing of data for multiple different projects/cohorts using this same Notebook in the current workspace.

In [3]:
# os.environ['GOOGLE_PROJECT'] = os.environ['WORKSPACE_NAMESPACE'] = "anvil-stage-demo"
# os.environ['WORKSPACE_NAME']="mbaumann terra_data_util test Amish"

In [4]:
# Set and verify the Google billing project environment variable
BILLING_PROJECT_ID = os.environ['GOOGLE_PROJECT']
BILLING_PROJECT_ID

'anvil-stage-demo'

In [5]:
# Set and verify the Workspace name
WORKSPACE = os.environ['WORKSPACE_NAME']
WORKSPACE

'mbaumann terra_data_util test Amish'

Define the name of the consolidated data table, then create the consolidated table.
If the consolidated data table already exists, it will be updated with any new or modified data from the Gen3 data tables. Any additions that have been made to the consolidated table, such as workflow outputs, will be preserved.

In [6]:
consolidated_table_name = "consolidated_metadata"

In [7]:
consolidate_gen3_geno_pheno_tables(consolidated_table_name)

The consolidated data frame size is: 930 rows x 234 columns
The consolidated data table consolidated_metadata size is: 930 rows x 234 columns


Load the consolidated table into a Pandas dataframe, then review what it contains.  

In [8]:
consolidated_df = get_terra_table_to_df(BILLING_PROJECT_ID, WORKSPACE, consolidated_table_name)
consolidated_df

Unnamed: 0,entity:consolidated_metadata_id,aligned_reads_index_created_datetime,aligned_reads_index_data_category,aligned_reads_index_data_format,aligned_reads_index_data_type,aligned_reads_index_eid,aligned_reads_index_file_name,aligned_reads_index_file_size,aligned_reads_index_file_state,aligned_reads_index_md5sum,...,submitted_aligned_reads_file_state,submitted_aligned_reads_md5sum,submitted_aligned_reads_object_id,submitted_aligned_reads_project_id,submitted_aligned_reads_state,submitted_aligned_reads_submitter_id,submitted_aligned_reads_updated_datetime,medical_history_age_at_cac_score,medical_history_cac_score,medical_history_unit_cac_score
0,DBG00001,2019-10-30T16:23:28.116537+00:00,Sequencing Data,CRAI,Aligned Reads Index,237fb2f1-25f5-4279-a811-1d2020caa2bf,NWD256836.b38.irc.v1.cram.crai,1362986,registered,56fe7506dc8f61ec871a081ee2079f45,...,registered,c013542e112077a031b851eb4afa5ce0,drs://dg.4503/31c651b6-7317-4540-bd58-187573cd...,topmed-Amish_HMB-IRB-MDS,validated,NWD256836-cram,2019-10-30T16:21:09.589184+00:00,,,
1,DBG00002,2019-10-30T16:23:09.390143+00:00,Sequencing Data,CRAI,Aligned Reads Index,64644ab9-558c-4e6f-9695-fa7c2b09ea2d,NWD129114.b38.irc.v1.cram.crai,1762913,registered,c061f0512d06b3b1f1cb572d6e119a86,...,registered,43db97f081a21c026da325a768ad0a9f,drs://dg.4503/044e9b96-ebec-4b7e-9141-ad2f4819...,topmed-Amish_HMB-IRB-MDS,validated,NWD129114-cram,2019-10-30T16:20:51.021292+00:00,,,
2,DBG00003,2019-10-30T16:24:28.452069+00:00,Sequencing Data,CRAI,Aligned Reads Index,f6946616-1f1c-42e7-8c9a-9f6d0f1734b3,NWD652420.b38.irc.v1.cram.crai,1313396,registered,765d099358b2d87f7c870e0669e5d269,...,registered,a48cd816c8c6b1ba4faf54a352c2ab3c,drs://dg.4503/3ccd50c8-dd0c-4bb3-971f-0ea05d94...,topmed-Amish_HMB-IRB-MDS,validated,NWD652420-cram,2019-10-30T16:22:08.795401+00:00,,,
3,DBG00004,2019-10-30T16:23:31.786935+00:00,Sequencing Data,CRAI,Aligned Reads Index,6d4bc1f3-d39e-432d-a978-ee75d7d8b7f1,NWD265950.b38.irc.v1.cram.crai,1260058,registered,be2e97fdf97baaf95dfb0bd5f5a07dc6,...,registered,ee53827c53276789083cd60d8df55b5a,drs://dg.4503/9b7528a2-e2c8-410c-9a7c-60f4b113...,topmed-Amish_HMB-IRB-MDS,validated,NWD265950-cram,2019-10-30T16:21:13.005889+00:00,,,
4,DBG00009,2019-10-30T16:24:39.598492+00:00,Sequencing Data,CRAI,Aligned Reads Index,b3318301-0c0a-4a97-81c5-b561c98d169a,NWD726988.b38.irc.v1.cram.crai,1158901,registered,0a797ff8f5c44377eb4b71e12838f086,...,registered,2438f39abd010e733409e797840733d6,drs://dg.4503/758bd559-6cbf-4820-bb5b-f17d752e...,topmed-Amish_HMB-IRB-MDS,validated,NWD726988-cram,2019-10-30T16:22:20.480628+00:00,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
925,DBG01123,2019-10-30T16:24:47.435205+00:00,Sequencing Data,CRAI,Aligned Reads Index,ec86b39a-82c1-4da6-a598-5e61e9855ac7,NWD781301.b38.irc.v1.cram.crai,1584280,registered,d7d58d40810119560bd9a1f5cb7c7081,...,registered,17c8519271fd77ec8e4f73633a406333,drs://dg.4503/0b7ff5ef-8be4-4084-809c-618ee41a...,topmed-Amish_HMB-IRB-MDS,validated,NWD781301-cram,2019-10-30T16:22:27.870544+00:00,,,
926,DBG01125,2019-10-30T16:23:39.540538+00:00,Sequencing Data,CRAI,Aligned Reads Index,151eba7a-7150-425a-a140-c3d72b61a02b,NWD309240.b38.irc.v1.cram.crai,1645845,registered,b260159fc401100069f4ea6f686cb6be,...,registered,0f2b6aa8b0511d49a2c27aff74ea5f1c,drs://dg.4503/51a12b06-5a99-455b-be6f-1fec3d88...,topmed-Amish_HMB-IRB-MDS,validated,NWD309240-cram,2019-10-30T16:21:20.423206+00:00,61.0,32.950001,Amish
927,DBG01127,2019-10-30T16:24:05.832668+00:00,Sequencing Data,CRAI,Aligned Reads Index,c382a264-f1fa-44ae-9541-441a93386e40,NWD506693.b38.irc.v1.cram.crai,1287353,registered,3a747cc15a1fadb2ae28bdcb340a98ea,...,registered,8fe7548ae32593c3e3e317fb8bbdbbf6,drs://dg.4503/d3e451d8-f21e-4ce4-a159-1fb84954...,topmed-Amish_HMB-IRB-MDS,validated,NWD506693-cram,2019-10-30T16:21:45.897775+00:00,61.0,4.120000,Amish
928,DBG01128,2019-10-30T16:24:59.654956+00:00,Sequencing Data,CRAI,Aligned Reads Index,e9f45d3b-bc60-47cc-b2dc-9afc841f3040,NWD844528.b38.irc.v1.cram.crai,1567704,registered,2c4ebebb4a30b8360ca9b25e1f855597,...,registered,4c55813b8d790694668ffa1e8062ba33,drs://dg.4503/76a52da8-48b0-4176-8f6d-3d68b9e6...,topmed-Amish_HMB-IRB-MDS,validated,NWD844528-cram,2019-10-30T16:22:39.000033+00:00,,,


Names and number of columns:

In [9]:
consolidated_df.columns

Index(['entity:consolidated_metadata_id',
       'aligned_reads_index_created_datetime',
       'aligned_reads_index_data_category', 'aligned_reads_index_data_format',
       'aligned_reads_index_data_type', 'aligned_reads_index_eid',
       'aligned_reads_index_file_name', 'aligned_reads_index_file_size',
       'aligned_reads_index_file_state', 'aligned_reads_index_md5sum',
       ...
       'submitted_aligned_reads_file_state', 'submitted_aligned_reads_md5sum',
       'submitted_aligned_reads_object_id',
       'submitted_aligned_reads_project_id', 'submitted_aligned_reads_state',
       'submitted_aligned_reads_submitter_id',
       'submitted_aligned_reads_updated_datetime',
       'medical_history_age_at_cac_score', 'medical_history_cac_score',
       'medical_history_unit_cac_score'],
      dtype='object', length=234)

In [10]:
len(consolidated_df.columns)

234

Number of rows

In [11]:
len(consolidated_df.index)

930

(Optionally) delete the consolidated data table that was created above.
This can take a few minutes.

In [None]:
delete_terra_table(BILLING_PROJECT_ID, WORKSPACE, consolidated_table_name)