This notebook explores the various DRS Ids acquired by the same file as it is imported into different Seven Bridges platforms

In [2]:
import json

from fasp.loc import sbcgcDRSClient
drsClient = sbcgcDRSClient('/Users/forei/.keys/sevenbridges_keys.json', 's3')
drs_id ='62765a94f26c9351737fa8cc'
print(json.dumps(drsClient.getObject(drs_id),indent=3))

{
   "id": "62765a94f26c9351737fa8cc",
   "name": "HG00445.test.cram",
   "size": 3971394,
   "checksums": [
      {
         "type": "etag",
         "checksum": "6d5a0c75eae2b7ff704e61a8d07b931e-1"
      }
   ],
   "self_uri": "drs://cgc-ga4gh-api.sbgenomics.com/62765a94f26c9351737fa8cc",
   "created_time": "2022-05-07T11:40:04Z",
   "updated_time": "2022-05-07T11:40:04Z",
   "mime_type": "application/json",
   "access_methods": [
      {
         "type": "s3",
         "region": "us-east-1",
         "access_id": "aws-us-east-1"
      }
   ]
}


This the same file imported to the CGC via the DRS import capability. Note that it has acquired a different DRS id. That DRS id is used in computes initiated on the file.

In [4]:
sb_imported_drs_drs_id ='626c1775f26c93517368c97c'
resp = drsClient.getObject(sb_imported_drs_drs_id)
print(json.dumps(drsClient.getObject(sb_imported_drs_drs_id),indent=3))
#print(resp.raise_for_status())

HTTPError: 401 Client Error: Unauthorized for url: https://cgc-ga4gh-api.sbgenomics.com/ga4gh/drs/v1/objects/626c1775f26c93517368c97c

In [8]:
drs_id ='62765a94f26c9351737fa8c'
print(json.dumps(drsClient.getObject(drs_id),indent=3))

HTTPError: 421 Client Error:  for url: https://cgc-ga4gh-api.sbgenomics.com/ga4gh/drs/v1/objects/62765a94f26c9351737fa8c

Show the URL obtained to access the file

In [19]:
url = drsClient.getAccessURL(sb_imported_drs_drs_id)
#print the url without any of the token information
print(url.split('?')[0])

https://nih-nhlbi-biodata-catalyst-1000-genomes-high-coverage.s3.amazonaws.com/CCDG_13607/Project_CCDG_13607_B01_GRM_WGS.cram.2019-02-06/Sample_HG00445/analysis/HG00445.final.cram


### The same file on the source platform (BioDataCatalyst)
What does this file look like on the original platform? TopMed
The Biodata Catalyst DRS id for the 'imported' file is 626c079e645ccb7324c671d1

We can access it directly from BioDataCatalyst as follows

In [5]:
from fasp.loc import sbbdcDRSClient
bdc_drs_id ='626c079e645ccb7324c671d1'
bdc_drsClient = sbbdcDRSClient('/Users/forei/.keys/sevenbridges_keys.json', 's3')
bdc_drsClient.getObject(bdc_drs_id)

HTTPError: 403 Client Error: Forbidden for url: https://ga4gh-api.sb.biodatacatalyst.nhlbi.nih.gov/ga4gh/drs/v1/objects/626c079e645ccb7324c671d1

In [15]:
url = bdc_drsClient.getAccessURL(bdc_drs_id)
#print the url without any of the token information
print(url.split('?')[0])

https://nih-nhlbi-biodata-catalyst-1000-genomes-high-coverage.s3.amazonaws.com/CCDG_13607/Project_CCDG_13607_B01_GRM_WGS.cram.2019-02-06/Sample_HG00445/analysis/HG00445.final.cram


Can we run the compute off CGC using WES and passing the BDC DRS id?
Worth checking.

In [24]:
outpath='/Users/forei/Desktop/txt/test_bam'
with open(outpath, 'w') as f:
    f.write('Hello World')

In [6]:
bdc_drs_id = '617c7ce1e6261a31b6d12f64'
bdc_drsClient.getObject(bdc_drs_id)

HTTPError: 403 Client Error: Forbidden for url: https://ga4gh-api.sb.biodatacatalyst.nhlbi.nih.gov/ga4gh/drs/v1/objects/617c7ce1e6261a31b6d12f64

### DRS id without name


#### Without the prefix

In [27]:
from fasp.loc import crdcDRSClient

gen3_drs = crdcDRSClient(api_key_path='~/.keys/crdc_credentials.json', debug=True)
no_name_drs_id = '00e02bd9-4135-4086-a868-98e6744570e4'

no_name_drs_id = '5ecf79b7-9951-4204-8665-12b02e8e1bf9'

print(json.dumps(gen3_drs.getObject(no_name_drs_id),indent=3))

https://nci-crdc.datacommons.io/ga4gh/drs/v1/objects/5ecf79b7-9951-4204-8665-12b02e8e1bf9
{
   "access_methods": [
      {
         "access_id": "s3",
         "access_url": {
            "url": "s3://pdcdatastore/studies/234/PSM/tsv/09CPTAC_LSCC_W_BI_20190722_KL_f23.psm"
         },
         "region": "",
         "type": "s3"
      }
   ],
   "aliases": [],
   "checksums": [
      {
         "checksum": "8fce2d87d44b70bde656092fb150ee8e",
         "type": "md5"
      }
   ],
   "created_time": "2020-09-24T12:57:37.475408",
   "description": null,
   "form": "object",
   "id": "dg.4DFC/5ecf79b7-9951-4204-8665-12b02e8e1bf9",
   "mime_type": "application/json",
   "name": null,
   "self_uri": "drs://dg.4DFC:5ecf79b7-9951-4204-8665-12b02e8e1bf9",
   "size": 4647227,
   "updated_time": "2020-09-24T12:57:37.475414",
   "version": "0daf90ef"
}


In [28]:
test3 = '9ced10ef-ea70-49bd-ad55-a2be0bbd1258'
print(json.dumps(gen3_drs.getObject(test3),indent=3))

https://nci-crdc.datacommons.io/ga4gh/drs/v1/objects/9ced10ef-ea70-49bd-ad55-a2be0bbd1258
{
   "access_methods": [
      {
         "access_id": "gs",
         "access_url": {
            "url": "gs://public-datasets-idc/9ced10ef-ea70-49bd-ad55-a2be0bbd1258.dcm"
         },
         "region": "",
         "type": "gs"
      }
   ],
   "aliases": [],
   "checksums": [
      {
         "checksum": "df20524dd1f99ac47f00f9758b6d3096",
         "type": "md5"
      }
   ],
   "created_time": "2022-04-28T00:11:33.556879",
   "description": null,
   "form": "{}",
   "id": "dg.4DFC/9ced10ef-ea70-49bd-ad55-a2be0bbd1258",
   "mime_type": "application/json",
   "name": null,
   "self_uri": "drs://dg.4DFC:9ced10ef-ea70-49bd-ad55-a2be0bbd1258",
   "size": 526458,
   "updated_time": "2022-04-28T00:11:33.556880",
   "version": "c24ce872"
}


In [None]:
print((gen3_drs.getAccessURL(test3, access_id='gs')))

#### With the prefix

In [31]:
test4 = 'dg.4DFC/9ced10ef-ea70-49bd-ad55-a2be0bbd1258'
print((gen3_drs.getAccessURL(test4, access_id='gs')))

https://nci-crdc.datacommons.io/ga4gh/drs/v1/objects/dg.4DFC/9ced10ef-ea70-49bd-ad55-a2be0bbd1258/access/gs
<Response [401]>
Unauthorized for that DRS id
None
