This notebook demonstrates the use of bulk DRS requests as implemented in the Starter Kit DRS implementation. These features implement the changes described in pull request [#365](https://github.com/ga4gh/data-repository-service-schemas/pull/365). The changes in the pull request are not yet an approved part of DRS.

In [1]:
from fasp.loc import DRSClient
import json

# requires the StarterKit DRS running 

drsClient = DRSClient("http://localhost:5000")



In [2]:
drsClient.get_object("8e18bfb64168994489bc9e7fda0acd4f")

{'id': '8e18bfb64168994489bc9e7fda0acd4f',
 'description': 'High coverage, downsampled CRAM file for sample HG00449',
 'created_time': '2022-06-24T07:41:44Z',
 'mime_type': 'application/cram',
 'name': 'HG00449 1000 Genomes Downsampled High Coverage CRAM file',
 'size': 18977144,
 'updated_time': '2022-06-24T07:41:44Z',
 'version': '1.0.0',
 'aliases': ['HG00449 high coverage downsampled CRAM'],
 'checksums': [{'checksum': '232a8379bf238fe0c2b646c03a4b8bd2d83917f3',
   'type': 'sha1'},
  {'checksum': '44ee4289015c892c442b504ed681532f032de5c09e846be021624815859f82e8',
   'type': 'sha256'},
  {'checksum': '8e18bfb64168994489bc9e7fda0acd4f', 'type': 'md5'}],
 'self_uri': 'drs://localhost:5000/8e18bfb64168994489bc9e7fda0acd4f',
 'access_methods': [{'access_url': {'url': 's3://ga4gh-ismb-tutorial-2022/data/1000genomes/cram/highcov/HG00449.final.2400kb.cram'},
   'type': 's3',
   'region': 'us-east-2'}]}

In [3]:
drs_ids = [
"8e18bfb64168994489bc9e7fda0acd4f",
"ba094cae0da59f27ea82a8a802be34cd",
"01b0fe13b5c4de28a4ff5a7ee3c15773",
"156f8e135472a6bc7f481c11da6a9372",
"336854e9e2cd32476efed80508e522ab",
"4db2e371cf5f5b4257120f26736f6a1d",
"77b0f3d65271c4a0064ff7760828dd92", 
"07d36706f15c3af1f1ad1dd595eca188",
"b60e59cc6b46ed04a3ede78d8c75a6ce",
"e2d03ee77bc4a7786bf6855da96dcb86",
"2405a382375763292ea903a6a658ce95",
"00be9e467ed3986cb2b2b1e2d157a2df",
"ba094cae0da59f27ea82a8a802be34cd",
"d5d4dc9bc29d993e5cc057c6c5a05939",
"9c6ad5209da53a3eeab831445b3c7dc2",
"f4e33a5535b43f8d3c3baf9ce05893ad",
"90dc98385d4523b6967299d0b3d0d1e2",
"f684f723102fc3b20a70ce132ec51ab7",
"c2ddf71411a1afa4e68a132258d70be7"
]

print (f"Sending {len(drs_ids)} ids to server")
resp = drsClient.get_objects(drs_ids)
print ("Response summary")
print(json.dumps(resp['summary'], indent=3))

Sending 19 ids to server
Response summary
{
   "requested": 19,
   "resolved": 19,
   "unresolved": 0
}


In [4]:
for r in resp['resolved_drs_object']:
	print(r['mime_type'])
	for a in r['access_methods']:
		print(a['type'], a['region'])
	

application/cram
s3 us-east-2
application/crai
s3 us-east-2
application/cram
s3 us-east-2
application/cram
s3 us-east-2
application/cram
s3 us-east-2
application/cram
s3 us-east-2
application/cram
s3 us-east-2
application/cram
s3 us-east-2
application/crai
s3 us-east-2
application/crai
s3 us-east-2
application/crai
s3 us-east-2
application/crai
s3 us-east-2
application/crai
s3 us-east-2
application/cram
s3 us-east-2
application/crai
s3 us-east-2
application/crai
s3 us-east-2
application/crai
s3 us-east-2
application/crai
s3 us-east-2
application/crai
s3 us-east-2


### Testing repeat ids in a bulk request

Note the following explores what would happen if the same id was repeated in the request.

What the preferred behavior in this situation might bear some discussion.

The fact that the server counts the items in the bulk and how many are resolved and provides a summary is interesting. If it does so - is it reasonable to recognize the duplicates.

In [12]:
triplicate_drs_ids = [
"8e18bfb64168994489bc9e7fda0acd4f",
"8e18bfb64168994489bc9e7fda0acd4f",
"8e18bfb64168994489bc9e7fda0acd4f"]

print (f"\nTesting same drs id multiple times in same request")
print (f"Sending {len(triplicate_drs_ids)} ids to server")
resp = drsClient.get_objects(triplicate_drs_ids)
print ("Response summary")
print(json.dumps(resp['summary'], indent=3))
last_item = None
n = 1
for i in resp['resolved_drs_object']:
    if last_item != None:
        if i == last_item:
            print (f"The DRS response for item {n} is the same as for item {n-1}")
        else:
            print("No it's different")
    else:
        print ("Item 1")
    last_item = i
    n+=1



Testing same drs id multiple times in same request
Sending 3 ids to server
Response summary
{
   "requested": 3,
   "resolved": 3,
   "unresolved": 0
}
Item 1
The DRS response for item 2 is the same as for item 1
The DRS response for item 3 is the same as for item 2


Is the following a more helpful/accurate response in this circumstance?
~~~
{
   "requested": 3,
   "resolved": 1,
   "unresolved": 0
}
~~~