# Using DOS to download protected data

This example shows how DOS can create an interoperability layer to work with data in `indexd`. As we will see, indexd works with fence to provide the credentials to perform URL signing.

## Accessing metadata from indexd

A lambda has been set up to point at dev.bionimbus.org. Let's get some DataObjects from it.

In [1]:
from ga4gh.dos.client import Client
client = Client("https://mkc9oddwq0.execute-api.us-west-2.amazonaws.com/api", config={'validate_responses': False})
local_client = client.client
models = client.models

Now that we've set up the client we can access data using `ListDataObjects`.

In [2]:
ListDataObjectsRequest = models.get_model('ListDataObjectsRequest')
data_objects = local_client.ListDataObjects(body=ListDataObjectsRequest(page_size=100)).result().data_objects
print("Returned {} data objects.".format(len(data_objects)))

Returned 3 data objects.


## Downloading data

These Data Objects point to s3 addresses.

In [3]:
data_object = local_client.GetDataObject(data_object_id=data_objects[2].id).result().data_object
print(data_object)

Data Object: a file, API or other resource(aliases=None, checksums=[Checksum(checksum=u'73d643ec3f4beb9020eef0beed440ad0', type=u'md5')], created=datetime.datetime(2018, 2, 26, 23, 36, 34, 899360), description=None, id=u'd8581e97-bc68-49b0-b2ea-6008950fdb36', mime_type=None, name=u'testdata1', size=9L, updated=datetime.datetime(2018, 2, 26, 23, 36, 34, 899371), urls=[URL(system_metadata=SystemMetadata(baseid=u'c917ab25-b773-4ffc-8b8c-cc1edafca1f7', created_date=u'2018-02-26T23:36:34.899360', did=u'd8581e97-bc68-49b0-b2ea-6008950fdb36', file_name=u'testdata1', form=u'object', hashes={u'md5': u'73d643ec3f4beb9020eef0beed440ad0'}, metadata={u'acls': u'acct'}, rev=u'b21b1b26', size=9, updated_date=u'2018-02-26T23:36:34.899371', urls=[u's3://cdis-presigned-url-test/testdata'], version=None), url=u's3://cdis-presigned-url-test/testdata', user_metadata=UserMetadata(acls=u'acct'))], version=u'b21b1b26')


Ordinarily these data will only be accessible with a third party client. If the data are in public buckets with requester pays, specially formatted URLs may be available.

In [4]:
print(data_object.urls[0].url)

s3://cdis-presigned-url-test/testdata


## Logging in to sign a URL

In `fence` my email `davidcs@ucsc.edu` has been granted access to one of the files for demonstration. To get the signed URL, we need to get a `fence_session` token. Please consider this a preliminary demonstration of crossing auth domains.

First, we must access the google login for `bionimbus.org`.

Clicking this URL will take us to the bionimbus login process.

https://dev.planx-pla.net/

On successful authentication we are redirected to bionimbus, where we can generate an API key and download the resulting `credentials.json`.

### Load the API Key

In [5]:
import json

with open("credentials.json","r") as json_file:
     credentials = json.load(json_file)

### Get a token from fence

In [6]:
import requests
token = requests.post('https://dev.planx-pla.net/user/credentials/cdis/access_token', json=credentials).json()
print(token['access_token'][0:10])

eyJhbGciOi


We can now use this token to get signed URLs, in addition to the cloud native URLs.

### Using auth with the DOS client

In [10]:
response = local_client.GetDataObject(data_object_id="d8581e97-bc68-49b0-b2ea-6008950fdb36", _request_options={"headers": {'Authorization': 'bearer {}'.format(token['access_token'])}}).result()
data_object = response.data_object
print(data_object.urls[0].url)
print(data_object.urls[1].url)
print(data_object.id)

s3://cdis-presigned-url-test/testdata
https://cdis-presigned-url-test.s3.amazonaws.com/testdata?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIEXU2UGJYLJ3ASDA%2F20180228%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20180228T021206Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&user_id=32&username=davidcs%40ucsc.edu&X-Amz-Signature=94dba66fa3bfa4259a176f850e675c707d48fa7a6d25a4c571a9cb11b2d57f2c
d8581e97-bc68-49b0-b2ea-6008950fdb36


One can then download from this second URL using wget.

In [21]:
signed_url = data_object.urls[1].url
!wget '$signed_url' -O out
!cat out

--2018-02-27 18:14:47--  https://cdis-presigned-url-test.s3.amazonaws.com/testdata?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIEXU2UGJYLJ3ASDA%2F20180228%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20180228T021206Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&user_id=32&username=davidcs%40ucsc.edu&X-Amz-Signature=94dba66fa3bfa4259a176f850e675c707d48fa7a6d25a4c571a9cb11b2d57f2c
Resolving cdis-presigned-url-test.s3.amazonaws.com (cdis-presigned-url-test.s3.amazonaws.com)... 54.231.40.11
Connecting to cdis-presigned-url-test.s3.amazonaws.com (cdis-presigned-url-test.s3.amazonaws.com)|54.231.40.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 40 [binary/octet-stream]
Saving to: ‘out’


2018-02-27 18:14:47 (1.07 MB/s) - ‘out’ saved [40/40]

Hi Zac!
cdis-data-client uploaded this!
