# Data Distribution

This tutorial demonstrates how to implements a simple data disitrubtion flow. The general use case is as follows:

1. A user has logged into the scinece gateway and wishes to access a dataset
1. Once the user selects the dataset, the science gateway will stage the data in a temporary directory, and grant the user read-only access to the directory
1. Once the user has successfully downloaded the data, the directory is removed (this can also be done based on a timeframe, after which the user's permission is revoked).

Note: You launched this notebook using the Globus-enabled JupyterHub environment, so the following have already happened:
1. You have established your identity by authenticating, with an institutional credential, ORCID, or similar
1. You have granted consent to the issuance of tokens with certain scopes
1. A notebook has been created, with access to those tokens

## Get tokens from the Jupyter environment

The Globus-enabled JupyterHub passes the tokens into the notebook environment `base64` encoded as a pickled Python dictionary assigned to the `GLOBUS_DATA` variable. We'll grab the variable and unpack it. 

In [None]:
# We will need a few utility packages
import os, pickle, base64, json

# Get Globus Auth token data
globus_token_data = os.getenv('GLOBUS_DATA')

# now extract the pickled tokens
pickled_tokens = base64.b64decode(globus_token_data)

# Unpickle and get the dictionary
tokens = pickle.loads(pickled_tokens)

# Minimal sanity check, did we get the data type we expected?
if isinstance(tokens, dict):
  print(json.dumps(tokens, indent=4))

## Get the authenticated user's primary identity

We will grant the user access to the data via their primary identity. This may be retrieved in a number of ways; here we just extract it from the tokens object.

In [None]:
# Get the authenticated user's primary identity from the tokens dictionary
identity_id = tokens['id_token']['sub']
primary_identity = next(identity for identity in tokens['id_token']['identity_set'] if identity['sub'] == identity_id)
 
print(f"Setting permissions for user: {primary_identity['username']}")
print(f"Notifications will be sent to: {primary_identity['email']}")   

## Create data distribution directory

Using a Globus Transfer client, we create a directory on the shared endpoint that will contain the data we are distributing to the user. The shared endpoint must already exist (it cannot be created on the fly); the data distribution directory will be named using the identity ID of the user.

In [None]:
# Shared endpoint - must already exist
shared_endpoint_id = "e56c36e4-1063-11e6-a747-22000bf2d559"  # legacy name petrel#testbed

# This is directory where we will place files for distribution
data_distribution_root = "/disthome/"

# Create a TransferClient object using the Transfer service token
transfer_access_token = tokens['tokens']['transfer.api.globus.org']['access_token']
transfer_authorizer = globus_sdk.AccessTokenAuthorizer(transfer_access_token)
tc = globus_sdk.TransferClient(authorizer=transfer_authorizer)

# Create a directory for the files we're distributing (name it using the user's identity ID)
try:
    user_path = data_distribution_root + identity_id + '/'
    mkdir_result = tc.operation_mkdir(shared_endpoint_id, path=user_path)
    print(mkdir_result['message'])
except globus_sdk.GlobusAPIError as error:
    print(f"Error code: {error.code}\nError message: {error.message}")

## Approach A: Add permissions to data distribution directory

The most strightforward approach is to grant the user access directly. In order to do this we must add their identity to the access control list for the specified directory. We will grant the user read-only access.

In [None]:
# Compose the permission rule
rule_data = {
    'DATA_TYPE': 'access',
    'permissions': 'r',  # read-only access
    'principal': identity_id,  # the user's identity ID
    'principal_type': 'identity',
    'path': user_path  # the directory to which we're granting access
}

try:
    # Add the rule to the access control list for the shared endpoint
    response = tc.add_endpoint_acl_rule(shared_endpoint_id, rule_data)
    access_rule_id = response['access_id']
    print (response)
except globus_sdk.GlobusAPIError as error:
    if "Exists" in error.code:
        print("ACL already exists, ignoring error")
    else:
        raise

## Approach B: Sharing via Groups membership
An alternative approach can be used when sharing data with a larger community. This entails using a Globus Group; the group is granted access to the data directory and then users can be granted access simply by adding them to that group.

In [None]:
import requests  # we'll use this to make low-level calls to the Groups API

# Prepare the authorization header; get access token for the Groups service
headers = {'Authorization':'Bearer '+ tokens['tokens']['groups.api.globus.org']['access_token']}

# This is the group (Directed Omics Translation) that already has access to the shared endpoint
sharing_group_id = "b4cada3a-af7c-11e3-8b90-1231391ccf32"  

# Assuming we have the username, we can get the user's identity ID via the Auth service
username = "globus.demodoc@gmail.com"
identities = ac.get_identities(usernames=username).data['identities']

# Now we add that identity to the group
group_add = requests.post(f"https://groups.api.globus.org/v2/groups/{sharing_group_id}", 
                            data=json.dumps({"add": [{"identity_id": identities[0]['id']},]}),
                            headers=headers).json()
if not group_add['add']:
    print(f"Failed to add user to sharing group: {group_add['errors']['add'][0]['detail']}")
else:
    print(json.dumps(group_add, indent=4))

## Revoke user permissions
We assume here that the science gateway either (a) tracks file transfers from the data distribution directory and can tell when data are successfully downloaded, or (b) sets a time limit for the user to download their data. After either (a) or (b), the science gateway will revoke the user's permissions and remove the data distribution directory.

We revoke permission by either (a) removing the rule from the ACL for the shared endpoint, or (b) removing the user from the sharing group.

### Remove rule from shared endpoint ACL

In [None]:
response = tc.delete_endpoint_acl_rule(shared_endpoint_id, access_rule_id)
print (response)

### Remove user from sharing group
There is currently no public API for removing group members, but this can be easily done [via the Globus web app here](https://app.globus.org/groups/b4cada3a-af7c-11e3-8b90-1231391ccf32/members).