<img src="img/automation_using_flows_header.png">

In this notebook we demonstrate how the Globus Flow service can be used to automate data management at scale. We demonstrate a flow that automates a common design pattern: moving data from one system to another and making the data accessible to collaborators. This flow is often needed to manage data coming from instruments, e.g., image files can be moved from local storage attached to a microscope to a high-performance storage system where they may be accessed by all members of the research project (in our example, we'll grant access to the [Tutorial Users group](https://app.globus.org/groups/50b6a29c-63ac-11e4-8062-22000ab68755/about)).

We will walk through the following tasks:
1. Authenticate with Globus and get tokens for accessing various services.
1. Define and register a flow with Globus.
1. Execute a flow using configurable inputs for the collections and the access permissions.

The Globus flow is illustrated below.

<img src="img/transfer_set_permissions_flow.png" alt="Transfer and set permissions flow" align="CENTER" style="width: 90%;"/>

In [None]:
import sys
import os
import time
import json
import uuid
import pickle
import base64

import globus_sdk
import globus_sdk.scopes

# ID of this tutorial notebook as registered with Globus Auth
CLIENT_ID = 'f794186b-f330-4595-b6c6-9c9d3e903e47'

# Feel free to replace the collection UUIDs below with those of your own collections
source_collection = "6c54cade-bde5-45c1-bdea-f4bd71dba2cc"  # "Globus Tutorial Collection 1"
destination_collection = "a6f165fa-aee2-4fe5-95f3-97429c28bf82"  # "Globus Tutorials on ALCF Eagle"
my_collaborators = "50b6a29c-63ac-11e4-8062-22000ab68755"  # "Tutorial Users" group

## A. Authentication and Authorization

All interactions between users and services on the Globus automation platform are governed by the Globus Auth service. In particular, this means that consent must be given by the user for each interaction taking place on their part, including in this notebook.

The first time you interact with each service such as the Flow service, or even a flow instance, you will be provided a link to perform the consent flow. You must click the link to complete the consent flow which will launch in a new tab. When complete, copy the code string, return to the notebook, and  paste the code into the input box that is presented below the link to begin the flow.

We will encounter authorization steps in a couple of places:
1. When deploying a new flow on the Globus Flows service; deploying a flow requires (a) an identity that is associated with a Globus subscription, and (b) access to the Flow service scope.
1. When executing a flow.

Access to the Flows service is already granted to you by virtue of authenticating to the JupyterHub running this notebook. Note: If you're running this notebook in your own environment you will need to manually log into Globus Auth and get tokens using a native app authorization flow (see the `Platform_Introduction` notebook for an example of how to initiate this flow).

In [None]:
# Create transfer scope with data_access scope dependency from the source mapped collection.
transfer_scope = globus_sdk.scopes.TransferScopes.make_mutable("all")
data_access_scope = globus_sdk.scopes.GCSCollectionScopeBuilder(source_collection).data_access
transfer_scope.add_dependency(data_access_scope)

# Get Globus Auth token data from the JupyterHub environment. If tokens already exist from logging into
# jupyter.demo.globus.org, tokens from the environment can be used instead. Otherwise, do a Native App flow.
globus_data_raw = os.getenv("GLOBUS_DATA")
if globus_data_raw:
    tokens = pickle.loads(base64.b64decode(os.getenv('GLOBUS_DATA')))['tokens']
else:
    # Do a native app authentication flow to get tokens that allow us to interact with the Globus Flows service
    scopes = [
        "openid",
        "profile",
        "email",
        transfer_scope,
        globus_sdk.FlowsClient.scopes.manage_flows,
        globus_sdk.FlowsClient.scopes.run_manage,
    ]
    native_auth_client = globus_sdk.NativeAppAuthClient(CLIENT_ID)
    native_auth_client.oauth2_start_flow(requested_scopes=scopes)
    print(f"Login Here:\n\n{native_auth_client.oauth2_get_authorize_url()}")
    
    # Authenticate and come back with your authorization code; paste it into the prompt below.
    auth_code = input('Authorization Code: ')
    response = native_auth_client.oauth2_exchange_code_for_tokens(auth_code)
    
    # Save the new token in a place where the flows client can retrieve it.
    tokens = response.by_resource_server
    
    # These are the saved scopes for the flow
    print(json.dumps(tokens, indent=2))

# Uncomment the line below to introspect tokens
#print(json.dumps(tokens, indent=2))

# Create a variable for storing flow scope tokens. Each newly deployed flow scope needs to be authorized separately,
# and will have its own set of tokens. Save each of these tokens by scope.
saved_flow_scopes = {}

# Add a callback to the flows client for fetching scopes. It will draw scopes from `saved_flow_scopes`
def get_flow_authorizer(flow_id):
    return globus_sdk.AccessTokenAuthorizer(access_token=saved_flow_scopes[flow_id]['access_token'])

# Setup the Flow client, using tokens from our Jupyterhub login to access the Globus Flows service, and
# set the `get_flow_authorizer` callback for any new flows we authorize.
flows_authorizer = globus_sdk.AccessTokenAuthorizer(access_token=tokens['flows.globus.org']['access_token'])
flows_client = globus_sdk.FlowsClient(authorizer=flows_authorizer)

### Fetch User Identity

When transferring files to the guest collection we will put them in a directory named `<identity_id>-shared-files`, just to uniquely identify it from other directories. Let's fetch our user id for this purpose.

In [None]:
# Create an Auth client so we can look up identities
auth_authorizer = globus_sdk.AccessTokenAuthorizer(access_token=tokens['auth.globus.org']['access_token'])
ac = globus_sdk.AuthClient(authorizer=auth_authorizer)

# Get the user's primary identity
primary_identity = ac.oauth2_userinfo()
identity_id = primary_identity['sub']

print(f"Username: {primary_identity['preferred_username']} (ID: {identity_id})")
print(f"Notifications will be sent to: {primary_identity['email']}")

# B. Flow Authoring

## Define a flow

* Flows are composed of *action* invocations.
* Each action invocation reads from and contributes back to the flow *state* which can be accessed in flow steps using the `InputPath` and `ResultPath` properties of an Action.
* Actions are specified with an `ActionUrl` property. The `ActionUrl` is the address of an *action provider*: An API that provides actions you can invoke with your flow. Globus provides a number of action providers for Globus services, and you can also create your own using the [Action Provider Tools package](https://action-provider-tools.readthedocs.io/en/latest/).
* Each action provider defines its accepted input schema specifying the permitted input format. For example, the Globus Transfer action provider requires source and destination collection IDs as well as source and destination paths.
* Actions are linked via their `Next` property. The last action in a flow sets the `End` property to `true`.

Our simple flow includes just two actions, `MoveFiles` and `SetPermission`.

In [None]:
# Define flow
flow_definition = {
    "Comment": "Transfer files to a guest collection and set access permissions",
    "StartAt": "TransferFiles",
    "States": {
        "TransferFiles": {
            "Comment": "Transfer to a guest collection",
            "Type": "Action",
            "ActionUrl": "https://transfer.actions.globus.org/transfer",
            # https://docs.globus.org/api/transfer/action-providers/transfer/
            "Parameters": {
                "source_endpoint.$": "$.input.source.id",
                "destination_endpoint.$": "$.input.destination.id",
                "DATA": [
                    {
                        "source_path.$": "$.input.source.path",
                        "destination_path.$": "$.input.destination.path",
                        "recursive.$": "$.input.recursive_tx"
                    }
                ]
            },
            "ResultPath": "$.TransferFiles",
            "WaitTime": 60,
            "Next": "SetPermission",
        },
        "SetPermission": {
            "Comment": "Grant read permission on the data to a Globus user or group",
            "Type": "Action",
            "ActionUrl": "https://transfer.actions.globus.org/manage_permission",
            # https://docs.globus.org/api/transfer/action-providers/manage-permission/
            "Parameters": {
                "endpoint_id.$": "$.input.destination.id",
                "path.$": "$.input.destination.path",
                "operation": "CREATE",
                "permissions": "r",  # read-only access
                "principal_type.$": "$.input.principal_type",  # 'group' or 'identity'
                "principal.$": "$.input.principal_identifier"
            },
            "ResultPath": "$.SetPermission",
            "End": True
        }
    }
}

## Define a schema

* All Flows require schemas to validate user input is correct. 
* Flow Input Schemas are written in JSON Schema. 
* Input Schemas are deployed with the ``flow_definition`` and are checked when any user tries to run a fluw.

Include a schema for our two states above:

In [None]:
# Define input schema
input_schema = {
    "required": [
        "input"
    ],
    "properties": {
        "input": {
            "type": "object",
            "required": [
                "source",
                "destination",
                "recursive_tx",
                "principal_identifier",
                "principal_type"
            ],
            "properties": {
                "source": {
                    "type": "object",
                    "title": "Select source collection and path",
                    "description": "The source collection and path (path MUST end with a slash)",
                    "format": "globus-collection",
                    "required": [
                        "id",
                        "path"
                    ],
                    "properties": {
                        "id": {
                            "type": "string",
                            "format": "uuid",
                            "default": source_collection
                        },
                        "path": {
                            "type": "string"
                        }
                    },
                    "additionalProperties": False
                },
                "destination": {
                    "type": "object",
                    "title": "Select destination collection and path",
                    "description": "The destination collection and path (path MUST end with a slash); default collection is 'Globus Tutorials on ALCF Eagle'",
                    "format": "globus-collection",
                    "required": [
                        "id",
                        "path"
                    ],
                    "properties": {
                        "id": {
                            "type": "string",
                            "format": "uuid",
                            "default": destination_collection
                        },
                        "path": {
                            "type": "string",
                            "default": f"/automation-tutorial/{identity_id}-shared-files/"
                        }
                    },
                    "additionalProperties": False
                },
                "recursive_tx": {
                    "type": "boolean",
                    "title": "Recursive transfer",
                    "description": "Whether or not to transfer recursively, must be true when transferring a directory.",
                    "default": True,
                },
                "principal_type": {
                    "type": "string",
                    "title": "Type of principal to share with",
                    "description": "Specifies whether files are being shared with a user ('identity') or a group ('group'); default is 'group'",
                    "enum": [
                        "identity",
                        "group"
                    ],
                    "default": "group"
                },
                "principal_identifier": {
                    "type": "string",
                    "title": "UUID of user identity or group",
                    "description": "The user or group id to share with; default is 'Tutorial Users' group.",
                    "format": "uuid",
                    "default": my_collaborators
                }
            },
            "additionalProperties": False
        }
    },
    "additionalProperties": False
}

## Create the flow
In order to run this flow, we use the definition to create a flow in the Globus Flows service. In addition to the definition we created above, we will provide a unique title for the flow to make it easier to identify. If deployment succeeds, Flows will return the ID of your new flow.

In [None]:
# Create the flow
# Set the flow's title so you can easily identify it
flow_title = f"Tutorial-Transfer-Share-{str(uuid.uuid4())[:4]}"
flow = flows_client.create_flow(
    title=flow_title,
    definition=flow_definition,
    input_schema=input_schema,
)
flow_id = flow['id']
flow_scope = globus_sdk.SpecificFlowClient(flow_id).scopes.make_mutable("user")
flow_scope.add_dependency(transfer_scope)


"""
# If you change the flow, you will need to update it.
# For example, to make this flow visibe to another user:
flow = flows_client.update_flow(
    flow_id=flow_id, 
    flow_viewers=[f"urn:globus:auth:identity:{identity_id}"]),
)
"""

print(f"Successfully created flow: '{flow_title}'")
print(f"(ID: {flow_id})")
print(f"Flow scope: {flow_scope}\n\n")
print(f"View the flow in the Web App: https://app.globus.org/flows/{flow_id}")
print(f"Note: You can start your flow directly from the Web App")

# C. Flow Execution

## Authorize the flow

Once your flow has been created, in order to run it, you will need to authorize it to interact with other services on your behalf. Globus Flow service generates a dedicated scope for each flow. To give consent to this flow, we need to get a properly scoped access token (see `flow_scope` above), and then we can use this token to execute the flow. Note that you will be required to consent again.

In [None]:
# If the flow scope is already saved, we don't need a new one.
if flow_id not in tokens:
    # Do a native app authentication flow and get tokens that include the newly deployed flow scope
    native_auth_client = globus_sdk.NativeAppAuthClient(CLIENT_ID)
    native_auth_client.oauth2_start_flow(requested_scopes=flow_scope)
    print(f"Login Here:\n\n{native_auth_client.oauth2_get_authorize_url()}")
    
    # Authenticate and come back with your authorization code; paste it into the prompt below.
    auth_code = input('Authorization Code: ')
    token_response = native_auth_client.oauth2_exchange_code_for_tokens(auth_code)
    
    # Save the new token in a place where the flows client can retrieve it.
    tokens[flow_id] = token_response.by_resource_server[flow_id]
    
    # These are the saved scopes for the flow
    print(json.dumps(tokens, indent=2))

## Define flow input

If your flow includes parameterized input, you must provide values for those properties when running the flow. Like the flow definition, flow input is defined as a JSON document. You must provide a value for each input property in your flow. (Input properties are part of the flow's "state" and can be accessed in a flow definition by prefixing values with `$.` and providing the path to the property, as seen in the flow definition above).

For the `MoveFiles` action, we must specify source and destination collection IDs and source and destination paths. For the `SetPermissions` action we must specify the collection ID, the type of entity to which we're granting permission, the entity's ID, and the permission (read or read/write).

In [None]:
# Define flow inputs
destination_path = f"/automation-tutorial/{identity_id}-shared-files/"
flow_input = {
    "input": {
        # Transfer input
        "source": {
            "id": source_collection,
            "path": "/home/share/godata/"
        },
        "destination": {
            "id": destination_collection,
            "path": destination_path
        },
        
        "recursive_tx": True,
        # Grant access to the Tutorial Users group
        "principal_type": "group",
        "principal_identifier": my_collaborators

        # We could also grant access to a specific user, using their Globus identity ID
        #"principal_type": "identity",
        #"principal_identifier": identity_id
    }
}

## Run the flow

We're finally ready to run the flow. You can monitor and manage your flow runs from the Globus Web App (https://app.globus.org/runs)

Note: If you run the flow multiple times it will fail after the first run because, once the access rule is set on the collection/path, setting it again will fail. Run the code in the "Remove Access Rule" cell below to clear things up on the destination endpoint before running the flow again.

In [None]:
# Get a client for the flow
specific_flow_authorizer = globus_sdk.AccessTokenAuthorizer(
    access_token=tokens[flow_id]['access_token'],
)
specific_flow_client = globus_sdk.SpecificFlowClient(
    flow_id=flow_id,
    authorizer=specific_flow_authorizer,
)

# Run the flow
# Set a descriptive label for this flow run
run_label = f"Transfer/Share tutorial run for {primary_identity['preferred_username']}"
run = specific_flow_client.run_flow(
  body=flow_input,
  label=run_label,
  tags=['tutorial', 'transfer-share-flow']
)

# Get run details
run_id = run['run_id']
run_status = run['status']
print("This flow can be monitored in the Web App:")
print(f"https://app.globus.org/runs/{run_id}")
print(f"Flow run started with ID: {run_id} - Status: {run_status}")

# Poll the Flow service to check on the status of the flow
while run_status == 'ACTIVE':
    time.sleep(5)
    run = flows_client.get_run(run_id)
    run_status = run['status']
    print(f'Run status: {run_status}')
    
# Run completed
print(json.dumps(run.data, indent=2))

## View the files on the guest collection
Our files were moved to the guest collection and read access was granted to the Tutorial Users group. Members of the group can now access the files via the web app, CLI or the Globus APIs. Here we generate a link that opens the Globus web app file manager to view the collection.

In [None]:
from urllib.parse import urlencode, urlunsplit
query_params = {
    "origin_id": destination_collection,
    "origin_path": destination_path
}
url = urlunsplit(("https", "app.globus.org", "file-manager", urlencode(query_params), ''))
print(f"View your files in the Globus web app here:\n{url}\n\n")


## Remove Access Rule
You can remove the access permission directly, using the Globus SDK. And we may as well clean up the directory while we're at it.

In [None]:
# Get the ID of the access rule from the flow action's output
access_rule_id = run['details']['output']['SetPermission']['details']['access_id']

transfer_authorizer = globus_sdk.AccessTokenAuthorizer(tokens['transfer.api.globus.org']['access_token'])
tc = globus_sdk.TransferClient(authorizer=transfer_authorizer)

# Remove the access rule
response = tc.delete_endpoint_acl_rule(destination_collection, access_rule_id)
print(response)

# Delete the directory on the guest collection
# DeleteData() automatically gets a submission_id for once-and-only-once submission
label = "Automation tutorial cleanup"
delete_data = globus_sdk.DeleteData(tc, destination_collection, label=label, recursive=True)

## Recursively delete the destination path contents (given recursive flag set above)
delete_data.add_item(destination_path)
tc.endpoint_autoactivate(destination_collection)
submit_result = tc.submit_delete(delete_data) 
print(f"DELETE Task ID: {submit_result['task_id']}")

## Review Permissions
Ensure the permissions no longer exist:

In [None]:
print(f"View sharing permissions: \nhttps://app.globus.org/file-manager/collections/{destination_collection}/sharing")