<img src="img/automation_using_flows.png">

In this notebook we demonstrate how the Globus Flow service can be used to automate data management at scale. We demonstrate a flow that automates a common design pattern: moving data from one system to another and making the data accessible to collaborators. This flow is often needed to manage data coming from instruments, e.g., image files can be moved from local storage attached to a microscope to a high-performance storage system where they may be accessed by all members of the research project (in our example, we'll grant access to the [Tutorial Users group](https://app.globus.org/groups/50b6a29c-63ac-11e4-8062-22000ab68755/about)).

We will walk through the following tasks:
1. Authenticate with Globus and get tokens for accessing various services.
1. Define and register a flow with Globus.
1. Execute a flow using configurable inputs for the endpoint and the access permissions.

The Globus flow is illustrated below.

<img src="img/transfer_set_permissions_flow.png" alt="Transfer and set permissions flow" align="CENTER" style="width: 90%;"/>

In [None]:
import sys
import os
import time
import json
import uuid
import pickle
import base64

import globus_sdk
from globus_automate_client import FlowsClient

client_id = 'f794186b-f330-4595-b6c6-9c9d3e903e47'  # native app client ID for notebook

# Feel free to replace the endpoint UUIDs below with those of your own endpoints
source_endpoint = "ddb59aef-6d04-11e5-ba46-22000b92c6ec"  # endpoint "Globus Tutorial Endpoint 1"
destination_endpoint = "a6f165fa-aee2-4fe5-95f3-97429c28bf82"  # endpoint "Globus Tutorials on ALCF Eagle"
my_collaborators = "50b6a29c-63ac-11e4-8062-22000ab68755"  # group "Tutorial Users"

## A. Authentication and Authorization

All interactions between users and services on the Globus automation platform are governed by the Globus Auth service. In particular, this means that consent must be given by the user for each interaction taking place on their part, including in this notebook.

The first time you interact with each service such as the Flow service, or even a flow instance, you will be provided a link to perform the consent flow. You must click the link to complete the consent flow which will launch in a new tab. When complete, copy the code string, return to the notebook, and  paste the code into the input box that is presented below the link to begin the flow.

We will encounter authorization steps in a couple of places:
1. When deploying a new flow on the Globus Flow service; deploying a flow requires (a) an identity that is associated with a Globus subscription, and (b) access to the Flow service scope.
1. When executing a flow.

Access to the Flow service is already granted to you by virtue of authenticating to the JupyterHub running this notebook. Note: If you're running this notebook in your own environment you will need to manually log into Globus Auth and get tokens using a native app authorization flow (see the `Platform_Introduction_Native_App_Auth` notebook for an example of how to initiate this flow).

In [None]:
# Get Globus Auth token data from the JupyterHub environment
tokens = pickle.loads(base64.b64decode(os.getenv('GLOBUS_DATA')))['tokens']

# Introspect tokens
print(json.dumps(tokens, indent=2))

# Create a variable for storing flow scope tokens. Each newly deployed scope needs to be authorized separately,
# and will have its own set of tokens. Save each of these tokens by scope.
saved_flow_scopes = {}

# Add a callback to the flows client for fetching scopes. It will draw scopes from  `saved_flow_scopes`
def get_flow_authorizer(flow_url, flow_scope, client_id):
    return globus_sdk.AccessTokenAuthorizer(access_token=saved_flow_scopes[flow_scope]['access_token'])

# Setup the Flow client, using tokens from our Jupyterhub login to access the Globus Flow service, and
# setting the `get_flow_authorizer` callback for any new flows we authorize.
flows_authorizer = globus_sdk.AccessTokenAuthorizer(access_token=tokens['flows.globus.org']['access_token'])
flows_client = FlowsClient.new_client(client_id, get_flow_authorizer, flows_authorizer)

### Fetch User Identity

When transferring files to the guest collection we will put them in a directory named `<identity_id>-shared-files`, just to uniquely identify it from other directories. Let's fetch our user id for this purpose.

In [None]:
# Create an Auth client so we can look up identities
auth_authorizer = globus_sdk.AccessTokenAuthorizer(access_token=tokens['auth.globus.org']['access_token'])
ac = globus_sdk.AuthClient(authorizer=auth_authorizer)

# Get the user's primary identity
primary_identity = ac.oauth2_userinfo()
identity_id = primary_identity['sub']

print(f"Username: {primary_identity['preferred_username']} (ID: {identity_id})")
print(f"Notifications will be sent to: {primary_identity['email']}")

# B. Flow Deployment

## Define a flow

* Flows are composed of *Action* invocations.
* Each Action invocation reads from and contributes back to the *Flow State* which is referenced in Flow steps using the `InputPath` and `ResultPath` properties of an Action.
* Actions specify the service endpoint that will be called using the `ActionUrl` property, and the Globus Auth scope that's required for the specified Action using the `ActionScope` property.
* The `ActionUrl` is an endpoint for an *Action Provider*; a number of Action Providers are pre-defined and you can also define you own using the [Action Provider tools](https://action-provider-tools.readthedocs.io/en/latest/).
* Each Action Provider (optionally) defines its own set of properties/inputs. For example, the Globus Transfer Action Provider requires source and destination endpoints, as well as source and destination files/paths.
* Actions are linked via their `Next` property; the last action in a flow sets the `End` property to true.

Our simple flow includes just two Actions, `MoveFiles` and `SetPermission`.

In [None]:
# Define flow
flow_definition = {
  "Comment": "Move files to guest collection and set access permissions",
  "StartAt": "MoveFiles",
  "States": {
    "MoveFiles": {
      "Comment": "Transfer from Globus Tutorial Endpoint 1 to a guest collection on Eagle",
      # https://globus-automate-client.readthedocs.io/en/latest/globus_action_providers.html#globus-transfer-transfer-data
      "Type": "Action",
      "ActionUrl": "https://actions.automate.globus.org/transfer/transfer",
      "Parameters": {
        "source_endpoint_id.$": "$.input.source.id", 
        "destination_endpoint_id.$": "$.input.destination.id",
        "transfer_items": [
              {
                "source_path.$": "$.input.source.path",
                "destination_path.$": "$.input.destination.path",
                "recursive.$": "$.input.recursive"
              }
        ],
      },
      "ResultPath": "$.MoveFiles",
      "WaitTime": 60,
      "Next": "SetPermission"
    }, 
    "SetPermission": {
      "Comment": "Grant read permission on the data to the Tutorial users group",
      "Type": "Action",
      # https://globus-automate-client.readthedocs.io/en/latest/globus_action_providers.html#globus-transfer-set-manage-permissions
      "ActionUrl": "https://actions.automate.globus.org/transfer/set_permission",
      "Parameters": {
        "endpoint_id.$": "$.input.destination.id",
        "path.$": "$.input.destination.path",
        "permissions": "r",  # read-only access
        "principal.$": "$.input.principal",  # 'group'
        "principal_type.$": "$.input.principal_type",
        "operation": "CREATE",
      },
      "ResultPath": "$.SetPermission",
      "End": True
    }
  }
}

## Define a schema

* All Flows require schemas to validate user input is correct. 
* Flow Input Schemas are written in JSON Schema. 
* Input Schemas are deployed with the ``flow_definition`` and are checked when any user tries to run a fluw.

Include a schema for our two states above:

In [None]:
input_schema = {
    "additionalProperties": False,
    "required": [
        "input"
    ],
    "properties": {
        "input": {
            "type": "object",
            "required": [
                "source", "destination", "recursive", "principal", "principal_type", 
            ],
            "properties": {
                "source": {
                    "type": "object",
                    "format": "globus-collection",
                    "title": "Find source collection ID and path",
                    "required": [
                        "id",
                        "path"
                    ],
                    "properties": {
                        "id": {
                            "type": "string",
                            "title": "Source Collection ID",
                            "format": "uuid",
                            "pattern": "[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}",
                            "maxLength": 36,
                            "minLength": 36,
                            "description": "The UUID for the collection which serves as the source of the Move",
                            "default": source_endpoint,
                        },
                        "path": {
                            "type": "string",
                            "title": "Source Collection Path",
                            "description": "The path on the source collection for the data",
                            "default": "/share/godata",
                        }
                    },
                    "additionalProperties": False
                },
                "destination": {
                    "type": "object",
                    "format": "globus-collection",
                    "title": "Find destination endpoint ID and path",
                    "required": [
                        "id",
                        "path"
                    ],
                    "properties": {
                        "id": {
                            "type": "string",
                            "title": "Destination Collection ID",
                            "format": "uuid",
                            "pattern": "[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}",
                            "maxLength": 36,
                            "minLength": 36,
                            "description": "The UUID for the collection which serves as the destination for the Move",
                            "default": destination_endpoint,
                        },
                        "path": {
                            "type": "string",
                            "title": "Destination Collection Path",
                            "description": "The path on the destination collection where the data will be stored",
                            "default": f"/automate-tutorial/{identity_id}-shared-files/",
                        }
                    },
                    "description": "The path to transfer and share the source files (Make sure the path ends with a slash!)",
                    "additionalProperties": False
                },
                "recursive": {
                    "type": "boolean",
                    "title": "Recursive transfer",
                    "description": "Whether or not to transfer a directory recursively, must be true when transferring a directory.",
                    "default": True
                },              
                "principal": {
                    "type": "string",
                    "title": "UUID of the princpal that is granted permission",
                    "pattern": "^[a-zA-Z0-9-_, ]+$",
                    "maxLength": 128,
                    "description": "UUID of the princpal that is granted permission",
                    "default": my_collaborators,
                },
                "principal_type": {
                    "type": "string",
                    "enum": ["identity", "group"],
                    "default": "group",
                    "description": "Whether this is being shared with a 'user' or a 'group'"
                },
            },
            "additionalProperties": False
        }
    }
}

## Deploy a flow

Before running a flow it must be deployed on the Globus Flow service. In addition to the flow definition we created above, you must provide a unique title for your flow when you deploy it. If deployment succeeds Globus returns an ID as a handle to the flow resource.

In [None]:
# Deploy the flow
flow_title = f"Tutorial-Flow-{str(uuid.uuid4())}"   # generate a unique title
flow = flows_client.deploy_flow(
  flow_definition, 
  title=flow_title,
  input_schema=input_schema,
)
flow_id = flow['id']
flow_scope = flow['globus_auth_scope']

'''
# If you change the flow you will need to update it; here we change the flow's visiblity
# By default, flow are visible only to their creator
#flow = flows_client.update_flow(
  flow_id, 
  flow_definition,
  visible_to=[f"urn:globus:auth:identity:{identity_id}"])
'''

print(f"Successfully deployed flow (ID: {flow_id})")
print(f"Flow scope: {flow_scope}\n\n")
print(f"View the flow in the Webapp: https://app.globus.org/flows/{flow_id}")
print(f"Note: You can start your flow directly from the Webapp!")

# C. Flow Execution

## Define flow input(s)

If your flow includes parameterized input properties you must provide values for those properties when running the flow. Like the flow definition, flow inputs are defined as a JSON document. You must provide a value for each input property in your flow (input properties are prefixed by `$.` (see flow definition above).

For the `MoveFiles` action we must specify source and destination collection IDs and source and destination paths. For the `SetPermissions` action we must specify the collection ID, the type of entity to which we're granting permission, the entity's ID, and the permission (read or read/write).

In [None]:
# Define flow inputs
destination_path = f"/automate-tutorial/{identity_id}-shared-files/"
flow_input = {
    "input": {
        # Transfer input
        "source": {
            "id": source_endpoint,
            "path": "/share/godata",
        },
        "destination": {
            "id": destination_endpoint,
            "path": destination_path,
        },
        
        "recursive": True,
        # Grant access to the Tutorial Users group
        "principal": my_collaborators,
        "principal_type": "group",

        # We could also grant access to a specific user, using their Globus identity ID
        #"principal": identity_id,
        #"principal_type": "identity",
    }
}

## Authorize the newly deployed flow

The new flow has been deployed, but it still needs to be authorized. When deploying the flow, the Globus Flow service generates a new scope specifically for this flow. We need to get an access token scoped to the newly deployed flow (see `flow_scope` above); we'll use this token to execute the flow. Note that you will be required to consent again.

In [None]:
# If the flow scope is already saved, we don't need a new one.
if flow_scope not in saved_flow_scopes:
    # Do a native app authentication flow and get tokens that include the newly deployed flow scope
    native_auth_client = globus_sdk.NativeAppAuthClient(client_id)
    native_auth_client.oauth2_start_flow(requested_scopes=flow_scope)
    print(f"Login Here:\n\n{native_auth_client.oauth2_get_authorize_url()}")
    
    # Authenticate and come back with your authorization code; paste it into the prompt below.
    auth_code = input('Authorization Code: ')
    token_response = native_auth_client.oauth2_exchange_code_for_tokens(auth_code)
    
    # Save the new token in a place where the flows client can retrieve it.
    saved_flow_scopes[flow_scope] = token_response.by_scopes[flow_scope]
    
    # These are the saved scopes for the flow
    print(json.dumps(saved_flow_scopes, indent=2))

## Run the flow

We're finally ready to run the flow. Note: If you run the flow multiple times it will fail after the first run because, once the access rule is set on the collection/path, setting it again will fail. Run the code in the "Remove Access Rule" cell below to clear things up on the destination endpoint and run the flow again.

In [None]:
# Run the flow
run_label = f"Tutorial run for {primary_identity['preferred_username']}"
flow_action = flows_client.run_flow(
  flow_id=flow_id,
  flow_scope=flow_scope,
  flow_input=flow_input,
  label=run_label,
  tags=['tutorial', 'my-first-flow', 'globusworld2022']
)

# Get flow execution parameters
flow_action_id = flow_action['action_id']
flow_status = flow_action['status']
print(f"Flow can be monitored in the webapp below: \nhttps://app.globus.org/runs/{flow_action_id}")
print(f"Flow action started with ID: {flow_action_id} - Status: {flow_status}")

# Poll the Flow service to check on the status of the flow
while flow_status == 'ACTIVE':
    time.sleep(5)
    flow_action = flows_client.flow_action_status(flow_id, flow_scope, flow_action_id)
    flow_status = flow_action['status']
    print(f'Flow status: {flow_status}')
    
# Flow completed (hopefully successfully!)
print(json.dumps(flow_action.data, indent=2))

### View the files on the guest collection
Our files were moved to the guest collection and read access was granted to the Tutorial Users group. Memebers of the group can now access the files via the web app, CLI or the Globus APIs. Here we generate a link that opens the Globus web app file manager to view the collection.

In [None]:
from urllib.parse import urlencode, urlunsplit
query_params = {
    "origin_id": destination_endpoint,
    "origin_path": destination_path
}
url = urlunsplit(("https", "app.globus.org", "file-manager", urlencode(query_params), ''))
print(f"View your files in the Globus web app here:\n{url}\n\n")


### Remove Access Rule

You can remove the access permission directly, using the Globus SDK. And we may as well clean up the directory while we're at it.

In [None]:
# Get the ID of the access rule from the flow action's output
access_rule_id = flow_action['details']['output']['SetPermission']['details']['access_id']

transfer_authorizer = globus_sdk.AccessTokenAuthorizer(tokens['transfer.api.globus.org']['access_token'])
tc = globus_sdk.TransferClient(authorizer=transfer_authorizer)

# Remove the access rule
response = tc.delete_endpoint_acl_rule(destination_endpoint, access_rule_id)
print(response)

# Delete the directory on the guest collection
# DeleteData() automatically gets a submission_id for once-and-only-once submission
label = "Automation tutorial cleanup"
ddata = globus_sdk.DeleteData(tc, destination_endpoint, label=label, recursive=True)

## Recursively delete the destination path contents (given recursive flag set above)
ddata.add_item(destination_path)
tc.endpoint_autoactivate(destination_endpoint)
submit_result = tc.submit_delete(ddata) 
print(f"DELETE Task ID: {submit_result['task_id']}")

### Review Permissions

Ensure the permissions no long exist at the endpoint below

In [None]:
print(f"Sharing Permissions Endpoint: \nhttps://app.globus.org/file-manager/collections/{destination_endpoint}/sharing")