<img src="img/automation_using_flows_header.png">


# Tomocupy Reconstruction

This notebook shows how [Globus Flows](https://www.globus.org/globus-flows-service) can be used to perform tomogragphy reconstructions using [Tomocupy](https://tomocupy.readthedocs.io/en/latest/).

Globus Flows is a reliable and secure platform for orchestrating and performing research data management and analysis tasks. A flow is often needed to manage data coming from instruments, e.g., image files can be moved from local storage attached to a microscope to a high-performance storage system where they may be accessed by all members of the research project.

In this notebook we show how the Flows web app can be used to launch tomocupy tasks at ALCF. We then walk through the process of creating a simple flow and describe potential extensions to make it applicable in real use cases.

More examples of creating and running flows can be found on our [demo instance](https://jupyter.demo.globus.org/hub/).


We have created an example flow to run Tomocupy on-demand at Polaris. This flow is protected by a Globus Group, requiring membership before you can run it. You can request to join the group here: https://app.globus.org/groups/7a86a971-062b-11ee-ac1b-51c1fdd25192/about

**The flow is available here: https://app.globus.org/flows/337bc825-a81e-477e-aa6d-1d0f75e4928d**

This example simply includes two steps:
1. Transfer input data with Globus Transfer
2. Run Tomocupy via Globus Compute

This can easily be extended to include further steps to return data, perform postprocessing, or publish and catalog results.


### Auth
The flow manages authentication by seamlessly passing tokens between the services. When the flow is started it first acquires tokens for each service used within the flow to perform actions on the user's behalf. Provided the user has access to the Globus Transfer and Compute endpoints the flow will be able to move data and perform analysis as the user.

While this example notebook employs user tokens to run, production deployments can be automated using an ALCF Service account and Globus Client credential. A service account provides access to ALCF for a specific resource (e.g., a beamline). Following this model, a Globus Compute endpoint can be configured on behalf of the service account and the flow can be granted access. This allows flows to be launched without a human in the loop, making it ideal for automated analysis and publication pipelines.

## Using the flow via GUI

The flow has been created with a JSON Schema input schema to define the required and optional parameters. This allows the Flows web app to automatically generate an input page to start the flow via a GUI. 

Here we walk through starting and monitoring the Tomocupy flow via the web app.

### Step 1: Provide input.
The flow requires source and destination information to transfer an input to the machine for analysis. The input page recognizes the input type as an endpoint and provides interactive search and browse capabilities to select input files.
<img src="img/input1.png">

### Step 2: Specify reconstruction parameters
Here we specify various inputs to pass to the Tomocupy step of the flow. These include the command, an enum of `recon` or `recon_steps`, reconstruction type, rotation, and nsino per chunk.

The input schema restricts the values of these fields to specific values and types, only allowing the flow to be started when appropriate values are specified.

Filename is used to reference the location of the file at ALCF. We note that this can be automatically determined when the ALCF project path is known ahead of time, e.g., when using a service account and allocation.


<img src="img/input2.png">

### Step 3: Start and monitor the flow
Once started the flow can be monitored through the web app. The events tab shows each step of the flow and provides details regarding the input and output of each action.

Here you can see the input to the Tomocupy step of the flow, showing the input values that are passed into the function to execute.

<img src="img/running.png">

### Step 4: Review the result
The result of the flow can be retrieved from the final step. Here we see a raw stdout dump from the execution of Tomocupy. The result could be processed and better formatted to use as input in subseqent flow steps.

<img src="img/results.png">

## Creating the flow

Here we explain how the flow is defined. Running these steps will register a flow of your own that you can then run.

To run these steps you will need to install:

`pip install -U globus_sdk`

### Registering the flow.

We first create a Globus FlowsClient to securely interact with the Flows service. This will prompt you to login and paste a token into the notebook.

In [None]:
import os
import time
import globus_sdk

from utils import get_flows_client, get_specific_flow_client

# Tutorial client ID
# We recommend replacing this with your own client for any production use-cases
# Create your own at developers.globus.org
CLIENT_ID = '418ad188-a040-48b3-ab78-d7a64c4507f9'
CLIENT_SECRET = 'blah'

In [None]:
fc = get_flows_client(CLIENT_ID, CLIENT_SECRET)

Specify the flow definition. This JSON definition is derived from the Amazon Step Functions language. States of the flow are chained together by specifying the `Next` field to construct a pipeline of operations. This flow consists of two steps:

1. TransferFiles
2. Tomocupy

The first step, TransferFiles, uses the Globus Transfer action provider. The step is given a 300s walltime and the entire input is required to be passed into the step. Static values can be used here to simplify user input.

The second step, Tomocupy, uses the Globus Compute action provider. Input is dymanically passed in as `kwargs`, which are then passed to the function to be executed. The step is given a 600s walltime and is the conclusion of the flow.

Specify the flow definition. This JSON definition is derived from the Amazon Step Functions language. States of the flow are chained together by specifying the `Next` field to construct a pipeline of operations. This flow consists of two steps:

1. TransferFiles
2. Tomocupy

The first step, TransferFiles, uses the Globus Transfer action provider. The step is given a 300s walltime and the entire input is required to be passed into the step. Static values can be used here to simplify user input.

The second step, Tomocupy, uses the Globus Compute action provider. Input is dymanically passed in as `kwargs`, which are then passed to the function to be executed. The step is given a 600s walltime and is the conclusion of the flow.

In [None]:
flow_definition = {
    "Comment": "Transfer and run Tomocupy",
    "StartAt": "TransferFiles",
    "States": {
        "TransferFiles": {
            "Comment": "Transfer files",
            "Type": "Action",
            "ActionUrl": "https://actions.automate.globus.org/transfer/transfer",
            "Parameters": {
                "source_endpoint_id.$": "$.input.source.id",
                "destination_endpoint_id.$": "$.input.destination.id",
                "transfer_items": [
                    {
                        "source_path.$": "$.input.source.path",
                        "destination_path.$": "$.input.destination.path",
                        "recursive.$": "$.input.recursive_tx"
                    }
                ]
            },
            "ResultPath": "$.TransferFiles",
            "WaitTime": 300,
            "Next": "Tomocupy"
        },
        "Tomocupy": {
            "Comment": "Tomocupy",
            "Type": "Action",
            "ActionUrl": "https://compute.actions.globus.org/fxap",
            "Parameters": {
                "endpoint.$": "$.input.compute_endpoint_id",
                "function.$": "$.input.compute_function_id",
                "kwargs.$": "$.input.compute_function_kwargs"
            },
            "ResultPath": "$.TomocupyOutput",
            "WaitTime": 600,
            "End": True
        }
    }
}

Register the flow. We leave the input schema blank and will later update it to support the web interface.

In [None]:
flow = fc.create_flow(definition=flow_definition, title="Tomocupy flow", input_schema={})
flow_id = flow['id']
print(flow)
flow_scope = flow['globus_auth_scope']
print(f'Newly created flow with id:\n{flow_id}\nand scope:\n{flow_scope}')

## Defining a Tomocupy function

To run tomocupy via the flow we need to register a function with Globus Compute. Here we define and register this function.

In [None]:
import globus_compute_sdk
from globus_compute_sdk.sdk.login_manager import AuthorizerLoginManager
from globus_compute_sdk.sdk.login_manager.manager import ComputeScopeBuilder
from globus_sdk.scopes import AuthScopes

In [None]:
if CLIENT_SECRET:
    # If using client credentials, you can either export GLOBUS_COMPUTE_CLIENT_ID and GLOBUS_COMPUTE_CLIENT_SECRET environment variables
    # or create a LoginManager using the creds.
    client = globus_sdk.ConfidentialAppAuthClient(
               client_id=CLIENT_ID, client_secret=CLIENT_SECRET
    )
    ComputeScopes = ComputeScopeBuilder()
    # Make the authorizers
    compute_authorizer = globus_sdk.ClientCredentialsAuthorizer(client, ComputeScopes.all)
    auth_authorizer = globus_sdk.ClientCredentialsAuthorizer(client, AuthScopes.openid)
    # Make a login manager from the authorizers
    compute_login_manager = AuthorizerLoginManager(
            authorizers={ComputeScopes.resource_server: compute_authorizer,
                         AuthScopes.resource_server: auth_authorizer})
    # Make a client with the login manager
    gc = globus_compute_sdk.Client(login_manager=compute_login_manager)
else:
    gc = globus_compute_sdk.Client()

Define a function to call tomocupy.

This is a simple wrapper function that uses subprocess to invoke a templated bash command to run tomocupy.

Further postprocessing can be applied within this function. Any JSON returned here will be usable within the flow in subsequent steps.

In [None]:
def tomocupy_wrapper(filename, command="recon", reconstruction_type="full", 
                     rotation_axis=782.5, nsino_per_chunk=4):
    import subprocess
    
    cmd = f"tomocupy {command} --cores 8 --file-name {filename} --reconstruction-type {reconstruction_type} --rotation-axis {rotation_axis} --nsino-per-chunk {nsino_per_chunk} " 
    return cmd
    res = subprocess.run(cmd.split(" "), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    
    return res.returncode, res.stdout.decode("utf-8"), res.stderr.decode("utf-8")

Register the function with Compute. 

In [None]:
tomo_func = gc.register_function(tomocupy_wrapper)

In [None]:
tomo_func

### Testing the function

Test the function by running it via Globus Compute. 

**Note, this uses my personal compute endpoint, not a service account endpoint. Access to this endpoint is restricted and will not work for others.**

In [None]:
polaris_ep = '4b116d3c-1703-4f8f-9f6f-39921e5864df'
gce = globus_compute_sdk.Executor(endpoint_id=polaris_ep, client=gc)

In [None]:
fn = "/home/rchard/src/APS/tomocupy/tests/data/test_data.h5"

In [None]:
future = gce.submit_to_registered_function(args=[fn], function_id=tomo_func)

In [None]:
future.result()

## Running the flow

We can now specify input and start the flow.

In [None]:
transfer_endpoint_uuid = '6c54cade-bde5-45c1-bdea-f4bd71dba2cc'

In [None]:
flow_input = {
    "input": {
      "source": {
        "id": transfer_endpoint_uuid,
        "path": "/home/share/godata/"
      },
      "destination": {
        "id": transfer_endpoint_uuid,
        "path": "/~/"
      },
      "recursive_tx": True,
      "compute_endpoint_id": polaris_ep,
      "compute_function_id": tomo_func,
      "compute_function_kwargs": {
        "command": "recon",
        "reconstruction_type": "full",
        "rotation_axis": "782.5",
        "nsino_per_chunk": 4,
        "filename": "/home/rchard/src/APS/tomocupy/tests/data/test_data.h5"
      }
    }
}

In [None]:
run_client = get_specific_flow_client(flow_id, client_id=CLIENT_ID, client_secret=CLIENT_SECRET, collection_ids=[transfer_endpoint_uuid])

In [None]:
flow_action = run_client.run_flow(flow_input, label="Tomocupy run", tags=["demo", "tomocupy"])
flow_run_id = flow_action['action_id']

print(f'Flow action started with id: {flow_run_id}')

print(f"Monitor your flow here: https://app.globus.org/runs/{flow_run_id}")

flow_status = flow_action['status']
while flow_status == 'ACTIVE':
    time.sleep(10)
    flow_action = fc.get_run(flow_run_id)
    flow_status = flow_action['status']
    print(f'Flow status: {flow_status}')

In [None]:
flow_action['details']['output']['TomocupyOutput']['details']['result']

## Attaching an input schema

We can use a JSON input schema to both generate the Web interface and provide additional handrails when starting the flow. Here we define the schema and update the flow to include it.

Example schema's can be found here: https://github.com/globus/globus-flows-trigger-examples

In [None]:
schema = {
    "required": [
        "input"
    ],
    "properties": {
        "input": {
            "type": "object",
            "required": [
                "source",
                "destination",
                "recursive_tx",
                "compute_endpoint_id",
                "compute_function_id",
                "compute_function_kwargs"
            ],
            "properties": {
                "source": {
                    "type": "object",
                    "title": "Select source collection and path",
                    "description": "The source collection and path (path MUST end with a slash)",
                    "format": "globus-collection",
                    "required": [
                        "id",
                        "path"
                    ],
                    "properties": {
                        "id": {
                            "type": "string",
                            "format": "uuid"
                        },
                        "path": {
                            "type": "string"
                        }
                    },
                    "additionalProperties": False
                },
                "destination": {
                    "type": "object",
                    "title": "Select destination collection and path",
                    "description": "The destination collection and path (path MUST end with a slash); default collection is 'Globus Tutorials on ALCF Eagle'",
                    "format": "globus-collection",
                    "required": [
                        "id",
                        "path"
                    ],
                    "properties": {
                        "id": {
                            "type": "string",
                            "format": "uuid"
                        },
                        "path": {
                            "type": "string"
                        }
                    },
                    "additionalProperties": False
                },
                "recursive_tx": {
                    "type": "boolean",
                    "title": "Recursive transfer",
                    "description": "Whether or not to transfer recursively, must be true when transferring a directory.",
                    "default": True,
                    "additionalProperties": False
                },
                "compute_endpoint_id": {
                    "type": "string",
                    "format": "uuid",                        
                    "title": "Globus Compute Endpoint ID",
                    "default": polaris_ep,
                    "description": "The UUID of the Globus Compute endpoint where Tomocupy will run",
                    "additionalProperties": False
                },
                "compute_function_id": {
                    "type": "string",
                    "format": "uuid",                        
                    "title": "Globus Compute Function ID",
                    "default": tomo_func,
                    "description": "The UUID of the function to invoke; must be registered with the Globus Compute service",
                    "additionalProperties": False
                },
                "compute_function_kwargs": {
                    "type": "object",
                    "title": "Function Inputs",
                    "description": "Inputs to pass to the function",
                    "required": [
                        "filename",
                        "command",
                        "reconstruction_type",
                        "rotation_axis",
                        "nsino_per_chunk"
                    ],
                    "properties": {
                        "filename": {
                            "type": "string",
                        },
                        "command": {
                            "type" : "string",
                            "description": "Reconstruction command: recon, recon_steps",
                            "default": "recon",
                            "enum" : [
                                "recon",
                                "recon_steps"
                            ]
                        },
                        "reconstruction_type": {
                            "type": "string",
                            "description": "Reconstruction type: full, try",
                            "default": "full",
                            "enum": [
                                "full", 
                                "try"
                            ]
                        },
                        "rotation_axis": {
                            "type": "string",
                            "default": "782.5"
                        },
                        "nsino_per_chunk": {
                            "type": "integer",
                            "default": 4
                        },
                    },
                    "additionalProperties": False
                }
            },
            "additionalProperties": False
        },    
    },
    "additionalProperties": False
}

In [None]:
fc.update_flow(flow_id, definition=flow_definition, input_schema=schema)

## Try it!

In [None]:
print(f'https://app.globus.org/flows/{flow_id}')

## Gladier toolkit

We have developed a toolkit to streamline and support the development of complex flows. The Gladier toolkit (https://gladier.readthedocs.io/) provides helper functions to automatically re-register compute functions, cache flows, manage inputs, and generate JSON flow definitions.

The upshot of the Gladier toolkit is that your flow defintion would look something like:

```@generate_flow_definition
class TomocupyFlow(GladierBaseClient):
    gladier_tools = [
        'gladier_tools.globus.Transfer',
        'gladier_tools.posix.ShellCMDTool',
        'gladier_tools.publish.PublishV2',
    ]
```