<img src="img/automation_using_flows_header.png">

# Cross-Facility Tomography Reconstructions
<img src="img/lightsources.png">

This notebook shows how [Tomopy](https://tomopy.readthedocs.io/en/latest/) reconstructions can be performed at different facilities using [Globus Flows](https://www.globus.org/globus-flows-service). 

# Using the Flows GUI

The Globus Flows web app can generate a `start` page to easily invoke flows. To create this page you need to attach an input schema to the flow. This is shown at the end of the notebook.

<img src="img/input1.png">

### Monitoring the flow
Once the flow has been started you can monitor the run. The events tab shows each step of the flow and provides details regarding the input and output of each action.

Here you can see the input to the Tomopy step of the flow, showing the input values that are passed into the function to execute.

<img src="img/running.png">

# A Tomopy Flow

We have defined a Globus Flow to run Tomopy. The flow definition can be [viewed in the Flows IDE here.](https://globus.github.io/flows-ide?d=N4Igwg9gtlCmB2AXEAuEAVATgQ3gZwDNZMACXAExMwFd4T1oIAHATxABoQBlRbTRAILI0AMWwBjAJYAbSYhZgAFhEnjYHbr0Sw8qUGKmz5SlWr0YWTdWhOr1nW2t0oA2qABqfSdgBG06yAAJAB0kvBM1IjB4tAR2gD6eHL2mphhAOYAogCO1NjSziACADJgIhoAcrAAHsJFpSKQ8ASS6SAAvgC6nAAisATY1NJ1FZkASlxgTS1t7ZwlZdOt5uiWAQAK2Hi6nJs4cNqYzqBJ2sGBqEGh4ZHRsZGw8THNrXjBC+VzIGM6Q4ibiEUlxCYTidygcRSVVqlywuEIxAAkjdkF9RhMphAXm0UKBVlZLpttho9tgDsRjiBTrBzsDrmCYhCHk8sTM3ujJh1OD88H8AUC0CCUeDIZUanU4fgiJhkXEuRgcFKkSjzJAYAgJYqEaRQZESC1-Bp8QEBOJEJIsRpTeasQBVTDSS6KRCIJh4FAAeg9Eht+GCg0Q0GwZ3S0ggPmobwgmHSHsQWulcYTxBJfDJsEOlLwEGomDU8QQ5CYKiQ8Uk5Fpgvpt2zubUoXIGnIOnN8GDFvgBfgRZLiDLFYuVd1UUZkOC1OC8fh0q7PbCyE4U6VmDL2igzjcVJzeceTGDimCAF5LgADIVg2s74J7wEnkgAahIZ+rUWH8QNCHTJ6bLbC7ax8Q3gex5oM+w4ig847JOOvDpBkgH7nej5gcKb4fm2cDfpwmCwOIuZJAAbtYAwFLAXRfDyfL7nSkrarKkQaAA6tgcjoJIcCoAAzAADNxnDQhKjCsPKDBQMwbC4uA0BwEghLEAQ0ZQPQQksGQbbSCwSQ7BYBJoNaHZWmaHb2o6aDOq67peqOkE+h2byhuGkbBNGsYENU2BMKm+wZhS5iFsW86Vlc4HWWcE6hY8-m9hoBC0EZWJBeetwRe+cW+v2GgANYAO58OkWbYER74QNIzaYIlL4QWF0E5q6kQIYCGgxNI-jxZ2QEVSF9zVWczWtelQExTIn5wEep5JSO3U0uFJX9R2DWKEhT4TVV03QXgsHwUBS0oWCaHDRhsBYSAjJQBQnXCilsXwG18Q5XlbynRQGg4c8G00Ld8hWBdDJTalN3pfdMZvK9WLvdQn1rB0FG-MM-I0SpADykRypwzGsex1gAGy8fx4qwsmmCUcMuhfLR0rE4glJqjJmrTsQVCw1TRpQ3pbWGb6JlOi6bqet6bVvAGQYhmGEZRjGSb05gkvLl56aZuYl75lF879j9yVTVBZxLtqs4BbJnDNhtf7pSrpblurURKzS5YaDrM7JOuqCbtbC0VaJ4nI3VUTNrwMgg0zLjcZ0LgAIydD+xttgN+5jaBK04bywwLTtIQe6wXtgr7LEFMEid-EHIfB8dr34ZIRGoCReBkZ0MNJ-81FVuTxAiMN2no4gbEcSgPF8SAmTdqg8bUGR7TtEAA)

This flow starts by selecting the facility to perform the tasks. The decision is based on an input to the flow (either `ALCF` or `NERSC`). The appropriate state is then injected into the subsequent steps to direct where data are transferred and specify the Compute endpoint to use.

<img src="img/flow_definition.png" style="width: 300px;"/>

Here we create a Globus FlowsClient to securely interact with the Flows service. This will prompt a login and paste a token into the notebook.

In [None]:
import os
import json
import time
import globus_sdk

from globus_sdk.experimental.globus_app import UserApp
from globus_sdk.scopes import GCSCollectionScopeBuilder

# Tutorial client ID. We recommend replacing this with your own client for any production use-cases
# Create your own at developers.globus.org
CLIENT_ID = "7ca21f4a-11de-4d97-8f84-cb66f7459981"

In [None]:
my_app = UserApp("tomography-user-app", client_id=CLIENT_ID)

flows_client = globus_sdk.FlowsClient(app=my_app)

Specify the ID of the previously registered flow.

In [None]:
flow_id = '081f12ba-cda1-48ce-89b5-ae3dd68a5689'

## Creating a Tomopy Function

To run Tomopy in the flow we need to define a Tomopy function and register it with Globus Compute.

**Note, this uses my personal compute endpoint, not a service account endpoint. Access to this endpoint is restricted.**

In [None]:
import globus_compute_sdk

compute_ep = '70f76338-f3a2-4992-9c44-f38af073bb50'#'0f44e1f7-baa5-4dfd-a2f6-73051004cfaa'
gce = globus_compute_sdk.Executor(endpoint_id=compute_ep)

Define a function to call tomocupy.

This is a simple wrapper function that uses subprocess to invoke a templated bash command to run the tomopy cli.

In [None]:
def tomopy_wrapper(filename, command="recon", reconstruction_type="slice", 
                   save_folder="_rec", collection_path="/eagle",
                  ):
    import subprocess
    import glob
    import os
    
    cmd = f"tomopy {command} --file-name {filename} --reconstruction-type {reconstruction_type} --save-folder={save_folder}"
    res = subprocess.run(cmd.split(" "), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    
    # Get the most recent reconstruction file
    list_of_files = glob.glob(f'{save_folder}/slice_rec/*')
    recon_file = max(list_of_files, key=os.path.getctime)

    recon_filename =  os.path.basename(recon_file)
    
    # The path from a Globus endpoint, this is the output path without the collection prefix
    transfer_path = recon_file.replace(collection_path, "")
    
    return recon_filename, transfer_path,  res.returncode, res.stdout.decode("utf-8"), res.stderr.decode("utf-8")

Register the function with Compute. This returns a function ID that can be passed to the flow.

In [None]:
tomo_function_id = gce.client.register_function(tomopy_wrapper)

### Test the function

Test the function by running directly with Globus Compute. 

In [None]:
filename = "/global/homes/r/rchard/IRI/test_data_1.h5"

In [None]:
future = gce.submit(tomopy_wrapper, filename, save_folder="/global/homes/r/rchard/IRI/outputs")
future.result()[1]

# Define Facility information

Switching between facilities relies on inputting:
- Globus Transfer endpoints for source and destination
- The Globus Compute endpoint deployed at the target facility
- Path information relating to the facility, such as project space and the mapping to the Globus Collection

Here we define configs to capture this information.

In [None]:
# Configs for Compute resources
compute_configs = {
    "ALCF": {
        "compute_endpoint": "0f44e1f7-baa5-4dfd-a2f6-73051004cfaa",
        "transfer_endpoint": "05d2c76a-e867-4f67-aa57-76edeb0beda0", # dtn#eagle
        "collection_path": "/eagle",
        "staging_path": "/APSDataAnalysis/rchard/IRI/",
        "output_path": "/eagle/APSDataAnalysis/rchard/IRI/outputs/",
    },
    "NERSC": {
        "compute_endpoint": "70f76338-f3a2-4992-9c44-f38af073bb50",
        "transfer_endpoint": "6bdc7956-fc0f-4ad2-989c-7aa5ee643a79",
        "collection_path": "",
        "staging_path": "/global/homes/r/rchard/IRI/",
        "output_path": "/global/homes/r/rchard/IRI/outputs/",
    },
}

## Running the flow

Define the flow input. This describes the file to act on, which site to use, site configs, and the reconstruction parameters. The flow will use the `compute_site` to inject the associated site configuration (e.g., endpoints) into the transfer and compute steps.

In [None]:
input_filename = "test_data_1.h5"
compute_site = "NERSC"

flow_input = {
    "input": {
        "input_filename": input_filename,
        "compute_site": compute_site,
        "compute_configs": compute_configs,
        "source": {
            "id": "a17d7fac-ce06-4ede-8318-ad8dc98edd69",
            "path": "/Tomography/inputs/"
        },
        "result_path": "/Tomography/outputs/",
        "compute_function_id": tomo_function_id,
        "compute_function_kwargs": {
            "command": "recon",
            "reconstruction_type": "slice",
        }
    }
}

Create a SpecificFlowClient to interact with the flow. This is used to start and monitor the run.

In [None]:
specific_flow_client = globus_sdk.SpecificFlowClient(
    flow_id=flow_id,
    app=my_app,
)

Add scopes to the flow to use Mapped Collections if Guest Collections are not available. Without this step, the flow will pause before performing the transfer for you to review consents.

In [None]:
# Get the data access scope Mapped collections
dest_transfer_endpoint = flow_input['input']['compute_configs'][compute_site]['transfer_endpoint']
dest_access_scope = GCSCollectionScopeBuilder(dest_transfer_endpoint).make_mutable("data_access", optional=True)

transfer_scope = globus_sdk.scopes.TransferScopes.make_mutable("all")
transfer_scope.add_dependency(dest_access_scope)

# Add the data access scopes as dependencies to the flow
flow_scope = specific_flow_client.scopes.make_mutable("user")
flow_scope.add_dependency(transfer_scope)

my_app.add_scope_requirements({'flow': [flow_scope]})

Start the flow. Include a label and tags to better manage and filter runs.

In [None]:
run = specific_flow_client.run_flow(
  body=flow_input,
  label="Tomopy run",
  tags=['Tomopy', 'IRI', 'example', compute_site]
)

In [None]:
run_id = run['run_id']
run_status = run['status']
print("This flow can be monitored in the Web App:")
print(f"https://app.globus.org/runs/{run_id}")
print(f"Flow run started with ID: {run_id} - Status: {run_status}")

# Poll the Flows service to check on the status of the run
while run_status == 'ACTIVE':
    time.sleep(5)
    run = flows_client.get_run(run_id)
    run_status = run['status']
    print(f'Run status: {run_status}')

In [None]:
run['details']['output']['TomopyOutput']['details']['result'][0][0]

## Attaching an input schema

We can use a JSON input schema to both generate the Web interface and provide additional handrails when starting the flow. Here we define the schema and update the flow to include it.

Example schema's can be found here: https://github.com/globus/globus-flows-trigger-examples

In [None]:
schema = {
    "required": [
        "input"
    ],
    "properties": {
        "input": {
            "type": "object",
            "required": [
                "source",
                "input_filename",
                "result_path",
                "compute_site",
                "compute_configs",
                "compute_function_id",
                "compute_function_kwargs"
            ],
            "properties": {
                "source": {
                    "type": "object",
                    "title": "Select source collection and path",
                    "description": "The source collection and path (path MUST end with a slash)",
                    "format": "globus-collection",
                    "required": [
                        "id",
                        "path"
                    ],
                    "properties": {
                        "id": {
                            "type": "string",
                            "format": "uuid"
                        },
                        "path": {
                            "type": "string"
                        }
                    },
                    "additionalProperties": False
                },
                "input_filename": {
                    "type" : "string",
                    "description": "The file to process",
                    "default": "test_data_1.h5",
                },
                "result_path": {
                    "type": "string",
                    "description": "Path to return results at source endpoint",
                    "default": "/Tomography/outputs/"
                },
                "compute_site": {
                    "type" : "string",
                    "description": "Facility: ALCF, NERSC",
                    "default": "ALCF",
                    "enum" : [
                        "ALCF",
                        "NERSC"
                    ]
                },
                "compute_configs": {
                    "type": "object",
                    "description": "Configs for various Compute sites",
                    "default": compute_configs
                },
                "compute_function_id": {
                    "type": "string",
                    "format": "uuid",
                    "description": "Tomopy function UUID",
                    "default": tomo_function_id
                },
                "compute_function_kwargs": {
                    "type": "object",
                    "title": "Function Inputs",
                    "description": "Inputs to pass to Tomopy",
                    "required": [
                        "command",
                        "reconstruction_type",
                    ],
                    "properties": {
                        "command": {
                            "type" : "string",
                            "description": "Reconstruction command: recon, recon_steps",
                            "default": "recon",
                            "enum" : [
                                "recon",
                                "recon_steps"
                            ]
                        },
                        "reconstruction_type": {
                            "type": "string",
                            "description": "Reconstruction type: slice, full",
                            "default": "slice",
                            "enum": [
                                "slice", 
                                "full"
                            ]
                        },
                    },
                    "additionalProperties": False
                }
            },
            "additionalProperties": False
        },    
    },
    "additionalProperties": False
}

In [None]:
flows_client.update_flow(flow_id, input_schema=schema)

## Try it!

In [None]:
print(f'https://app.globus.org/flows/{flow_id}')