# Gladier Tutorial
### Gladier: The Globus Architecture for Data-Intensive Experimental Research.

Gladier is a programmable data capture, storage, and analysis architecture for experimental facilities. The architecture leverages a data and computing substrate based on agents deployed across computer and storage systems at APS, ALCF, and elsewhere, all managed by cloud-hosted Globus services. In particular, we leverage [Globus Connect](https://www.globus.org/globus-connect)
and [funcX](https://funcx.org) agents to facilitate secure, reliable remote data and computation and employ the [Globus Flows](https://www.globus.org/platform/services/flows) platform to orchestrate distributed data management tasks into reliable pipelines.

## Gladier Toolkit
The Gladier toolkit provides tools and capabilities to simplify and accelerate the development of these automations. The toolkit manages the dynamic creation of flows, automatically registers funcX functions, and assists in validating inputs. 

Here we demonstrate how the Gladier toolkit can be used to let anyone create a simple, yet powerful client to automate data management tasks.

While not necessary to use this notebook, the Gladier toolkit is available on pypi and can be installed with:

    $ pip install gladier

Documentation is available [here.](https://gladier.readthedocs.io/en/latest/index.html)


In [None]:
# General Imports
from pprint import pprint
import json
import os

# Gladier Imports
from gladier import GladierBaseClient, GladierBaseTool, generate_flow_definition

# Globus automate variable for Binder
os.environ['SSH_TTY'] = 'BINDER_REMOTE'

## Gladier Tools

Gladier Tools are the glue that holds together Globus Flows and funcX functions. Tools bundle everything the funcX function needs to run, so the Gladier Client can register the function, check the requirements, and run it inside the flow.

Here we create a Gladier tool called `FileSize`. To do this we first define a function called `file_size`. The function is then specified as the funcx_function of the FileSize tool. The FileSize tool extends the GladierBaseTool class, providing capabilities to dynamically register the function as it changes and validate inputs when it is used within a flow.

In [None]:
def file_size(**data):
    """Return the size of a file"""
    import os
    return os.path.getsize(data['pathname'])

@generate_flow_definition
class FileSize(GladierBaseTool):
    funcx_functions = [file_size]

## Gladier Clients

Gladier Clients manage a collection of Gladier Tools and a Globus Flow to link them together into a pipeline. Clients handle both registering funcX functions for each tool and registering the flow to orchestrate each tool's execution. The checksum of the flows and funcX functions are checked prior to each invocation to ensure they are always up-to-date. Further, the client checks the necessary inputs to each tool are present before the flow is invoked.

Once a tool has been created it can be imported and used by a client. The client can then dynamically create a flow using the list of tools.

Here we define an `ExampleClient` and specify the `FileSize` tool. 

In [None]:
@generate_flow_definition
class ExampleClient(GladierBaseClient):
    gladier_tools = [
        FileSize,
    ]

exampleClient = ExampleClient()

The `@generate_flow_definition` annotation prompts the client to dynamically create a Flow to serially combine each tool used by the client. The resulting flow definition is then saved and can be inspected.

More information on flow generation can be found [here.](https://gladier.readthedocs.io/en/latest/flow_generation.html)

## Flow Input

As you can see from the flow definition the input arguments for the tool have been dynamically defined. In this case, the `FileSize` tool requires a `funcx_endpoint_compute`, `file_size_funcx_id` and the entire `input` document is passed as the function payload. These values can be overridden in the flow or defined in the Tool definition.

It is important to note that the funcX function id, `file_size_funcx_id` is automatically populated by the Client at runtime. This allows the client to check whether the function definition has changed and re-register the function with funcX if necessary. As such, you do not need to specify the function id as input to the flow.

Here we define the input to include a pathname for the tool to act on and a public funcX endpoint to perform the execution.

In [None]:
# Remote endpoints
fx_endpoint = '4b116d3c-1703-4f8f-9f6f-39921e5864df'

# File to check size at the remote machine
filepath = '/etc/hostname'

# Input with automate format. 
flow_input = {
    "input": {
        "pathname": filepath,
        "funcx_endpoint_compute": fx_endpoint,
    }
}

print(json.dumps(flow_input, indent = 2))

## Running The Flow

Now input has been created we can use the client to start and monitor the flow.

This will prompt you to authenticate and grant permission to the flow to perform a funcX invocation on your behalf.

In [None]:
run_label = 'GladierTest_v1'

example_flow = exampleClient.run_flow(flow_input=flow_input,label=run_label)
print("  File : " + flow_input["input"]["pathname"])
print("  UUID : " + example_flow['action_id'])

## Accessing Flow Logs

In [None]:
print('https://app.globus.org/flows/%s/runs/%s' % (example_flow['flow_id'],example_flow['action_id']))

## Monitoring Flow Progress

In [None]:
exampleClient.progress(example_flow['action_id'])

## Getting Flow Results 

We can access the results of each step of the run.

In [None]:
exampleClient.get_details(example_flow['action_id'], 'FileSize')