# Globus Compute Tutorial

Globus Compute is a Function-as-a-Service (FaaS) platform that enables fire-and-forget execution of Python functions on one or more remote Globus Compute endpoints.

This tutorial is configured to use a tutorial endpoint hosted by the Globus Compute team.  You can setup your own endpoint on resources to which you have access by following the [Globus Compute documentation](https://globus-compute.readthedocs.io/en/latest/endpoints/endpoints.html).  Globus Compute endpoints can be deployed on many cloud platforms, clusters with batch schedulers (e.g., Slurm, PBS), Kubernetes, or on a local PC.  After configuring an endpoint you can use it in this tutorial by simply setting the `endpoint_id` below.

## Globus Compute Python SDK

The Globus Compute Python SDK provides programming abstractions for interacting with the Globus Compute service. Before running this tutorial you should first install the Globus Compute SDK as follows:

```console
$ pip install globus-compute-sdk
```

The Globus Compute SDK exposes a `Client` and `Executor` for interacting with the Globus Compute service. In order to use Globus Compute, you must first authenticate using one of hundreds of supported identity provides (e.g., your institution, ORCID, or Google).  As part of the authentication process you must grant permission for Globus Compute to access your identity information (to retrieve your email address).

In [None]:
import os
import pickle
import base64
from globus_sdk import AccessTokenAuthorizer
from globus_sdk.scopes import AuthScopes

from globus_compute_sdk import Client, Executor
from globus_compute_sdk.serialize import CombinedCode
from globus_compute_sdk.sdk.login_manager import AuthorizerLoginManager
from globus_compute_sdk.sdk.login_manager.manager import ComputeScopeBuilder

# For use within this Jupyter Notebook, these SDK imports are more complex
# than usual.  In other contexts (e.g., the command line, a stand alone
# script), it is often enough to import only the Executor:
#
# from globus_compute_sdk import Executor

> **Note**: Here we use the public Globus Compute tutorial endpoint. You can use this endpoint to run the tutorial (the endpoint is shared with all Globus Compute users). You can also change the `endpoint_id` to the UUID of any endpoint for which you have permission to execute functions.

In [None]:
tutorial_endpoint = '4b116d3c-1703-4f8f-9f6f-39921e5864df' # Public tutorial endpoint

Create an Executor to submit tasks.

This will attempt to use tokens from the Jupyter Hub enviornment. An authentication flow will be used to verify your identity if tokens are not available.

In [None]:
# Collect Globus Auth token data from the JupyterHub environment, if available
globus_data_raw = os.getenv('GLOBUS_DATA')
login_manager = None
if globus_data_raw:
    # Environment has tokens we need; avoid a login flow by setting up the
    # requisite authorizers and data structures
    tokens = pickle.loads(base64.b64decode(globus_data_raw))['tokens']
    
    ComputeScopes = ComputeScopeBuilder()
    
    # Create Authorizers from the Compute and Auth tokens
    compute_auth = AccessTokenAuthorizer(tokens[ComputeScopes.resource_server]['access_token'])
    openid_auth = AccessTokenAuthorizer(tokens['auth.globus.org']['access_token'])
    
    # Create a Compute Client from these authorizers
    login_manager = AuthorizerLoginManager(
        authorizers={
            ComputeScopes.resource_server: compute_auth,
            AuthScopes.resource_server: openid_auth,
        }
    )
    login_manager.ensure_logged_in()

# Only necessary because reusing the provided Jupyter environment; other use-cases
# can simply use the Executor directly, with no need to customize the Client:
#     gce = Executor(endpoint_id=your_endpoint_id)
gc = Client(login_manager=login_manager, code_serialization_strategy=CombinedCode())
gce = Executor(endpoint_id=tutorial_endpoint, client=gc)
print('Executor : ', gce)

# Globus Compute 101

The following example demonstrates how you can execute a function with the `Executor` interface.


### Submitting a function

To execute a function, you simply call `submit` and pass a reference to the function. 
Optionally, you may also specify any input arguments to the function. 

In [None]:
# Define the function for remote execution
def hello_name(name: str):
    return f'Hello, {name}!'

future = gce.submit(hello_name, 'wonderful person')

print('Submit returned immediately (no result yet): ', future)

### Getting results

When you `submit()` a function for execution (called a `task`), the executor will return an instance of `ComputeFuture` in lieu of the result from the function.  Futures are a common way to reference asynchronous tasks, enabling you to interrogate the future to find the status, results, exceptions, etc. without blocking to wait for results.

`ComputeFuture`s returned from the `Executor` can be used in the following ways:
* `future.done()` is a non-blocking call that returns a boolean that indicates whether the task is finished.
* `future.result()` is a blocking call that returns the result from the task execution or raises an exception if task execution failed. 

In [None]:
# Returns a boolean that indicates task completion
future.done()

> **Note**: It may take a few seconds to execute the first task submitted to the endpoint as it provisions resources.

In [None]:
# Waits for the function to complete and returns the task result or exception on failure
future.result()

### Catching exceptions

When a task fails and you try to get its result, the `future` will raise an exception. In the following example, the `ZeroDivisionError` exception is raised when `future.result()` is called:

In [None]:
def division_by_zero():
    return 42 / 0 # This will raise a ZeroDivisionError

future = gce.submit(division_by_zero)

try:
    future.result()
except Exception as exc:
    print('Globus Compute returned an exception: ', exc)

## Functions with arguments

Globus Compute supports registration and execution of functions with arbitrary arguments and returned parameters. Globus Compute will serialize any `*args` and `**kwargs` when executing a function and it will serialize any return parameters or exceptions.

Note: Globus Compute uses standard Python serialization libraries (i.e., [`dill`](https://dill.readthedocs.io/en/latest/index.html)).  It also limits the size of input arguments and returned parameters to 10 MB.  For larger input or output data we suggest using [Globus Connect](https://www.globus.org/globus-connect-personal).

The following example shows a function that computes the sum of a list of input arguments:

In [None]:
def get_sum(a, b):
    return a + b

future = gce.submit(get_sum, 40, 2)
print(f'40 + 2 = {future.result()}')

## Functions with dependencies

In order to execute a function on a remote endpoint, Globus Compute requires that functions explictly state all dependencies within the function body.  It also requires that any dependencies (e.g., libraries, modules) are available on the endpoint on which the function will execute.  For example, in the following function, we explicitly import `date` from the `datetime` module:

In [None]:
def get_date():
    from datetime import date
    return date.today()

future = gce.submit(get_date)

print('Date fetched from endpoint: ', future.result())

The result of a function is not simply a string or a number.  The endpoint ("remote-side") serializes whatever the function returns, and the SDK ("local-side") performs the opposite operation.  Strings, integers, and floats transfer from remote to local, and so do any objects that can be serialized.  Note, for example, the type of the last result:

In [None]:
# Note the data type of the result; it's not a raw string or number:
print('Result data type:', type(future.result()))
print('The week number of the year:', future.result().isocalendar().week)

## Calling external applications

While Globus Compute is designed to execute Python functions, you can easily invoke external applications that are accessible on the remote endpoint.  For example, the following function runs anything given as an argument in a shell (e.g., `sh`).

> **Note**: This is only an example to show running a remote shell script.  We recommend using [ShellFunction](https://globus-compute.readthedocs.io/en/latest/sdk/executor_user_guide.html#shell-functions) for most applications.

In [None]:
from subprocess import CompletedProcess

def runsh(cmd):
    # This function is run on the *remote* host
    from subprocess import run, PIPE, STDOUT
    return run(cmd, shell=True, stdout=PIPE, stderr=STDOUT)

def run_and_print(cmd):
    # A formatting function; runs locally, but it sends the `cmd` upstream
    # to runsh via gce.submit()
    future = gce.submit(runsh, cmd)
    proc_result = future.result()

    assert isinstance(proc_result, CompletedProcess), '*complex, remote* objects serialized; recreated locally by SDK'

    print('\n' + '=' * 80)
    print(f'Return code: {proc_result.returncode}')
    print(f'\n$ {proc_result.args}\n{proc_result.stdout.decode(errors="backslashreplace")}')

In [None]:
# First, a simple command to "prove the point"
cmd = '''/bin/echo "The \\$USER environment variable is '$USER'"'''
run_and_print(cmd)

# Now for a longer, compound statement script
cmd = '''
/bin/echo -e "\\n---------- Environment variables ----------"
set
/bin/echo -e "\\n---------- Host kernel info ---------------"
uname -a
/bin/echo -e "\\n---------- Host memory --------------------"
free -m
'''.strip().replace('\n', '; ')
run_and_print(cmd)

# What time does the remote-side think it is?
cmd = '/bin/date'
run_and_print(cmd)

## Running functions many times

One of the strengths of Globus Compute is the ease by which you can run functions many times, perhaps with different input arguments.  The following example shows how you can use the Monte Carlo method to estimate pi.
Specifically, if a circle with radius $r$ is inscribed inside a square with side length $2r$, the area of the circle is $\pi r^2$ and the area of the square is $(2r)^2$. Thus, if $N$ uniformly-distributed points are dropped at random locations within the square, approximately $N\pi/4$ will be inside the circle and therefore we can estimate the value of $\pi$.


In [None]:
import statistics

# function that estimates pi by placing points in a box
def pi(num_points):
    from random import random
    inside = 0   
    
    for i in range(num_points):
        x, y = random(), random()  # Drop a point randomly within the box.
        if x**2 + y**2 < 1:        # Count points within the circle.
            inside += 1  
    return (inside*4 / num_points)


# execute the function N times 
N = 100
estimates = [
    gce.submit(pi, 10**5)
    for _ in range(N)
]

# get the results and calculate the total
results = [future.result() for future in estimates]

# print the results
print('Estimates: [{:.3f}, {:.3f}, {:.3f}, ..., {:.3f}]'.format(*results[:3], results[-1]))
print(f'   Mean: {sum(results)/len(results):.5f}')
print(f'Std Dev: {statistics.stdev(results):.5f}')

# Endpoint operations

You can retrieve information about endpoints including status and information about how the endpoint is configured.

In [None]:
gc.get_endpoint_status(tutorial_endpoint)