# Server-side Compute with Globus Compute + Flows
<img src="images/globus-logo.png" width=250 alt="Globus logo" style="display:inline-block">
<img src="images/esgf.png" width=250 alt="ESGF logo" style="float:left"></img>

## The Use Case: Custom Computations
What if we have a computation other than the typical averaging/subsetting/regridding workflows?

An example: The El Niño Southern Oscillation (ENSO) Index:
![ENSO Index](https://www.ncdc.noaa.gov/monitoring-content/teleconnections/nino-regions.gif)

## The Solution: Globus Compute

Thankfully, there is an existing solution to packaging custom computations, through a common API, allowing pre-defined functions to run in proximity to the datasets. From their documentation (https://www.globus.org/compute), their capabilities match our requirements:

✅ …figuring out credentials and different authentication mechanisms

✅ …configuring and managing batch jobs and schedulers

✅ …interacting with resource managers, waiting in queues and scaling nodes

✅ …configuring the execution environment for different compute systems

✅ **…retrieving and sharing computation results**

## So I have a function I would like to share - where do I start?

<img src="images/esgf-compute-diagram.png" 
     width="400" />

### Step 1. Write, register, and test your function
As someone with access to the ESGF data holdings in a data center, you would:
1. Write a function that locally accesses the data using `intake-esgf`
2. Register the function with `globus-compute`
3. Test the function on your local machine, using the unique ID of the function you registered to test.

### Step 2. Share your function with the community.
Now that you have a registered function, you can share that with a user group by:
1. Creating a shared user group
2. Adding that group as collaborators on your function using the web interface at globus.org

## An Example of Calculating ENSO with Globus Compute

### Imports and Pre-Requirements
These imports and associated code would be run **within the data center, which has access to petabytes of earth system model output**.

In [1]:
import hvplot.xarray
import holoviews as hv
import numpy as np
import hvplot.xarray
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from intake_esgf import ESGFCatalog
import xarray as xr
import cf_xarray
import warnings
import os
from globus_compute_sdk import Executor, Client
warnings.filterwarnings("ignore")

hv.extension("bokeh")

### Writing, Registering, and Testing our Function
As mentioned in the introduction, we are utilizing functions from the previous ENSO notebooks. In order to run these with Globus Compute, we need to comply with the following requirements
- All libraries/packages used in the function need to be installed on the globus compute endpoint
- All functions/libraries/packages need to be imported and defined within the function to execute
- The output from the function needs to serializable (ex. xarray.Dataset, numpy.array)

Using these constraints, we setup the following function, with the key parameter being which modeling center (model) to compare. Two examples here include The National Center for Atmospheric Research (NCAR) and the Model for Interdisciplinary Research on Climate (MIROC).

In [None]:
def run_plot_enso(model, return_path=False):
    import numpy as np
    import matplotlib.pyplot as plt
    from intake_esgf import ESGFCatalog
    import xarray as xr
    import cf_xarray
    import warnings
    warnings.filterwarnings("ignore")

    def search_esgf(institution_id, grid='gn'):

        # Search and load the ocean surface temperature (tos)
        cat = ESGFCatalog()
        cat.search(
            activity_id="CMIP",
            experiment_id="historical",
            institution_id=institution_id,
            variable_id=["tos"],
            member_id='r11i1p1f1',
            table_id="Omon",
        )
        try:
            tos_ds = cat.to_datatree()[grid].to_dataset()
        except ValueError:
            tos_ds = cat.to_dataset_dict()[""]

        # Search and load the ocean grid cell area
        cat = ESGFCatalog()
        cat.search(
            activity_id="CMIP",
            experiment_id="historical",
            institution_id=institution_id,
            variable_id=["areacello"],
            member_id='r11i1p1f1',
        )
        try:
            area_ds = cat.to_datatree()[grid].to_dataset()
        except ValueError:
            area_ds = cat.to_dataset_dict()[""]
        return xr.merge([tos_ds, area_ds])

    def calculate_enso(ds):

        # Subset the El Nino 3.4 index region
        dso = ds.where(
        (ds.cf["latitude"] < 5) & (ds.cf["latitude"] > -5) & (ds.cf["longitude"] > 190) & (ds.cf["longitude"] < 240), drop=True
        )

        # Calculate the monthly means
        gb = dso.tos.groupby('time.month')

        # Subtract the monthly averages, returning the anomalies
        tos_nino34_anom = gb - gb.mean(dim='time')

        # Determine the non-time dimensions and average using these
        non_time_dims = set(tos_nino34_anom.dims)
        non_time_dims.remove(ds.tos.cf["T"].name)
        weighted_average = tos_nino34_anom.weighted(ds["areacello"]).mean(dim=list(non_time_dims))

        # Calculate the rolling average
        rolling_average = weighted_average.rolling(time=5, center=True).mean()
        std_dev = weighted_average.std()
        return rolling_average / std_dev

    def add_enso_thresholds(da, threshold=0.4):

        # Conver the xr.DataArray into an xr.Dataset
        ds = da.to_dataset()

        # Cleanup the time and use the thresholds
        try:
            ds["time"]= ds.indexes["time"].to_datetimeindex()
        except:
            pass
        ds["tos_gt_04"] = ("time", ds.tos.where(ds.tos >= threshold, threshold).data)
        ds["tos_lt_04"] = ("time", ds.tos.where(ds.tos <= -threshold, -threshold).data)

        # Add fields for the thresholds
        ds["el_nino_threshold"] = ("time", np.zeros_like(ds.tos) + threshold)
        ds["la_nina_threshold"] = ("time", np.zeros_like(ds.tos) - threshold)

        return ds
    
    ds = search_esgf("NCAR")
    enso_index = add_enso_thresholds(calculate_enso(ds).compute())
    enso_index.attrs = ds.attrs
    enso_index.attrs["model"] = model

    return enso_index

# Key Points

- Great solution if a user needs custom computation next to the data
- Minimizes data transfer by operating on the data where it is stored
- `intake-esgf` is used to detect the file system, and access the data locally