# Configuring a Harmony Service

**Note:** If you are developing a new serivce, it is highly recommended that you start by reading the instructions at `docs/guides/adapting-new-services.md` and then running the `bin/generate-new-service` script in the [NASA Harmony repository](https://github.com/nasa/harmony). This will take care of much of the boilerplate configuration/code covered in this notebook.

This notebook will show the steps required to configure a [Harmony](https://harmony.earthdata.nasa.gov/) service, covering the following points:

* Initial configuration of a new service in Harmony.
* Associating a collection with an existing service.
* Enabling service discovery via Earthdata Search Client ([EDSC](https://search.earthdata.nasa.gov/search)).
* Associating variables with an existing collection.


The following requirements are assumed to be already fulfilled:

* A Docker image, containing a service that is wrapped in a `HarmonyAdapter` instance, exists in a place that can be accessed by Harmony.
* A collection, containing granules, has been ingested and has associated UMM-C and UMM-G records.
* Write access to the CMR provider containing the collection to be associated with the new Harmony service.
* Access to the [NASA harmony repository](https://github.com/nasa/harmony), including the ability to push branches to the remote repository, and to open pull requests (PRs).

The cell below will import the packages required for this notebook - it will need to be run ahead of most of the cells below that make requests against the CMR and the UMM-Var Generator (UVG) APIs.

In [None]:
# Install prerequisite packages
import sys

!{sys.executable} -m pip install requests

In [None]:
import json
import xml.etree.ElementTree as ET
import requests

### Collection and granule record requirements:

Harmony uses records in the Common Metadata Repository ([CMR](https://cmr.earthdata.nasa.gov/search)) to determine which services are to be used for which collection. There are several Unified Metadata Model (UMM) record types that are used:

* [UMM-C](https://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/collection): Collection records, which contain information describing the collection itself, such as native file information, instrumentation or observing campaign used to take the data.
* [UMM-G](https://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/granule): Granule records, with specific information on an individual file within a collection. For example, spatial or temporal extents.
* [UMM-Var](https://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/variable): Variable records, detailing individual variables that are common to granules within the same collection. For example, sea surface temperature, or longitude.
* [UMM-S](https://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/service): Service records, that provide information on a back-end service that can be called to retrieve or transform hosted data. These records also help EDSC to configure the options presented to their users in order to make a valid request to these services.

Before proceeding further, you should have granules ingested by a [Cumulus](https://github.com/nasa/cumulus) instance. These granules should each have a UMM-G record and be within a collection with a UMM-C record, hosted in a CMR provider. It is not required for Harmony to utilize cloud-hosted data ingested via Cumulus, but it is strongly recommended to reduce egress cost.

First make a note of the collection concept ID for the cloud-hosted collection you want associated with a Harmony service. It has the format "C1234567890-PROVIDER", where "PROVIDER" corresponds to your CMR provider.

In [None]:
base_cmr_url = 'https://cmr.uat.earthdata.nasa.gov'  # Update this value to the correct environment
base_uvg_url = 'https://uvg.uat.earthdata.nasa.gov'  # Update this value to the correct environment (to use UVG)
collection_concept_id = 'C1234567890-PROVIDER'  # Update this value to that of your collection
provider = 'PROVIDER'  # Update this value to your provider

Next, ensure that the UMM-G records in your collection contain the required entry in the `RelatedUrls` field. This might look as follows:

```json
{
    ...,
    "RelatedUrls": [
        {
            "URL": "https://www.cloud-provider.com/path/to/granule/file.nc4",
            "Type": "GET DATA"
        },
        {
            "URL": "https://opendap.earthdata.nasa.gov/collections/C1234567890-PROVIDER/granules/granuleUR",
            "Type": "USE SERVICE API",
            "Subtype": "OPENDAP DATA"
        }
    ],
    ...
}
```

Alternatively, the Atom JSON format of the same granule record would look like:

```json
{
    ...,
    "links": [
        {
            "rel": "http://esipfed.org/ns/fedsearch/1.1/data#",
            "title": "Files may be downloaded directly to your workstation from this link",
            "hreflang": "en-US",
            "href": "https://www.cloud-provider.com/path/to/granule/file.nc4"
        },
        {
            "rel": "http://esipfed.org/ns/fedsearch/1.1/service#",
            "title": "OPeNDAP request URL (GET DATA : OPENDAP DATA)",
            "hreflang": "en-US",
            "href": "https://opendap.uat.earthdata.nasa.gov/collections/C1234567890-PROVIDER/granules/granuleUR"
        }
    ],
    ...
}
```

Harmony retrieves the Atom JSON response for granule record, and currently will retrieve the URL of the first link with the correct `rel` type. A user can also specify a string literal pattern that must be present in that URL, for example "opendap", to ensure a specific URL is retrieved.

If you intend for Harmony job results that include this collection to be shareable, make sure that guests have `read` permission on the collection (via [CMR ACLs endpoints](https://cmr.earthdata.nasa.gov/access-control/site/docs/access-control/api.html)), and if no EULAs are present that the `harmony.has-eula` tag is associated with the collection and set to `false` via the CMR `/search/tags/harmony.has-eula/associations` endpoint. Example request body: `[{"concept_id": "C1233860183-EEDTEST", "data": false}]`. All collections used in the Harmony job must meet these two requirements in order for the job to be shareable.

### Activating a service:

At this point you should have a collection of granules with UMM-C and UMM-G records. Additionally, a Docker image of your service should be hosted in a repository that is accessible to Harmony. This could be the Harmony AWS instance Elastic Container Repository (ECR), or a public DockerHub account, for example.

The following step will describe a pull request (PR) that should be made against the [NASA Harmony repository](https://github.com/nasa/harmony) in order to activate your service. Once this PR has been merged into the repository, and deployed to the relevant environment, it should be possible to make Harmony requests using your service against the configured collections by either constructing a Harmony URL manually or using the [harmony-py](https://pypi.org/project/harmony-py/) Python package.

To activate a new service, you will need to include two things in the PR:

* Environment variables for the service in [env-defaults](https://github.com/nasa/harmony/blob/main/env-defaults).
* An entry in the [services.yml](https://github.com/nasa/harmony/blob/main/config/services.yml) configuration file.

#### Environment variables for the service:

HARMONY_SERVICE_EXAMPLE_IMAGE=harmonyservices/service-example:latest
HARMONY_SERVICE_EXAMPLE_REQUESTS_CPU=128m
HARMONY_SERVICE_EXAMPLE_REQUESTS_MEMORY=128Mi
HARMONY_SERVICE_EXAMPLE_LIMITS_CPU=128m
HARMONY_SERVICE_EXAMPLE_LIMITS_MEMORY=512Mi
HARMONY_SERVICE_EXAMPLE_INVOCATION_ARGS='python -m harmony_service_example'

The REQUESTS_CPU, REQUESTS_MEMORY, LIMITS_CPU, and LIMITS_MEMORY parameters are used for configuring the needed resources for running the docker container in a pod on kubernetes. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ for details.

**A chained Harmony request:**

As mentioned above, a request to Harmony could involve multiple services, for example, the first service could extract a subset of variables from a granule hosted in OPeNDAP, while the second service could mask the retrieved variables using a GeoJSON polygon. There are examples of chained workflows in [config/services.yml](https://github.com/nasa/harmony/blob/main/config/services.yml) (any service with multiple `steps:`).



#### A `services.yml` entry:

There is additional documentation for adding a new entry to the [config/services.yml](https://github.com/nasa/harmony/blob/main/config/services.yml) file available [here](https://github.com/nasa/harmony/blob/main/docs/guides/adapting-new-services.md#5-registering-services-in-servicesyml).

You will be required to add a unique entry to the `services.yml` for each service in each environment. Each service (or service chain) must be represented by one and only one unique umm-s concept. Any collections that support this service (or service chain) need to be associated with the umm-s concept. Here is an example service template from the documentation:

```yaml
- name: harmony/service-example    # A unique identifier string for the service, conventionally <team>/<service>
  data_operation_version: '0.17.0' # The version of the data-operation messaging schema to use
  has_granule_limit: true          # Optional flag indicating whether we will impose granule limts for the request. Default to true.
  default_sync: false              # Optional flag indicating whether we will force the request to run synchrously. Default to false.
  type:                            # Configuration for service invocation
      <<: *default-turbo-config    # To reduce boilerplate, services.yml includes default configuration suitable for all Docker based services.
      params:
        <<: *default-turbo-params  # Always include the default parameters for docker services
        env:
          <<: *default-turbo-env   # Always include the default docker environment variables and then add service specific env
          STAGING_PATH: public/harmony/service-example # The S3 prefix where artifacts generated by the service will be stored
  umm_s: S1234-EXAMPLE            # Service concept id for the service. It is a required field and must be a string.
  collections:                    # Optional, should not exist in most cases. It is only used when there are granule_limit or variables applied to collections of the service.
    - id: C1234-EXAMPLE
      granule_limit: 1000         # A limit on the number of granules that can be processed for the collection (OPTIONAL - defaults to no limit)
      variables:                  # A list of variables provided by the collection (OPTIONAL)
        - v1
        - v2
  maximum_sync_granules: 1        # Optional limit for the maximum number of granules for a request to be handled synchronously. Defaults to 1. Set to 0 to only allow async requests.
  capabilities:                   # Service capabilities
    subsetting:
      bbox: true                  # Can subset by spatial bounding box
      temporal: true              # Can subset by a time range
      variable: true              # Can subset by UMM-Var variable
      multiple_variable: true     # Can subset multiple variables at once
    output_formats:               # A list of output mime types the service can produce
      - image/tiff
      - image/png
      - image/gif
    reprojection: true            # The service supports reprojection
  steps:
      - image: !Env ${QUERY_CMR_IMAGE} # The image to use for the first step in the chain
      - image: !Env ${HARMONY_EXAMPLE_IMAGE}     # The image to use for the second step in the chain
      
- name: harmony/http-example  # An example of configuring the HTTP backend
  type:
    name: http                # This is an HTTP endpoint
    params:
      url: http://www.example.com/harmony  # URL for the backend service
  # ... And other config (collections / capabilities) as in the above docker example
```

In a chained workflow there are multiple `steps` added to the template. These are in the order they should be invoked.

#### Making a PR:

Once you have a git branch with a workflow template and the necessary entries in the `config/services.yml` file, you should open a pull request to merge those changes into the NASA Harmony repository. Once merged, the changes will need to be deployed to the specified environments to activate your service. At that point you can begin making HTTP requests to retrieve output from your service via `harmony-py`, a browser, the Python `requests` package, cURL, or other client.

### UMM-S record:
As mentioned above, an unique umm-s record needs to be created for each service (or service chain) and it is specified as the value of the umm_s key in services.yml configuration of the harmony service. 

When creating a UMM-S record, it is important to ensure you select a Harmony service, and that the URL the service points to is the base Harmony URL for the environment your service record relates to. See [UMM-S Guidance for Harmony Services](https://wiki.earthdata.nasa.gov/display/HARMONY/UMM-S+Guidance+for+Harmony+Services) for additional details on UMM-S curation for Earthdata Search discovery. Note, you will need to replicate this services record across environments in which the service will operate.

#### Creating a UMM-S record in MMT

Once signed in to MMT, you can click on the "Manage Services" button near the centre, followed by "Create New Record"

![Create a new UMM-S record](../images/mmt_new_service_page.png "Create a new UMM-S record")

As with the MMT interface for new UMM-Var records, this will take you to a multi-page form where you can specify the capabilities and requirements of your service. These options will inform EDSC the information required from users as input to the service. For example, if the service performs spatial subsetting, the user may need to provide the geographic values of a bounding box.

#### Creating a UMM-S record via the CMR API:

There is documentation on this process available [here](https://cmr.earthdata.nasa.gov/ingest/site/docs/ingest/api.html#create-update-service). Alternatively, you could create a UMM-S record via an HTTP PUT request. Note, the service metadata is an example - consult the full [schema](https://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/service) for more options:

In [None]:
headers = {'Content-type': 'application/vnd.nasa.cmr.umm+json',
           'Echo-Token': echo_token,
           'Accept': 'application/json'}
base_cmr_url = 'https://cmr.uat.earthdata.nasa.gov'
service_native_id = 'sample_native_id_value'  # This must be unique to the provider.

service_metadata = {'Name': 'harmony-service-name',
                    'Version': '0.9.0',
                    'Description': 'A sentence describing your amazing service.',
                    'ServiceOptions': {'Subset': {'VariableSubset': {'AllowMultipleValues': True}}}
                    'SupportedProjections': ['Geographic'],
                    'SupportedFormats': ['netCDF-4'],
                    'Type': 'Harmony',
                    'URL': {'Description': 'This is the harmony root endpoint',
                            'URLValue': 'https://harmony.uat.earthdata.nasa.gov'}}

service_response = requests.put(f'{base_cmr_url}/ingest/providers/{provider}/services/{service_native_id}',
                                headers=headers,
                                data=service_metadata)

service_concept_id = json.loads(service_response).get('concept-id')

### UMM-Var records:

Some services operate on an entire granule, and do not need information regarding variables. Other services may only return a user-defined selection of variables from the native granule. For this latter type of service, a Harmony request URL will include a URL-encoded full variable path. Harmony requires that UMM-Var records exist for each variable users can specify, and that these UMM-Var records are associated with the relevant collection (UMM-C) record. The service will make requests against the CMR API to retrieve the required UMM-Var records, before sending this information to the requested back-end service:

There are several ways to create UMM-Var records:

* Manually via the Metadata Management Tool ([MMT](https://mmt.earthdata.nasa.gov)). This is a Graphical User Interface (GUI) that is handy for creating a small number of variables and associating them with the correct collection.
* Making HTTP requests directly against the Common Metadata Repository (CMR) API (see documentation [here](https://wiki.earthdata.nasa.gov/display/CMR/CMR+Data+Partner+User+Guide) and [here](https://cmr.earthdata.nasa.gov/ingest/site/docs/ingest/api.html#create-update-variable)).
* Using the UMM-Var Generator ([UVG](https://uvg.earthdata.nasa.gov/)).

This section will focus on using each of these methods in turn:

#### Using MMT to associate variables with a collection:

First navigate to the MMT instance associated with your environment (e.g. [mmt.earthdata.nasa.gov](https://mmt.earthdata.nasa.gov), [mmt.uat.earthdata.nasa.gov](mmt.uat.earthdata.nasa.gov)). Find the collection record for the collection that requires variables. On the summary page for that collection, click on the "Create Associated Variable" button, indicated in the figure below:

![Create Associated Variable](../images/mmt_collection_page.png "Create Associated Variable")

Clicking this link should take you to a multi-part form that allows you to fully define a UMM-Var record. Initially, the only required fields are on the first page of the form, including the variable name, long name, and definition. For variables in a hierarchical file, the variable name should be the full path, beginning with a leading "/" character. For variables in a flat file the leading slash is not required.

After you have completed your UMM-Var record draft, you can save and publish it. The new UMM-Var record should automatically be linked to your collection.

#### Making HTTP requests against CMR:

This option may be preferable for a collection with a large number of variables that do not require information beyond the most basic fields (e.g., name, long name and description). API documentation is available [here](https://cmr.earthdata.nasa.gov/ingest/site/docs/ingest/api.html#create-update-variable).

Requests can be made against CMR using any standard client (for example cURL). In this notebook the examples will use the Python `requests` package. This package will need to be present in your environment, and can be installed via Pip:

```bash
pip install requests
```

First you must authenticate with CMR. In the example below you will need to update the content of the XML token string with your EDL credentials, a name of your choosing for your client, and your IP address. Note, this request assumes that you are interacting with the UAT environment. If you are trying to create or update variables in another environment, you will need to update the base CMR URL near the top of this notebook:

In [None]:
# The echo_token variable retrieved in this cell is used in most CMR and UVG requests below.

xml_token_string = ('<token><username>your_username</username>'
                    '<password>y0ur_p4ssw0rd</password>'
                    '<client_id>a_name_for_your_client</client_id>'
                    '<user_ip_address>127.0.0.1</user_ip_address>'
                    '</token>')

headers = {'Content-Type': 'application/xml'}

token_response = requests.post(f'{base_cmr_url}/legacy-services/rest/tokens')

if token_response.status_code == 201:
    token_response_tree = ET.from_string(token_response.content)
    echo_token = token_response_tree.get('id').text
    print('Successfully extracted token, ending in: ...{echo_token[-5:]}')

The Echo token is a UUID contained in the "id" element of the response. This will need to be incorporated into the request headers of requests to create or update records in CMR.

With this token in hand, one can create or update a new variable. A native ID that is unique within the collections provider must be provided. This native ID will be required every time an existing variable record is to be updated via requests to the CMR API.

The cell below will create (or update) the metadata for a variable. It uses the collection concept ID defined earlier in this notebook. The content of the variable metadata dictionary is minimal. For richer examples see either the [API documentation](https://cmr.earthdata.nasa.gov/ingest/site/docs/ingest/api.html#create-update-variable) or [UMM-Var schema](https://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/variable).

In [None]:
headers = {'Content-type': 'application/vnd.nasa.cmr.umm+json', 'Echo-Token': echo_token}
variable_native_id = 'sample_native_id_value'  # This must be unique to the provider.

variable_metadata = {'Name': 'variable_name',
                     'LongName': 'A long UMM-Var name',
                     'VariableType': 'SCIENCE_VARIABLE'}

var_response = requests.put(f'{base_cmr_url}/ingest/collections/{collection_concept_id}/1/variables/{variable_native_id}',
                            headers=headers,
                            data=variable_metadata)

The request above could be implemented as part of a script that iterates through a list of variables, if the metadata is either already known, or incredibly minimal (e.g., you already have a list of science variable names, and do not need dimension information).

#### Using the UMM-Var Generator (UVG):

The [UVG](http://uvg.earthdata.nasa.gov/) is powerful tool for creating UMM-Var records for collections with a large number of variables with complicated relations between one another.

Documentation for UVG is available [here](https://wiki.earthdata.nasa.gov/display/UVG/UMM-Var+Generator+%28UVG%29+User%27s+Guide). First, you must use the UVG `/generate` endpoint to generate a set of valid UMM-Var records for a collection with granules in OPeNDAP. It will parse the `.dmr` for a granule (randomly selected if not specified) and return a response with valid UMM-Var records:

In [None]:
headers = {'Echo-Token': echo_token}

uvg_generate_response = requests.post(f'{base_uvg_url}/generate',
                                      data={'collection_concept_id': collection_concept_id, 'provider': provider},
                                      headers=headers)

if uvg_generate_response.ok:
    uvg_generate_json = json.loads(uvg_generate_response.content)

Once valid UMM-Var records have been generated via UVG, a request can be made to publish new UMM-Var records for this collection using the `/publish` endpoint of UVG. This requires the collection concept ID, its provider, and a list of variable records, as returned in the UVG `/generate` response:

In [None]:
headers = {'Echo-Token': echo_token}

uvg_publish_response requests.put(f'{base_uvg_url}/publish',
                                  data={'collection_concept_id': collection_concept_id,
                                        'provider': provider,
                                        'variables': uvg_generate_json.get('variables', [])}
                                  headers=headers)

### Enabling service discovery in EDSC:

After a PR has been merged to the Harmony repository that configures the service for use with Harmony, and this version of Harmony has been deployed to the necessary environment, your service will be active. At this point, however, users will not be able to discover or use this service for the relevant collections via Earthdata Search Client (EDSC). To enable this functionality, the umm-s concept that is configured for the service via the umm_s key in services.yml must be associated with the collections that can be used with the service. 

### Associating a UMM-S record with a collection:

This can be performed either via the MMT or via the [CMR API](https://wiki.earthdata.nasa.gov/display/CMR/CMR+Data+Partner+User+Guide#CMRDataPartnerUserGuide-Services). If you are trying to add an association between a service in one provider, and a collection in another, you will have to use the CMR API.

#### Associating a UMM-S record with a collection in MMT:

First navigate to your UMM-S record, by searching for it in MMT. Then click on the "Manage collection associations" link near the top of the page:

![Managing collection associations](../images/mmt_manage_associations.png "Manage collection associations")

Within this next page you will see a list of current associations. You can the click to "Add Collection Associations". The page you are brought to allows you to search via fields such as collection concept ID or collection title.

#### Associating a UMM-S record with a collection via the CMR API:

This method is good for either associating several collections to a service in a single operation, or for associating services from other providers to a service.

In [None]:
# If you used MMT to create a UMM-S record, uncomment the following line, and set it to the new
# service concept ID:
# service_concept_id = 'S1234567890-PROVIDER'

headers = {'Content-Type': 'application/json', 'Echo-Token': echo_token}
collections_list = [{'concept_id': collection_concept_id}]

association_response = requests.post(f'{base_cmr_url}/search/services/{service_concept_id}/associations',
                                     headers=headers, data=collections_list)

### Using your service via EDSC:

The UMM-S record and association with a collection should take immediate effect. To test it, navigate to Earthdata Search Client (EDSC). Search for your collection and select a granule:

![EDSC select a collection](../images/edsc_collection_select.png "EDSC select a collection")

After clicking on the "Download" button, you'll be able to look at the download form. Near the top are the options for customizing the output. One, or more, of these options should be "Harmony". Select your service. If multiple Harmony services are configured for a single collection, you can choose between them by clicking on "More Info" to see the service name and description. Note, when a Harmony request is received for a collection with multiple services, Harmony will try to route the request to the service chain that can best fulfill the request, accounting for the input request parameters.

![EDSC customise download](../images/edsc_download_form_one.png "EDSC customise download")

After selecting your Harmony service, the download form should include all the options to provide data that your service needs. In the example below, a user can request only a subset of the variables to be returned from the original input granule. Once the form is complete, you can then click "Download Data" and you will be redirected to the standard status page for an EDSC download request.

![EDSC customise download](../images/edsc_download_form_two.png "EDSC customise download")