# How to generate UMM-Var records via the CMR GraphQL API

As of v2.0.0, `earthdata-varinfo` can generate and publish UMM-Var records to Earthdata Common Metadata Repository (CMR). Functionality to generate and return UMM-Var record JSON has been embedded within the [CMR GraphQL specification](https://graphql.earthdata.nasa.gov/api). This notebook will provide an example of how to programmatically request UMM-Var records in JSON format from the CMR GraphQL API.

*Note: This feature of the CMR GraphQL API is currently under development and not deployed as of 2023-09-23.*

## Environment set up:

First create and activate a Python environment using either `pyenv` or conda. Then pip install the following requirements:

* [requests](https://pypi.org/project/requests/) - A package to make HTTP requests.
* [notebook](https://pypi.org/project/notebook/) - A package to run the web-based Jupyter notebook environment.

## Import necessary packages:

In [None]:
import json

import requests

## Set the environment being used:

This notebook can be used against any environment in which the `generateVarableDrafts` attribute is available in the CMR-GraphQL interface. The `environment_name` variable in the cell below should be set to one of the following values:

* `sit`
* `uat`
* `production`

The `environment_name` variable is used both to identify the CMR-GraphQL URL against which the query should be performed, and to retrieve a valid Earthdata Login (EDL) Bearer token.

In [None]:
environment_name = 'uat'

## Retrieve a token for authorization:

The CMR GraphQL API supports the authentication methods that can be used with the CMR. Currently, these include:

### NASA LaunchPad tokens:

[NASA LaunchPad tokens](https://wiki.earthdata.nasa.gov/display/CMR/Launchpad+Authentication+User%27s+Guide) do not specify an authentication scheme in the `Authorization` header:

```
Authorization: <LaunchPad-token>
```

LaunchPad tokens can currently be used for both CMR search and ingest.

### EDL bearer tokens:

The `Authorization` header for these tokens has the following format, where the "Bearer" authentication scheme is specified before the token:

```
Authorization: Bearer <EDL-bearer-token>
```

EDL Bearer tokens are currently meant to be used with CMR when searching for records only, not for ingesting them.

There are several ways to retrieve an EDL Bearer token:

* Using the `https://urs.earthdata.nasa.gov/api/users/tokens` endpoint to retrieve an existing token.
* Using the `https://urs.earthdata.nasa.gov/api/users/token` endpoint to generate a new token.
* Using the [Earthdata Login](https://urs.earthdata.nasa.gov) GUI to generate and/or copy a token into a local text file or string variable.

This notebook will use an EDL Bearer token, with the cell below defines a helper function that will be used later in the notebook to retrieve one. This function assumes that there is a `.netrc` file on the local machine, which will be used to authenticate with EDL.

The function first uses the `/api/users/tokens` endpoint to retrieve the first existing token for a user. If there is no existing token, the `/api/users/token` endpoint is used to generate a new one.

In [None]:
urs_urls = {
    'sit': 'https://sit.urs.earthdata.nasa.gov',
    'uat': 'https://uat.urs.earthdata.nasa.gov',
    'production': 'https://urs.earthdata.nasa.gov'
}


def get_edl_token(environment_name: str) -> str:
    """ Retrieve an EDL token for use in requests to CMR graph. If
        the user identified by a local .netrc file does not have a
        token then a new one will be generated.

    """
    urs_url = urs_urls.get(environment_name)

    existing_tokens_response = requests.get(
        f'{urs_url}/api/users/tokens',
        headers={'Content-type': 'application/json'}
    )
    existing_tokens_response.raise_for_status()
    existing_tokens_json = existing_tokens_response.json()

    if len(existing_tokens_json) == 0:
        new_token_response = requests.post(
            f'{urs_url}/api/users/token',
            headers={'Content-type': 'application/json'}
        )
        new_token_response.raise_for_status()
        new_token_json = new_token_response.json()
        edl_token = new_token_json['access_token']
    else:
        edl_token = existing_tokens_json[0]['access_token']

    return edl_token

## Select the correct CMR GraphQL environment:

The CMR GraphQL endpoint is available in SIT, UAT and production environments. Select the appropriate environment by updating the last line in the next cell to use the correct environment key ('local', 'sit', 'uat' or 'production'). The notebook is configured to access UAT by default.

In [None]:
graphql_environments = {
    'local': 'http://localhost:3013/dev/api',
    'sit': 'https://graphql.sit.earthdata.nasa.gov/api',
    'uat': 'https://graphql.uat.earthdata.nasa.gov/api',
    'production': 'https://graphql.earthdata.nasa.gov/api'
}

graphql_url = graphql_environments[environment_name]

## Define the GraphQL query:

Requests to a GraphQL API require a [query or mutation](https://graphql.org/learn/queries/) to be defined. In this case, the `Collection` query is used. This query requires a client to specify the fields of the object to be returned. The request below specifies a single field `generateVariableDrafts`, which will in turn trigger an AWS Lambda function to use `earthdata-varinfo` to generate UMM-Var JSON (schema version 1.8.2) for all identified variables within a sample granule for that collection.

The query defines the fields of the generated UMM-Var records that the response will contain:

* `dataType`
* `definition`
* `dimensions`
* `longName`
* `name`
* `standardName`
* `units`
* `metadataSpecification`

For more information on each of these fields, please see the [UMM-Var JSON schema](https://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/variable/v1.8.2/umm-var-json-schema.json).

In [None]:
graphql_query = '''
    query Collection($params: CollectionInput) {
      collection(params: $params) {
        generateVariableDrafts {
          count
          items {
            dataType
            definition
            dimensions
            longName
            name
            standardName
            units
            metadataSpecification
          }
        }
      }
    }
'''

## Set GraphQL query parameters:

The `Collection` query in CMR-GraphQL has a number of parameters that can be specified to identify matching collections. In this example, the collection concept ID is specified. The example collection used [a testing copy of a GPM IMERG precipitation product](https://cmr.uat.earthdata.nasa.gov/search/concepts/C1245618475-EEDTEST.html).

In [None]:
variables = {
    'params': {
        'conceptId': 'C1245618475-EEDTEST'
    }
}

## Create full JSON payload:

With the query and the variables defined, the JSON payload for the HTTP request to the CMR-GraphQL interface can be formed:

In [None]:
payload = {
    'query': graphql_query,
    'variables': variables
}

## Configure HTTP request headers:

The HTTP request made in this notebook requires two headers:

* `Authorization` - this will include an [Earthdata Login](https://urs.earthdata.nasa.gov/) (EDL) Bearer token or a NASA LaunchPad token. (There are more details above on these token types)
* `Content-Type` - this header tells the HTTP request the media type of the body of the request. In this case the request contains JSON, as defined in the payload above.

This notebook will retrieve an EDL Bearer token for authentication with CMR. This requires a `.netrc` to be present on the local machine, and for that file to contain credentials for the necessary EDL environment (SIT, UAT, production).

In [None]:
headers = {
    'Authorization': f'Bearer {get_edl_token(environment_name)}',
    'Content-Type': 'application/json',
}

## Make a request to CMR GraphQL:

The following cell submits the HTTP request to the CMR GraphQL API, using the URL of the environment chosen at the start of this notebook. The request combines the payload (`Collection` query and parameters) with the necessary headers to retrieve an HTTP response.

In [None]:
cmr_graphql_response = requests.post(graphql_url, json=payload, headers=headers)

## CMR GraphQL response:

The resulting response from the CMR GraphQL API is in HTTP format. A successful response will return the requested information within the `data` attribute of the response body, while any errors will be reported under the `errors` attribute ([see the GraphQL documentation for a full description of the response body](https://graphql.org/learn/serving-over-http/#response)). The code below will print the full response, rendered as JSON.

The expected output for a successful request will look as follows, denoting the total count generated UMM-Var records and their JSON each variable detected in a sample granule for the requested collection. These are contained under the `generateVariableDrafts` field:

```
{
  "data": {
    "collection": {
      "generateVariableDrafts": {
        "count": 16,
        "items": [
          {
            "dataType": "int32",
            "definition": "Grid/time",
            "dimensions": [
              {
                "Name": "Grid/time",
                "Size": 1,
                "Type": "TIME_DIMENSION"
              }
            ],
            "longName": "Grid/time",
            "name": "Grid/time",
            "standardName": "time",
            "units": "seconds since 1970-01-01 00:00:00 UTC",
            "metadataSpecification": {
              "URL": "https://cdn.earthdata.nasa.gov/umm/variable/v1.8.2",
              "Name": "UMM-Var",
              "Version": "1.8.2"
            }
          },
          ...
        ]
      }
    }
  }
}

}
```

In [None]:
data = cmr_graphql_response.json()
print(json.dumps(data, indent=2))