## How to use `earthdata-varinfo` to publish UMM-Var records to CMR
### Overview:

This notebook demonstrates how to create and publish, Unified Metadata Model-Variable (UMM-Var) records to NASA's Common Metadata Repository (CMR) with, `earthdata-varinfo` >= 2.0.0. `earthdata-varinfo` utilizes [`python-cmr`](https://github.com/nasa/python_cmr) to query CMR for collection granules to download locally. The `VarInfoFromNetCDF4` class in `earhdata-varinfo` is used to create CMR compliant UMM-Var entries. Lastly, the `requests` library is used to publish UMM-Var records to a given CMR environment (`OPS`, `UAT`, and `SIT`).

### Setting up your environment to run this notebook

Create and activate your `pyenv` or conda environment, then:

```
pip install earthdata-varinfo
```

If this doesn't work, alternatively you can clone the git repository, and install the package in editable mode:

```
git clone https://github.com/nasa/earthdata-varinfo
cd earthdata-varinfo
pip install -e .
```
### Other notebook requirements:

When installing `earthdata-varinfo` via PyPI required packages should automatically be installed as dependencies. 
For local development, without a standard pip installation, third party requirements can be installed from the following files:

```
pip install -r requirements.txt -r dev-requirements.txt
pip install notebook
```
### Example usage:

* [GLDAS_NOAH10_3H](https://cmr.uat.earthdata.nasa.gov/search/collections.umm_json?concept-id=C1256543837-EEDTEST)
* [M2I1NXASM](https://cmr.uat.earthdata.nasa.gov/search/collections.umm_json?concept-id=C1256535511-EEDTEST)

### Authorization:

* Launchpad or EDL tokens must be used in order query and publish to CMR.
* Authorization headers for EDL tokens contain the header prefix `Bearer` before the token
    * For example: `Bearer <EDL token>`
* Authorization headers for Launchpad tokens do **NOT** contain any prefixes in the header
    * For example: `<Launchpad token>`

To request a Launchpad Token visit:
* [Launchpad Authentication User's Guide](https://wiki.earthdata.nasa.gov/display/CMR/Launchpad+Authentication+User%27s+Guide)

### Publish UMM-Var records for **GLDAS_NOAH10_3H** with `generate_collections_umm_var`

`generate_collections_umm_var` is a wrapper function that combines the functionalities in `varinfo.cmr_search`, the `VarInfoFromNetCDF4` class and `varinfo.umm_var` to create and publish UMM-Var entries to CMR.

Update `auth_header` to include your EDL token (e.g. `Bearer <EDL token>`) or Launchpad token (e.g. `<Launchpad token>`)

In [None]:
auth_header = 'Bearer <EDL token> or <Launchpad token>'

Update the `collection_concept_id` to the **GLDAS_NOAH10_3H** concept-id for the EEDTEST provider.
* This can be updated to any concept-id for any provider

In [None]:
collection_concept_id_gldas = 'C1256543837-EEDTEST'

Import `generate_collection_umm_var` from `varinfo.generate_umm_var`

In [None]:
from varinfo.generate_umm_var import generate_collection_umm_var

`generate_collection_umm_var` will:

* Download the most recent granule for **GLDAS_NOAH10_3H**
* Generate the UMM-Var records for this granule
* Publish these records to CMR if `publish=True`. 
* If `publish=True`, a list of ingested variable concept-ids or the error(s) from an unsucessful ingest is returned
    * `['V1259971755-EEDTEST', 'V1259971757-EEDTEST', ...]` 
    * `['V1259971755-EEDTEST', '#: CMR error 1\n  #: CMR error 2', ...]`
* If `publish=False` (default) a list of UMM-Var entries is returned:
    * `[...{'Name': 'lat', 'LongName': 'lat', ...}, {'Name': 'time', 'LongName': 'time', ...}...]`

In [None]:
generate_collection_umm_var(collection_concept_id=collection_concept_id_gldas,
                            auth_header=auth_header, publish=True)

### Publishing and creating UMM-Var entries for **M2I1NXASM**:
This example is an alternative to using `generate_collection_umm_var`. It demonstrates the individual components of `generate_collection_umm_var` with:
* `varinfo.cmr_search`: queries CMR for a granule download link and downloads granules locally
* `VarInfoFromNetCDF4`: varinfo parent class that represents the contents of a granule
* `varinfo.umm_var`: contains functions for creating and publishing UMM-Var records to CMR
* `CMR_UAT` is a string constant (e.g. https://cmr.uat.earthdata.nasa.gov/search/) of a CMR environment

In [None]:
from cmr import CMR_UAT

from varinfo.cmr_search import (get_granules, get_granule_link, 
                                download_granule)

from varinfo import VarInfoFromNetCDF4

from varinfo.umm_var import (get_all_umm_var, get_umm_var, publish_all_umm_var,
                             publish_umm_var)

Update the `collection_concept_id` to the **M2I1NXASM** concept-id for the EEDTEST provider
* This can be updated to any concept-id for any provider

In [None]:
collection_concept_id_merra = 'C1256535511-EEDTEST'

Get the granule record and granule download URL with `get_granules` and `get_granule_link`

* `get_granules`: queries `CMR_UAT` (default is `CMR_OPS`) for a UMM-G record (granule record) given a collection or granule concept-id
    * you can query any CMR environment by adding `cmr_env=CMR_UAT` or `cmr_env=CMR_SIT`
* `get_granule_link`: parses the UMM-G record from `get_granules` for a data download URL

In [None]:
granule_response = get_granules(concept_id=collection_concept_id_merra,
                                cmr_env=CMR_UAT,
                                auth_header=auth_header)
url = get_granule_link(granule_response)
url

Download the granule locally with `download_granule`
* Defaults to current directory
* Add optional argument `out_directory=/path/to/save/granule` to save to specified path
* Returns the path the granule was downloaded to (e.g. `/path/granule/was/saved/to`)

In [None]:
download_granule(url, auth_header=auth_header)

Instantiate a ```VarInfoFromNetCDF4``` object for a local NetCDF-4 file. 

In [None]:
var_info = VarInfoFromNetCDF4('MERRA2_400.inst1_2d_asm_Nx.20220130.nc4',
                              short_name='M2I1NXASM')

Retrieve a dictionary of UMM-Var JSON records
* Returns a nested dictionary of UMM-Var records with full variable paths as keys and their UMM-Var records as values
* e.g. `{'/lon': {'Name': 'lon', 'LongName': 'lon', ...}, '/lat': {'Name': 'lat', 'LongName': 'lat', ...}...}`

In [None]:
umm_var_dict = get_all_umm_var(var_info)
umm_var_dict

Publish all UMM-Var records for **M2I1NXASM** to CMR_UAT with `publish_all_umm_var`
* Returns a dictionary of variable names and variable concept-ids as key value pairs respectively.
* Example output: ```{'/lon': 'V1259972387-EEDTEST', '/lat': 'V1259972389-EEDTEST'...}```

In [None]:
publish_all_umm_var(collection_concept_id_merra,
                    umm_var_dict,
                    auth_header=auth_header,
                    cmr_env=CMR_UAT)

### Publish one UMM-Var record with the `var_info.get_variable()` object
This example is another alternative to using `generate_collection_umm_var`. In this example we use the granule we have already download locally (**M2I1NXASM**) to create and ingest a single UMM-Var record.
* Use `var_info.get_variable()` to retrieve the variable object from `var_info`
* Keys are the full variable paths (e.g. `'/TROPPV'`)

In [None]:
variable = var_info.get_variable('/TROPPV')

Check if the variable exists and get a dictionary of the variable's UMM-Var JSON record

In [None]:
if variable is not None:
     umm_var_entry = get_umm_var(var_info, variable)
else:
    print('Selected variable was not found in granule')

umm_var_entry

Publish the UMM-Var record for `TROPPV` (from **M2I1NXASM**) to CMR_UAT with `publish_umm_var`
* This will return a variable concept-id (e.g. `'V1259972421-EEDTEST'`)

In [None]:
publish_umm_var(collection_concept_id_merra,
                umm_var_entry,
                auth_header=auth_header,
                cmr_env=CMR_UAT)