# NASA CMR Recipe: GPM IMERG Late Precipitation

This tutorial shows how to use the [pangeo-forge-cmr](https://github.com/yuvipanda/pangeo-forge-cmr) plugin to create a recipe from files cataloged within [NASA's Common Metadata Repository](https://www.earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/cmr) (CMR). Using this library allows us to create recipes from a large catalog of archival NASA data.

In addition to this, we will use the [pangeo-forge-earthdatalogin](https://github.com/yuvipanda/pangeo-forge-earthdatalogin) plugin to help with NASA Earthdata Login Credentials. 

This tutorial is meant as an addition to the documentation in  [pangeo-forge-cmr](https://github.com/yuvipanda/pangeo-forge-cmr) and [pangeo-forge-earthdatalogin](https://github.com/yuvipanda/pangeo-forge-earthdatalogin). Both of these utilities were created by [@yuvipanda](https://github.com/yuvipanda). 



## Background

This dataset is stored as `.netcdf` files and will be written to `Zarr`. The only difference in this recipe should be the pattern creation. We will use `pangeo-forge-cmr` to help us with the recipe pattern generation and `pangeo-forgecredentials` utility to help us authenticate.  From there on, this tutorial should look similar to other `Xarray-Zarr` tutorials.

The dataset we are looking at is a NASA satellite product of global surface precipitation. 

## Setup NASA Earthdata Credentials

The `pangeo-forge-earthdatalogin` is a small utilty to aid in authentication with NASA Earthdata. To use this, you will need a NASA Earthdata account and will need to accept the EULA waiver for whichever dataset you are planning to access.

[Example EULA information](https://disc.gsfc.nasa.gov/earthdata-eula)

Once you have set-up your Earthdata account and accepted any relevent EULA's, the next step is to generate a Earthdata Token. There is an excellent guide [here](https://disc.gsfc.nasa.gov/earthdata-eula). From here, one option is to store this Token as an environment variable. When you have added this to your `bash`/`zsh`/etc... profile, `pangeo-forge-earthdatalogin` should be able to access it and use it in your recipe. 


## Examine a Single File
Since we are interested in the GPM IMERG dataset, we can infer some information about it on the [NASA GSFC DAAC website](https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDL_06/summary).

![CMR](../../../images/cmr-screenshot.png)


Here we can see the `short_name` for the dataset is `GPM_3IMERGDL` and the current `version` is `06`. We will need this information for `pangeo-forge-cmr` to construct a valid file pattern.



'/Users/nrhagen/Documents/carbonplan/pangeo_forge/pangeo-forge-recipes/docs/pangeo_forge_recipes/tutorials/xarray_zarr'

In [None]:
from pangeo_forge_cmr import files_from_cmr, get_cmr_granule_links
from pangeo_forge_earthdatalogin import OpenURLWithEarthDataLogin
import xarray as xr

shortname = 'GPM_3IMERGDL'
version = '06'

In [None]:
urls = get_cmr_granule_links(shortname, version)

### Optional
Uncomment to view a single file

In [None]:
# import xarray as xr
# from pydap.client import open_url
# from pydap.cas.urs import setup_session

# username = "<earthdata_username>"
# password= "<earthdata_password>"

# url = urls[0]
# session = setup_session(username, password, check_url=url)
# pydap_ds = open_url(url, session=session)

# store = xr.backends.PydapDataStore(pydap_ds)
# ds = xr.open_dataset(store)

## Define File Pattern

Now that we have looked a a single file from the dataset, we can use `pangeo-forge-cmr` to create the file pattern.



In [None]:
pattern = files_from_cmr( 
    shortname,
    version, 
    nitems_per_file=1,
    concat_dim='time',  
)

In [None]:
pattern  = pattern.prune()

## Define the Pipeline
Now that we have the file pattern defined, we can start piecing together the processing pipeline.

In [None]:
import apache_beam as beam
from pangeo_forge_recipes.transforms import OpenURLWithFSSpec, OpenWithXarray, StoreToZarr

For this example, lets create a temporary location for the data.

In [None]:
import os
from tempfile import TemporaryDirectory
td = TemporaryDirectory()
target_root = td.name
store_name = "NASA_CMR.zarr"
target_store = os.path.join(target_root, store_name)
target_store

### Assemble the Pipeline

Now we will use our `pattern` created by `pangeo-forge-cmr` as inputs to our beam-pipeline. This should be the same as other `Xarray-Zarr` based example pipelines. 

In [None]:
transforms = (
    beam.Create(pattern.items())
    | OpenURLWithEarthDataLogin()
    | OpenWithXarray(file_type=pattern.file_type)
    | StoreToZarr(
        store_name=store_name,
        target_root=target_root,
        combine_dims=pattern.combine_dim_keys,
        target_chunks={"time": 1}
    )
)
transforms


In [None]:
with beam.Pipeline() as p:
    p | transforms

# Check The Outputs

In [None]:
ds_target =  xr.open_dataset(target_store, engine="zarr", chunks={})
ds_target

In [None]:
ds_target['HQprecipitation'].isel(time=0).plot()