# NASA CMR Recipe: GPM IMERG Late Precipitation

This tutorial shows how to use the library [pangeo-forge-cmr](https://github.com/yuvipanda/pangeo-forge-cmr) to create a recipe from files cataloged within [NASA's Common Metadata Repository](https://www.earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/cmr) (CMR). Using this library allows us to create recipes from a large catalog of archival NASA data.



## Background

This dataset is stored as `.netcdf` files and will be written to zarr. The only difference in this recipe should be the pattern creation. We will use the `pangeo-forge-cmr` library to help us with the recipe pattern generation and credentials to access data across the NASA CMR. From there on, this tutorial should look similar to other `Xarray-Zarr` tutorials.

The dataset we are looking at is a NASA satellite product of global surface precipitation. 

## Examine a Single File
Since we are interested in the GPM IMERG dataset, we can infer some information about it on the [NASA GSFC DAAC website](https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDL_06/summary).

Here we can see the `short_name` for the dataset is `GPM_3IMERGDL` and the current `version` is `06`. We will need this information for `pangeo-forge-cmr` to construct a valid file pattern.

In [4]:
from pangeo_forge_cmr import files_from_cmr, get_cmr_granule_links

shortname = 'GPM_3IMERGDL'
version = '06'

In [5]:
urls = get_cmr_granule_links(shortname, version)

In [26]:
import xarray as xr
from pydap.client import open_url
from pydap.cas.urs import setup_session

username = "<earthdata_username>"
password= "<earthdata_password>"

url = urls[0]
session = setup_session(username, password, check_url=url)
pydap_ds = open_url(url, session=session)

store = xr.backends.PydapDataStore(pydap_ds)
ds = xr.open_dataset(store)

## Define File Pattern

Now that we have looked a a single file from the dataset, we can use `pangeo-forge-cmr` to create the file pattern.



In [27]:
pattern = files_from_cmr( 
    shortname,
    version, 
    nitems_per_file=1,
    concat_dim='time',  
)

<FilePattern {'time': 8376}>

In [None]:
pattern 

## Define the Pipeline
Now that we have the file pattern defined, we can start piecing together the processing pipeline.

In [None]:
import apache_beam as beam
from pangeo_forge_recipes.transforms import OpenURLWithFSSpec, OpenWithXarray, StoreToZarr

For this example, lets create a temporary location for the data.

In [None]:
import os
from tempfile import TemporaryDirectory
td = TemporaryDirectory()
target_root = td.name
store_name = "NASA_CMR.zarr"
target_store = os.path.join(target_root, store_name)
target_store

### Assemble the Pipeline

Now we will use our `pattern` created by `pangeo-forge-cmr` as inputs to our beam-pipeline. This should be the same as other `Xarray-Zarr` based example pipelines. 

In [None]:
transforms = (
    beam.Create(pattern.items())
    | OpenWithXarray(file_type=pattern.file_type)
    | StoreToZarr(
        store_name=store_name,
        target_root=target_root,
        combine_dims=pattern.combine_dim_keys,
        target_chunks={"time": 1}
    )
)
transforms

In [None]:
with beam.Pipeline() as p:
    p | transforms

# Check The Outputs

In [None]:
ds_target =  xr.open_dataset(target_store, engine="zarr", chunks={})
ds_target

In [None]:
ds_target['HQprecipitation'].isel(time=0).plot()