## Accessing zarr-formatted Daymet data on Azure

The Daymet dataset contains daily minimum temperature, maximum temperature, precipitation, shortwave radiation, vapor pressure, snow water equivalent, and day length at 1km resolution for North America. The dataset covers the period from January 1, 1980 to December 31, 2019.

The Daymet dataset is maintained at [daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1328](daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1328) and mirrored on Azure Open Datasets at [aka.ms/ai4edata-daymet](aka.ms/ai4edata-daymet). Azure also provides a cloud-optimized version of the data in [Zarr](https://zarr.readthedocs.io/en/stable/) format, which can be read into an [xarray](http://xarray.pydata.org/en/stable/) [Dataset](http://xarray.pydata.org/en/stable/data-structures.html#dataset). If you just need a subset of the data, we recommend using xarray and Zarr to avoid downloading the full dataset unnecessarily.

The datasets are available in the `daymeteuwest` storage account, in the `daymet-zarr` container.  Files are named according to `daymet-zarr/{frequency}/{region}.zarr`, where frequency is one of `{daily, monthly, annual}` and region is one of `{hi, na, pr}` (for Hawaii, CONUS, and Puerto Rico, respectively). For example, `daymet-zarr/daily/hi.zarr`.

In [None]:
# Standard or standard-ish imports
import warnings
import matplotlib.pyplot as plt

# Less standard, but still pip- or conda-installable
import xarray as xr
import fsspec

# Neither of these are accessed directly, but both need to be installed; they're used
# via fsspec
import adlfs
import zarr

account_name = 'daymeteuwest'
container_name = 'daymet-zarr'

### Load data into an xarray Dataset

We can lazily load the data into an `xarray.Dataset` by creating a zarr store with [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) and then reading it in with xarray. This only reads the metadata, so it's safe to call on a dataset that's larger than memory.

In [None]:
store = fsspec.get_mapper('az://' + container_name + '/monthly/na.zarr', account_name=account_name)
# consolidated=True speeds of reading the metadata
ds = xr.open_zarr(store, consolidated=True)
ds

In [None]:
ds = ds.sel(time=slice('1990-01','2019-12'))

In [None]:
ds.prcp.nbytes/1e9

### Working with the data

Using xarray, we can quickly select subsets of the data, perform an aggregation, and plot the result. For example, we'll plot the average of the maximum temperature for the year 2009.

In [None]:
warnings.simplefilter("ignore", RuntimeWarning)
fig, ax = plt.subplots(figsize=(12, 12))
ds.sel(time="2009")["tmax"].mean(dim="time").plot.imshow(ax=ax, cmap="inferno");

Or we can visualize the timeseries of the minimum temperature over the past decade.

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
ds.sel(time=slice("2010", "2019"))['tmin'].mean(dim=["x", "y"]).plot(ax=ax);

### Chunking

Each of the datasets is chunked to allow for parallel and out-of-core or distributed processing with [Dask](https://dask.org/). The different frequencies (daily, monthly, annual) are chunked so that each year is in a single chunk. The different regions in the `x` and `y` coordinates so that no single chunk is larger than about 250 MB, which is primarily important for the `na` region.

In [None]:
ds['prcp']

So our `prcp` array has a shape `(14600, 584, 284)` where each chunk is `(365, 584, 284)`. Examining the store for monthly North America, we see the chunk each of a size of `(12, 1250, 1250)`. 

In [None]:
na_store = fsspec.get_mapper("az://" + container_name + "/monthly/na.zarr",
                             account_name=account_name)
na = xr.open_zarr(na_store, consolidated=True)
na['prcp']

See http://xarray.pydata.org/en/stable/dask.html for more on how xarray uses Dask for parallel computing. 