## Accessing ECMWF Open Data â€“ Real Time

The Planetary Computer includes data from the ECMWF Open Data (real time) program. See the [dataset](https://planetarycomputer.microsoft.com/dataset/ecmwf-forecast) page and [ECMWF Uesr Guide](https://confluence.ecmwf.int/display/UDOC/ECMWF+Open+Data+-+Real+Time) for more.

A separate collection exists for each combination of *stream* (or forecasting system) and *type*, namely, enfo-ep, oper-fc, scda-fc, scwv-fc, waef-ef, waef-ep, wave-fc. This notebook will focus on the wave-fc collection.

Each item in this collection includes metadata about the data that produced that particular dataset. Filter on these values to select the item of interest. For example, we can select data that are forecast `0h` hours out.

In [1]:
import cartopy.crs as ccrs
import fsspec
import matplotlib.pyplot as plt
import pystac_client
import planetary_computer
import urllib.request
import xarray as xr

#Loading custom codec
from stactools.ecmwf_forecast.range_codec import Range
from numcodecs.registry import register_codec
register_codec(Range)



In [None]:
catalog = pystac_client.Client.open(
    "https://pct-apis-staging.westeurope.cloudapp.azure.com/stac/",
    modifier=planetary_computer.sign_inplace,
)
search = catalog.search(
    collections=["ecmwf-forecast-wave-fc"],
    query={
        "ecmwf:step": {"eq": "0h"},
    },
)
items = search.get_all_items()
len(items)



We'll select the most recent item, using the item's datetime.

In [None]:
item = max(items, key=lambda item: item.datetime)
item

This STAC item has two assets. One asset is the actual GRIB2 file with the data. The second asset is the "index" file, which contains information about the messages within the GRIB2 file.

In [None]:
url = item.assets["data"].href
url

To open the file with xarray, we can download it locally and open it with `cfgrib`.

In [None]:
filename, _ = urllib.request.urlretrieve(url)
ds = xr.open_dataset(filename, engine="cfgrib")
ds

As opposed to having to download the data locally to take a look at it, we can open it more efficiently using kerchunk indices. [Kerchunk](https://pypi.org/project/kerchunk/) is a python library that provides a unified way to represent a variety of chunked, compressed data formats (e.g. NetCDF, HDF5, GRIB) through metadata, allowing efficient access to the data from traditional file systems or cloud object storage.

We used the Kerchunk library to generate metadata indices for this STAC collection and stored them in the properties section for each item. Let's take a look.

In [None]:
item.properties['kerchunk_indices']

Kerchunk indices are similar to a Zarr store. We can use the packages fsspec and xarray to view the virtual dataset. You'll see it matches the dataset loaded above from the downloaded .grib2 file.

In [None]:
m = fsspec.get_mapper('reference://', fo=item.properties['kerchunk_indices'])
ds_kerchunk = xr.open_dataset(m, engine='zarr', consolidated=False)
ds_kerchunk

We can plot the various data variables, for example the significant height of combined wind waves and swell.

In [None]:
projection = ccrs.Robinson()
fig, ax = plt.subplots(figsize=(16, 9), subplot_kw=dict(projection=projection))

ds.swh.plot(ax=ax)
plt.show()

Now let's do a simple comparison of loading and analysis time between having to retrieve/download the data and accessing it via kerchunk.

Performing the analysis using the original .grib2 file takes 3 seconds (2.2 seconds to load, <1 second to calculate the mean of swh). Performing the analysis using kerchunk indices takes 1 second (<1 second to load, <1 second to calculate the mean of swh). When performing a more complex analysis, using the kerchunk indices can really speed up your work.

In [None]:
%%time
filename, _ = urllib.request.urlretrieve(url)
ds = xr.open_dataset(filename, engine="cfgrib")

In [None]:
%%time
ds['swh'].mean()

In [None]:
%%time
m = fsspec.get_mapper('reference://', fo=item.properties['kerchunk_indices'])
ds = xr.open_dataset(m, engine='zarr', consolidated=False)

In [None]:
%%time
ds['swh'].mean()