## Accessing ECMWF Open Data – Real Time

The Planetary Computer includes data from the ECMWF Open Data (real time) program. See the [dataset](https://planetarycomputer.microsoft.com/dataset/ecmwf-forecast) page and [ECMWF Uesr Guide](https://confluence.ecmwf.int/display/UDOC/ECMWF+Open+Data+-+Real+Time) for more.

A separate collection exists for each combination of *stream* (or forecasting system) and *type*, namely, enfo-ep, oper-fc, scda-fc, scwv-fc, waef-ef, waef-ep, wave-fc. This notebook will focus on the wave-fc collection.

Each item in this collection includes metadata about the data that produced that particular dataset. Filter on these values to select the item of interest. For example, we can select data that are forecast `0h` hours out.

In [1]:
import pystac_client
import planetary_computer

catalog = pystac_client.Client.open(
    "https://pct-apis-staging.westeurope.cloudapp.azure.com/stac/",
    modifier=planetary_computer.sign_inplace,
)
search = catalog.search(
    collections=["ecmwf-forecast-wave-fc"],
    query={
        "ecmwf:step": {"eq": "0h"},
    },
)
items = search.get_all_items()
len(items)



1270

We'll select the most recent item, using the item's datetime.

In [2]:
item = max(items, key=lambda item: item.datetime)
item

This STAC item has two assets. One asset is the actual GRIB2 file with the data. The second asset is the "index" file, which contains information about the messages within the GRIB2 file.

In [3]:
url = item.assets["data"].href
url

'https://ai4edataeuwest.blob.core.windows.net/ecmwf/20231018/00z/0p4-beta/wave/20231018000000-0h-wave-fc.grib2?st=2023-10-17T16%3A04%3A46Z&se=2023-10-18T16%3A49%3A46Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-10-18T14%3A28%3A23Z&ske=2023-10-25T14%3A28%3A23Z&sks=b&skv=2021-06-08&sig=1A4WApH0KP9EqDz%2BrNIqMVS%2BSkzVdfS6k/JzepaB7Tk%3D'

To open the file with xarray, we can download it locally and open it with `cfgrib`.

In [4]:
import urllib.request
import xarray as xr

filename, _ = urllib.request.urlretrieve(url)
ds = xr.open_dataset(filename, engine="cfgrib")
ds

We can plot the various data variables, for example the significant height of combined wind waves and swell.

In [6]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

projection = ccrs.Robinson()
fig, ax = plt.subplots(figsize=(16, 9), subplot_kw=dict(projection=projection))

ds.swh.plot(ax=ax, transform=ccrs.PlateCarree());

ModuleNotFoundError: No module named 'cartopy'

As opposed to having to download the data locally to take a look at it, we can open it more efficiently using kerchunk indices. [Kerchunk](https://pypi.org/project/kerchunk/) is a python library that provides a unified way to represent a variety of chunked, compressed data formats (e.g. NetCDF, HDF5, GRIB) through metadata, allowing efficient access to the data from traditional file systems or cloud object storage.

We used the Kerchunk library to generate metadata indices for this STAC collection and stored them in the properties section for each item. Let's take a look.

In [8]:
item

Kerchunk indices are similar to a Zarr store. We can use the packages fsspec and xarray to view the virtual dataset. You'll see it matches the dataset loaded above from the downloaded .grib2 file.

In [17]:
#REMOVE THIS LATER
import numpy as np
def decode_range(m):
    for key in m:
        if b'RANGE' in m[key]:
            new_arr = np.arange(*[float(i) for i in m[key].decode()[6:-1].split(',')])
            m[key] = new_arr.tobytes()

In [20]:
import fsspec
import xarray as xr

m = fsspec.get_mapper('reference://', fo=item.properties['kerchunk_indices'])
decode_range(m)
ds = xr.open_dataset(m, engine='zarr', consolidated=False)
ds

Now let's do a simple comparison of loading and analysis time between having to retrieve/download the data and accessing it via kerchunk.

Performing the analysis using the original .grib2 file takes 3 seconds (2.2 seconds to load, <1 second to calculate the mean of swh). Performing the analysis using kerchunk indices takes 1 second (<1 second to load, <1 second to calculate the mean of swh). When performing a more complex analysis, using the kerchunk indices can really speed up your work.

In [28]:
%%time

filename, _ = urllib.request.urlretrieve(url)
ds = xr.open_dataset(filename, engine="cfgrib")

CPU times: user 774 ms, sys: 67.9 ms, total: 842 ms
Wall time: 2.2 s


In [32]:
%%time

ds['swh'].mean()

CPU times: user 19.5 ms, sys: 10.7 ms, total: 30.1 ms
Wall time: 827 ms


In [33]:
%%time

m = fsspec.get_mapper('reference://', fo=item.properties['kerchunk_indices'])
decode_range(m)
ds = xr.open_dataset(m, engine='zarr', consolidated=False)

CPU times: user 45.6 ms, sys: 0 ns, total: 45.6 ms
Wall time: 856 ms


In [34]:
%%time

ds['swh'].mean()

CPU times: user 13.1 ms, sys: 10.3 ms, total: 23.4 ms
Wall time: 116 ms
