# Using Microsoft Planetary Computer for Large-scale Geospatial and Weather Data Analytics 

### Pre-requisite

You will need to have access to Microsoft Planetary Computer. You can apply for an account [here](https://planetarycomputer.microsoft.com/account/request).


Useful links:
* [Guide: Connecting to Planetary Computer using VSCode](https://planetarycomputer.microsoft.com/docs/overview/ui-vscode/)
* [Token generator on Planetary Computer JupyterHub](https://pccompute.westeurope.cloudapp.azure.com/compute/hub/token)


## A quick check on the JupyterHub server instance

Note that even the single server instance is a relatively powerful machine.

In [3]:
# get the number of cores in the server
!cat /proc/cpuinfo | grep 'model name' | uniq
!cat /proc/cpuinfo | grep processor | wc -l
# get the amount of memory (GB) in the server
!free -h | awk 'NR==2{print $2}'
# get the GPU type
!nvidia-smi -L

model name	: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
8
62Gi
/bin/bash: line 1: nvidia-smi: command not found


## Task 1: Calculate the climatological monthly mean for surface winds (1990-2020) using ERA-5 data

Traditionally, we need to download this data from [ECMWF MARS archive](https://www.ecmwf.int/en/forecasts/access-forecasts/access-archive-datasets) and then process it locally. This can take a long time, typically in the order of weeks just to download the data.

With the use of Planetary Computer, we leverage on the cloud compute cluster to process the data in-situ and then get the results in a matter of minutes ! 

### Step 1: Search the STAC data catalogue

* We search the STAC catagloe for [ERA-5 PDS](https://planetarycomputer.microsoft.com/dataset/era5-pds#overview) data hosted on Planetary Computer. 
* If you are not familiar with STAC catalogue , you may refer to this [tutorial by Planet Labs](https://developers.planet.com/docs/planetschool/introduction-to-stac-part-1-an-overview-of-the-specification/).

In [63]:
import pystac_client
import planetary_computer
import xarray as xr

In [64]:
# Open a STAC catalog

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

search = catalog.search(collections=["era5-pds"],
                        bbox=[90, -12, 130, 25], # ASEAN region
                        datetime="1990-01-01/2020-12-31", # look for climatological period 1990-2020
                        query={"era5:kind": {"eq": "an"}} # use only analysis data
                    )

In [None]:
items = search.get_all_items()
print (f"Found {len(items)} items")

### Step 2: Inspect the content of the item 

Here we can see the metadata of the item. 

We can alse see that the data is stored in the [Zarr](https://zarr.readthedocs.io/en/stable/) format.

In [None]:
items[0].assets

In [None]:
items[0].to_dict()

### Step 3: "Load" the data

We will load the `u` and `v` data into a dask xarray dataset. I will use the winds at 100-m vertical height for this example. 

We will see that this is a lazy operation, meaning that the data is not loaded into memory yet.

In [None]:
vars = [
    'northward_wind_at_100_metres',
    'eastward_wind_at_100_metres',
]

In [None]:
datasets = [xr.open_dataset(item.assets[var].href, 
                            **item.assets[var].extra_fields["xarray:open_kwargs"]
                            ) for item in items for var in vars]

In [None]:
datasets[0], datasets[-1]

This step only loads a list of xarray dataset into an array and we can combine all the individual datasets into a single dataset.

In [None]:
era_ds = xr.combine_by_coords(datasets)

In [None]:
# Perform a geographical bbox selection in xarray
era_ds = era_ds.sel(lon=slice(90, 130), lat=slice(25, -12))

In [None]:
# check total size of dataset and round to nearest 1 decimal
print(f"Total size of dataset: {era_ds.nbytes / 1e9:.1f} GB")

You can see that we need to load about 52 GB of data for our task. If we were to download this data, it will take a few hours minimally. Moreover, the data may not be slicable and you may well have to download the entire dataset in some cases.

In the next step, we will see how setup a Dask cluster to process this data on Planetary Computer.


### Cluster Limits
There are a few restrictions on the size of the Dask Clusters you can create.

* The maximum number of cores per worker is 8, and the maximum amount of memory per worker is 64 GiB. This ensures that the workers fit in the Standard_E8_v3 Virtual Machines used for workers.

* The maximum number of cores per cluster is 400

* The maximum amount of memory per cluster is 3200 GiB

* The maximum number of workers per cluster is 400

With the default settings of 1 core and 8 GiB per worker, this means a limit of 400 workers on 50 physical nodes (each with 8 cores and 64 GiB of memory).

You can read more about the maximum size of the Dask cluster [here](https://planetarycomputer.microsoft.com/docs/overview/environment/).

### Step 4: Setup Dask cluster

We will now spin up a Dask cluster. Dask is a distributed computing framework that allows us to process large datasets in parallel.

To learn more about Dask, you can refer to the [Dask documentation](https://docs.dask.org/en/latest/).

In [None]:
import dask_gateway

gateway = dask_gateway.Gateway()
cluster_options = gateway.cluster_options()

In [None]:
cluster_options['worker_cores'] = 4 # number of cores per worker
cluster_options['worker_memory'] = 8 # memory per worker (GB)
cluster_options['gpu'] = False # use GPU or not, I advise to switch to False for most cases as it seems that it is not so easy to get GPU resources
# cluster_options["environment"] = {
#    "DASK_DISTRIBUTED__WORKERS__RESOURCES__GPU": "1",
# }

cluster = gateway.new_cluster(cluster_options)
client = cluster.get_client()
cluster.adapt(minimum=20, maximum=50)

In [None]:
client

In [None]:
#client
#cluster.shutdown()
#client.shutdown()

### Step 5: Compute the climatological monthly mean

Here we will do a lazy evaluation of the monthly mean of the winds.

In [None]:
res = era_ds.groupby('time.month').mean('time')
res

However, we have not actually made the actual computation. We will do that in the next step by calling the `compute()` method.

While this is executing, you can take a look at the Dask dashboard to see the progress of the computation.

In my case, i managed to be allocated a cluster size of 200 GB ram and 75 cores.

![Dask dashboard](dask-dashboard.png)

In [None]:
res.compute()

Now we will reorganize the data into chunks of `12, 20, 20` chunks for the time, latitude and longitude dimensions respectively.

We will save this `.zarr` file to the Azure blob storage.

In [None]:
res_rechunk = res.chunk({'lat': 20, 'lon': 20, 'month': 12})

In [6]:
import xarray as xr
from adlfs import AzureBlobFileSystem

# Create an Azure Blob Storage file system
fs = AzureBlobFileSystem(account_name='songhanplanetarycomputer', 
                         account_key=None, 
                         sas_token="""sp=r&st=2023-12-16T07:11:00Z&se=2023-12-16T15:11:00Z&spr=https&sv=2022-11-02&sr=c&sig=mYN6EMf6eJpiwe1BVbNWiMpM8M34bMfnJ1UzoBcCPV4%3D""")

store = fs.get_mapper('scratch/mydata.zarr')
res_rechunk.to_zarr(store, mode='w-')

NameError: name 'res_rechunk' is not defined

### Step 6: Visualise the results

In our final step, we will visualise the results using the `cartopy` library.

In [48]:
# read data from azure blob storage
import xarray as xr
from adlfs import AzureBlobFileSystem

# Create an Azure Blob Storage file system
fs = AzureBlobFileSystem(account_name='songhanplanetarycomputer', 
                         account_key=None, 
                         sas_token="""sp=r&st=2023-12-16T07:11:00Z&se=2023-12-16T15:11:00Z&spr=https&sv=2022-11-02&sr=c&sig=mYN6EMf6eJpiwe1BVbNWiMpM8M34bMfnJ1UzoBcCPV4%3D""")

ds = xr.open_zarr(store, consolidated=True)

In [49]:
ds.load()

In [50]:
# convert units m/s to knots
ds = ds * 1.94384

In [55]:
ds['wind_speed'] = xr.DataArray(np.linalg.norm([ds.eastward_wind_at_100_metres, 
                          ds.northward_wind_at_100_metres],
                         axis=0)).rename({'dim_0':'month', 'dim_1': 'lat', 'dim_2': 'lon'})

In [56]:
ds

In [62]:
import numpy as np
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import ipywidgets as widgets

n = 5  # Change this to space out your wind barbs more or less
ds_subset = ds.isel(lat=slice(None, None, n), lon=slice(None, None, n))

def plot_time(time):
    fig = plt.figure(figsize=(10, 10))
    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.coastlines()
    ax.set_extent([90, 130, -12, 25])
    _ = ax.contourf(ds.lon, 
                 ds.lat, 
                 ds.wind_speed.sel(month=time),
                 transform=ccrs.PlateCarree())
    _ = ax.barbs(ds_subset.lon, 
                 ds_subset.lat, 
                 ds_subset.eastward_wind_at_100_metres.sel(month=time), 
                 ds_subset.northward_wind_at_100_metres.sel(month=time), 
                 length = 4,
                 transform=ccrs.PlateCarree())
    

    plt.show()

# Create a time slider and use it to interactively plot the data
_ = widgets.interact(plot_time, time=widgets.SelectionSlider(options=ds.month.values))

interactive(children=(SelectionSlider(description='time', options=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), val…

## Task 2: Training a ML model to predict daily rainfall data in Singapore using operational ECMWF data

In the next task, we will try something harder - to train a ML model using ECMWF data to predict daily rainfall data in Singapore. 

Well, after a quick search through the Microsoft Planetary Computer catalogue, the closest data we can find is the [ECMWF Real-Time Data](https://planetarycomputer.microsoft.com/dataset/ecmwf-forecast).

Unfortunately, the data is stored in `.grib` data format. This is not a cloud native storage format and you can read more about the challenges [here](https://tomaugspurger.net/noaa-nwm/02-problems.html).

I'll do a quick demonstration to show why.

In [81]:
# Open a STAC catalog

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

search = catalog.search(collections=["ecmwf-forecast"],
                        bbox=[90, -12, 130, 25], # ASEAN region
                        datetime="2022-01-01/2022-12-31", 
                        query={
                            "ecmwf:type": {"eq": "fc"},
                            "ecmwf:stream": {"eq": "oper"},
                            "ecmwf:step": {"eq": "24h"},
                        }
                    )

In [83]:
items = search.get_all_items()
items

0
id: ecmwf-2022-12-30T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-31T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-31T12:00:00Z
ecmwf:reference_datetime: 2022-12-30T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/12z/0p4-beta/oper/20221230120000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-30T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/12z/0p4-beta/oper/20221230120000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-30T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-30T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-30T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-31T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-31T00:00:00Z
ecmwf:reference_datetime: 2022-12-30T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/00z/0p4-beta/oper/20221230000000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-30T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/00z/0p4-beta/oper/20221230000000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-30T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-30T00-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-29T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-30T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-30T12:00:00Z
ecmwf:reference_datetime: 2022-12-29T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221229/12z/0p4-beta/oper/20221229120000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-29T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221229/12z/0p4-beta/oper/20221229120000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-29T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-29T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-29T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-30T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-30T00:00:00Z
ecmwf:reference_datetime: 2022-12-29T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221229/00z/0p4-beta/oper/20221229000000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-29T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221229/00z/0p4-beta/oper/20221229000000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-29T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-29T00-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-28T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-29T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-29T12:00:00Z
ecmwf:reference_datetime: 2022-12-28T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221228/12z/0p4-beta/oper/20221228120000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-28T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221228/12z/0p4-beta/oper/20221228120000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-28T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-28T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-28T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-29T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-29T00:00:00Z
ecmwf:reference_datetime: 2022-12-28T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221228/00z/0p4-beta/oper/20221228000000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-28T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221228/00z/0p4-beta/oper/20221228000000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-28T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-28T00-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-27T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-28T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-28T12:00:00Z
ecmwf:reference_datetime: 2022-12-27T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221227/12z/0p4-beta/oper/20221227120000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-27T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221227/12z/0p4-beta/oper/20221227120000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-27T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-27T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-27T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-28T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-28T00:00:00Z
ecmwf:reference_datetime: 2022-12-27T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221227/00z/0p4-beta/oper/20221227000000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-27T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221227/00z/0p4-beta/oper/20221227000000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-27T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-27T00-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-26T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-27T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-27T12:00:00Z
ecmwf:reference_datetime: 2022-12-26T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221226/12z/0p4-beta/oper/20221226120000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-26T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221226/12z/0p4-beta/oper/20221226120000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-26T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-26T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-26T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-27T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-27T00:00:00Z
ecmwf:reference_datetime: 2022-12-26T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221226/00z/0p4-beta/oper/20221226000000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-26T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221226/00z/0p4-beta/oper/20221226000000-24h-oper-fc.index?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-26T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-26T00-oper-fc-24h
type: application/geo+json


In [95]:
# Try reading grib data directly from blob storage, would not work
try:
    xr.open_dataset(items[0].assets['data'].href, engine='cfgrib')
except Exception as e:
    print (f"{e}")

Can't create file 'https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/12z/0p4-beta/oper/20221230120000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D.923a8.idx'
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.11/site-packages/cfgrib/messages.py", line 534, in from_indexpath_or_filestream
    with compat_create_exclusive(indexpath) as new_index_file:
  File "/srv/conda/envs/notebook/lib/python3.11/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/srv/conda/envs/notebook/lib/python3.11/site-packages/cfgrib/messages.py", line 500, in compat_create_exclusive
    fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
         ^^^^^^^^

[Errno 2] No such file or directory: 'https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/12z/0p4-beta/oper/20221230120000-24h-oper-fc.grib2?st=2023-12-15T08%3A10%3A09Z&se=2023-12-23T08%3A10%3A09Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-12-16T08%3A10%3A08Z&ske=2023-12-23T08%3A10%3A08Z&sks=b&skv=2021-06-08&sig=qZMWbo8JkcV9imP2NffDZth3xz4F8rx%2B7NEJ6oQmAuY%3D'


### Step 1: Use ECMWF data from Google Cloud Bucket

Thankfully, Google is currently embarking on a journey to create [Analysis-Ready, Cloud Optimized ERA-5 and ECMWF Operational Data](https://github.com/google-research/arco-era5)

In [96]:
import fsspec

fs = fsspec.filesystem('gs')
fs.ls('gs://gcp-public-data-arco-era5/ar/')

['gcp-public-data-arco-era5/ar/1959-2022-1h-240x121_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-128x64_equiangular_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-128x64_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-1440x721.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-240x121_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-512x256_equiangular_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-64x32_equiangular_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-64x32_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2',
 'gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg-chunk-1.zarr-v2',
 'gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg_der

In [None]:
import xarray as xr

ec_ds = xr.open_zarr(
    'gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2/', 
    chunks={'time': 100, 'latitude': 1000, 'longitude': 1000},
    consolidated=True,
)


In [None]:
#select time slice between 2019-01-01 and 2020-01-01
# and lat lon slice around Singapore
u10 = reanalysis.sel(
    time=slice('2019-01-01', '2019-02-01')).u10

In [None]:
print (u10)

In [None]:
ds = u10.mean(dim='time').compute()

In [None]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

plt.figure(figsize=(12, 6))
plt.axes(projection=ccrs.PlateCarree())
plt.scatter(
    ds.longitude,
    ds.latitude,
    ds.values,
)

In [None]:
cluster.close()

In [None]:
# plot a 2d contour map using cartopy

import cartopy.crs as ccrs
import matplotlib.pyplot as plt

ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
plt.contourf(ds.longitude, ds.latitude, ds.values)
plt.show()


In [None]:
print(f'size: {reanalysis.nbytes / (1024 ** 4)} TiB')


In [None]:
cluster.shutdown()
client.close()

In [None]:
cluster.close()

In [None]:
import dask_gateway

gateway = dask_gateway.Gateway()
cluster_options = gateway.cluster_options()

In [None]:
cluster_options['worker_cores'] = 2
cluster_options['worker_memory'] = 16
cluster_options['gpu'] = True

cluster = gateway.new_cluster(cluster_options)
client = cluster.get_client()
cluster.adapt(minimum=2, maximum=4)

In [None]:
client.shutdown()
cluster.shutdown()

In [None]:
cluster.scale(2)

In [None]:
client.close()
cluster.close()

In [None]:
cluster.shutdown()

In [None]:
time_range = "2000-01-01/2023-12-31"
bbox = [90, -12, 130, 25]

search = catalog.search(
    collections=["era5-pds"],
    bbox=bbox,
    datetime=time_range,
    query={"era5:kind": 
        {"eq": "an"}
    }
)

planetary_computer.sign_inplace(search)


items = search.get_all_items()
len(items)

In [None]:
items

In [None]:
# Use Dask to connect to Dask Gateway in Microsoft Planetary Computer 
# Read the data from STAC items into a Dask array

import dask_gateway
import dask.array as da
import numpy as np

gateway = dask_gateway.Gateway()
cluster = gateway.new_cluster()
cluster.adapt(minimum=1, maximum=20)
cluster

In [None]:
print(cluster.dashboard_link)

In [None]:
import xarray as xr
import stackstac

In [None]:
import fsspec

fs = fsspec.filesystem('gs')
fs.ls('gs://gcp-public-data-arco-era5/co/')

In [None]:
    import fsspec

    store = fsspec.get_mapper(asset.href)
    ds = xr.open_zarr(store, **asset.extra_fields["xarray:open_kwargs"])
    ds

    # stackstac.stack(planetary_computer.sign(items), epsg=4326)

In [None]:
items

In [None]:
items[0]

In [None]:
# Load all the items from STAC into dask xarray datasets
#items[0].assets['data'].href
xr.load_dataset(items[0].assets['data'].href, engine='cfgrib')