# Using Microsoft Planetary Computer for Large-scale Geospatial and Weather Data Analytics 

For internal tech talk @ Meteorological Service Singapore.

## What is Microsoft Planetary Computer ?

[https://planetarycomputer.microsoft.com/](https://planetarycomputer.microsoft.com/)

### Pre-requisite

You will need to have access to Microsoft Planetary Computer. You can apply for an account [here](https://planetarycomputer.microsoft.com/account/request).


Useful links:
* [Guide: Connecting to Planetary Computer using VSCode](https://planetarycomputer.microsoft.com/docs/overview/ui-vscode/)
* [Token generator on Planetary Computer JupyterHub](https://pccompute.westeurope.cloudapp.azure.com/compute/hub/token)


## A quick check on the JupyterHub server instance

Note that even the single server instance is a relatively powerful machine.

In [1]:
# get the number of cores in the server
!cat /proc/cpuinfo | grep 'model name' | uniq
!cat /proc/cpuinfo | grep processor | wc -l
# get the amount of memory (GB) in the server
!free -h | awk 'NR==2{print $2}'
# get the GPU type
!nvidia-smi -L

model name	: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
8
62Gi
/bin/bash: line 1: nvidia-smi: command not found


## Task 1: Calculate the climatological monthly mean for surface winds (1990-2020) using ERA-5 data

Traditionally, we need to download this data from [ECMWF MARS archive](https://www.ecmwf.int/en/forecasts/access-forecasts/access-archive-datasets) and then process it locally. This can take a long time, typically in the order of days to weeks just to download the data.

With the use of Planetary Computer, we leverage on the cloud compute cluster to process the data in-situ and then get the results in a matter of minutes ! 

### Step 1: Search the STAC data catalogue

* We search the STAC catagloe for [ERA-5 PDS](https://planetarycomputer.microsoft.com/dataset/era5-pds#overview) data hosted on Planetary Computer. 
* If you are not familiar with STAC catalogue , you may refer to this [tutorial by Planet Labs](https://developers.planet.com/docs/planetschool/introduction-to-stac-part-1-an-overview-of-the-specification/).

In [2]:
import pystac_client
import planetary_computer
import xarray as xr

In [3]:
# Open a STAC catalog

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

search = catalog.search(collections=["era5-pds"],
                        bbox=[90, -12, 130, 25], # ASEAN region
                        datetime="1990-01-01/2020-12-31", # look for climatological period 1990-2020
                        query={"era5:kind": {"eq": "an"}} # use only analysis data
                    )

In [4]:
items = search.get_all_items()
print (f"Found {len(items)} items")



Found 372 items


### Step 2: Inspect the content of the item 

Here we can see the metadata of the item. 

We can alse see that the data is stored in the [Zarr](https://zarr.readthedocs.io/en/stable/) format. I recommend reading the following articles:

* [Explanation about Zarr by NASA](https://wiki.earthdata.nasa.gov/display/ESO/Zarr+Format).
* [Decrease geospatial query latency using zarr by Amazon](https://aws.amazon.com/blogs/publicsector/decrease-geospatial-query-latency-minutes-seconds-using-zarr-amazon-s3/)

Zarr is a considered an [Analysis-Ready-Cloud-Optimized (ARCO)](https://www.frontiersin.org/articles/10.3389/fclim.2021.782909/full) format while `.grib` and `.nc` are not (at least without the use of [kerchunk](https://fsspec.github.io/kerchunk/), which is another vast topic for another day).

In [6]:
items[0].assets

{'surface_air_pressure': <Asset href=abfs://era5/ERA5/2020/12/surface_air_pressure.zarr>,
 'sea_surface_temperature': <Asset href=abfs://era5/ERA5/2020/12/sea_surface_temperature.zarr>,
 'eastward_wind_at_10_metres': <Asset href=abfs://era5/ERA5/2020/12/eastward_wind_at_10_metres.zarr>,
 'air_temperature_at_2_metres': <Asset href=abfs://era5/ERA5/2020/12/air_temperature_at_2_metres.zarr>,
 'eastward_wind_at_100_metres': <Asset href=abfs://era5/ERA5/2020/12/eastward_wind_at_100_metres.zarr>,
 'northward_wind_at_10_metres': <Asset href=abfs://era5/ERA5/2020/12/northward_wind_at_10_metres.zarr>,
 'northward_wind_at_100_metres': <Asset href=abfs://era5/ERA5/2020/12/northward_wind_at_100_metres.zarr>,
 'air_pressure_at_mean_sea_level': <Asset href=abfs://era5/ERA5/2020/12/air_pressure_at_mean_sea_level.zarr>,
 'dew_point_temperature_at_2_metres': <Asset href=abfs://era5/ERA5/2020/12/dew_point_temperature_at_2_metres.zarr>}

In [7]:
items[0].to_dict()

{'type': 'Feature',
 'stac_version': '1.0.0',
 'id': 'era5-pds-2020-12-an',
 'properties': {'datetime': None,
  'era5:kind': 'an',
  'end_datetime': '2020-12-31T23:00:00Z',
  'cube:variables': {'surface_air_pressure': {'type': 'data',
    'unit': 'Pa',
    'attrs': {'units': 'Pa',
     'nameCDM': 'Surface_pressure_surface',
     'long_name': 'Surface pressure',
     'nameECMWF': 'Surface pressure',
     'product_type': 'analysis',
     'standard_name': 'surface_air_pressure',
     'shortNameECMWF': 'sp'},
    'shape': [744, 721, 1440],
    'dimensions': ['time', 'lat', 'lon'],
    'description': 'Surface pressure'},
   'sea_surface_temperature': {'type': 'data',
    'unit': 'K',
    'attrs': {'units': 'K',
     'nameCDM': 'Sea_surface_temperature_surface',
     'long_name': 'Sea surface temperature',
     'nameECMWF': 'Sea surface temperature',
     'product_type': 'analysis',
     'standard_name': 'sea_surface_temperature',
     'shortNameECMWF': 'sst'},
    'shape': [744, 721, 1440],

### Step 3: "Load" the data

We will load the `u` and `v` data into a dask xarray dataset. I will use the winds at 100-m vertical height for this example. 

We will see that this is a lazy operation, meaning that the data is not loaded into memory yet.

This operation will take around 1 minute as it crawls over multiple `.zarr` metadata.

In [8]:
vars = [
    'northward_wind_at_100_metres',
    'eastward_wind_at_100_metres',
]

In [9]:
datasets = [xr.open_dataset(item.assets[var].href, 
                            **item.assets[var].extra_fields["xarray:open_kwargs"]
                            ) for item in items for var in vars]

We can do a quick preview of the dataset

In [10]:
datasets[0], datasets[-1]

(<xarray.Dataset>
 Dimensions:                       (lat: 721, lon: 1440, time: 744)
 Coordinates:
   * lat                           (lat) float32 90.0 89.75 89.5 ... -89.75 -90.0
   * lon                           (lon) float32 0.0 0.25 0.5 ... 359.5 359.8
   * time                          (time) datetime64[ns] 2020-12-01 ... 2020-1...
 Data variables:
     northward_wind_at_100_metres  (time, lat, lon) float32 dask.array<chunksize=(372, 150, 150), meta=np.ndarray>
 Attributes:
     institution:  ECMWF
     source:       Reanalysis
     title:        ERA5 forecasts,
 <xarray.Dataset>
 Dimensions:                      (time: 744, lat: 721, lon: 1440)
 Coordinates:
   * lat                          (lat) float32 90.0 89.75 89.5 ... -89.75 -90.0
   * lon                          (lon) float32 0.0 0.25 0.5 ... 359.5 359.8
   * time                         (time) datetime64[ns] 1990-01-01 ... 1990-01...
 Data variables:
     eastward_wind_at_100_metres  (time, lat, lon) float32 dask.arr

This step only loads a list of xarray dataset into an array. We do not want to deal with a list of dataset but rather, a single unified dataset.

Next, we can combine all the individual datasets into a single dataset and do some filtering by geopgrahical area.

In [11]:
era_ds = xr.combine_by_coords(datasets)

In [13]:
# Perform a geographical bbox selection in xarray
era_ds = era_ds.sel(lon=slice(90, 130), lat=slice(25, -12))

In [14]:
# check total size of dataset and round to nearest 1 decimal
print(f"Total size of dataset: {era_ds.nbytes / 1e9:.1f} GB")

Total size of dataset: 52.2 GB


You can see that we need to load about 52 GB of data for our task. If we were to download this data, it will take a few hours minimally. In some cases, the data may not be slicable and you may well have to download the entire dataset. 

In the next step, we will see how setup a Dask cluster to process this data on Planetary Computer.


### Cluster Limits
There are a few restrictions on the size of the Dask Clusters you can create.

* The maximum number of cores per worker is 8, and the maximum amount of memory per worker is 64 GiB. This ensures that the workers fit in the Standard_E8_v3 Virtual Machines used for workers.

* The maximum number of cores per cluster is 400

* The maximum amount of memory per cluster is 3200 GiB

* The maximum number of workers per cluster is 400

With the default settings of 1 core and 8 GiB per worker, this means a limit of 400 workers on 50 physical nodes (each with 8 cores and 64 GiB of memory).

You can read more about the maximum size of the Dask cluster [here](https://planetarycomputer.microsoft.com/docs/overview/environment/).

### Step 4: Setup Dask cluster

We will now spin up a Dask cluster. Dask is a distributed computing framework that allows us to process large datasets in parallel.

To learn more about Dask, you can refer to the [Dask documentation](https://docs.dask.org/en/latest/).

In [15]:
import dask_gateway

gateway = dask_gateway.Gateway()
cluster_options = gateway.cluster_options()

In [16]:
cluster_options['worker_cores'] = 4 # number of cores per worker
cluster_options['worker_memory'] = 8 # memory per worker (GB)
cluster_options['gpu'] = False # use GPU or not, I advise to switch to False for most cases as it seems that it is not so easy to get GPU resources
# cluster_options["environment"] = {
#    "DASK_DISTRIBUTED__WORKERS__RESOURCES__GPU": "1",
# }

cluster = gateway.new_cluster(cluster_options)
client = cluster.get_client()
cluster.adapt(minimum=20, maximum=50)



In [17]:
client

0,1
Connection method: Cluster object,Cluster type: dask_gateway.GatewayCluster
Dashboard: https://pccompute.westeurope.cloudapp.azure.com/compute/services/dask-gateway/clusters/prod.9cc2de2868624985a5139a6ab8221771/status,


In [20]:
#client
# cluster.shutdown()
# client.shutdown()

### Step 5: Compute the climatological monthly mean

Here we will do a lazy evaluation of the monthly mean of the winds.

In [18]:
res = era_ds.groupby('time.month').mean('time')
res

Unnamed: 0,Array,Chunk
Bytes,1.10 MiB,38.32 kiB
Shape,"(12, 149, 161)","(1, 109, 90)"
Dask graph,48 chunks in 812 graph layers,48 chunks in 812 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.10 MiB 38.32 kiB Shape (12, 149, 161) (1, 109, 90) Dask graph 48 chunks in 812 graph layers Data type float32 numpy.ndarray",161  149  12,

Unnamed: 0,Array,Chunk
Bytes,1.10 MiB,38.32 kiB
Shape,"(12, 149, 161)","(1, 109, 90)"
Dask graph,48 chunks in 812 graph layers,48 chunks in 812 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.10 MiB,38.32 kiB
Shape,"(12, 149, 161)","(1, 109, 90)"
Dask graph,48 chunks in 812 graph layers,48 chunks in 812 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.10 MiB 38.32 kiB Shape (12, 149, 161) (1, 109, 90) Dask graph 48 chunks in 812 graph layers Data type float32 numpy.ndarray",161  149  12,

Unnamed: 0,Array,Chunk
Bytes,1.10 MiB,38.32 kiB
Shape,"(12, 149, 161)","(1, 109, 90)"
Dask graph,48 chunks in 812 graph layers,48 chunks in 812 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


However, we have not actually made the actual computation. We will do that in the next step by calling the `compute()` method.

While this is executing, you can take a look at the Dask dashboard to see the progress of the computation.

In my case, i managed to be allocated a cluster size of 200 GB ram and 75 cores.

![Dask dashboard](dask-dashboard.png)

In [19]:
res.compute()

Now we will reorganize the data into chunks of `12, 20, 20` chunks for the time, latitude and longitude dimensions respectively.

We will save this `.zarr` file to the Azure blob storage.

In [20]:
res_rechunk = res.chunk({'lat': 20, 'lon': 20, 'month': 12})

In [21]:
import xarray as xr
from adlfs import AzureBlobFileSystem

# Create an Azure Blob Storage file system
fs = AzureBlobFileSystem(account_name='songhanplanetarycomputer', 
                         account_key=None, 
                         sas_token="""<PRIVATE SAS TOKEN>""")

store = fs.get_mapper('scratch/mydata.zarr')
res_rechunk.to_zarr(store, mode='w-')

<xarray.backends.zarr.ZarrStore at 0x71eafc23f0d0>

### Step 6: Visualise the results

In our final step, we will visualise the results using the `cartopy` library.

In [22]:
# read data from azure blob storage
import xarray as xr
from adlfs import AzureBlobFileSystem

# Create an Azure Blob Storage file system
fs = AzureBlobFileSystem(account_name='songhanplanetarycomputer', 
                         account_key=None, 
                         sas_token="""<PRIVATE SAS TOKEN>""")

ds = xr.open_zarr(store, consolidated=True)

In [23]:
ds.load()

In [24]:
# convert units m/s to knots
ds = ds * 1.94384

In [26]:
import numpy as np
ds['wind_speed'] = xr.DataArray(np.linalg.norm([ds.eastward_wind_at_100_metres, 
                          ds.northward_wind_at_100_metres],
                         axis=0)).rename({'dim_0':'month', 'dim_1': 'lat', 'dim_2': 'lon'})

In [27]:
ds

In [29]:
import numpy as np
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import ipywidgets as widgets

n = 5  # Change this to space out your wind barbs more or less
ds_subset = ds.isel(lat=slice(None, None, n), lon=slice(None, None, n))

def plot_time(time):
    fig = plt.figure(figsize=(10, 10))
    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.coastlines()
    ax.set_extent([90, 130, -12, 25])
    _ = ax.contourf(ds.lon, 
                 ds.lat, 
                 ds.wind_speed.sel(month=time),
                 transform=ccrs.PlateCarree())
    _ = ax.barbs(ds_subset.lon, 
                 ds_subset.lat, 
                 ds_subset.eastward_wind_at_100_metres.sel(month=time), 
                 ds_subset.northward_wind_at_100_metres.sel(month=time), 
                 length = 4,
                 transform=ccrs.PlateCarree())
    

    plt.show()

# Create a time slider and use it to interactively plot the data
_ = widgets.interact(plot_time, time=widgets.SelectionSlider(options=ds.month.values))

interactive(children=(SelectionSlider(description='time', options=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), val…

## Accessing ECMWF Isobaric Level Model Data

Unfortunately this dataset has yet to be converted to `Zarr` format by Microsoft though this effort is likely underway.

First, let's try to access it via Planetary Computer. [ECMWF Open Data real-time forecast](https://planetarycomputer.microsoft.com/dataset/ecmwf-forecast) will be the dataset that we use.

In [31]:
# Open a STAC catalog

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

search = catalog.search(collections=["ecmwf-forecast"],
                        bbox=[90, -12, 130, 25], # ASEAN region
                        datetime="2022-01-01/2022-12-31", 
                        query={
                            "ecmwf:type": {"eq": "fc"},
                            "ecmwf:stream": {"eq": "oper"},
                            "ecmwf:step": {"eq": "24h"},
                        }
                    )

In [32]:
items = search.get_all_items()
items



0
id: ecmwf-2022-12-30T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-31T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-31T12:00:00Z
ecmwf:reference_datetime: 2022-12-30T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/12z/0p4-beta/oper/20221230120000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-30T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/12z/0p4-beta/oper/20221230120000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-30T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-30T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-30T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-31T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-31T00:00:00Z
ecmwf:reference_datetime: 2022-12-30T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/00z/0p4-beta/oper/20221230000000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-30T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/00z/0p4-beta/oper/20221230000000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-30T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-30T00-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-29T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-30T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-30T12:00:00Z
ecmwf:reference_datetime: 2022-12-29T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221229/12z/0p4-beta/oper/20221229120000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-29T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221229/12z/0p4-beta/oper/20221229120000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-29T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-29T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-29T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-30T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-30T00:00:00Z
ecmwf:reference_datetime: 2022-12-29T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221229/00z/0p4-beta/oper/20221229000000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-29T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221229/00z/0p4-beta/oper/20221229000000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-29T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-29T00-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-28T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-29T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-29T12:00:00Z
ecmwf:reference_datetime: 2022-12-28T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221228/12z/0p4-beta/oper/20221228120000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-28T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221228/12z/0p4-beta/oper/20221228120000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-28T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-28T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-28T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-29T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-29T00:00:00Z
ecmwf:reference_datetime: 2022-12-28T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221228/00z/0p4-beta/oper/20221228000000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-28T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221228/00z/0p4-beta/oper/20221228000000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-28T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-28T00-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-27T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-28T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-28T12:00:00Z
ecmwf:reference_datetime: 2022-12-27T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221227/12z/0p4-beta/oper/20221227120000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-27T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221227/12z/0p4-beta/oper/20221227120000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-27T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-27T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-27T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-28T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-28T00:00:00Z
ecmwf:reference_datetime: 2022-12-27T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221227/00z/0p4-beta/oper/20221227000000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-27T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221227/00z/0p4-beta/oper/20221227000000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-27T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-27T00-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-26T12-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-27T12:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-27T12:00:00Z
ecmwf:reference_datetime: 2022-12-26T12:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221226/12z/0p4-beta/oper/20221226120000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-26T12-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221226/12z/0p4-beta/oper/20221226120000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-26T12-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-26T12-oper-fc-24h
type: application/geo+json

0
id: ecmwf-2022-12-26T00-oper-fc-24h
"bbox: [-180.0, -90.0, 180.0, 90.0]"
datetime: 2022-12-27T00:00:00Z
ecmwf:step: 24h
ecmwf:type: fc
ecmwf:stream: oper
ecmwf:forecast_datetime: 2022-12-27T00:00:00Z
ecmwf:reference_datetime: 2022-12-26T00:00:00Z

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221226/00z/0p4-beta/oper/20221226000000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/wmo-GRIB2
roles: ['data']
owner: ecmwf-2022-12-26T00-oper-fc-24h

0
href: https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221226/00z/0p4-beta/oper/20221226000000-24h-oper-fc.index?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D
type: application/x-ndjson
roles: ['index']
owner: ecmwf-2022-12-26T00-oper-fc-24h

0
rel: collection
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: parent
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast
type: application/json

0
rel: root
href: https://planetarycomputer.microsoft.com/api/stac/v1
type: application/json
title: Microsoft Planetary Computer STAC API

0
rel: self
href: https://planetarycomputer.microsoft.com/api/stac/v1/collections/ecmwf-forecast/items/ecmwf-2022-12-26T00-oper-fc-24h
type: application/geo+json


In [33]:
# Try reading grib data directly from blob storage, would not work
try:
    xr.open_dataset(items[0].assets['data'].href, engine='cfgrib')
except Exception as e:
    print (f"{e}")

Can't create file 'https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/12z/0p4-beta/oper/20221230120000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D.923a8.idx'
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.11/site-packages/cfgrib/messages.py", line 534, in from_indexpath_or_filestream
    with compat_create_exclusive(indexpath) as new_index_file:
  File "/srv/conda/envs/notebook/lib/python3.11/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/srv/conda/envs/notebook/lib/python3.11/site-packages/cfgrib/messages.py", line 500, in compat_create_exclusive
    fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
         ^^^^^^^^

[Errno 2] No such file or directory: 'https://ai4edataeuwest.blob.core.windows.net/ecmwf/20221230/12z/0p4-beta/oper/20221230120000-24h-oper-fc.grib2?st=2024-01-17T06%3A00%3A44Z&se=2024-01-25T06%3A00%3A44Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-18T06%3A00%3A43Z&ske=2024-01-25T06%3A00%3A43Z&sks=b&skv=2021-06-08&sig=7sSgqZkS6fTAwuJldeqNYDHIM/BE5o3HincM%2B980YkY%3D'


In [None]:
import urllib

# Download the grib data to local directory first
datasets_ec = []

for item in items:
    url = item.assets["data"].href
    filename, _ = urllib.request.urlretrieve(url)
    try:
        ds = xr.open_dataset(filename, engine="cfgrib", 
                                           filter_by_keys={
                                            "typeOfLevel": "isobaricInhPa"})
        ds.to_zarr(f"./grib/{item.id}")
        print(f"{item.id} saved.")
    except Exception as e:
        print(f"{e}")

In [78]:
xr_ds = xr.open_dataset('./grib/ecmwf-2022-12-20T00-oper-fc-24h', engine='zarr')

In [86]:
#open multiple dataset
import glob
xr_list = [f for f in glob.glob('./grib/ecmwf*')]
ds = xr.open_mfdataset(xr_list, combine='nested', concat_dim='time', engine='zarr')

In [89]:
ds = ds.sel(longitude=slice(90, 130), latitude=slice(25, -12))
print(f"Total size of dataset: {ds.nbytes / 1e9:.1f} GB")

Total size of dataset: 0.1 GB


Now we can see the disadvantage of non-ARCO ready dataset such as GRIB, where it has to be retrieved to local disk first before processing. After waiting for so long, we only managed to load only a small dataset.

References:
* [Problems with data access](https://tomaugspurger.net/noaa-nwm/02-problems.html)

We'll just have to wait for Microsoft to convert the grib data into ARCO format. 

### ARCO alternative: Use ECMWF data from Google Cloud Bucket

Thankfully, Google is currently embarking on a journey to create [Analysis-Ready, Cloud Optimized ERA-5 and ECMWF Operational Data](https://github.com/google-research/arco-era5) and there is already a whole ton of ARCO-ready dataset.

In [90]:
import fsspec

fs = fsspec.filesystem('gs')
fs.ls('gs://gcp-public-data-arco-era5/ar/')

['gcp-public-data-arco-era5/ar/1959-2022-1h-240x121_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-128x64_equiangular_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-128x64_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-1440x721.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-240x121_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-512x256_equiangular_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-64x32_equiangular_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-6h-64x32_equiangular_with_poles_conservative.zarr',
 'gcp-public-data-arco-era5/ar/1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2',
 'gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg-chunk-1.zarr-v2',
 'gcp-public-data-arco-era5/ar/1959-2022-full_37-6h-0p25deg_der

In [92]:
import xarray as xr

ec_ds = xr.open_zarr(
    'gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2/', 
    chunks={'time': 100, 'latitude': 1000, 'longitude': 1000},
    consolidated=True,
)



In [95]:
print(f"Total size of dataset: {ec_ds.nbytes / 1e12:.1f} TB")

Total size of dataset: 534.4 TB


## Wow, so i can start using this from Planetary Computer ?

Well, yes and no. The GCB hosting ECMWF ARCO dataset is located in US while Microsoft PC is located in West Europe, so there is limited network bandwidth between them as the data is transferred via the internet. It is still possible to do some analysis using direct data query from ARCO dataset but it seems that GCP will throttle the data transfer speed due to .... e-gress charges.

So the only option is to spin up a dask cluster in GCP directly in US. We will not cover that here.

In [107]:
#Example of doing data analysis using ARCO dataset hosted on GCB

ds_v2 = ec_ds.sel(time='2021-01-01', method='nearest').sel(longitude=slice(90, 130), latitude=slice(25, -12))

In [109]:
ds_v2.load()

In [110]:
ds_v2

# Where can I learn more about Planetary Computer ?

* [Introduction by Tom Augspurger to PC at AMS 2024](https://github.com/TomAugspurger/pc-ams/tree/1cf4a4349cac7d574daa8a95db1f8adc8b04d82d)