# Access to data in the cloud (GCS)

## Import modules and libraries

*First, let's make sure the Python env is correct to run this notebook*:

In [1]:
import os, sys, urllib, tempfile
with tempfile.TemporaryDirectory() as tmpdirname:
    sys.path.append(tmpdirname)
    repo = "https://raw.githubusercontent.com/obidam/ds2-2023/main/"
    urllib.request.urlretrieve(os.path.join(repo, "utils.py"), 
                               os.path.join(tmpdirname, "utils.py"))
    from utils import check_up_env
    check_up_env()

Running on your own environment
Make sure to have all necessary packages installed
See: https://github.com/obidam/ds2-2023/blob/main/binder/environment.yml


*Then, import the usual suspects*:

In [2]:
import xarray as xr
from intake import open_catalog

import sys
import gcsfs
import xarray as xr
import intake
import pandas as pd

## Read data from Google Cloud Storage (gcsfs)

### Access and listing

In [3]:
# Define cloud file system access point:
fs = gcsfs.GCSFileSystem(project='alert-ground-261008', token='anon', access='read_only')

# And list content of a bucket:
fs.ls('opendata_bdo2020')

['opendata_bdo2020/EN.4.2.1.f.analysis.g10.zarr',
 'opendata_bdo2020/GLOBAL_ARGO_SDL2000',
 'opendata_bdo2020/GLOB_HOMOGENEOUS_variables.zarr',
 'opendata_bdo2020/Global_Argo_VerticalMean_Temperature.zarr',
 'opendata_bdo2020/dt_global_allsat_phy_l4_mm']

But data access with ``gcsfs`` is critically dependant on the GCS set-up. For instance the following project does not allow to list the bucket content:

In [4]:
fs2 = gcsfs.GCSFileSystem(project='alert-ground-261008', token='anon', access='read_only')
try:
    fs2.ls('data_bdo2020')
except:
    print(sys.exc_info()[0])

_request non-retriable exception: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)., 401
Traceback (most recent call last):
  File "/Users/gmaze/miniconda3/envs/ds2/lib/python3.10/site-packages/gcsfs/retry.py", line 115, in retry_request
    return await func(*args, **kwargs)
  File "/Users/gmaze/miniconda3/envs/ds2/lib/python3.10/site-packages/gcsfs/core.py", line 415, in _request
    validate_response(status, contents, path, args)
  File "/Users/gmaze/miniconda3/envs/ds2/lib/python3.10/site-packages/gcsfs/retry.py", line 102, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)., 401


<class 'gcsfs.retry.HttpError'>


On the other hand, some dataset may not be free and use a requester pay model. 
In this case, you would have to properly manage authentication:

In [5]:
fs3 = gcsfs.GCSFileSystem(project='poised-honor-358', token='anon')
try:
    fs3.ls('sonific01')
except ValueError as e:
    print(str(e))

Bucket is requester pays. Set `requester_pays=True` when creating the GCSFileSystem.


### Load data

In [6]:
gcsmap = fs.get_mapper("opendata_bdo2020/EN.4.2.1.f.analysis.g10.zarr")
ds = xr.open_zarr(gcsmap)

# ds = xr.open_dataset("gcs://opendata_bdo2020/EN.4.2.1.f.analysis.g10.zarr",
#                      backend_kwargs={"storage_options": {"project": "alert-ground-261008", "token": 'anon', 'access':'read_only'}},
#                     engine="zarr")

print("Size of the dataset:", ds.nbytes/1e9,"Gb")
print(ds)

Size of the dataset: 52.2317975 Gb
<xarray.Dataset>
Dimensions:                          (depth: 42, time: 832, bnds: 2, lat: 173,
                                      lon: 360)
Coordinates:
  * depth                            (depth) float32 5.022 15.08 ... 5.35e+03
  * lat                              (lat) float32 -83.0 -82.0 ... 88.0 89.0
  * lon                              (lon) float32 1.0 2.0 3.0 ... 359.0 360.0
  * time                             (time) datetime64[ns] 1950-01-16T12:00:0...
Dimensions without coordinates: bnds
Data variables:
    depth_bnds                       (time, depth, bnds) float32 dask.array<chunksize=(1, 42, 2), meta=np.ndarray>
    salinity                         (time, depth, lat, lon) float32 dask.array<chunksize=(1, 42, 173, 360), meta=np.ndarray>
    salinity_observation_weights     (time, depth, lat, lon) float32 dask.array<chunksize=(1, 42, 173, 360), meta=np.ndarray>
    salinity_uncertainty             (time, depth, lat, lon) float32 dask

In [7]:
# Load another dataset:
gcsmap = fs.get_mapper('opendata_bdo2020/GLOBAL_ARGO_SDL2000')
ds = xr.open_zarr(gcsmap, consolidated=False)
print("Size of the dataset:", ds.nbytes/1e9,"Gb")
print(ds)

Size of the dataset: 5.974301444 Gb
<xarray.Dataset>
Dimensions:    (depth: 381, samples: 976831)
Coordinates:
  * depth      (depth) float64 0.0 -5.0 -10.0 ... -1.89e+03 -1.895e+03 -1.9e+03
  * samples    (samples) int64 0 1 2 3 4 ... 976826 976827 976828 976829 976830
Data variables:
    julianday  (samples) float32 dask.array<chunksize=(6000,), meta=np.ndarray>
    latitude   (samples) float32 dask.array<chunksize=(6000,), meta=np.ndarray>
    longitude  (samples) float32 dask.array<chunksize=(6000,), meta=np.ndarray>
    so         (samples, depth) float64 dask.array<chunksize=(6000, 381), meta=np.ndarray>
    thetao     (samples, depth) float64 dask.array<chunksize=(6000, 381), meta=np.ndarray>
Attributes:
    Conventions:  CF-1.6
    institution:  Argo-France
    source:       Argo float
    title:        Argo float profiles interpolated onto Standard Depth Levels


## Use intake catalog of data

The catalog also uses the gcsfs entry point, but with intake it's transparent to the user:

### Access and listing of the catalog

In [8]:
from intake import open_catalog

In [9]:
catalog_url = 'https://raw.githubusercontent.com/obidam/ds2-2023/main/ds2_data_catalog.yml'
cat = open_catalog(catalog_url)
list(cat)

['argo_global_sdl',
 'argo_global_sdl_homogeneous',
 'argo_global_vertical_mean',
 'en4',
 'sea_surface_height']

### Load data

In [10]:
ds = cat['en4'].read_chunked()
print("Size of the dataset:", ds.nbytes/1e9,"Gb")
ds

Size of the dataset: 52.2317975 Gb


Unnamed: 0,Array,Chunk
Bytes,273.00 kiB,336 B
Shape,"(832, 42, 2)","(1, 42, 2)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 273.00 kiB 336 B Shape (832, 42, 2) (1, 42, 2) Dask graph 832 chunks in 2 graph layers Data type float32 numpy.ndarray",2  42  832,

Unnamed: 0,Array,Chunk
Bytes,273.00 kiB,336 B
Shape,"(832, 42, 2)","(1, 42, 2)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.11 GiB 9.98 MiB Shape (832, 42, 173, 360) (1, 42, 173, 360) Dask graph 832 chunks in 2 graph layers Data type float32 numpy.ndarray",832  1  360  173  42,

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.11 GiB 9.98 MiB Shape (832, 42, 173, 360) (1, 42, 173, 360) Dask graph 832 chunks in 2 graph layers Data type float32 numpy.ndarray",832  1  360  173  42,

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.11 GiB 9.98 MiB Shape (832, 42, 173, 360) (1, 42, 173, 360) Dask graph 832 chunks in 2 graph layers Data type float32 numpy.ndarray",832  1  360  173  42,

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.11 GiB 9.98 MiB Shape (832, 42, 173, 360) (1, 42, 173, 360) Dask graph 832 chunks in 2 graph layers Data type float32 numpy.ndarray",832  1  360  173  42,

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.11 GiB 9.98 MiB Shape (832, 42, 173, 360) (1, 42, 173, 360) Dask graph 832 chunks in 2 graph layers Data type float32 numpy.ndarray",832  1  360  173  42,

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.11 GiB 9.98 MiB Shape (832, 42, 173, 360) (1, 42, 173, 360) Dask graph 832 chunks in 2 graph layers Data type float32 numpy.ndarray",832  1  360  173  42,

Unnamed: 0,Array,Chunk
Bytes,8.11 GiB,9.98 MiB
Shape,"(832, 42, 173, 360)","(1, 42, 173, 360)"
Dask graph,832 chunks in 2 graph layers,832 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,13.00 kiB,13.00 kiB
Shape,"(832, 2)","(832, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 13.00 kiB 13.00 kiB Shape (832, 2) (832, 2) Dask graph 1 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",2  832,

Unnamed: 0,Array,Chunk
Bytes,13.00 kiB,13.00 kiB
Shape,"(832, 2)","(832, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray


In [11]:
ds  = cat["sea_surface_height"].to_dask()
print("Size of the dataset:", ds.nbytes/1e9,"Gb")
ds

Size of the dataset: 18.1203746 Gb


Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.41 GiB 7.91 MiB Shape (312, 720, 1440) (1, 720, 1440) Dask graph 312 chunks in 2 graph layers Data type float64 numpy.ndarray",1440  720  312,

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.44 kiB,8 B
Shape,"(312,)","(1,)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.44 kiB 8 B Shape (312,) (1,) Dask graph 312 chunks in 2 graph layers Data type float64 numpy.ndarray",312  1,

Unnamed: 0,Array,Chunk
Bytes,2.44 kiB,8 B
Shape,"(312,)","(1,)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.41 GiB 7.91 MiB Shape (312, 720, 1440) (1, 720, 1440) Dask graph 312 chunks in 2 graph layers Data type float64 numpy.ndarray",1440  720  312,

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.71 MiB,5.62 kiB
Shape,"(312, 720, 2)","(1, 720, 2)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.71 MiB 5.62 kiB Shape (312, 720, 2) (1, 720, 2) Dask graph 312 chunks in 2 graph layers Data type float32 numpy.ndarray",2  720  312,

Unnamed: 0,Array,Chunk
Bytes,1.71 MiB,5.62 kiB
Shape,"(312, 720, 2)","(1, 720, 2)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.43 MiB,11.25 kiB
Shape,"(312, 1440, 2)","(1, 1440, 2)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.43 MiB 11.25 kiB Shape (312, 1440, 2) (1, 1440, 2) Dask graph 312 chunks in 2 graph layers Data type float32 numpy.ndarray",2  1440  312,

Unnamed: 0,Array,Chunk
Bytes,3.43 MiB,11.25 kiB
Shape,"(312, 1440, 2)","(1, 1440, 2)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.41 GiB 7.91 MiB Shape (312, 720, 1440) (1, 720, 1440) Dask graph 312 chunks in 2 graph layers Data type float64 numpy.ndarray",1440  720  312,

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.41 GiB 7.91 MiB Shape (312, 720, 1440) (1, 720, 1440) Dask graph 312 chunks in 2 graph layers Data type float64 numpy.ndarray",1440  720  312,

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.41 GiB 7.91 MiB Shape (312, 720, 1440) (1, 720, 1440) Dask graph 312 chunks in 2 graph layers Data type float64 numpy.ndarray",1440  720  312,

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.41 GiB 7.91 MiB Shape (312, 720, 1440) (1, 720, 1440) Dask graph 312 chunks in 2 graph layers Data type float64 numpy.ndarray",1440  720  312,

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.41 GiB 7.91 MiB Shape (312, 720, 1440) (1, 720, 1440) Dask graph 312 chunks in 2 graph layers Data type float64 numpy.ndarray",1440  720  312,

Unnamed: 0,Array,Chunk
Bytes,2.41 GiB,7.91 MiB
Shape,"(312, 720, 1440)","(1, 720, 1440)"
Dask graph,312 chunks in 2 graph layers,312 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


# Pangeo data

https://github.com/pangeo-data/pangeo-datastore

https://catalog.pangeo.io/

## Explore catalog

In [None]:
from intake import open_catalog

pangeo_cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml")
list(pangeo_cat)

In [None]:
list(pangeo_cat.ocean)
# print(list(pangeo_cat.atmosphere))
# print(list(pangeo_cat.hydro))
# pangeo_cat.walk(depth=5)

# CMIP6 data

In [None]:
# this only needs to be created once
gcs = gcsfs.GCSFileSystem(token='anon')

In [None]:
df_full = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df_full.sample(10)

In [None]:
# df = df_full.query("activity_id=='CMIP' & table_id == 'Omon' & variable_id == 'thetao' & experiment_id == 'historical' & member_id == 'r1i1p1f1'")
df = df_full.query("activity_id=='CMIP' & table_id == 'Omon' & institution_id == 'CNRM-CERFACS' & experiment_id == 'historical'")
# df = df_full.query('institution_id == "CNRM-CERFACS" & member_id=="r1i1p1f2" & source_id=="CNRM-CM6-1"')

# df = df_full.query("activity_id=='CMIP' & table_id == 'Omon' & variable_id == 'thetao' & experiment_id == 'abrupt-4xCO2'")

# df = df.query("source_id=='CNRM-CM6-1-HR' & variable_id=='thetao'") # Horizontal resolution up to 1/4 deg
# df = df.query("source_id=='CNRM-ESM2-1' & variable_id=='thetao'") # Horizontal resolution up to 1deg
df = df.query("source_id=='CNRM-ESM2-1' & (variable_id=='thetao' | variable_id=='so')") # Horizontal resolution up to 1deg

# df = df.sort_values('version')
df = df.sort_values('member_id')
df

In [None]:
# get the path to a specific zarr store (the first one from the dataframe above)
zstore = df.zstore.values[-1]
print(zstore)

# create a mutable-mapping-style interface to the store
mapper = gcs.get_mapper(zstore)

# open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)
print("Size of the dataset:", ds.nbytes/1e9,"Gb")

ds

In [None]:
sst = ds['thetao'].sel(lev=0, method='nearest')
sst

In [None]:
def open_cmip6(df_row):
    # get the path to zarr store
    zstore = df.zstore.values[-1]
#     print(zstore)
    
    # create a mutable-mapping-style interface to the store
    mapper = gcs.get_mapper(zstore)

    # open it using xarray and zarr
    return xr.open_zarr(mapper, consolidated=True)

ds = open_cmip6(df.iloc[0])
print("Size of the dataset:", ds.nbytes/1e9,"Gb")
ds

In [None]:
# Compute size of the df selection:
total_size = 0 # Gb
for index, row in df.iterrows():
    ds = open_cmip6(row)
    total_size += ds.nbytes/1e9
print("Size of the selection of datasets:", total_size, "Gb")    