# Accessing the cloud with fsspec

> Filesystem Spec ([fsspec](https://github.com/fsspec)) is a project to provide a unified pythonic interface to local, remote and embedded file systems and bytes storage. fsspec provides two main concepts: a set of filesystem classes with uniform APIs (i.e., functions such as cp, rm, cat, mkdir, …) supplying operations on a range of storage systems; and top-level convenience functions like fsspec.open(), to allow you to quickly get from a URL to a file-like object that you can use with a third-party library or your own code.

[swiftspec](https://github.com/fsspec/swiftspec) is a plugin for fsspec so that fsspec understands the swift api. You can install ilt with `pip install swiftspec`. It is also available in the python unstable kernel at DKRZ.

This notebook introduces

- the fsspec file system object
- the fsspec mapper
- url chaining

to create, copy and delete objects within DRKZ´s insitutional cloud and other storages and use it in xarray.

## Preparation

### 1. Create a container for your work in the project space.

Open [swiftbrowser.dkrz.de](swiftbrowser.dkrz.de), login and create a container.

### 2. Create an access token and load them for working in a script

In [None]:
%%bash
module load py-python-swiftclient/3.12.0-gcc-11.2.0
swift-token new
#Account: bk1377
#Username: YOUR-USER-NAME
module switch py-python-swiftclient
#cat ~/.swiftenv
#write these variables into a mytoken.py file that can be imported from another python script

In [1]:
from mytoken import *

### 3. Set up environment variables in order to work with fsspec

In [2]:
import os
import fsspec
os.environ["OS_STORAGE_URL"]=OS_STORAGE_URL
os.environ["OS_AUTH_TOKEN"]=OS_AUTH_TOKEN

## The File system object

to use basic shell commands on different storage back ends

In [3]:
fsswift=fsspec.filesystem("swift")

What is stored under the account?

In [4]:
fsswift.ls(OS_STORAGE_URL)

[{'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/dkrz_scratch',
  'size': 0,
  'type': 'directory'},
 {'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/hackathon2023-results',
  'size': 25961183,
  'type': 'directory'},
 {'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/k204210',
  'size': 0,
  'type': 'directory'}]

Note that a swift path starts with *swift://*. The OS_STORAGE_URL is internally translated, we could it as well:

In [5]:
swift_account_url=OS_STORAGE_URL.replace("https://","swift://").replace("/v1/","/")
fsswift.ls(swift_account_url)

[{'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/dkrz_scratch',
  'size': 0,
  'type': 'directory'},
 {'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/hackathon2023-results',
  'size': 25961183,
  'type': 'directory'},
 {'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/k204210',
  'size': 0,
  'type': 'directory'}]

How big is a container inside the storage?

In [6]:
fsswift.du(OS_STORAGE_URL+"/hackathon2023-results")

25961183

Note: Container cannot be created with fsspec. Use the GUI for that.

Note: There are no directories under a container BUT fsspec lets you work with the path as if there would be:

In [7]:
fsswift.mkdir(OS_STORAGE_URL+"/k204210/VIRTUAL_DIRECTORY")

Create objects/files:

In [8]:
with fsswift.open(OS_STORAGE_URL+"/k204210/VIRTUAL_DIRECTORY/newtestfile","w") as f:
    f.write("My first file in the cloud")

Copy files:

(Note that there is also `cp` and `get` in the `fsswift` but it does not seem to work. However, we have:)


In [9]:
help(fsswift.pipe)

Help on function _pipe in module fsspec.asyn:

_pipe(path, value=None, batch_size=None, **kwargs)
    Put value into path
    
    (counterpart to ``cat``)
    
    Parameters
    ----------
    path: string or dict(str, bytes)
        If a string, a single remote location to put ``value`` bytes; if a dict,
        a mapping of {path: bytesvalue}.
    value: bytes, optional
        If using a single path, these are the bytes to put there. Ignored if
        ``path`` is a dict



In [10]:
fsswift.pipe(
    swift_account_url+"/k204210/VIRTUAL_DIRECTORY/newtestfile2",
    fsswift.cat(swift_account_url+"/k204210/VIRTUAL_DIRECTORY/newtestfile")
)

[None]

In [11]:
fsswift.ls(swift_account_url+"/k204210/VIRTUAL_DIRECTORY")

[{'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/k204210/VIRTUAL_DIRECTORY/newtestfile',
  'size': 26,
  'type': 'file',
  'last_modified': '2023-12-06T09:06:35.477220',
  'hash': '35006976bfd3ddf5ae82a2f6b1f21079',
  'content_type': 'application/octet-stream'},
 {'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/k204210/VIRTUAL_DIRECTORY/newtestfile2',
  'size': 26,
  'type': 'file',
  'last_modified': '2023-12-06T09:06:36.271410',
  'hash': '35006976bfd3ddf5ae82a2f6b1f21079',
  'content_type': 'application/octet-stream'}]

Delete files:

In [12]:
fsswift.rm(OS_STORAGE_URL+"/k204210/VIRTUAL_DIRECTORY/newtestfile")
to_delete=[
    a["name"]
    for a in fsswift.ls(swift_account_url+"/k204210/")
]
fsswift.rm(
    to_delete,
    recursive=True
)

[None]

In [13]:
fsswift.ls(swift_account_url+"/k204210")

[]

Note that "mv" in a cloud does not exist. It is an expensive copy and remove. So if you put data in the cloud,

- the location should be fix (otherwise it is a mv again)
- the data should be fix (update also means delete and recreate)

## The mapper

can create a dict-like object for any url:

In [14]:
cloudmapper=fsswift.get_mapper(swift_account_url+"/k204210")

Has an underlying file system again:

In [15]:
cloudmapper.fs

<swiftspec.core.SWIFTFileSystem at 0x7fffe73dc2b0>

Detects the protocol by parsing the first characters of the path:

In [16]:
diskmapper=fsspec.get_mapper("/work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/")

In [17]:
diskmapper.fs

<fsspec.implementations.local.LocalFileSystem at 0x7fffe75396f0>

Is a key-value storage: All keys of the dict correspond to files or object ids whereas the values of each key is the byte content of the key.

In [18]:
print(list(diskmapper.keys()))

['tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_201501-201912.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_202001-202412.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_202501-202912.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_203001-203412.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_203501-203912.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_204001-204412.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_204501-204912.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_205001-205412.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_205501-205912.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_206001-206412.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_206501-206912.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_207001-207412.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_207501-207912.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_208001-208412.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_208501-208912.nc', 'tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_209001-209412.nc', 'tas_Am

Can be used to copy files **across file systems**. Be careful: the keys in the mapper are for all subdirectories!

In [19]:
cloudmapper.update(diskmapper)

Saves the underlying root path:

In [20]:
fsswift.ls(cloudmapper.root)

[{'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/k204210/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_201501-201912.nc',
  'size': 7094228,
  'type': 'file',
  'last_modified': '2023-12-06T09:06:44.279250',
  'hash': '4093945c24f7e07fe99a63b6de534d14',
  'content_type': 'application/octet-stream'},
 {'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/k204210/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_202001-202412.nc',
  'size': 7092506,
  'type': 'file',
  'last_modified': '2023-12-06T09:06:46.131150',
  'hash': '1988ff02cc430b6046255993e4b1e6b0',
  'content_type': 'application/octet-stream'},
 {'name': 'swift://swift.dkrz.de/dkrz_5cfad75f-8778-40d2-bdc0-eec8ae27ad1f/k204210/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_202501-202912.nc',
  'size': 7092539,
  'type': 'file',
  'last_modified': '2023-12-06T09:06:46.894810',
  'hash': '6469e165a86f07645b933e613de99e32',
  'content_type': 'application/octet-stream'},
 {'name': 'swift://swift.dkrz.d

fsspec open files can be used in xarray:

In [21]:
files_for_xarray=[
    fsspec.open(a["name"]).open()
    for a in fsswift.ls(cloudmapper.root)
]

In [26]:
import xarray as xr
ds=xr.open_mfdataset(
    files_for_xarray,
    compat="override",
    coords="minimal"
)
ds=ds.set_coords(["time_bnds","lat_bnds","lon_bnds"])
ds

Unnamed: 0,Array,Chunk
Bytes,16.12 kiB,0.94 kiB
Shape,"(1032, 2)","(60, 2)"
Dask graph,18 chunks in 37 graph layers,18 chunks in 37 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 16.12 kiB 0.94 kiB Shape (1032, 2) (60, 2) Dask graph 18 chunks in 37 graph layers Data type datetime64[ns] numpy.ndarray",2  1032,

Unnamed: 0,Array,Chunk
Bytes,16.12 kiB,0.94 kiB
Shape,"(1032, 2)","(60, 2)"
Dask graph,18 chunks in 37 graph layers,18 chunks in 37 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.02 MiB,180.00 kiB
Shape,"(1032, 192, 2)","(60, 192, 2)"
Dask graph,18 chunks in 55 graph layers,18 chunks in 55 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.02 MiB 180.00 kiB Shape (1032, 192, 2) (60, 192, 2) Dask graph 18 chunks in 55 graph layers Data type float64 numpy.ndarray",2  192  1032,

Unnamed: 0,Array,Chunk
Bytes,3.02 MiB,180.00 kiB
Shape,"(1032, 192, 2)","(60, 192, 2)"
Dask graph,18 chunks in 55 graph layers,18 chunks in 55 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,6.05 MiB,360.00 kiB
Shape,"(1032, 384, 2)","(60, 384, 2)"
Dask graph,18 chunks in 55 graph layers,18 chunks in 55 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 6.05 MiB 360.00 kiB Shape (1032, 384, 2) (60, 384, 2) Dask graph 18 chunks in 55 graph layers Data type float64 numpy.ndarray",2  384  1032,

Unnamed: 0,Array,Chunk
Bytes,6.05 MiB,360.00 kiB
Shape,"(1032, 384, 2)","(60, 384, 2)"
Dask graph,18 chunks in 55 graph layers,18 chunks in 55 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,290.25 MiB,16.88 MiB
Shape,"(1032, 192, 384)","(60, 192, 384)"
Dask graph,18 chunks in 37 graph layers,18 chunks in 37 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 290.25 MiB 16.88 MiB Shape (1032, 192, 384) (60, 192, 384) Dask graph 18 chunks in 37 graph layers Data type float32 numpy.ndarray",384  192  1032,

Unnamed: 0,Array,Chunk
Bytes,290.25 MiB,16.88 MiB
Shape,"(1032, 192, 384)","(60, 192, 384)"
Dask graph,18 chunks in 37 graph layers,18 chunks in 37 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


We can also write to the swift cloud with zarr. Note that *zarr* uses fsspec internally so that we can directly pass the url to `to_zarr` and `open_zarr`.

In [27]:
target_location=cloudmapper.root+"/testzarr"
#    fsswift.open(,"w").open()
ds.to_zarr(
    target_location
)

<xarray.backends.zarr.ZarrStore at 0x7fff8481aab0>

Clean up:

In [28]:
to_be_removed=list(cloudmapper.keys())
for k in to_be_removed:
    cloudmapper.pop(k, None)

In [29]:
fsswift.ls(cloudmapper.root)

[]

## URL chains

URLs can be chained with the separator `::`. In this example, we access a specific file in a zip container.

In [30]:
!zip /home/k/k204210/eerie-io/analysis_support/temp.zip /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/*

updating: work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_201501-201912.nc (deflated 1%)
updating: work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_202001-202412.nc (deflated 1%)
updating: work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_202501-202912.nc (deflated 1%)
updating: work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_203001-203412.nc (deflated 1%)
updating: work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_203501-203912.nc (deflated 1%)
updating: work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/

In [31]:
xr.open_dataset(
    fsspec.open(
        'zip://work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_201501-201912.nc'+
        "::"+
        "/home/k/k204210/eerie-io/analysis_support/temp.zip"
    ).open()
)

This allows us to e.g. access specific files from a tar archive.

There are many [implementations](https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations). for fsspec.                                 