In [16]:
import intake
from pathlib import Path

# The Data Catalog


## Connecting to the Catalog with `sshfs`

We can use `sshfs` to mount the catalog file system on CSD3 directly onto our local machine. The following commands show how to mount the data catalog in a new folder called `data`. The second command mounts the catalog. You will need to modify the command to use your CSD3 login name.

```{note}
This assumes you already have an account on CSD3. Please follow the guide [here](https://docs.hpc.cam.ac.uk/hpc/user-guide/quickstart.html) to setup your login.
```

```bash
mkdir ~/mast-data
sshfs -o allow_other,auto_cache,reconnect ~/mast-data <your-csd3-username>@login.hpc.cam.ac.uk:/rds/project/rds-mOlK9qn0PlQ/ir-jack5
```


## Opening the Catalog


In [17]:
catalog = intake.open_catalog(Path('~/mast-data/mast/catalog.yml').expanduser())
catalog

mast:
  args:
    path: /home/lhs18285/mast-data/mast/catalog.yml
  description: The MAST Data Archive Catalog
  driver: intake.catalog.local.YAMLFileCatalog
  metadata:
    version: 1


In [18]:
intake.interface.gui.GUI([catalog])

Accessing a single record for the catalog

In [19]:
catalog.ACT_C_PLA_TEMPERATURE

ACT_C_PLA_TEMPERATURE:
  args:
    fastzarr: true
    urlpath: /home/lhs18285/mast-data/mast/ACT_C_PLA_TEMPERATURE.zarr
  description: Carbon temperature
  driver: intake_xarray_datatree.intake_xarray_datatree.DataTreeSource
  metadata:
    catalog_dir: /home/lhs18285/mast-data/mast/
    description: ''
    label: Carbon temperature
    rank: 2
    shape:
    - 2
    - 32
    time_index: 0
    units: eV


Load data from the data catalog directly

In [29]:
dataset = catalog.ACT_C_PLA_TEMPERATURE.read()
print("Number for shots in dataset:", len(dataset))

Number for shots in dataset: 712


Look at the first shot in the dataset

In [30]:
dataset['28977']

Unnamed: 0,Array,Chunk
Bytes,256 B,256 B
Shape,"(2, 32)","(2, 32)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 256 B 256 B Shape (2, 32) (2, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",32  2,

Unnamed: 0,Array,Chunk
Bytes,256 B,256 B
Shape,"(2, 32)","(2, 32)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,256 B,256 B
Shape,"(2, 32)","(2, 32)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 256 B 256 B Shape (2, 32) (2, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",32  2,

Unnamed: 0,Array,Chunk
Bytes,256 B,256 B
Shape,"(2, 32)","(2, 32)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
