# Use SpatialData with your data

The `spatialdata` framework has three ways to construct `SpatialData` objects:

1. you can read a `SpatialData` object that has already been saved to `.zarr` in the SpatialData Zarr format;
    1. from disk
    2. from the cloud
3. you can use the reader functions from `spatialdata-io`;
4. you can construct a `SpatialData` object from scratch using our Python `spatialdata` APIs;

This notebook will discuss all of them.

## Reading SpatialData `.zarr` data

### The distinction between Zarr, OME-NGFF and the SpatialData format

Let's start with a clarification on the storage format.

[Zarr](https://zarr.dev/) is a storage format to save data on-disk or in the cloud in a performant and interoperble way. A Zarr object saved on-disk or in the cloud is referred to as a *Zarr store*. Effectively a *Zarr store* is not a file, but a folder containing data and metadata. Zarr is optimized to store tensor data (such as large images).

[OME-NGFF](https://ngff.openmicroscopy.org/latest/) is a specification that describes how to structure the storage of bioimaging data and metadata. For instance it defines a community-agreed system for storing multiple resolutions for large images, and to divide them into smaller chunks. It also defines how to specify axes, coordinate systems and coordinate transformations to describe the spatial context of the data. OME-NGFF does not require to save the data to Zarr, but the most used implementation of the specification is in Zarr and is called [OME-Zarr](https://link.springer.com/article/10.1007/s00418-023-02209-1).

The SpatialData Zarr format, which is described in our [design doc](https://spatialdata.scverse.org/en/latest/design_doc.html), is an extension of the OME-NGFF specification, which makes use of the OME-Zarr, the [AnnData Zarr](https://anndata.readthedocs.io/en/latest/fileformat-prose.html) and the [Parquet](https://parquet.apache.org/) file formats. We need to use these combination of technologies because currently OME-NGFF does not provide all the fundamentals required for storing spatial omics dataset; nevertheless, we try to stay as close as OME-NGFF as possible, and we are contributing to ultimately make spatial omics support available in pure OME-NGFF.

### Compatible `.zarr` stores
`spatialdata` can read SpatialData Zarr data. Practically, this is data that has been previously saved using the `spatialdata` APIs in Python. Outside Python there are preliminary efforts to make possible to save SpatialData Zarr objects. For instance in R: https://github.com/HelenaLC/SpatialData (not yet ready!).

### Non-compatible `.zarr` stores
`spatialdata` cannot read arbitrary Zarr files, for instance the `feature_slice.zarr` file in Visium HD data is not a SpatialData Zarr file (we will see how to read Visium HD data later). `spatialdata` cannot read arbitrary OME-Zarr files (but eventually our aim would be to make every OME-Zarr file compatible).

#### Example datasets

You can download example SpatialData Zarr files [from our documentation](https://spatialdata.scverse.org/en/latest/tutorials/notebooks/datasets/README.html), example below.

| Technology                                | Sample                                                    | File Size | Filename (spatialdata-sandbox) | download data                                                                                                     | license           |
| :---------------------------------------- | :-------------------------------------------------------- | --------: | :----------------------------- | :---------------------------------------------------------------------------------------------- | :---------------- |
| Visium HD                                 | Mouse intestin [^1]                                       |      1 GB | visium_hd_3.0.0_id             | [.zarr.zip](https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_hd_3.0.0_io.zip)          | CCA               |
| Visium                                    | Breast cancer [^2]                                        |    1.5 GB | visium_associated_xenium_io    | [.zarr.zip](https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_associated_xenium_io.zip) | CCA               |
| Xenium                                    | Breast cancer [^2]                                        |    2.8 GB | xenium_rep1_io                 | [.zarr.zip](https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep1_io.zip)              | CCA               |

Sources.
1. From https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine
2. Janesick, A. et al. High resolution mapping of the breast cancer tumor microenvironment using integrated single cell, spatial and in situ analysis of FFPE tissue. bioRxiv 2022.10.06.510405 (2022) doi:10.1101/2022.10.06.510405.

### APIs to read SpatialData `.zarr` data from disk

Here is an example of writing an in-memory example SpatialData object to SpatialData and then reading it again.

In [6]:
from pathlib import Path
from tempfile import TemporaryDirectory

from spatialdata import SpatialData, read_zarr
from spatialdata.datasets import blobs

sdata = blobs()
print(sdata)
print()

with TemporaryDirectory() as tmpdir:
    f = Path(tmpdir) / "data.zarr"
    sdata.write(f)
    # 2 equivalent alternatives:
    from_disk = read_zarr(f)
    from_disk = SpatialData.read(f)
    print(from_disk)

  return lib.overlaps(a, b, **kwargs)
  table = TableModel.parse(table, region=shapes_name, region_key=region_key, instance_key=instance_key)


SpatialData object with:
├── Images
│     ├── 'blobs_image': SpatialImage[cyx] (3, 512, 512)
│     └── 'blobs_multiscale_image': MultiscaleSpatialImage[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│     ├── 'blobs_labels': SpatialImage[yx] (512, 512)
│     └── 'blobs_multiscale_labels': MultiscaleSpatialImage[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     └── 'blobs_points': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│     ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│     ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│     └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (26, 3)
with coordinate systems:
▸ 'global', with elements:
        blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)

SpatialData o

### APIs to read SpatialData `.zarr` data from the cloud
Remote access of `.zarr` data is currently only partially supported, see more here: https://github.com/scverse/spatialdata/discussions/526.

## Reader functions from `spatialdata-io`