Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
4dd68e6
added readme for dev res (file format)
LucaMarconato Aug 23, 2023
e0b05ac
updated readme
LucaMarconato Aug 23, 2023
d8c5abb
update readme
LucaMarconato Aug 23, 2023
1850c77
improved readme
LucaMarconato Aug 23, 2023
b75745c
added multiple elems and some transformations
LucaMarconato Aug 23, 2023
0d0db06
added zarr
LucaMarconato Aug 23, 2023
38ac15f
added sequence
LucaMarconato Aug 23, 2023
85bf0bc
added tech notes to readme
LucaMarconato Aug 23, 2023
0dc43af
Update Readme.md: fixed links
LucaMarconato Aug 23, 2023
2079e59
autorun: storage format
LucaMarconato Aug 23, 2023
edbdafe
Merge branch 'dev_notebooks' of https://github.com/scverse/spatialdat…
LucaMarconato Aug 23, 2023
f797d96
autorun: storage format; spatialdata from fa096da (v0.0.12)
LucaMarconato Aug 23, 2023
1b39433
fix readme
LucaMarconato Aug 23, 2023
9e2adc6
index.html
LucaMarconato Aug 23, 2023
50bc846
fix readmes
LucaMarconato Aug 23, 2023
c0977a6
fix readme
LucaMarconato Aug 23, 2023
aed6644
preview html
LucaMarconato Aug 23, 2023
c31df08
fix
LucaMarconato Aug 23, 2023
a9c9c4f
fix readme
LucaMarconato Aug 23, 2023
db5240e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 23, 2023
7fed5c8
fix pre-commit
LucaMarconato Aug 23, 2023
df862b9
Merge branch 'dev_notebooks' of https://github.com/scverse/spatialdat…
LucaMarconato Aug 23, 2023
ff574ce
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 23, 2023
283598d
fix pre-commit
LucaMarconato Aug 23, 2023
83dc5e2
Merge branch 'dev_notebooks' of https://github.com/scverse/spatialdat…
LucaMarconato Aug 23, 2023
c082a78
table data was missing (gitignore)
LucaMarconato Sep 19, 2023
f267de1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 19, 2023
8bad376
fix docs
LucaMarconato Sep 24, 2023
0612bf0
Merge branch 'main' into dev_notebooks
LucaMarconato Sep 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@
"notebooks/paper_reproducibility",
"notebooks/examples/*.zarr" "references.md",
"Readme.md", # hack cause git his acting up
"notebooks/developers_resources/storage_format/*.ipynb",
"notebooks/developers_resources/storage_format/Readme.md",
]
# Ignore warnings.
nitpicky = False # TODO: solve upstream.
Expand Down
29 changes: 15 additions & 14 deletions datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,21 @@ The example notebooks operate on a set of spatial omics datasets that can be dow

Here you can find the dataset hosted in S3 object storage.

| Dataset | .zarr.zip | S3 (see note below!) |
| :-------------------------: | :----------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------: |
| cosmx_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/cosmx_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/cosmx_io.zarr/> |
| mcmicro_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/mcmicro_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/mcmicro_io.zarr/> |
| merfish | <https://s3.embl.de/spatialdata/spatialdata-sandbox/merfish.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/merfish.zarr/> |
| mibitof | <https://s3.embl.de/spatialdata/spatialdata-sandbox/mibitof.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/mibitof.zarr/> |
| steinbock_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/steinbock_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/steinbock_io.zarr/> |
| toy | <https://s3.embl.de/spatialdata/spatialdata-sandbox/toy.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/toy.zarr/> |
| visium | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium.zarr/> |
| visium_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_io.zarr/> |
| visium_associated_xenium_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_associated_xenium_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_associated_xenium_io.zarr/> |
| xenium_rep1_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep1_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep1_io.zarr/> |
| xenium_rep2_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep2_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep2_io.zarr/> |
| Dataset | .zarr.zip | S3 (see note below!) |
| :----------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------: |
| cosmx_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/cosmx_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/cosmx_io.zarr> |
| mcmicro_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/mcmicro_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/mcmicro_io.zarr> |
| merfish | <https://s3.embl.de/spatialdata/spatialdata-sandbox/merfish.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/merfish.zarr> |
| mibitof | <https://s3.embl.de/spatialdata/spatialdata-sandbox/mibitof.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/mibitof.zarr> |
| steinbock_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/steinbock_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/steinbock_io.zarr> |
| toy | <https://s3.embl.de/spatialdata/spatialdata-sandbox/toy.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/toy.zarr> |
| visium | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium.zarr> |
| visium_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_io.zarr> |
| visium_associated_xenium_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_associated_xenium_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/visium_associated_xenium_io.zarr> |
| xenium_rep1_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep1_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep1_io.zarr> |
| xenium_rep2_io | <https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep2_io.zip> | <https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep2_io.zarr> |
| [additional resources for methods developers](https://github.com/scverse/spatialdata-notebooks/blob/main/notebooks/notebooks/developers_resources/storage_format/) | - | - |

## Note

Opening the above URLs in a web browser would not work, you need to treat the URLs as Zarr stores. For example if you append `.zgroup` to any of the URLs above you will be able to see that file.
Opening the above URLs in a web browser would not work, you need to treat the URLs as Zarr stores. For example if you append `/.zgroup` to any of the `.zarr` URLs above you will be able to see that file.
1 change: 1 addition & 0 deletions notebooks/developers_resources/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
!data/
1 change: 1 addition & 0 deletions notebooks/developers_resources/storage_format/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
!*.zarr
37 changes: 37 additions & 0 deletions notebooks/developers_resources/storage_format/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Examples covering the whole storage specification

This directory offers comprehensive resources for developers that want to interface their methods with the SpatialData format in a robust way.

## Why this repository

The file storage format adopted by SpatialData is built on top of the latest version of the well-documented [OME-NGFF specification](https://ngff.openmicroscopy.org/latest/index.html), but it also uses _some_ less-documented features of the OME-NGFF specification that are still [under review](https://github.com/ome/ngff/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc), or experimental storage strategies that will be eventually discussed with the NGFF community.
This repository addresses the need for communicating the storage specification to other developers in a complete and robust way.

## What this repository contains

This directory contains notebooks that operate on lightweight datasets.

- Each notebook covers a particular aspect of the storage specification and ~~all the~~ _the main (work in progress)_ edge cases of the specification are covered in at least one of the notebooks.
- All the notebooks are run every 24h (work in progress, automatic run temporarily disabled) against the `main` branch of the `spatialdata` repository. Each notebook creates a dataset, writes it to disk, reloads it in memory, rewrites it to disk to check for consistency, reloads it again in memory and plots it.
- The disk storage is committed to GitHub so that the output of each daily run is associated to a commit, the commit message is "autorun: storage format; spatialdata from <commit hash> <optional (commit tag)>". Examples of commit messages are:
- `autorun: storage format; spatialdata from al29fak`
- `autorun: storage format; spatialdata from fa096da (v0.0.12)`
- The `.zarr` data produced by every run is available in the current directory, in the commit corresponding to the run.
- The data is also [uploaded to S3](https://refined-github-html-preview.kidonng.workers.dev/scverse/spatialdata-notebooks/raw/dev_notebooks/notebooks/developers_resources/storage_format/index.html), both as Zarr directories and as zipped files.

## How to use this repository

Practically, a third party tool (e.g. R reader, format converter, JavaScript data visualizer, etc.) that runs correctly on the lightweight datasets from this repository, should be guaranteed to run correctly on any SpatialData dataset.

We recommend the following.

- Implement your readers on the data from the latest run available (look for the latest commit with message `autorun: storage format; ...`).
- Set up an automated test (e.g. daily) that gets the latest converted data (you can use a `git pull` or download the data from S3) and runs your code on it.
- If your reader fails, you can inspect the corresponding commit in this repository to see what has changed in the storage specification; in particular, you may find useful to compare different commits using the GitHub compare function, accessible with the following syntax: https://github.com/scverse/spatialdata-notebooks/compare/267adb1..5847084

## Important technical notes

- The most crucial part of the metadata is stored, for each spatial element, in the `.zattr` file. [Example](transformation_identity.zarr/images/blobs_image/.zattrs).
- The `zmetadata` in the root folder stores redundant information and is used for storage systems that do not support `ls` operations (e.g. S3). [Example](transformation_identity.zarr/zmetadata).
- Please keep in mind that the data that we generate daily are produced against the latest `main` and not the latest release. This means that in the event of a format change (which should anyway happen less and less frequently as the frameworks become more mature), this does not immediately translate into a bug for the user. In fact, the user will still be using the latest release version for a while, giving time to developers to update the tools before the users are affected.
- When the format will become more mature we will provide converters between previous versions of the format. Luckily, heavy data like images and labels are stable from NGFF v0.4, therefore the converters will mostly perform lightweight conversions of the metadata and relatively small conversions of the geometries.
121 changes: 121 additions & 0 deletions notebooks/developers_resources/storage_format/__template__.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7bf33a06-009a-487a-991b-715a28b1069b",
"metadata": {},
"source": [
"# Scope and description"
]
},
{
"cell_type": "markdown",
"id": "d96031b8-ee40-43ca-9001-011f4751d4c1",
"metadata": {},
"source": [
"<One line description>\n",
"\n",
"Elements contained:\n",
"- <Element>\n",
"- <Another element>\n",
"\n",
"Annotations contained:\n",
"- <table annotating...>\n",
"\n",
"<Additional notes>"
]
},
{
"cell_type": "markdown",
"id": "5a4c6566-882a-48d4-a4cd-0777332a5a99",
"metadata": {},
"source": [
"# Prepare the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "094509a1-86fb-4aa6-b920-11e34ed43e1c",
"metadata": {},
"outputs": [],
"source": [
"NAME = \"name_of_the_notebook\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "332c9c62-7451-4d8e-ba9e-13c12131c312",
"metadata": {},
"outputs": [],
"source": [
"import spatialdata as sd\n",
"import spatialdata_plot\n",
"from spatialdata.datasets import blobs\n",
"from io_utils import delete_old_data, write_sdata_and_check_consistency\n",
"\n",
"delete_old_data(name=NAME)\n",
"sdata = <create sdata here>\n",
"sdata"
]
},
{
"cell_type": "markdown",
"id": "800a540b-2906-46c9-8245-99def219692f",
"metadata": {},
"source": [
"# Read-write and IO validation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6c01345-27de-42f3-af07-75a43e84808a",
"metadata": {},
"outputs": [],
"source": [
"write_sdata_and_check_consistency(sdata=sdata, name=NAME)"
]
},
{
"cell_type": "markdown",
"id": "06806f13-f346-410f-ab04-fdd61fd77a10",
"metadata": {},
"source": [
"# Plot the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af8d7108-8b7c-461e-871e-77d82de82af6",
"metadata": {},
"outputs": [],
"source": [
"<plot teh data>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
108 changes: 108 additions & 0 deletions notebooks/developers_resources/storage_format/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
<html>
<head>
<title>Index</title>
</head>
<body>
<table border="1">
<tr>
<th>Element</th>
<th>Link</th>
<th>Zip</th>
</tr>
<tr>
<td>transformation_scale</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_scale.zarr"
>transformation_scale.zarr</a
>
</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_scale.zarr.zip"
>transformation_scale.zarr.zip</a
>
</td>
</tr>
<tr>
<td>transformation_identity</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_identity.zarr"
>transformation_identity.zarr</a
>
</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_identity.zarr.zip"
>transformation_identity.zarr.zip</a
>
</td>
</tr>
<tr>
<td>multiple_elements</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/multiple_elements.zarr"
>multiple_elements.zarr</a
>
</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/multiple_elements.zarr.zip"
>multiple_elements.zarr.zip</a
>
</td>
</tr>
<tr>
<td>transformation_translation</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_translation.zarr"
>transformation_translation.zarr</a
>
</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_translation.zarr.zip"
>transformation_translation.zarr.zip</a
>
</td>
</tr>
<tr>
<td>transformation_sequence</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_sequence.zarr"
>transformation_sequence.zarr</a
>
</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_sequence.zarr.zip"
>transformation_sequence.zarr.zip</a
>
</td>
</tr>
<tr>
<td>transformation_affine</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_affine.zarr"
>transformation_affine.zarr</a
>
</td>
<td>
<a
href="https://s3.embl.de/spatialdata/developers_resources/storage_format/transformation_affine.zarr.zip"
>transformation_affine.zarr.zip</a
>
</td>
</tr>
</table>
Note: opening the above .zarr URLs in a web browser would not work, you
need to treat the URLs as Zarr stores. For example if you append
`/.zgroup` to any of the .zarr URLs above you will be able to see that
file.
</body>
</html>
Loading