Skip to content

Conversation

@nilchia
Copy link
Contributor

@nilchia nilchia commented Oct 3, 2025

I'm trying to add spatialdata as a new data type in Galaxy.
spatialdata is a data framework that comprises a FAIR storage format and a collection of python libraries for performant access, alignment, and processing of uni- and multi-modal spatial omics datasets.

The metadata should be sth like this:

SpatialData object, with associated Zarr store: /Users/macbook/embl/projects/basel/spatialdata-sandbox/mouse_liver/data.zarr
├── Images
│     └── 'raw_image': DataTree[cyx] (1, 6432, 6432), (1, 1608, 1608)
├── Labels
│     └── 'segmentation_mask': DataArray[yx] (6432, 6432)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 3) (2D points)
├── Shapes
│     └── 'nucleus_boundaries': GeoDataFrame shape: (3375, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (3375, 99)
with coordinate systems:
    ▸ 'global', with elements:
        raw_image (Images), segmentation_mask (Labels), transcripts (Points), nucleus_boundaries (Shapes)

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@nilchia nilchia marked this pull request as draft October 3, 2025 13:30
@github-actions github-actions bot added this to the 25.1 milestone Oct 3, 2025
@nilchia
Copy link
Contributor Author

nilchia commented Oct 15, 2025

currently when I upload the file, it is detected as zarr.zip and not spatialdata
image

@nilchia
Copy link
Contributor Author

nilchia commented Oct 15, 2025

image

The sniffer works now.
I'm trying to see if it is possible to also show the metadata like this:

SpatialData object, with associated Zarr store: /Users/macbook/embl/projects/basel/spatialdata-sandbox/mouse_liver/data.zarr
├── Images
│     └── 'raw_image': DataTree[cyx] (1, 6432, 6432), (1, 1608, 1608)
├── Labels
│     └── 'segmentation_mask': DataArray[yx] (6432, 6432)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 3) (2D points)
├── Shapes
│     └── 'nucleus_boundaries': GeoDataFrame shape: (3375, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (3375, 99)
with coordinate systems:
    ▸ 'global', with elements:
        raw_image (Images), segmentation_mask (Labels), transcripts (Points), nucleus_boundaries (Shapes)

@nilchia
Copy link
Contributor Author

nilchia commented Oct 15, 2025

It shows a simple metadata like this now:
image

I don't think the metadata should go deeper

@nilchia
Copy link
Contributor Author

nilchia commented Oct 15, 2025

image image

@nilchia nilchia marked this pull request as ready for review October 15, 2025 11:43
@nilchia
Copy link
Contributor Author

nilchia commented Oct 15, 2025

FAILED test/integration/test_datatype_upload.py::test_upload_datatype_auto[OMEzarrImages.ome_zarr.zip] - AssertionError: assert 'zarr.zip' == 'ome_zarr.zip'
  - ome_zarr.zip
  ? ----
  + zarr.zip

the ome_zarr is somehow detected as zarr.zip

@arash77 arash77 requested a review from a team October 30, 2025 11:10
Copy link
Contributor

@davelopez davelopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a question about metadata usage below:


file_ext = "spatialdata.zip"

MetadataElement(
Copy link
Contributor

@davelopez davelopez Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is all this metadata used by any existing or future tool?
I think we want to keep metadata as minimal as possible since this is information that goes into the database for each dataset. So it is important to keep only the essential metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata is really needed for the users to inspect the info in the data.
I tried to minimize it here:
image
is it good enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is needed, then it is needed :)
I just wanted to make sure.

@nilchia nilchia requested a review from davelopez October 30, 2025 19:58
@davelopez davelopez merged commit 92bafaf into galaxyproject:release_25.0 Nov 5, 2025
49 of 52 checks passed
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

This PR was merged without a "kind/" label, please correct.

@nilchia nilchia deleted the spataildata_dt branch November 6, 2025 08:09
@LucaMarconato
Copy link

LucaMarconato commented Nov 9, 2025

Thanks @nilchia and @davelopez!

One comment: since last week's release, spatialdata supports the Zarr v3 format. Mainly in Zarr v3 there is no .zattrs anymore, and instead zarr.json is used. Also, the OME-Zarr metadata is slightly different (see the new "attributes": "ome" keys in the example below):

Example zarr.json for the rasterized image for this merfish dataset:

Reveal the zarr.json
{
  "attributes": {
    "ome": {
      "omero": {
        "channels": [
          {
            "label": 0
          }
        ]
      },
      "version": "0.5-dev-spatialdata",
      "multiscales": [
        {
          "datasets": [
            {
              "path": "0",
              "coordinateTransformations": [
                {
                  "type": "scale",
                  "scale": [
                    1.0,
                    1.0,
                    1.0
                  ]
                }
              ]
            }
          ],
          "name": "/images/rasterized",
          "axes": [
            {
              "name": "c",
              "type": "channel"
            },
            {
              "name": "y",
              "type": "space"
            },
            {
              "name": "x",
              "type": "space"
            }
          ],
          "coordinateTransformations": [
            {
              "type": "sequence",
              "transformations": [
                {
                  "type": "scale",
                  "scale": [
                    1.0,
                    3.863113519548272,
                    3.5077751335606355
                  ],
                  "input": {
                    "name": "cyx",
                    "axes": [
                      {
                        "name": "c",
                        "type": "channel"
                      },
                      {
                        "name": "y",
                        "type": "space",
                        "unit": "unit"
                      },
                      {
                        "name": "x",
                        "type": "space",
                        "unit": "unit"
                      }
                    ]
                  },
                  "output": {
                    "name": "global",
                    "axes": [
                      {
                        "name": "c",
                        "type": "channel"
                      },
                      {
                        "name": "y",
                        "type": "space",
                        "unit": "unit"
                      },
                      {
                        "name": "x",
                        "type": "space",
                        "unit": "unit"
                      }
                    ]
                  }
                },
                {
                  "type": "translation",
                  "translation": [
                    0.0,
                    4548.0,
                    1154.0
                  ],
                  "input": {
                    "name": "cyx",
                    "axes": [
                      {
                        "name": "c",
                        "type": "channel"
                      },
                      {
                        "name": "y",
                        "type": "space",
                        "unit": "unit"
                      },
                      {
                        "name": "x",
                        "type": "space",
                        "unit": "unit"
                      }
                    ]
                  },
                  "output": {
                    "name": "global",
                    "axes": [
                      {
                        "name": "c",
                        "type": "channel"
                      },
                      {
                        "name": "y",
                        "type": "space",
                        "unit": "unit"
                      },
                      {
                        "name": "x",
                        "type": "space",
                        "unit": "unit"
                      }
                    ]
                  }
                }
              ],
              "input": {
                "name": "cyx",
                "axes": [
                  {
                    "name": "c",
                    "type": "channel"
                  },
                  {
                    "name": "y",
                    "type": "space",
                    "unit": "unit"
                  },
                  {
                    "name": "x",
                    "type": "space",
                    "unit": "unit"
                  }
                ]
              },
              "output": {
                "name": "global",
                "axes": [
                  {
                    "name": "c",
                    "type": "channel"
                  },
                  {
                    "name": "y",
                    "type": "space",
                    "unit": "unit"
                  },
                  {
                    "name": "x",
                    "type": "space",
                    "unit": "unit"
                  }
                ]
              }
            }
          ]
        }
      ]
    },
    "spatialdata_attrs": {
      "version": "0.3"
    }
  },
  "zarr_format": 3,
  "consolidated_metadata": null,
  "node_type": "group"
}                                                                                                                           

@nilchia
Copy link
Contributor Author

nilchia commented Nov 10, 2025

Thanks @LucaMarconato for the info, I made a new PR here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants