Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composability of the format #10

Closed
ivirshup opened this issue Jan 24, 2022 · 5 comments
Closed

Composability of the format #10

ivirshup opened this issue Jan 24, 2022 · 5 comments

Comments

@ivirshup
Copy link

As discussed with @ambrosejcarr

The AnnData schema is being used internally in new formats for multimodal and spatial formats. Example of this are:

  • MuData files containing multiple AnnData elements
  • PathML's hdf5 format contains an AnnData element
  • OME is looking into using AnnData elements internally for the OME-Zarr format (PR, hackathon)

Having a schema which can be implemented on multiple formats makes this possible.

This was brought up by the idea that this project might result in a C-API which other's would wrap. I think this could make these higher level compositions of schema's difficult to implement.

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Jan 29, 2022

Thanks for posting this @ivirshup

This project should satisfy this requirement:

Having a schema which can be implemented on multiple formats makes this possible.

We're interested in aligning at the API level, but everything will be open source including the specifications/schema for the data layouts, which will be published in single-cell-data/TileDB-SingleCell. Reviewing the OME-Zarr example, I think that's the main requirement, but I'm not 100% confident I'm following.

@kevinyamauchi
Copy link

We're interested in aligning at the API level, but everything will be open source including the specifications/schema for the data layouts, which will be published in single-cell-data/TileDB-SingleCell. Reviewing the OME-Zarr example, I think that's the main requirement, but I'm not 100% confident I'm following.

Hey @ambrosejcarr ! I hope all is well. I'm not sure about the context here, but I'm going to be working on the table spec for OME-NGFF/OME-Zarr this week. Happy to have a chat if you have questions or would like to discuss. We are definitely interested in maintaining compatibility with outputs from spatial omics analyses.

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Jan 31, 2022

Hi @kevinyamauchi, great to hear from you! If you want to snag some time on my calendar when it's convenient for you, I'd be happy to discuss.

@ambrosejcarr
Copy link
Member

@ivirshup I want to return to your original question. I've synced with @bkmartinjr and @ihnorton and we think the answer to your question is that the format we're proposing is composable. Here's our logic - hopefully this can help tease apart any incorrect assumptions we're making about composability.

I'll start with some definitions. I know these words are overloaded, I'm just hoping to create enough clarity for this conversation, not to globally define these terms:

  1. API: the interfaces that will be supported
  2. Container: hdf5, zarr, tiledb
  3. Schema: How data are laid out in the container
  4. Format: Container + Schema

Here are our assumptions:

  1. If a container and schema permit additional data to be stored in the same container, a format is composable.
  2. Composability is independent of APIs.
  3. For specific formats to be composed in a container, the container must support the appropriate data types for all schema and all formats must be composable.

In this project, we're proposing to create an API standard, and to demonstrate the value of that API, we will implement a format that is based on the TileDB container.

@johnkerl
Copy link
Member

johnkerl commented Feb 2, 2023

Circling back -- this seminal conversation was one of the main inputs to last year's SOMA design:
https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md

@johnkerl johnkerl closed this as completed Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants