Metadata format for replacing skimage.io with imageio #5229

FirefoxMetzger · 2021-02-11T13:20:05Z

A while back we had a discussion about deprecating most of skimage.io and instead wrapping imageio (#5036 ). As a result of this, @almarklein started rethinking the current API for imageio and we have come up with a draft, which will - eventually - become the new imageio API ( imageio/imageio#574 ).

The PR is mostly ready now and I am starting to get excited by how flexible it is; in particular with the new imiter:

import imageio as iio

# new_api.imiter will become imiter once the new API becomes the default
for image in iio.new_api.imiter("path/to/image.tiff"):
    image.shape  # a ndimage

I digress.

One design decision we still need to make for the API - and for which I'd like input from people here - is how to report metadata. The current decision is to have a get_meta(...) function, which returns metadata (plugin and file-format dependent). The question I have is: What kind of metadata should it return, and what would be an ideal format? What information are people over here using/expecting?

My current idea is to return a dict populated with whatever the plugin considers to be the metadata for the current image, but that might not be sufficient. For example, when opening, say, a GIF should we give access to the color palette (which isn't considered metadata by the pillow plugin) via the metadata?

The text was updated successfully, but these errors were encountered:

almarklein · 2021-02-12T13:44:30Z

There has been some discussion about this in the past, see e.g. imageio/imageio#382 and imageio/imageio#502

There are some use-cases (imageio/imageio#501) to consider that we want to eventually support. I think it comes down to:

Having standardized fields (we can start with a few, fields and add more over time).
Allowing the format to specify format-specific data.
Provide some way to report raw metadata and/or register what metadata has changed (e.g. when reading a jpg image, we may rotate the image in the correct position based on the exif flag, but upon writing the format should drop that exif flag.)

My inclination is that the metadata is a dict with fields "standard" and "raw", which are each again a dict. We can then add more fields to the root dict as needed, maybe even just let the format add data. But since "flat is better than nested", I could also be happy with placing all standardized metatdata field in the root dict, and allowing a "raw" field for the format-specific stuff.

FirefoxMetzger · 2021-02-12T13:47:13Z

One question for my understanding: Will all plugins provide the "standard" fields? I.e., should/will imageio enforce/guarantee this?

almarklein · 2021-02-12T13:53:43Z

Good question. I think the policy should be that all formats should produce the standard fields that are applicable, which would probably most of them in most cases. Enforcing that in a test makes a lot of sense. The boilerplate for such a test should could be generalized, so that it's just a few lines in each format's test script.

We can go at this one step at a time. Let's first wrapup the PR you're currently working on, then move on to standardized metadata etc.

FirefoxMetzger · 2021-02-12T14:36:47Z

Okay. The main reason I brought this up is that I was implementing a wrapper for Pillow in the new API so that I can add some actual unit tests. (The wrapper is now super short; with extensive comments, ~300 lines compared to the 800+ lines for the old wrapper.) Currently, I was just forwarding the raw metadata from pillow, but realized that this may not be a good idea, because different plugins may choose different formats (and datatypes).

I've read through the references (and some of the references within the references). The one consensus is that it is a dict; which is pretty darn basic, but at least it is something. My (naive) approach in response to my reading would have been

def get_meta(self, standardized=False):
    """"
    Return metadata associated with the image
    
    The exact fields depend on the plugin (support) used 
    and file format (presence). If `standard=True` return
    only those fields recognized as standard by imageio
    and drop others.
    """"
    # load raw metadata and convert to dict
    if standardized:
        pass  # prune everything except standard fields
    return metadata_dict

Which is essentially always returning "raw" metadata, but allowing to restrict it, if it is undesirable to return more (not sure this actually happens). This would make the plugin to decide what constitutes valid metadata; it's format dependent, so a plugin can make that decision rather easily, whereas it is hard to say for the general case.

Writing metadata is a different story, but we haven't really discussed that when redefining the API.

grlee77 mentioned this issue Feb 15, 2021

2021's calendar of community management #5169

Closed

cgohlke mentioned this issue Feb 23, 2021

imread loads some images with wrong shape #5242

Open

scikit-image locked and limited conversation to collaborators Oct 18, 2021

rfezzani closed this as completed Oct 18, 2021

grlee77 reopened this Feb 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata format for replacing skimage.io with imageio #5229

Metadata format for replacing skimage.io with imageio #5229

FirefoxMetzger commented Feb 11, 2021

almarklein commented Feb 12, 2021

FirefoxMetzger commented Feb 12, 2021 •

edited

almarklein commented Feb 12, 2021 •

edited

FirefoxMetzger commented Feb 12, 2021 •

edited

Metadata format for replacing skimage.io with imageio #5229

Metadata format for replacing skimage.io with imageio #5229

Comments

FirefoxMetzger commented Feb 11, 2021

almarklein commented Feb 12, 2021

FirefoxMetzger commented Feb 12, 2021 • edited

almarklein commented Feb 12, 2021 • edited

FirefoxMetzger commented Feb 12, 2021 • edited

FirefoxMetzger commented Feb 12, 2021 •

edited

almarklein commented Feb 12, 2021 •

edited

FirefoxMetzger commented Feb 12, 2021 •

edited