Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata format for replacing skimage.io with imageio #5229

Open
FirefoxMetzger opened this issue Feb 11, 2021 · 4 comments
Open

Metadata format for replacing skimage.io with imageio #5229

FirefoxMetzger opened this issue Feb 11, 2021 · 4 comments

Comments

@FirefoxMetzger
Copy link
Contributor

A while back we had a discussion about deprecating most of skimage.io and instead wrapping imageio (#5036 ). As a result of this, @almarklein started rethinking the current API for imageio and we have come up with a draft, which will - eventually - become the new imageio API ( imageio/imageio#574 ).

The PR is mostly ready now and I am starting to get excited by how flexible it is; in particular with the new imiter:

import imageio as iio

# new_api.imiter will become imiter once the new API becomes the default
for image in iio.new_api.imiter("path/to/image.tiff"):
    image.shape  # a ndimage

I digress.

One design decision we still need to make for the API - and for which I'd like input from people here - is how to report metadata. The current decision is to have a get_meta(...) function, which returns metadata (plugin and file-format dependent). The question I have is: What kind of metadata should it return, and what would be an ideal format? What information are people over here using/expecting?

My current idea is to return a dict populated with whatever the plugin considers to be the metadata for the current image, but that might not be sufficient. For example, when opening, say, a GIF should we give access to the color palette (which isn't considered metadata by the pillow plugin) via the metadata?

@almarklein
Copy link
Contributor

There has been some discussion about this in the past, see e.g. imageio/imageio#382 and imageio/imageio#502

There are some use-cases (imageio/imageio#501) to consider that we want to eventually support. I think it comes down to:

  • Having standardized fields (we can start with a few, fields and add more over time).
  • Allowing the format to specify format-specific data.
  • Provide some way to report raw metadata and/or register what metadata has changed (e.g. when reading a jpg image, we may rotate the image in the correct position based on the exif flag, but upon writing the format should drop that exif flag.)

My inclination is that the metadata is a dict with fields "standard" and "raw", which are each again a dict. We can then add more fields to the root dict as needed, maybe even just let the format add data. But since "flat is better than nested", I could also be happy with placing all standardized metatdata field in the root dict, and allowing a "raw" field for the format-specific stuff.

@FirefoxMetzger
Copy link
Contributor Author

FirefoxMetzger commented Feb 12, 2021

One question for my understanding: Will all plugins provide the "standard" fields? I.e., should/will imageio enforce/guarantee this?

@almarklein
Copy link
Contributor

almarklein commented Feb 12, 2021

Good question. I think the policy should be that all formats should produce the standard fields that are applicable, which would probably most of them in most cases. Enforcing that in a test makes a lot of sense. The boilerplate for such a test should could be generalized, so that it's just a few lines in each format's test script.

We can go at this one step at a time. Let's first wrapup the PR you're currently working on, then move on to standardized metadata etc.

@FirefoxMetzger
Copy link
Contributor Author

FirefoxMetzger commented Feb 12, 2021

Okay. The main reason I brought this up is that I was implementing a wrapper for Pillow in the new API so that I can add some actual unit tests. (The wrapper is now super short; with extensive comments, ~300 lines compared to the 800+ lines for the old wrapper.) Currently, I was just forwarding the raw metadata from pillow, but realized that this may not be a good idea, because different plugins may choose different formats (and datatypes).

I've read through the references (and some of the references within the references). The one consensus is that it is a dict; which is pretty darn basic, but at least it is something. My (naive) approach in response to my reading would have been

def get_meta(self, standardized=False):
    """"
    Return metadata associated with the image
    
    The exact fields depend on the plugin (support) used 
    and file format (presence). If `standard=True` return
    only those fields recognized as standard by imageio
    and drop others.
    """"
    # load raw metadata and convert to dict
    if standardized:
        pass  # prune everything except standard fields
    return metadata_dict

Which is essentially always returning "raw" metadata, but allowing to restrict it, if it is undesirable to return more (not sure this actually happens). This would make the plugin to decide what constitutes valid metadata; it's format dependent, so a plugin can make that decision rather easily, whereas it is hard to say for the general case.

Writing metadata is a different story, but we haven't really discussed that when redefining the API.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants