Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit how imageio provides and handles meta data #362

Open
almarklein opened this issue Jul 31, 2018 · 17 comments

Comments

@almarklein
Copy link
Member

commented Jul 31, 2018

Related issues

Current status

  • The read functions return an imageio.Image object (subclass of ndarray) which has a .meta dict attached.
  • We have Reader.get_meta_data() but it does not have standard elements, so is usually not very helpful (#263).
  • The (generic) Writer class allows giving meta data explicitly, and also checks for the .meta attribute.
  • The high level public API (e.g. imwrite()) does not have a way to provide meta data explicitly.

Tasks

  • Rename and better docs for the Image class (done in #375, solving #333)
  • Document that code that processes images should probably not change its behavior based on the meta data itself, but only from args/kwargs that the user may set using the meta data.
  • Define and implement standardized field names: #382
  • Better support for getting and setting meta data: #383
  • Upon reading, some meta data fields must be cleared: #371

Feel free to comment below. I will adjust this top comment to show the latest status.

@lschr

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2018

One could also use an approach similar to https://github.com/soft-matter/slicerator:

  • Add an attribute to Image that contains a list of attributes that should be copied after image processing, e.g.
class Image:
    _propagate_attrs = ["meta"]
    ...
  • Provide a function decorator that can be used on processing functions to copy all attributes listed in _propagate_attrs to the result array.
def copy_attrs(func):
    def wrapped(input, *args, **kwargs):
        res = func(input, *args, **kwargs)
        for a in getattr(input, "_propagate_attrs", []):
            setattr(res, a, getattr(input, a, None))
        return res
    return wrapped

@copy_attrs
def do_something(arr):
    return numpy.zeros_like(arr)

(not sure if above code actually works, but I think you get the idea). This is a very general approach; one could also limit it to checking whether a meta attribute exists and if so, copy it.

@almarklein

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2018

I can see how such an approach would make it easier to work with a custom Image class. The thing is that imageio does not have any processing functions itself, and skimage got rid of using a custom class some years ago.

The purpose of this issue is to find a way to make meta data more (generically) useful (e.g. by standardizing certain info), let it work well with whatever skimage will decide on in scikit-image/scikit-image#2605, and along the way, if possible, drop the Image class too.

@jakirkham

This comment has been minimized.

Copy link

commented Aug 9, 2018

What sort of metadata can the builtin plugins currently provide (including stuff that might not be nicely exposed currently)?

@almarklein

This comment has been minimized.

Copy link
Member Author

commented Aug 10, 2018

@jakirkham That's a good question. I think that it should be possible to inspect shape, dtype and nchannels for most plugins, though many plugins will need work to do this before the loading of the data. There may also be plugins for which it is not possible to read metadata before reading the image data. What other kinds of generic metadata could we think of?

@lschr

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2018

I think most formats support some kind of description or comment tag, which would be very helpful to have access to. Also date and time are very common.

@lschr

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2018

Additionally, colorspace (what each channel represents; RGB or CMYK …) information may be useful.

@jni

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2018

I totally missed this! Hi @almarklein! =) (And hi @jakirkham! =) (@lschr we haven't met but now it feels rude not to say hi. =P 👋 )

Regarding what metadata you could include, for microscopy an absolutely essential one is the pixel spacing. Some microscopes also do tiling and include e.g. stage positioning information, which then aids in mosaicing/registration to get the full image. Sometimes the units of the image data are also useful, e.g. "photons".

Regarding what to do with it, it goes far beyond copying and every function does something different. For example, downsampling requires one to increase the pixel spacing by a corresponding factor. Scaling the image requires adjusting the units. Colorspace conversion requires changing the colorspace. etc.

I think I just talked myself out of handling all this in scikit-image. =P

As to what imageio does with the Image class, from scikit-image's perspective I am agnostic. You can unpack into (array, dict) using return_meta, or you can keep returning Image and we can unpack it on our end easily enough.

I do like the idea of standard metadata fields and having imageio provide them (rather than doing the conversion on skimage's end). I can do a bit of work here both in discussing standard field names, and in providing an appropriate conversion for some formats (notably FEI, for which I wrote the reader).

One question is how to deal with the existing metadata tags that may or may not map to our standard ones. I think we would want to keep them, while still doing the conversion. For example, the FEI format has two layers of annotations, and pixel spacing is under ['Scan']['PixelHeight'] and ['Scan']['PixelWidth']. (And these are in meters, which is simultaneously sensible and silly. =P) How do we prevent namespace clashes in the metadata hierarchy? I see two options:

  • return two metadata dictionaries, one "raw" and one standardised
  • use a special key, e.g. ['imageio'], and if that key exists, check whether it's there from saving a previous imageio standard set, in which case we just keep it, or whether it's someone else's 'imageio' tag, in which case...? This gets messy quickly.

Anyway, they're my thoughts for now. Happy to have some friends in this adventure. =D

@lschr

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2018

Hey @jni ;-)

After all, I gather it seems best to have an extra dict for metadata. I was thinking, that instead of having something like return_meta=True and changing the return signature accordingly, simply introducing a meta parameter may also be an option. This can either be None, in which case nothing happens or a user-supplied dictionary to be updated, like

md = {}
img = imread(..., meta=md)

Not sure if this has any advantage; just an idea.

@jni

This comment has been minimized.

Copy link
Contributor

commented Sep 1, 2018

@lschr Being a functional programming proponent, I'm quite opposed to modifying input arguments, especially in public-facing functions. =)

@almarklein

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2018

Things I am starting to converge on (will update the post top accordingly):

Since we have to keep backward compatibility, I feel that any attempt to try and get rid of the Image class will add complexity. Also, as @jni points out, skimage can easily strip&convert. Therefore, I think it's best to keep it, but rename to ArrayWithMetaArray, which (I hope) sounds more like a ndarray subclass than a wrapper. Image is kept as an alias. And we should add docs (in the class's docstring and in the higher level docs) on how to convert.

We should add an imageio.metaread() function that will (where possible) return the meta data without reading the image data. We should also add a meta argument to the write() functions.

Things that need more thoughs and discussion:

It seems evident that we should standardize the meta information. This means that every image comes with its "raw" meta data, as well as a "standardized" meta data. Some information may be duplicate and upon writing, it should be merged to raw meta data again. I think it makes sense to have any standardized fields overwrite the raw fields here. An alternative is to consider the standardized information as read-only and ignore it in the writers (ppl that actively write meta data probably know how to use the raw data).

Its not trivial how to provide these two forms of meta data. As @jni mentioned:

  • They could be two dictionaries (e.g. the standardized dict could be an info attribute on the ArrayWithMeta class). Probably easiest for imageio but awkward in skimage.
  • The standardized dict could be a subdict, but under what key?
  • The standardized dict could also have a subdict for the raw data. Seems nice and clean, so maybe good for skimage? But harder to do in imageio because of backward compatibility.
@jni

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2018

@almarklein

have a subdict for the raw data.

Like, as a monkey-patched attribute? That doesn't seem like something I would describe as "nice and clean"???

@jni

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2018

(Overall though, I'm glad we are converging on vaguely the same space, I think!)

@almarklein

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2018

@jni, haha no, as having a dict with the standardized data, which has one field called raw that contains the raw meta data. As in the other way around as what you proposed.

@jni

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2018

aaaaaah! 🤦‍♂ Got it. =) Yes that's nice and clean, I agree!

@almarklein

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2018

Another thing to fix is #371: the meta data can become invalid when it's "used" during reading, e.g. when exif rotation is applied. Upon saving, when the original meta data is provided (which happens by default), the image is already rotated but also has the exif rotation flag set.

The solution would be for the reader to modify the meta data (e.g. remove exif rotation flag). In which case, should the dict obtained via imageio.metaread() return the actual meta data as in the file, or as it is when the image is read?

@almarklein

This comment has been minimized.

Copy link
Member Author

commented Sep 11, 2018

I have edited the top post to list the "plan of action", and made issues for more specific tasks.

@xkortex

This comment has been minimized.

Copy link

commented Jun 10, 2019

I would highly recommend leaving some indication that the original file had some metadata, even if (especially if) it is transformed during load.

Real life use case: I have a dataset of images of assorted datatypes, among them jpgs with orientation flag. I have a corresponding set of png images which are masks showing key regions in the original images. Here's the kicker, the png masks are always the same orientation of the raw image array (i.e. ignoring the orientation tag). But I also need to turn upright the portrait images, to feed them to a face detector, which is fairly rotation-variant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.