Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible collections with custom load_func. #6276

Merged
merged 17 commits into from May 5, 2022

Conversation

thvoigtmann
Copy link
Contributor

Small change to ImageCollection to make it accept non-file-name
sequences as the load_pattern, as long as a custom load_func is
given (to which the items in the sequence will be passed in lieu of
the image numbers).

Description

This tiny patch allows ImageCollection to handle the loading of images with a custom load_func more flexibly. If the latter argument is set, load_pattern is allowed to be any python sequence (rather than just a list of file names or file-name patterns), the elements of which are to be interpreted by the user-supplied loader.

This allows for a modification of the code given in the documentation (reading images from a video file):

class vidread_random:
    def __init__ (self, f):
        self.vid = imageio.get_reader(f)
    def __call__ (self, frameno):
        return self.vid.get_data(frameno)
ic = ImageCollection (range(500), load_func=vidread_random('movie.mp4'))
len(ic) # 500

(In difference to the documentation code, this will not need to read the entire video into memory at once.)

Another use case is (random) access to images stored sequentially in one big binary file, for example in applications using high-speed cameras.

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • There is a bot to help automate backporting a PR to an older branch. For
    example, to backport to v0.19.x after merging, add the following in a PR
    comment: @meeseeksdev backport to v0.19.x
  • To run benchmarks on a PR, add the run-benchmark label. To rerun, the label
    can be removed and then added again. The benchmark output can be checked in
    the "Actions" tab.

Small change to `ImageCollection` to make it accept non-file-name
sequences as the `load_pattern`, as long as a custom `load_func` is
given (to which the items in the sequence will be passed in lieu of
the image numbers).
Copy link
Member

@mkcor mkcor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much: This is a nice addition (extension)! Can you please update the docstring accordingly?

Included an example for using ImageCollection objects with custom
`load_func` and a Sequence object as the `load_pattern`.
@thvoigtmann
Copy link
Contributor Author

thvoigtmann commented Mar 11, 2022

@mkcor I updated the docstring accordingly. Happy that you like the small change.

Copy link
Member

@mkcor mkcor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much, @thvoigtmann! Could you please add a test (or a doctest, i.e., an example under Examples in the docstring), so CI gets to run this new functionality? We have the following image sequences available in the current data registry:

skimage/io/collection.py Outdated Show resolved Hide resolved
skimage/io/collection.py Outdated Show resolved Hide resolved
skimage/io/collection.py Outdated Show resolved Hide resolved
skimage/io/collection.py Outdated Show resolved Hide resolved
thvoigtmann and others added 6 commits March 13, 2022 14:42
Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
The added docstring example tests the loading of multiple images
from one file using a custom `load_func`.
@pep8speaks
Copy link

pep8speaks commented Mar 13, 2022

Hello @thvoigtmann! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 205:80: E501 line too long (80 > 79 characters)

Comment last updated at 2022-04-30 18:57:12 UTC

@thvoigtmann
Copy link
Contributor Author

thvoigtmann commented Mar 13, 2022

Thanks for the suggestion @mkcor! I found it easier to use the multi-frame GIF file from the test data. It's the first time for me working with CI and docstring examples, so I hope it's looking good.

Having looked looked at the code again, I am now confused by MultiImage. I think that the code under Examples given there does not work as advertised. I guess that why it is marked as SKIP for doctest. I also think that MultiImage does not implement the functionality that is claimed by the docstring. Do you have an opinion on this?

Specifically I think that the desired effect of the code in the example there (load a multipage TIFF file, return an object derived from ImageCollection whose length is equal to the number of pages in the TIFF file) is not at all achieved by MultiImage. But it is already achieved by ImageCollection without further modification (also without the patch that I suggest). Just replace (skimage/io/collection.py line 470):

img = MultiImage(data_dir + '/multipage.tif') # doctest: +SKIP

with

img = ImageCollection(data_dir + '/multipage.tif')

This works because ImageCollection has specific code to check for multipage TIFF files if the given file name ends with .tif or .tiff.

To me this looks like one could remove MultiImage without loosing functionality, and for a deprecation cycle simply make MultiImage = ImageCollection.

I'd be happy to fix this if you have some guidance. (Fix it? Something I'm overlooking? New pull request or keep adding here? Bug report first?)

@mkcor
Copy link
Member

mkcor commented Mar 16, 2022

Dear @thvoigtmann,

Thank you very much for your responsiveness! 🙂

The MultiImage class was written 8 years ago and 'fixed' (just so it wasn't actually broken) 15 months ago. We won't fix it any further, since the plan moving forward is to either remove the skimage.io module altogether or (more likely) to keep it only as a thin wrapper around imageio (please see #5439 (comment) under "Remove the skimage.io module?").

@mkcor
Copy link
Member

mkcor commented Mar 16, 2022

I found it easier to use the multi-frame GIF file from the test data.

That's great (even better for a toy example actually).

PS: You can apply (code review) suggestions as a batch: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request

skimage/io/collection.py Outdated Show resolved Hide resolved
skimage/io/collection.py Outdated Show resolved Hide resolved
thvoigtmann and others added 2 commits March 16, 2022 11:40
Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
@mkcor
Copy link
Member

mkcor commented Mar 16, 2022

For now at least, skimage.io.ImageCollection has a specific added value wrt imageio functions (see #5036 (comment) and #5036 (comment)). And I think your contribution here is very nice (why not enjoy more flexibility?). But, since we are encouraging users to move towards using imageio directly, I was looking into:

collection = imageio.mvolread(filename, format='pillow')

@thvoigtmann
Copy link
Contributor Author

Yes @mkcor, I fully agree with your points. I was just wondering whether it's still worth fixing the discrepancy between current MultiImage and its docstring.

@mkcor
Copy link
Member

mkcor commented Mar 16, 2022

Yes @mkcor, I fully agree with your points. I was just wondering whether it's still worth fixing the discrepancy between current MultiImage and its docstring.

Oh, sure, that wouldn't hurt. 🙏

Continuing on

collection = imageio.mvolread(filename, format='pillow')

I noticed that collection is a list of length 24 just like your ic, but each element of this list is an array of shape (25, 14, 3), i.e., a 2D RGB image, whereas each element of ic has shape (25, 14, 4): RGB... + some kind of alpha, perhaps? Anyway, the fourth channel seems to be trivial (all white/maximum, with all 255 values).

skimage/io/collection.py Outdated Show resolved Hide resolved
@mkcor
Copy link
Member

mkcor commented Mar 16, 2022

@scikit-image/core the CI failure is unrelated to this PR (docs building timed out because doc/examples/applications/plot_fluorescence_nuclear_envelope.py is too resource-demanding...) 😬

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
@thvoigtmann
Copy link
Contributor Author

Thanks for approving!

Continuing on

collection = imageio.mvolread(filename, format='pillow')

I noticed that collections is a list of length 24 just like your ic, but each element of this list is an array of shape (25, 14, 3), i.e., a 2D RGB image, whereas each element of ic has shape (25, 14, 4): RGB... + some kind of alpha, perhaps? Anyway, the fourth channel seems to be trivial (all white/maximum, with all 255 values).

I can reproduce that. But I noted that

collection = imageio.mimread(filename)

gives again list of length 24 whose elements have shape (25, 14, 4). RGBA I would have guessed indeed.

The real case for ImageCollection for me is the lazy loading of a huge stack of images with random access. As mentioned in #5036 (comment), this seems to be currently not possible using imageio directly? (Although arguably it looks like functionality that should be implemented there and not in skimage.)

For example, I have a 51GB binary file from a high-speed camera that consists of raw sensor data (8bpp rows x cols per frame), and with this PR here merged I can use the following rough piece of code (not optimized for filehandle issues etc):

class BinLoader:
    def __init__(self,filename,rows,cols,dtype=np.uint8):
        self.filename = filename
        self.rows = rows
        self.cols = cols
        self.dtype = dtype
        self.fd = open(filename,'rb')
        self.fd.seek(0,os.SEEK_END)
        self.numframes = int(self.fd.tell()/rows/cols/dtype(0).nbytes)
        assert self.numframes*rows*cols*dtype(0).nbytes == self.fd.tell()
        self.fd.seek(0)
    def __call__(self,frame):
        cnt = self.rows*self.cols
        fpos = cnt*int(frame)
        self.fd.seek(0)
        return np.fromfile(self.fd,dtype=self.dtype,count=cnt,offset=fpos).reshape((self.rows,self.cols))
    def range(self):
        return range(0,self.numframes)

and then do

loader = BinLoader(filename, rows, cols)
ic = skimage.io.ImageCollection(loader.range(), load_func=loader)

I could look into how this can be done using imageio, but I'm not sure yet where to start.

@stefanv
Copy link
Member

stefanv commented Mar 16, 2022

FWIW, if things like MultiImage and friends are useful, we can keep them and maintain them; no one is forcing us to offload to imageio. I like the ImageCollection example in your last comment @thvoigtmann—exactly what it was designed for!

@thvoigtmann
Copy link
Contributor Author

Thanks, @stefanv for the comment.

FWIW, if things like MultiImage and friends are useful, we can keep them and maintain them; no one is forcing us to offload to imageio.

Based on my example, I would argue that ImageCollection is worth keeping, but MultiImage isn't, because its behavior can be reproduced with ImageCollection and imageio.mimread easily. The current implementation of MultiImage relies on an internal imread, and I guess that is something that you would want to remove in the long run.

As I discussed with @mkcor, I have a small patch ready that updates the docstring for MultiImage, just so that it is consistent with the current implementation behavior. I guess I should make a new branch and associated PR with it, referring to the discussion here?

@stefanv
Copy link
Member

stefanv commented Mar 17, 2022

That sounds like a good plan, thanks @thvoigtmann. And, yes, a new branch for that doc fix will make it easy to merge.

This adds a test for `ImageCollection` called with a custom `load_func`
and a sequence as the `load_pattern`.
Copy link
Member

@alexdesiqueira alexdesiqueira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @thvoigtmann! Just some small suggestions around, and then I'll merge it.

skimage/io/collection.py Outdated Show resolved Hide resolved
skimage/io/collection.py Outdated Show resolved Hide resolved
sequence, an ImageCollection of corresponding length will be created,
and the individual images will be loaded by calling `load_func` with the
matching element of the `load_pattern` as its first argument. In this
case, the elements of the sequence do not need to be resolvable file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand "resolvable" here. Could we use a nice, simpler synonym? 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alexdesiqueira, sorry for the late response. "resolvable" here means that it is a file name that exists on the local file system. I'll propose a changed wording in the next commit.

skimage/io/collection.py Outdated Show resolved Hide resolved
skimage/io/collection.py Outdated Show resolved Hide resolved
skimage/io/collection.py Outdated Show resolved Hide resolved
skimage/io/collection.py Outdated Show resolved Hide resolved
Copy link
Member

@alexdesiqueira alexdesiqueira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @thvoigtmann! Looks good to me. Merging 🙂

@alexdesiqueira alexdesiqueira merged commit 6d8c3e3 into scikit-image:main May 5, 2022
alexdesiqueira pushed a commit to alexdesiqueira/scikit-image that referenced this pull request May 24, 2022
Small change to `ImageCollection` to make it accept non-file-name sequences as the `load_pattern`, as long as a custom `load_func` is given (to which the items in the sequence will be passed in lieu of the image numbers).
Included an example for using ImageCollection objects with custom `load_func` and a Sequence object as the `load_pattern`.
@lagru lagru added the ⏩ type: Enhancement Improve existing features label Oct 11, 2022
@lagru lagru added this to the 0.20 milestone Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⏩ type: Enhancement Improve existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants