More flexible collections with custom load_func. #6276

thvoigtmann · 2022-03-07T23:33:48Z

Small change to ImageCollection to make it accept non-file-name
sequences as the load_pattern, as long as a custom load_func is
given (to which the items in the sequence will be passed in lieu of
the image numbers).

Description

This tiny patch allows ImageCollection to handle the loading of images with a custom load_func more flexibly. If the latter argument is set, load_pattern is allowed to be any python sequence (rather than just a list of file names or file-name patterns), the elements of which are to be interpreted by the user-supplied loader.

This allows for a modification of the code given in the documentation (reading images from a video file):

class vidread_random:
    def __init__ (self, f):
        self.vid = imageio.get_reader(f)
    def __call__ (self, frameno):
        return self.vid.get_data(frameno)
ic = ImageCollection (range(500), load_func=vidread_random('movie.mp4'))
len(ic) # 500

(In difference to the documentation code, this will not need to read the entire video into memory at once.)

Another use case is (random) access to images stored sequentially in one big binary file, for example in applications using high-speed cameras.

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.
There is a bot to help automate backporting a PR to an older branch. For
example, to backport to v0.19.x after merging, add the following in a PR
comment: @meeseeksdev backport to v0.19.x
To run benchmarks on a PR, add the run-benchmark label. To rerun, the label
can be removed and then added again. The benchmark output can be checked in
the "Actions" tab.

Small change to `ImageCollection` to make it accept non-file-name sequences as the `load_pattern`, as long as a custom `load_func` is given (to which the items in the sequence will be passed in lieu of the image numbers).

mkcor

Thank you very much: This is a nice addition (extension)! Can you please update the docstring accordingly?

Included an example for using ImageCollection objects with custom `load_func` and a Sequence object as the `load_pattern`.

thvoigtmann · 2022-03-11T18:35:42Z

@mkcor I updated the docstring accordingly. Happy that you like the small change.

mkcor

Thank you so much, @thvoigtmann! Could you please add a test (or a doctest, i.e., an example under Examples in the docstring), so CI gets to run this new functionality? We have the following image sequences available in the current data registry:

skimage/io/collection.py

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

The added docstring example tests the loading of multiple images from one file using a custom `load_func`.

pep8speaks · 2022-03-13T14:37:19Z

Hello @thvoigtmann! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file skimage/io/collection.py:

Line 205:80: E501 line too long (80 > 79 characters)

Comment last updated at 2022-04-30 18:57:12 UTC

thvoigtmann · 2022-03-13T15:20:45Z

Thanks for the suggestion @mkcor! I found it easier to use the multi-frame GIF file from the test data. It's the first time for me working with CI and docstring examples, so I hope it's looking good.

Having looked looked at the code again, I am now confused by MultiImage. I think that the code under Examples given there does not work as advertised. I guess that why it is marked as SKIP for doctest. I also think that MultiImage does not implement the functionality that is claimed by the docstring. Do you have an opinion on this?

Specifically I think that the desired effect of the code in the example there (load a multipage TIFF file, return an object derived from ImageCollection whose length is equal to the number of pages in the TIFF file) is not at all achieved by MultiImage. But it is already achieved by ImageCollection without further modification (also without the patch that I suggest). Just replace (skimage/io/collection.py line 470):

img = MultiImage(data_dir + '/multipage.tif') # doctest: +SKIP

with

img = ImageCollection(data_dir + '/multipage.tif')

This works because ImageCollection has specific code to check for multipage TIFF files if the given file name ends with .tif or .tiff.

To me this looks like one could remove MultiImage without loosing functionality, and for a deprecation cycle simply make MultiImage = ImageCollection.

I'd be happy to fix this if you have some guidance. (Fix it? Something I'm overlooking? New pull request or keep adding here? Bug report first?)

mkcor · 2022-03-16T09:58:50Z

Dear @thvoigtmann,

Thank you very much for your responsiveness! 🙂

The MultiImage class was written 8 years ago and 'fixed' (just so it wasn't actually broken) 15 months ago. We won't fix it any further, since the plan moving forward is to either remove the skimage.io module altogether or (more likely) to keep it only as a thin wrapper around imageio (please see #5439 (comment) under "Remove the skimage.io module?").

mkcor · 2022-03-16T10:11:47Z

I found it easier to use the multi-frame GIF file from the test data.

That's great (even better for a toy example actually).

PS: You can apply (code review) suggestions as a batch: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request

skimage/io/collection.py

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

mkcor · 2022-03-16T11:04:31Z

For now at least, skimage.io.ImageCollection has a specific added value wrt imageio functions (see #5036 (comment) and #5036 (comment)). And I think your contribution here is very nice (why not enjoy more flexibility?). But, since we are encouraging users to move towards using imageio directly, I was looking into:

collection = imageio.mvolread(filename, format='pillow')

thvoigtmann · 2022-03-16T11:40:18Z

Yes @mkcor, I fully agree with your points. I was just wondering whether it's still worth fixing the discrepancy between current MultiImage and its docstring.

mkcor · 2022-03-16T11:50:43Z

Yes @mkcor, I fully agree with your points. I was just wondering whether it's still worth fixing the discrepancy between current MultiImage and its docstring.

Oh, sure, that wouldn't hurt. 🙏

Continuing on

collection = imageio.mvolread(filename, format='pillow')

I noticed that collection is a list of length 24 just like your ic, but each element of this list is an array of shape (25, 14, 3), i.e., a 2D RGB image, whereas each element of ic has shape (25, 14, 4): RGB... + some kind of alpha, perhaps? Anyway, the fourth channel seems to be trivial (all white/maximum, with all 255 values).

skimage/io/collection.py

mkcor · 2022-03-16T13:30:01Z

@scikit-image/core the CI failure is unrelated to this PR (docs building timed out because doc/examples/applications/plot_fluorescence_nuclear_envelope.py is too resource-demanding...) 😬

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

thvoigtmann · 2022-03-16T17:16:51Z

Thanks for approving!

Continuing on
collection = imageio.mvolread(filename, format='pillow')
I noticed that collections is a list of length 24 just like your ic, but each element of this list is an array of shape (25, 14, 3), i.e., a 2D RGB image, whereas each element of ic has shape (25, 14, 4): RGB... + some kind of alpha, perhaps? Anyway, the fourth channel seems to be trivial (all white/maximum, with all 255 values).

I can reproduce that. But I noted that

collection = imageio.mimread(filename)

gives again list of length 24 whose elements have shape (25, 14, 4). RGBA I would have guessed indeed.

The real case for ImageCollection for me is the lazy loading of a huge stack of images with random access. As mentioned in #5036 (comment), this seems to be currently not possible using imageio directly? (Although arguably it looks like functionality that should be implemented there and not in skimage.)

For example, I have a 51GB binary file from a high-speed camera that consists of raw sensor data (8bpp rows x cols per frame), and with this PR here merged I can use the following rough piece of code (not optimized for filehandle issues etc):

class BinLoader:
    def __init__(self,filename,rows,cols,dtype=np.uint8):
        self.filename = filename
        self.rows = rows
        self.cols = cols
        self.dtype = dtype
        self.fd = open(filename,'rb')
        self.fd.seek(0,os.SEEK_END)
        self.numframes = int(self.fd.tell()/rows/cols/dtype(0).nbytes)
        assert self.numframes*rows*cols*dtype(0).nbytes == self.fd.tell()
        self.fd.seek(0)
    def __call__(self,frame):
        cnt = self.rows*self.cols
        fpos = cnt*int(frame)
        self.fd.seek(0)
        return np.fromfile(self.fd,dtype=self.dtype,count=cnt,offset=fpos).reshape((self.rows,self.cols))
    def range(self):
        return range(0,self.numframes)

and then do

loader = BinLoader(filename, rows, cols)
ic = skimage.io.ImageCollection(loader.range(), load_func=loader)

I could look into how this can be done using imageio, but I'm not sure yet where to start.

stefanv · 2022-03-16T19:56:56Z

FWIW, if things like MultiImage and friends are useful, we can keep them and maintain them; no one is forcing us to offload to imageio. I like the ImageCollection example in your last comment @thvoigtmann—exactly what it was designed for!

thvoigtmann · 2022-03-17T10:29:50Z

Thanks, @stefanv for the comment.

FWIW, if things like MultiImage and friends are useful, we can keep them and maintain them; no one is forcing us to offload to imageio.

Based on my example, I would argue that ImageCollection is worth keeping, but MultiImage isn't, because its behavior can be reproduced with ImageCollection and imageio.mimread easily. The current implementation of MultiImage relies on an internal imread, and I guess that is something that you would want to remove in the long run.

As I discussed with @mkcor, I have a small patch ready that updates the docstring for MultiImage, just so that it is consistent with the current implementation behavior. I guess I should make a new branch and associated PR with it, referring to the discussion here?

stefanv · 2022-03-17T18:09:28Z

That sounds like a good plan, thanks @thvoigtmann. And, yes, a new branch for that doc fix will make it easy to merge.

This adds a test for `ImageCollection` called with a custom `load_func` and a sequence as the `load_pattern`.

alexdesiqueira

Thank you @thvoigtmann! Just some small suggestions around, and then I'll merge it.

skimage/io/collection.py

alexdesiqueira · 2022-04-05T18:01:54Z

skimage/io/collection.py

+    sequence, an ImageCollection of corresponding length will be created,
+    and the individual images will be loaded by calling `load_func` with the
+    matching element of the `load_pattern` as its first argument. In this
+    case, the elements of the sequence do not need to be resolvable file


I didn't understand "resolvable" here. Could we use a nice, simpler synonym? 🙂

Hi @alexdesiqueira, sorry for the late response. "resolvable" here means that it is a file name that exists on the local file system. I'll propose a changed wording in the next commit.

skimage/io/collection.py

Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org>

…-collection

Changed wording of docstring to be simpler.

alexdesiqueira

Thank you, @thvoigtmann! Looks good to me. Merging 🙂

Small change to `ImageCollection` to make it accept non-file-name sequences as the `load_pattern`, as long as a custom `load_func` is given (to which the items in the sequence will be passed in lieu of the image numbers). Included an example for using ImageCollection objects with custom `load_func` and a Sequence object as the `load_pattern`.

More flexible collections with custom load_func.

1109e49

Small change to `ImageCollection` to make it accept non-file-name sequences as the `load_pattern`, as long as a custom `load_func` is given (to which the items in the sequence will be passed in lieu of the image numbers).

mkcor reviewed Mar 11, 2022

View reviewed changes

Updated docstring for ImageCollection.

b5efe26

Included an example for using ImageCollection objects with custom `load_func` and a Sequence object as the `load_pattern`.

mkcor reviewed Mar 12, 2022

View reviewed changes

skimage/io/collection.py Outdated Show resolved Hide resolved

skimage/io/collection.py Outdated Show resolved Hide resolved

skimage/io/collection.py Outdated Show resolved Hide resolved

skimage/io/collection.py Outdated Show resolved Hide resolved

thvoigtmann and others added 6 commits March 13, 2022 14:42

Update skimage/io/collection.py

8c12c82

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

Update skimage/io/collection.py

6e49b67

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

Update skimage/io/collection.py

f214ee5

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

Update skimage/io/collection.py

402730f

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

Merge branch 'scikit-image:main' into binfile-collection

faa7f6c

Added docstring example for ImageCollection

4f8212d

The added docstring example tests the loading of multiple images from one file using a custom `load_func`.

Removed trailing whitespace.

6b8e3ce

mkcor reviewed Mar 16, 2022

View reviewed changes

skimage/io/collection.py Outdated Show resolved Hide resolved

skimage/io/collection.py Outdated Show resolved Hide resolved

thvoigtmann and others added 2 commits March 16, 2022 11:40

Removed redundant line

0c252d0

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

Moved import line in docstring example.

c873e87

mkcor approved these changes Mar 16, 2022

View reviewed changes

skimage/io/collection.py Outdated Show resolved Hide resolved

mkcor added the action: mrg+1 label Mar 16, 2022

Small grammatical fix to docstring

1c9322e

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

This was referenced Mar 18, 2022

Update for the MultiImage docstring. #6290

Merged

Remove dependence on _io in collection.py. #6291

Draft

Updated test suite for new functionality.

268e1dd

This adds a test for `ImageCollection` called with a custom `load_func` and a sequence as the `load_pattern`.

alexdesiqueira suggested changes Apr 5, 2022

View reviewed changes

thvoigtmann and others added 4 commits April 30, 2022 20:51

Apply suggestions from code review

98d0ba6

Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org>

Merge remote-tracking branch 'upstream/main' into binfile-collection

82c4788

Merge remote-tracking branch 'origin/binfile-collection' into binfile…

29dbebc

…-collection

Implement suggestion from code review.

816b7d9

Changed wording of docstring to be simpler.

thvoigtmann requested a review from alexdesiqueira April 30, 2022 19:17

alexdesiqueira approved these changes May 5, 2022

View reviewed changes

alexdesiqueira merged commit 6d8c3e3 into scikit-image:main May 5, 2022

lagru added the ⏩ type: Enhancement Improve existing features label Oct 11, 2022

lagru added this to the 0.20 milestone Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More flexible collections with custom load_func. #6276

More flexible collections with custom load_func. #6276

thvoigtmann commented Mar 7, 2022

mkcor left a comment

thvoigtmann commented Mar 11, 2022 •

edited

mkcor left a comment •

edited

pep8speaks commented Mar 13, 2022 •

edited

thvoigtmann commented Mar 13, 2022 •

edited

mkcor commented Mar 16, 2022

mkcor commented Mar 16, 2022

mkcor commented Mar 16, 2022

thvoigtmann commented Mar 16, 2022

mkcor commented Mar 16, 2022 •

edited

mkcor commented Mar 16, 2022

thvoigtmann commented Mar 16, 2022

stefanv commented Mar 16, 2022

thvoigtmann commented Mar 17, 2022

stefanv commented Mar 17, 2022

alexdesiqueira left a comment

alexdesiqueira Apr 5, 2022

thvoigtmann Apr 30, 2022

alexdesiqueira left a comment

More flexible collections with custom load_func. #6276

More flexible collections with custom load_func. #6276

Conversation

thvoigtmann commented Mar 7, 2022

Description

For reviewers

mkcor left a comment

Choose a reason for hiding this comment

thvoigtmann commented Mar 11, 2022 • edited

mkcor left a comment • edited

Choose a reason for hiding this comment

pep8speaks commented Mar 13, 2022 • edited

Comment last updated at 2022-04-30 18:57:12 UTC

thvoigtmann commented Mar 13, 2022 • edited

mkcor commented Mar 16, 2022

mkcor commented Mar 16, 2022

mkcor commented Mar 16, 2022

thvoigtmann commented Mar 16, 2022

mkcor commented Mar 16, 2022 • edited

mkcor commented Mar 16, 2022

thvoigtmann commented Mar 16, 2022

stefanv commented Mar 16, 2022

thvoigtmann commented Mar 17, 2022

stefanv commented Mar 17, 2022

alexdesiqueira left a comment

Choose a reason for hiding this comment

alexdesiqueira Apr 5, 2022

Choose a reason for hiding this comment

thvoigtmann Apr 30, 2022

Choose a reason for hiding this comment

alexdesiqueira left a comment

Choose a reason for hiding this comment

thvoigtmann commented Mar 11, 2022 •

edited

mkcor left a comment •

edited

pep8speaks commented Mar 13, 2022 •

edited

thvoigtmann commented Mar 13, 2022 •

edited

mkcor commented Mar 16, 2022 •

edited