Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the tif reading testing and implementation #170

Merged
merged 24 commits into from
Jun 13, 2021

Conversation

swkeemink
Copy link
Member

In this pull request I have updated our previous tif reading test, and added two new ones.

One reproduces exactly the way Suite2P saves their tifs (which is breaking our current tif loading, see #148), and one which is just a more general way of saving tif files. All three can have insconsistent behaviour with different ways of loading tifs (to do with being loaded as a single 3D frame or seeing every frame as a separate series), and our tif reading should be updated to pass these tests.

@scottclowe will update our current tif-reading method, and add that to this PR.

@scottclowe
Copy link
Member

Great, thanks Sander! Also, thanks for linking to a stable copy of suite2p's save_tiff function. :)

@scottclowe
Copy link
Member

scottclowe commented Jun 7, 2021

@swkeemink

On Python 3.5, the test_tiffwriter_tiff fails while trying to write the tif file. It is not clear that there would be an issue reading the contents of that file after it was saved. On Python 3.5, this is the only unit test which fails. That means test_suite2p_tiff passes on Python 3.5.

>               tif.write(self.expected[i, :, :], contiguous=True)
E               AttributeError: 'TiffWriter' object has no attribute 'write'

On Python 3.9, test_tiffwriter_tiff passes. Meanwhile test_suite2p_tiff fails, which is currently expected.

        # assert equality
>       self.assert_equal(actual, self.expected)

Clearly at some point the write method was added to tifffile.TiffWriter, and I guess the behaviour of save was changed; and they dropped py3.5 support so we get the old version when we unit test on py3.5 and new version of tifffile on py3.9.

I think the two different tiff files need to be committed to the test resources directory so we aren't relying on being able to write the tiffs at test time. Could you make this change - generate the tiff files and commit them, instead of writing and tearing down during the unit tests?

@swkeemink
Copy link
Member Author

@swkeemink

On Python 3.5, the test_tiffwriter_tiff fails while trying to write the tif file. It is not clear that there would be an issue reading the contents of that file after it was saved. On Python 3.5, this is the only unit test which fails. That means test_suite2p_tiff passes on Python 3.5.

>               tif.write(self.expected[i, :, :], contiguous=True)
E               AttributeError: 'TiffWriter' object has no attribute 'write'

On Python 3.9, test_tiffwriter_tiff passes. Meanwhile test_suite2p_tiff fails, which is currently expected.

        # assert equality
>       self.assert_equal(actual, self.expected)

Clearly at some point the write method was added to tifffile.TiffWriter, and I guess the behaviour of save was changed; and they dropped py3.5 support so we get the old version when we unit test on py3.5 and new version of tifffile on py3.9.

I think the two different tiff files need to be committed to the test resources directory so we aren't relying on being able to write the tiffs at test time. Could you make this change - generate the tiff files and commit them, instead of writing and tearing down during the unit tests?

Now added to the PR!

@scottclowe
Copy link
Member

Great thanks. I'll move the files to the right place.

Not sure what is meant by

using tifffile.__version__ = tifffile.imsave('test_imsave.tif', data).

etc in the doc strings. Did you copy and paste the wrong thing? It looks like you meant to write the tifffile version numbers but copied code instead.

@scottclowe
Copy link
Member

@swkeemink I'll hold off on implementing this until #156 is done, since they are both touching the same bits of the code base and otherwise it'll make difficulties with regards to merge conflicts for whichever PR is finished second.

swkeemink and others added 8 commits June 9, 2021 12:57
- Add test tiffs files to tests/resources/tiffs
- Use these pre-made tiffs for tif-loading tests, instead of
  building them on the fly.
- This is important because the tiff writing functions have
  changed their behaviour across tifffile versions, and we
  want stable tif files to test on.
Not just methods attached to the BaseTestCase class.
@scottclowe scottclowe force-pushed the tiff_read_unit_test branch 3 times, most recently from 5c530de to c3184c1 Compare June 10, 2021 08:22
@codecov
Copy link

codecov bot commented Jun 10, 2021

Codecov Report

Merging #170 (4283130) into master (73cee3b) will decrease coverage by 0.37%.
The diff coverage is 76.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #170      +/-   ##
==========================================
- Coverage   93.03%   92.65%   -0.38%     
==========================================
  Files           8        8              
  Lines         818      831      +13     
  Branches      161      165       +4     
==========================================
+ Hits          761      770       +9     
- Misses         29       31       +2     
- Partials       28       30       +2     
Flag Coverage Δ
unittests 92.53% <76.47%> (-0.38%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
fissa/extraction.py 95.87% <76.47%> (-4.13%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73cee3b...4283130. Read the comment docs.

Copy link
Member

@scottclowe scottclowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded on the unit tests by programmatically generating tiffs in the following ways:

  • dtype varied over uint8, uint16, uint64, int16, int64, float16, float32, float64
  • writer parameters varied over tifffile.imsave with bigtiff=True/False, imagej=True/False; tifffile.TiffWriter().save; tifffile.TiffWriter().write with contiguous=False, contiguous=True, and a mixture of contiguous=False/True; tifffile.TiffWriter() with a mixture of write and save
  • shaped (6, 3, 2), (3, 2, 3, 2), (2, 1, 3, 3, 2), and a series of pages which do not stack [(4, 3, 2), (2, 3, 2)]
    The code to generate the tiffs is added as fissa/tests/generate_tiffs.py, and was executed using imageio 2.9.0 and tifffile 2021.6.6 on Python 3.8.

Since the tifffile documentation describes ImageJ hyperstacks as being able to store up to 6-dimensional image data, I have made our DataHandlerTifffile loading method able to handle arbitrarily sized TIFFs of at least 2 dimensions.

Note that the way that we handle higher dimensional image data is to flatten all dimensions beyond the 2 inner most dimensions (height and width) together. This assumes all higher dimensions are time-like, and do not include spatial changes. This means that (for instance) FISSA will try to run on 4d TIFF stacks which have both time and depth dimensions in addition to height and width (TDHW) without raising an error, and will obviously do a really bad job with them, because the origin of signals is not shaped like a cylinders across the depth dimension.

Since we can't really inspect the DataHandlerPillow data variable and see if it is correct (the return is a PIL.Image handle with lazy loading, and not an ndarray), I have added tests which check that the mean over all frames is accurate using both image2array and getmean as part of the same test.

We find that DataHandlerTifffile (with the new changes) can handle all of the formats tested against, but DataHandlerPillow can not. DataHandlerPillow does not support:

  • uint64
  • int64
  • float16
  • float64
  • bigtiff
  • unstackable shapes [(4, 3, 2), (2, 3, 2)]

@swkeemink, could you review the changes I have made?

@swkeemink
Copy link
Member Author

Question: do the expanded tests catch the bug thrown by using the suite2p tests? (only a single frame read)
Their way of saving tiffs is now not included explicitly but I'm guessing one of the tifffile.imread's functionally does the same.

Copy link
Member Author

@swkeemink swkeemink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great changes and additions. See above for my question about Suite2p output.

About Pillow not passing all the tests, that might just be what it is. It does make the use-case a bit confusing 'low-memory mode'. I would actually consider taking it out given that it apparently has such poor support for different tiff files.



if __name__ == "__main__":
print("Using imageio {}".format(imageio.__version__))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should print the used package versions into a text-file so it's clear which version was used for the actual tiffs stored on the repo now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the commit (120784d) that introduced/last modified these test resources (which is an obvious thing to do if you are trying to check the provenance of the files), it says the version numbers in the commit message. I don't think we need it in a text file. To me at least, it's not obvious to expect that there should be a text file with this content to look for, and it will easily be lost in amongst a directory of 92 tiff files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, happy to not have that.

image, page.ndim, page.shape
)
)
shp = [-1] + list(page.shape[-2:])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should perhaps throw a warning at least when the dimensionality is higher than 3, and state that we concatenate all dimensions > 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the warnings, and tests that make sure the warnings are warned.

However, it turns out most higher-dimensional TIFF stack really are loaded frame-by-frame when loaded page-by-page with TiffReader, so we can't always tell that the structure it was saved as had more than 3 dimensions. How I've done it is the best we can do though.

return tifffile.imread(image)
return np.array(image)
if not isinstance(image, basestring):
return np.array(image)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could throw a TypeError() if the input is neither an array or basestring, perhaps even with the suggestion to write a new datahandling method. We ask for array-like inputs so it can also handle lists I guess, so perhaps we can have:

Suggested change
return np.array(image)
if isinstance(image, (np.ndarray, list)):
return np.array(image)
else:
raise TypeError('Wrong image input format: unfamiliar shape. Expected array_like or string.')

Although lists of things can be lots of things and probably not good to expect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to do duck typing - anything which is array-like should be handled in this way. It could be a list of lists. It could be a tuple instead of a list. Or it could be a pandas.DataFrame. Or any arbitrary array-like data structure.

Either way, I've not altered this bit of the code (the behaviour w.r.t. non-strings is the same as it was before), so changing it is off-topic for this PR.

Incidentally, I think it would be better to use np.asarray instead of np.array, since np.array will make a copy of the memory if the input is already an ndarary and np.asarray doesn't make a copy.

@scottclowe
Copy link
Member

Question: do the expanded tests catch the bug thrown by using the suite2p tests? (only a single frame read)
Their way of saving tiffs is now not included explicitly but I'm guessing one of the tifffile.imread's functionally does the same.

Yes, of course. It is included, it's just not called "suite2p" any more. It is the "TiffWriter.save_int16.tif" test resource. I went for a description of what the save method is instead to fit in with the programatically generated suite. Also, suite2p might change their save behaviour in the future.

TIFF test resources, in fissa/tests/resources/tiffs, were built
using imageio 2.9.0 and tifffile 2021.6.6.
@scottclowe scottclowe force-pushed the tiff_read_unit_test branch 4 times, most recently from 3629d45 to fc81ebc Compare June 13, 2021 07:39
@scottclowe scottclowe merged commit 3b1ba55 into rochefort-lab:master Jun 13, 2021
@swkeemink swkeemink deleted the tiff_read_unit_test branch June 29, 2021 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants