-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the tif reading testing and implementation #170
Update the tif reading testing and implementation #170
Conversation
Great, thanks Sander! Also, thanks for linking to a stable copy of suite2p's save_tiff function. :) |
On Python 3.5, the
On Python 3.9,
Clearly at some point the write method was added to I think the two different tiff files need to be committed to the test resources directory so we aren't relying on being able to write the tiffs at test time. Could you make this change - generate the tiff files and commit them, instead of writing and tearing down during the unit tests? |
Now added to the PR! |
Great thanks. I'll move the files to the right place. Not sure what is meant by
etc in the doc strings. Did you copy and paste the wrong thing? It looks like you meant to write the tifffile version numbers but copied code instead. |
@swkeemink I'll hold off on implementing this until #156 is done, since they are both touching the same bits of the code base and otherwise it'll make difficulties with regards to merge conflicts for whichever PR is finished second. |
- Add test tiffs files to tests/resources/tiffs - Use these pre-made tiffs for tif-loading tests, instead of building them on the fly. - This is important because the tiff writing functions have changed their behaviour across tifffile versions, and we want stable tif files to test on.
Not just methods attached to the BaseTestCase class.
5c530de
to
c3184c1
Compare
Codecov Report
@@ Coverage Diff @@
## master #170 +/- ##
==========================================
- Coverage 93.03% 92.65% -0.38%
==========================================
Files 8 8
Lines 818 831 +13
Branches 161 165 +4
==========================================
+ Hits 761 770 +9
- Misses 29 31 +2
- Partials 28 30 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
c3184c1
to
819bd66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded on the unit tests by programmatically generating tiffs in the following ways:
- dtype varied over uint8, uint16, uint64, int16, int64, float16, float32, float64
- writer parameters varied over tifffile.imsave with bigtiff=True/False, imagej=True/False; tifffile.TiffWriter().save; tifffile.TiffWriter().write with contiguous=False, contiguous=True, and a mixture of contiguous=False/True; tifffile.TiffWriter() with a mixture of write and save
- shaped
(6, 3, 2)
,(3, 2, 3, 2)
,(2, 1, 3, 3, 2)
, and a series of pages which do not stack[(4, 3, 2), (2, 3, 2)]
The code to generate the tiffs is added asfissa/tests/generate_tiffs.py
, and was executed using imageio 2.9.0 and tifffile 2021.6.6 on Python 3.8.
Since the tifffile documentation describes ImageJ hyperstacks as being able to store up to 6-dimensional image data, I have made our DataHandlerTifffile loading method able to handle arbitrarily sized TIFFs of at least 2 dimensions.
Note that the way that we handle higher dimensional image data is to flatten all dimensions beyond the 2 inner most dimensions (height and width) together. This assumes all higher dimensions are time-like, and do not include spatial changes. This means that (for instance) FISSA will try to run on 4d TIFF stacks which have both time and depth dimensions in addition to height and width (TDHW) without raising an error, and will obviously do a really bad job with them, because the origin of signals is not shaped like a cylinders across the depth dimension.
Since we can't really inspect the DataHandlerPillow data variable and see if it is correct (the return is a PIL.Image handle with lazy loading, and not an ndarray), I have added tests which check that the mean over all frames is accurate using both image2array
and getmean
as part of the same test.
We find that DataHandlerTifffile (with the new changes) can handle all of the formats tested against, but DataHandlerPillow can not. DataHandlerPillow does not support:
- uint64
- int64
- float16
- float64
- bigtiff
- unstackable shapes
[(4, 3, 2), (2, 3, 2)]
@swkeemink, could you review the changes I have made?
819bd66
to
67c9634
Compare
Question: do the expanded tests catch the bug thrown by using the suite2p tests? (only a single frame read) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall great changes and additions. See above for my question about Suite2p output.
About Pillow not passing all the tests, that might just be what it is. It does make the use-case a bit confusing 'low-memory mode'. I would actually consider taking it out given that it apparently has such poor support for different tiff files.
|
||
|
||
if __name__ == "__main__": | ||
print("Using imageio {}".format(imageio.__version__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should print the used package versions into a text-file so it's clear which version was used for the actual tiffs stored on the repo now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the commit (120784d) that introduced/last modified these test resources (which is an obvious thing to do if you are trying to check the provenance of the files), it says the version numbers in the commit message. I don't think we need it in a text file. To me at least, it's not obvious to expect that there should be a text file with this content to look for, and it will easily be lost in amongst a directory of 92 tiff files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, happy to not have that.
image, page.ndim, page.shape | ||
) | ||
) | ||
shp = [-1] + list(page.shape[-2:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should perhaps throw a warning at least when the dimensionality is higher than 3, and state that we concatenate all dimensions > 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the warnings, and tests that make sure the warnings are warned.
However, it turns out most higher-dimensional TIFF stack really are loaded frame-by-frame when loaded page-by-page with TiffReader, so we can't always tell that the structure it was saved as had more than 3 dimensions. How I've done it is the best we can do though.
fissa/extraction.py
Outdated
return tifffile.imread(image) | ||
return np.array(image) | ||
if not isinstance(image, basestring): | ||
return np.array(image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could throw a TypeError() if the input is neither an array or basestring, perhaps even with the suggestion to write a new datahandling method. We ask for array-like inputs so it can also handle lists I guess, so perhaps we can have:
return np.array(image) | |
if isinstance(image, (np.ndarray, list)): | |
return np.array(image) | |
else: | |
raise TypeError('Wrong image input format: unfamiliar shape. Expected array_like or string.') |
Although lists of things can be lots of things and probably not good to expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to do duck typing - anything which is array-like should be handled in this way. It could be a list of lists. It could be a tuple instead of a list. Or it could be a pandas.DataFrame. Or any arbitrary array-like data structure.
Either way, I've not altered this bit of the code (the behaviour w.r.t. non-strings is the same as it was before), so changing it is off-topic for this PR.
Incidentally, I think it would be better to use np.asarray instead of np.array, since np.array will make a copy of the memory if the input is already an ndarary and np.asarray doesn't make a copy.
Yes, of course. It is included, it's just not called "suite2p" any more. It is the |
TIFF test resources, in fissa/tests/resources/tiffs, were built using imageio 2.9.0 and tifffile 2021.6.6.
3629d45
to
fc81ebc
Compare
fc81ebc
to
1752e7e
Compare
In this pull request I have updated our previous tif reading test, and added two new ones.
One reproduces exactly the way Suite2P saves their tifs (which is breaking our current tif loading, see #148), and one which is just a more general way of saving tif files. All three can have insconsistent behaviour with different ways of loading tifs (to do with being loaded as a single 3D frame or seeing every frame as a separate series), and our tif reading should be updated to pass these tests.
@scottclowe will update our current tif-reading method, and add that to this PR.