Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for data types and other common sources of errors #3009

Open
jni opened this Issue Apr 9, 2018 · 17 comments

Comments

Projects
None yet
6 participants
@jni
Copy link
Contributor

jni commented Apr 9, 2018

Description

I recently mentioned on the mailing list that a large proportion of our errors come from new (and old!) users surprised by the behavior of our data type conversions.

Although they are documented, this documentation is not prominent enough. Indeed, we barely even cover these in our scikit-image tutorials! Additionally, our own code base really is not careful enough to guide users along.

I want this issue to compile both common errors due to data types and proposed solutions, including a to-do list of consensus solutions.

Common errors

(Please add links here if you find them.)

  • User has a 12-bit image that is encoded in a uint16 array. When this is converted to uint8 or float, the result is an image that has much lower contrast (16x) than expected.
  • User has a float image with values falling outside the range [0, 1] or [-1, 1]. Some scikit-image functions fail altogether with this kind of input, while others warn. In almost all cases, these warnings are unnecessary.
  • User wants to apply scikit-image functions to an integer array. This array, by default, is of type int32 or int64. scikit-image converts the values in the array to float by dividing by 2^31, resulting in tiny values (which for example are impossible to encode in common image formats)

Proposed solutions

  • Figure out image metadata format and handling. Having correct metadata (bit_depth=12, for example) would alleviate many issues.
  • Clearer separation of data viewing and data processing. See email by @ds604. Specifically, we could tweak io.imshow to clearly display characteristics of the input data, including text info in the figure title/subtitle, colorbar, dtype, etc. Perhaps revisiting skimage.viewer as an integral part of skimage is warranted.

Rejected proposals

  • Magically infer the range of input images (objections here and here.)

To-do list

  • Add an FAQ section to the website homepage.
  • Audit the code base to make sure that float images work unimpeded wherever it's possible. (Search for img_as_float calls.) (Detailed suggestion by @grlee77 here.)
  • Remove automatic range conversion for int32 and int64 images.
  • Warn on conversion of uint16 images where no value exceeds 4096 (possible 12-bit image)
  • Add bit-depth option to in_range and out_range parameters for exposure.rescale_intensity.
@soupault

This comment has been minimized.

Copy link
Member

soupault commented May 7, 2018

xref #2605

@soupault soupault added this to the 1.0 milestone May 7, 2018

@stefanv

This comment has been minimized.

Copy link
Member

stefanv commented May 9, 2018

Remove automatic range conversion for int32 and int64 images.

How would we do this, given that we typically convert to floating point images before processing?

@stefanv

This comment has been minimized.

Copy link
Member

stefanv commented May 9, 2018

For metadata, perhaps switching to a single I/O backend per format would help.

@jni

This comment has been minimized.

Copy link
Contributor Author

jni commented May 9, 2018

@stefanv

Remove automatic range conversion for int32 and int64 images.
How would we do this, given that we typically convert to floating point images before processing?

Something along the lines of preserve_range=True implicitly if the dtype is int higher than uint16. And before you complain about the magic of this, we are already magically treating uint32 images differently from uint16 ones!

@hmaarrfk

This comment has been minimized.

Copy link
Member

hmaarrfk commented May 18, 2018

Personally, I'm against constantly checking bounds. If I want to perform convolutions and scaling on my images, that should be my choice. most algorithms are scale invariant, or require some kind of normalization anyway.

the speed of the algorithms shouldn't be impeded by range checks on my giant matricies multiple times per calls.

what is the point of imshow really? what is deficient in matplotlib.pyplot.imshow? performance, yes, but it autoscales and allows you to do things like add colorbars when you want to. Using it also empowers the user to learn a useful other tool so that they can expand on their skillset later.

I think visualization should be left to the user and not to the processing library.

@hmaarrfk

This comment has been minimized.

Copy link
Member

hmaarrfk commented May 18, 2018

this kinda relates to my PR #3062 where I dont think we should impose 64 bit floats on users.

I would suggest that if you want to make it new user proof, that you change the image load functions. Those happen once at the beginning and probably have had little processing applied to them.

Many of the functions first call img_as_float. loading images should arguably just convert it to the float by default, alleviating many issues with subsequent conversions.

@emmanuelle

This comment has been minimized.

Copy link
Member

emmanuelle commented May 28, 2018

Could we consider dropping range conversion altogether (like for 1.0 version)?

@jni

This comment has been minimized.

Copy link
Contributor Author

jni commented May 28, 2018

To answer that, we need to figure out all the places where the assumed range is useful. There is at least one case: RGB float images being displayed by matplotlib are clipped to [0, 1]:

In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: image = np.random.random((10, 10, 3)) * 255

In [4]: plt.imshow(image)
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Out[4]: <matplotlib.image.AxesImage at 0x7f204a2e0a90>

I don't want to coordinate a deprecation with matplotlib.

It also matters when downcasting: what do you do when you have out-of-range values? I don't want to trade rescaling error reports for overflow error reports.

@emmanuelle

This comment has been minimized.

Copy link
Member

emmanuelle commented May 28, 2018

OK thanks. Do we do a lot of downcasting?

@jni

This comment has been minimized.

Copy link
Contributor Author

jni commented May 28, 2018

float -> uint8 is very very very common. I presume that anyone that does uint16 (e.g. 12-bit image) -> float will end up saving their image as uint8.

@jni

This comment has been minimized.

Copy link
Contributor Author

jni commented May 28, 2018

Probably you are right that rescale_ utility functions is the right answer for this.

@emmanuelle

This comment has been minimized.

Copy link
Member

emmanuelle commented May 28, 2018

I agree but we don't do that in scikit-image code, right? So utility functions should do the job

@emmanuelle

This comment has been minimized.

Copy link
Member

emmanuelle commented May 28, 2018

Still needs to be checked though.

@jni

This comment has been minimized.

Copy link
Contributor Author

jni commented Jun 1, 2018

It could be that lab2rgb code is also suffering from data range coercion, see #3078

@jni jni referenced this issue Nov 28, 2018

Open

Affine Image Registration #3544

4 of 8 tasks complete

jni added a commit to jni/scikit-image that referenced this issue Dec 3, 2018

Add option to suppress warnings in dtype.convert
See discussions:
scikit-image#543 (comment)
scikit-image#2602

Specifically, @stefanv said [1]_:

> Why don't we just remove this warning entirely? It seemed like the
> proper thing to do at the time, but it's never been useful.

Therefore, I set the default to False, but leave the option for warning
there since it's potentially useful to some. When and if we stop doing
automatic dtype conversions (see scikit-image#3009), we might be able to remove
warnings altogether.

..[1]: scikit-image#2602 (comment)

@jni jni referenced this issue Dec 3, 2018

Open

Remove precision loss warnings by default #3575

5 of 9 tasks complete

jni added a commit to jni/scikit-image that referenced this issue Feb 9, 2019

Add option to suppress warnings in dtype.convert
See discussions:
scikit-image#543 (comment)
scikit-image#2602

Specifically, @stefanv said [1]_:

> Why don't we just remove this warning entirely? It seemed like the
> proper thing to do at the time, but it's never been useful.

Therefore, I set the default to False, but leave the option for warning
there since it's potentially useful to some. When and if we stop doing
automatic dtype conversions (see scikit-image#3009), we might be able to remove
warnings altogether.

..[1]: scikit-image#2602 (comment)

jni added a commit to jni/scikit-image that referenced this issue Feb 15, 2019

Add option to suppress warnings in dtype.convert
See discussions:
scikit-image#543 (comment)
scikit-image#2602

Specifically, @stefanv said [1]_:

> Why don't we just remove this warning entirely? It seemed like the
> proper thing to do at the time, but it's never been useful.

Therefore, I set the default to False, but leave the option for warning
there since it's potentially useful to some. When and if we stop doing
automatic dtype conversions (see scikit-image#3009), we might be able to remove
warnings altogether.

..[1]: scikit-image#2602 (comment)

jni added a commit to jni/scikit-image that referenced this issue Feb 24, 2019

Add option to suppress warnings in dtype.convert
See discussions:
scikit-image#543 (comment)
scikit-image#2602

Specifically, @stefanv said [1]_:

> Why don't we just remove this warning entirely? It seemed like the
> proper thing to do at the time, but it's never been useful.

Therefore, I set the default to False, but leave the option for warning
there since it's potentially useful to some. When and if we stop doing
automatic dtype conversions (see scikit-image#3009), we might be able to remove
warnings altogether.

..[1]: scikit-image#2602 (comment)
@jni

This comment has been minimized.

Copy link
Contributor Author

jni commented Mar 13, 2019

A small update about this issue. I hope to start tackling this more systematically in the coming months. I asked @batson, author of noise2self, for any friction he might have encountered using skimage in the paper. His response (after being very complimentary, I should note! =):

one thing which killed was that the denoising functions (median filter especially, but maybe the others) were not type consistent. it would take in an image as float and return uint8, etc
so i would try to compare PSNR with the output and get something crazy
also NL-means could take in uint8 and return floats, but floats on the 0-255 scale
so not images according to skimage

The last bit is particularly concerning, and I can indeed see that we just do .astype(float64) for input images in the NL means code:

cdef IMGDTYPE [:, :, ::1] padded = np.ascontiguousarray(np.pad(image,
((offset, offset), (offset, offset), (0, 0)),
mode='reflect').astype(np.float64))
cdef IMGDTYPE [:, :, ::1] result = padded.copy()

@stefanv

This comment has been minimized.

Copy link
Member

stefanv commented Mar 13, 2019

An algorithm should never return a type of lower precision than the input, unless it implicitly throws away information (thresholding, e.g.).

I don't think we can solve the problem of int types being mapped to float. That is often the only way to produce an accurate answer.

@grlee77

This comment has been minimized.

Copy link
Contributor

grlee77 commented Mar 13, 2019

The last bit is particularly concerning, and I can indeed see that we just do .astype(float64) for input images in the NL means code

You have to use floats internally for the denoising methods, but I think the inconsistency is that denoise_nl_means is not doing the conversion via img_as_float while all other denoising functions are. So int8 input to most denoising functions results in float output in the range [0, 1] while for nl-means there is no rescaling. My vote is to add img_as_float to denoise_nl_means, although that will result in a change to the output amplitude for non-float inputs so may need a deprecation cycle?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.