Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change the max value of poisson noise input from 2 ** n to 2 ** n - 1 #6531

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

newcomertzc
Copy link

@newcomertzc newcomertzc commented Sep 17, 2022

Description

This PR changes the implementation of poisson noise in skimage.util.random_noise.

As the max value of an unsigned n-bit integer is 2 ** n - 1 and the min value is 0, the scaling operation should be image * (2 ** n - 1). For example, when we take an image of type uint8 as input, it will be calculated as image / 255 * 256, which is weird.

elif mode == 'poisson':
# Determine unique values in image & calculate the next power of two
vals = len(np.unique(image))
vals = 2 ** np.ceil(np.log2(vals))
# Ensure image is exclusively positive
if low_clip == -1.:
old_max = image.max()
image = (image + 1.) / (old_max + 1.)
# Generating noise for each unique value in image.
out = rng.poisson(image * vals) / float(vals)
# Return image to original range if input was signed
if low_clip == -1.:
out = out * (old_max + 1.) - 1.

@lagru
Copy link
Member

lagru commented Sep 18, 2022

Hey @newcomertzc, thank you for the suggested improvement! From looking at the code and reading the docstring, I'm not sure that the the scaling operation with a power of two has a meaningful relation to the bit-representation of an integer. The calculation scaling is applied on a float image and it is returned as such as well.

Does someone know why the scaling happens with a power of two?

@newcomertzc
Copy link
Author

newcomertzc commented Sep 18, 2022

Hey @newcomertzc, thank you for the suggested improvement! From looking at the code and reading the docstring, I'm not sure that the the scaling operation with a power of two has a meaningful relation to the bit-representation of an integer. The calculation scaling is applied on a float image and it is returned as such as well.

Does someone know why the scaling happens with a power of two?

Sorry for not expressing clearly. Unsigned n-bit integer is not accurate enough. Actually I want to express n-bit color, such as 8-bit color (256 colors) and 16-bit color (65536 colors). Since the Poisson distribution is a discrete probability distribution for non-negative integers (although numpy.random.poisson accepts float input), we need to convert the different colors represented by float numbers to integer representations. Taking 256 colors as an example, its float representation [0.0, 1/255, 2/255, ..., 254/255, 1.0] should be converted to integer representation [0, 1, 2, ..., 254, 255], which is then passed to numpy.random.poisson as input. However, if we multiply the former by 256 (2 ** 8) other than 255 (2 ** 8 - 1), we get a weird float representation: [0.0, 256/255, 512/255, ..., 254*256/255, 256]. So I think multiplying float image by 2 ** 8 - 1 is more reasonable than multiplying it by 2 ** 8.

Here is Matlab's documentation about Poisson Noise, where the processing method for uint8 and uint16 images is the same.

@newcomertzc
Copy link
Author

@lagru In conclusion, I think the scaling with a power of two is aimed to convert float image to integer image. However, multiplying float image by 2 ** n does not scale float image in a proper way, as the following code shows:

from skimage import data, img_as_float
img = img_as_float(data.camera())
print('-'*32)
print(f"img * 255:\n{img*255}")
print('-'*32)
print(f"img * 256:\n{img*256}") 

output:
Snipaste_2022-09-18_21-59-08

@lagru
Copy link
Member

lagru commented Sep 20, 2022

Ah, thank you @newcomertzc for the detailed explanation. The Poisson is discrete so we need to scale lambda up so that we get a reasonable resolution of the Poisson distribution. Currently the function does so by assuming that the next power of two of the unique values in the image gives is a reasonable noise resolution.

You are arguing that it is "weird". What precisely do you mean by that? The only difference I see between using -1 or not is a slight difference in resolution. Concerning your example: how does this apply? random_noise scales back down to a [0, 1] range. Either value, 256 or 255, will lead to small errors due to the finite floating point resolution.

@newcomertzc
Copy link
Author

newcomertzc commented Sep 20, 2022

@lagru I think not using -1 leads to bigger errors. Consider an image with 256 colors. The unique integer values appeared in the image are 0, 1, 2, ... 254, 255 (256 numbers), and the corresponding unique float values are 0.0, 1/255, 2/255, ... 254/255, 1.0 (also 256 numbers). I refer to the above numbers as "valid values".

Since the output of numpy.random.poisson is of integer type, dividing it by 256 gets a float image whose unique values is 0.0, 1/256, 2/256, ..., 255/256, 1.0 (257 numbers, those greater than 1.0 are clipped in random_noise). I think it's "weird" for a 256-color image to have 257 unique values. That means you need to convert these 257 unique values to the "valid values" of 256-color image mentioned above, or even convert them to "valid values" of 65536 colors. I think the errors caused by this are significantly larger than the errors caused by the finite floating point resolution.

@lagru
Copy link
Member

lagru commented Sep 20, 2022

I think it's "weird" for a 256-color image to have 257 unique values.

But it is not a 256-color (or 8-bit) image I think? It is the result of sampling poisson distributions with lambdas of up to 256. The noisy result is then clipped to a maximum of 257 unique intensity values within a [0., 1.] float range. When this is converted back to an actual 8-bit representation, e.g. with img_as_ubyte, we get some loss due to rounding but only to the extent that we round the "noise resolution" down to 256 unique values.

I totally get why this might be un-intuitive and surprising. If we were to add this as new functionality I would be happy to use 2 ** n -1. However, I think this surprising behavior is largely hidden from the user. And if we patch this, we will create small changes in user code somewhere even though it's been pinned with a seed value. I'm not what we would gain anything from this change that would offset this cost. And still would not call this a bug.

@newcomertzc, I'm sorry about being a bit pedantic on this. Please feel welcome to continue the conversation. I may be still missing your point completely and I'd be happy to reach a common understanding here. And other contributors and maintainers might view this different than me anyway. 😉

@lagru lagru added 🔧 type: Maintenance Refactoring and maintenance of internals 💬 Discussion labels Sep 20, 2022
@grlee77
Copy link
Contributor

grlee77 commented Sep 21, 2022

Thanks @newcomertzc. I am also not sure if the fix here outweighs the backwards-incompatible values. Given that we always returned normalized floats, we are already not consistent with the corresponding Matlab behavior whether we make the change or not.

@stefanv
Copy link
Member

stefanv commented Oct 17, 2022

Our implementation looks incorrect to me.

First, we calculate:

        vals = len(np.unique(image))

On a floating point image. That makes no sense! I think we should think more carefully about what the scaling factor should be.

Also, having clip=True the default here is regrettable; it leads to non-linear behavior near the upper ends of values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💬 Discussion 🔧 type: Maintenance Refactoring and maintenance of internals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants