New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: misc: allow both cmin/cmax and low/high params in bytescale #6578
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the TravisCI errors are unrelated to this change.
Other than the backwards-compat issue, I'm +1 to merge.
@@ -94,10 +94,10 @@ def bytescale(data, cmin=None, cmax=None, high=255, low=0): | |||
cscale = 1 | |||
|
|||
scale = float(high - low) / cscale | |||
bytedata = (data * 1.0 - cmin) * scale + 0.4999 | |||
bytedata = (data * 1.0 - cmin) * scale + low |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not part of your changes, but it seems like the * 1.0
is unnecessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll remove it.
bytedata[bytedata < 0] = 0 | ||
return cast[uint8](bytedata) + cast[uint8](low) | ||
bytedata[bytedata < low] = low | ||
return cast[uint8](bytedata.round()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this rounding is breaking backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding 0.4999 instead of rounding would maintain that backwards compatibility, but to me using the round()
method seems more correct. Which is more important, correctness or backwards compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I overlooked that, sorry! I'm on the side of correctness here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a tricky one. I'd tend to say that the 0.4999 is incorrect. However @WarrenWeckesser wanted a backwards compatible change (#5305 (comment)).
(This started out as a reply to Ralf's inline comment, but it got a little too long for an inline comment reply.) If we ignore error-checking for the moment, it looks likes this function could almost be written as:
In the existing code (not this pull request),
Under any reasonable convention for handling the points exactly on a quantization boundary, the above results should both be 1. I suspect the intent of adding 0.4999 was to always round down. In the boundary cases,
Suppose we do want to always round down. The numpy
Or use
|
I like the idea of using Regarding rounding, what's the rationale for wanting to round down? If we don't want to use numpy's round() function because it implements a 'bankers' rounding, why not add 0.5 and truncate to round up (rather than 0.4999 or some number closer to but still less than 0.5 to round down)? |
I don't know; my interpretation of the intent of adding 0.4999 instead of 0.5 is just a guess. We're now dealing with the fallout of an undocumented magic constant. Grrr.
If we were starting from scratch, I think that would be a reasonable thing to do. Perhaps it is a reasonable thing to do now, too. It might cause some different results in some cases, but I think we can argue that many of those cases (e.g. the example I gave earlier) were actually incorrect, so we're fixing a bug. But in an example such as |
@WarrenWeckesser You also suggested clipping to (0, 255), rather than clipping to (low, high):
The initial implementation was clipping between (0, high), which I assumed was a mistake. In my mind, clipping between (low, high) seemed like the correct approach, and therefore I wrote a unittest to support that output. But obviously clipping to (0, 255) yields different results. Which is preferred? Clipping output to (low, high):
Clipping output to (0, 255):
|
I didn't think too much about that aspect when I wrote the code. Now I think clipping to (low, high) makes more sense. By the way, there is currently a check for |
Ensure that 0 <= low <= high <= 255
I've added a few more changes, based on the comments provided above. Thanks for your feedback! |
This looks good to me, and all the tests still pass. There are some edges cases where this returns a different array. In particular, this is evident in the modified test of |
So I looked into #4458. The changes made in this PR have very little affect on imresize. The only affect would be due to the rounding changes made. Some background on #4458: If 'mode' is not specified when calling This PR doesn't change any of that logic. The only affect this PR should have on imresize on scipy 0.18.1In [1]: import scipy
In [2]: scipy.__version__
Out[2]: '0.18.1'
In [3]: from skimage import data
In [4]: clock = data.clock()
In [5]: def stats(img):
...: print 'min', img.min()
...: print 'mean', img.mean()
...: print 'max', img.max()
...: print 'shape', img.shape
...: print 'dtype', img.dtype
...:
In [6]: clock2 = scipy.misc.imresize(clock.astype('uint8'), [1000, 1000])
In [7]: clock3 = scipy.misc.imresize(clock.astype('float32'), [1000, 1000])
In [8]: stats(clock)
min 99
mean 146.331533333
max 247
shape (300, 400)
dtype uint8
In [9]: stats(clock2)
min 99
mean 146.398597
max 247
shape (1000, 1000)
dtype uint8
In [10]: stats(clock3)
min 0
mean 81.600154
max 254
shape (1000, 1000)
dtype uint8
In [11]: imresize on misc.bytescale-lowhigh branchIn [1]: import scipy
In [2]: scipy.__version__
Out[2]: '0.19.0.dev0+09d22ca'
In [3]: from skimage import data
In [4]: clock = data.clock()
In [5]: def stats(img):
...: print 'min', img.min()
...: print 'mean', img.mean()
...: print 'max', img.max()
...: print 'shape', img.shape
...: print 'dtype', img.dtype
...:
In [6]: clock2 = scipy.misc.imresize(clock.astype('uint8'), [1000, 1000])
In [7]: clock3 = scipy.misc.imresize(clock.astype('float32'), [1000, 1000])
In [8]: stats(clock)
min 99
mean 146.331533333
max 247
shape (300, 400)
dtype uint8
In [9]: stats(clock2)
min 99
mean 146.398597
max 247
shape (1000, 1000)
dtype uint8
In [10]: stats(clock3)
min 0
mean 81.601141
max 254
shape (1000, 1000)
dtype uint8
In [11]: Also note that the issue identified in #4458 can be easily overcome by simply providing an appropriate In [20]: stats(scipy.misc.imresize(clock.astype('float32'), [1000, 1000], mode='F'))
min 99.345
mean 146.331
max 246.36
shape (1000, 1000)
dtype float32 |
We could probably make a change to |
Regarding #5305, it's pretty much the same situation as #4458. As far as I can tell, this PR should have very little impact on imsave. The only difference would be a slight difference in values due to the changes in rounding. The main issue with imsave using scipy 0.18.1In [35]: stats(clock.astype('float32'))
min 99.0
mean 146.332
max 247.0
shape (300, 400)
dtype float32
In [36]: scipy.misc.imsave('/tmp/clock_float32.tif', clock.astype('float32'))
In [37]: clock_float32 = scipy.misc.imread('/tmp/clock_float32.tif')
In [38]: stats(clock_float32)
min 0
mean 81.542275
max 255
shape (300, 400)
dtype uint8
In [39]: imsave using misc.bytescale-lowhighIn [13]: stats(clock.astype('float32'))
min 99.0
mean 146.332
max 247.0
shape (300, 400)
dtype float32
In [14]: scipy.misc.imsave('/tmp/clock_float32.tif', clock.astype('float32'))
In [15]: clock_float32 = scipy.misc.imread('/tmp/clock_float32.tif')
In [16]: stats(clock_float32)
min 0
mean 81.5432666667
max 255
shape (300, 400)
dtype uint8
In [17]: So again, this PR doesn't change the behavior of |
Agreed. Thanks for the detailed investigation @gdooper! You've pretty much figured out what's needed for those two issues, so if you'd be interested to send follow up PRs for those that would be great. This is good to go, so merging. Thanks all! |
Resolves issue #6552 where bytescaling an array using both cmin/cmax and low/high parameters resulted in an unexpected array.