BUG: misc: allow both cmin/cmax and low/high params in bytescale #6578

gdooper · 2016-09-15T01:36:45Z

Resolves issue #6552 where bytescaling an array using both cmin/cmax and low/high parameters resulted in an unexpected array.

perimosocordiae

Looks like the TravisCI errors are unrelated to this change.

Other than the backwards-compat issue, I'm +1 to merge.

perimosocordiae · 2016-09-15T18:48:42Z

scipy/misc/pilutil.py

@@ -94,10 +94,10 @@ def bytescale(data, cmin=None, cmax=None, high=255, low=0):
        cscale = 1

    scale = float(high - low) / cscale
-    bytedata = (data * 1.0 - cmin) * scale + 0.4999
+    bytedata = (data * 1.0 - cmin) * scale + low


I know it's not part of your changes, but it seems like the * 1.0 is unnecessary here.

Good point. I'll remove it.

perimosocordiae · 2016-09-15T18:50:16Z

scipy/misc/pilutil.py

-    bytedata[bytedata < 0] = 0
-    return cast[uint8](bytedata) + cast[uint8](low)
+    bytedata[bytedata < low] = low
+    return cast[uint8](bytedata.round())


Seems like this rounding is breaking backwards compatibility.

Adding 0.4999 instead of rounding would maintain that backwards compatibility, but to me using the round() method seems more correct. Which is more important, correctness or backwards compatibility?

Ahh, I overlooked that, sorry! I'm on the side of correctness here.

That's a tricky one. I'd tend to say that the 0.4999 is incorrect. However @WarrenWeckesser wanted a backwards compatible change (#5305 (comment)).

rgommers · 2016-09-26T09:10:54Z

Would be nice to close gh-6552, gh-4458 and gh-5305 all in one go here.

WarrenWeckesser · 2016-09-26T15:34:56Z

(This started out as a reply to Ralf's inline comment, but it got a little too long for an inline comment reply.)

If we ignore error-checking for the moment, it looks likes this function could almost be written as:

def bytescale(data, cmin=None, cmax=None, high=255, low=0):

    if cmin is None:
        cmin = data.min()
    if cmax is None:
        cmax = data.max()

    # Linear map from [cmin, cmax] to [low, high]
    scale = float(high - low) / (cmax - cmin)
    scaled_data = (scale * (data - cmin)) + low

    # Quantize the scaled data by rounding.
    scaled8bit = scaled_data.round().clip(0, 255).astype(np.uint8)  # XXX Probably not what we want!

    return scaled8bit

In the existing code (not this pull request), round is not used. Instead, 0.4999 is added before the scaled data is truncated to integers. That leads to behavior that I consider a bug:

In [48]: bytescale(np.array([5.0001]), cmax=10.0, cmin=0.0, high=1, low=0)
Out[48]: array([0], dtype=uint8)

In [49]: bytescale(np.array([5.001]), cmax=10.0, cmin=0.0, high=1, low=0)
Out[49]: array([1], dtype=uint8)

Under any reasonable convention for handling the points exactly on a quantization boundary, the above results should both be 1.

I suspect the intent of adding 0.4999 was to always round down. In the boundary cases, np.round rounds to the nearest even value, which is probably not what we want:

In [67]: r = np.array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5])

In [68]: r.round()
Out[68]: array([0.,  2.,  2.,  4.,  4.,  6.])

Suppose we do want to always round down. The numpy round/around functions don't provide an option for that behavior. We could change the constant 0.4999 to something even closer to 0.5, e.g.:

In [130]: d = 0.5 - 65*np.finfo(np.float64).eps

In [131]: r = np.array([0.5, 1.5, 2.5, 3.5, 4.5, 253.5, 254.5])

In [132]: (r + d).astype(np.uint8)
Out[132]: array([  0,   1,   2,   3,   4, 253, 254], dtype=uint8)

Or use np.where (or something equivalent, e.g. lazy_where):

In [156]: set_printoptions(formatter=dict(float=lambda t: "%6.2f" % t))

In [157]: x
Out[157]: 
array([  0.00,   0.25,   0.50,   0.75,   1.00,   1.25,   1.50,   1.75,
         2.00, 254.00, 254.25, 254.50, 254.75, 255.00])

In [158]: np.where((x % 1) == 0.5, np.floor(x), np.round(x))
Out[158]: 
array([  0.00,   0.00,   0.00,   1.00,   1.00,   1.00,   1.00,   2.00,
         2.00, 254.00, 254.00, 254.00, 255.00, 255.00])

gdooper · 2016-09-29T04:03:49Z

I like the idea of using astype(uint8) instead of cast[uint8] since this would preserve ndarray subclasses (i.e. bytescale on a masked array would return a masked array). Right now, this isn't the case.

Regarding rounding, what's the rationale for wanting to round down? If we don't want to use numpy's round() function because it implements a 'bankers' rounding, why not add 0.5 and truncate to round up (rather than 0.4999 or some number closer to but still less than 0.5 to round down)?

WarrenWeckesser · 2016-09-29T04:34:15Z

Regarding rounding, what's the rationale for wanting to round down?

I don't know; my interpretation of the intent of adding 0.4999 instead of 0.5 is just a guess. We're now dealing with the fallout of an undocumented magic constant. Grrr.

... why not add 0.5 and truncate to round up ...

If we were starting from scratch, I think that would be a reasonable thing to do. Perhaps it is a reasonable thing to do now, too. It might cause some different results in some cases, but I think we can argue that many of those cases (e.g. the example I gave earlier) were actually incorrect, so we're fixing a bug. But in an example such as bytescale(np.array([0, 0.5, 1]), low=0, high=1), the result changes from array([0, 0, 1]) to array([0, 1, 1]). Can we justify that as the natural consequence of a bug fix? As I write this, I'm leaning towards "Yes, we can", but it would be good to get the opinion of some other devs.

gdooper · 2016-09-29T05:08:19Z

@WarrenWeckesser You also suggested clipping to (0, 255), rather than clipping to (low, high):

scaled8bit = scaled_data.round().clip(0, 255).astype(np.uint8)  # XXX Probably not what we want!

The initial implementation was clipping between (0, high), which I assumed was a mistake. In my mind, clipping between (low, high) seemed like the correct approach, and therefore I wrote a unittest to support that output.

But obviously clipping to (0, 255) yields different results. Which is preferred?

Clipping output to (low, high):

In [4]: bytescale(np.arange(10), cmin=3, cmax=6, low=100, high=200)
Out[4]: array([100, 100, 100, 100, 133, 167, 200, 200, 200, 200], dtype=uint8)

Clipping output to (0, 255):

In [4]: bytescale(np.arange(10), cmin=3, cmax=6, low=100, high=200)
Out[4]: array([  0,  33,  67, 100, 133, 167, 200, 233, 255, 255], dtype=uint8)

WarrenWeckesser · 2016-09-29T16:12:18Z

You also suggested clipping to (0, 255), rather than clipping to (low, high) ...

I didn't think too much about that aspect when I wrote the code. Now I think clipping to (low, high) makes more sense.

By the way, there is currently a check for low <= high. That could be extended to check for 0 <= low <= high <= 255.

Ensure that 0 <= low <= high <= 255

gdooper · 2016-09-30T07:40:46Z

I've added a few more changes, based on the comments provided above. Thanks for your feedback!

WarrenWeckesser · 2016-10-05T22:12:47Z

This looks good to me, and all the tests still pass.

There are some edges cases where this returns a different array. In particular, this is evident in the modified test of bytescale(np.array([0, 1, 2])), which now returns [0, 128, 255] instead of [0, 127, 255]. I see this as a consequence of a bug fix, and not a backwards incompatibility--see my earlier comments. There are probably one or two devs that are more conservative than me concerning backwards compatibility, so I'll hold off on merging this.

rgommers · 2016-10-08T02:20:10Z

Changes look reasonable to me. Before merging would be good to check how this affects imsave (gh-4458 and gh-5305).

gdooper · 2016-10-08T16:58:02Z

So I looked into #4458. The changes made in this PR have very little affect on imresize. The only affect would be due to the rounding changes made.

Some background on #4458: If 'mode' is not specified when calling misc.imresize', thenmisc.imresizecallsmisc.toimagewithmode=None. If 'misc.toimage is called with mode=None, it then calls misc.bytescale, which will output an 8bit image scaled to (0,255).

This PR doesn't change any of that logic. The only affect this PR should have on misc.imresize is slightly different pixel values due to the rounding change introduced here. This is evident on the different mean value of the output image when the input image is not 8bit (e.g. clock3 images below). I've replicated the example given in issue #4458 on both scipy 0.18.1 and this branch, using skimage.data.clock() instead of scipy.misc.lena():

imresize on scipy 0.18.1

In [1]: import scipy

In [2]: scipy.__version__
Out[2]: '0.18.1'

In [3]: from skimage import data

In [4]: clock = data.clock()

In [5]: def stats(img):
    ...:     print 'min', img.min()
    ...:     print 'mean', img.mean()
    ...:     print 'max', img.max()
    ...:     print 'shape', img.shape
    ...:     print 'dtype', img.dtype
    ...:     

In [6]: clock2 = scipy.misc.imresize(clock.astype('uint8'), [1000, 1000])

In [7]: clock3 = scipy.misc.imresize(clock.astype('float32'), [1000, 1000])

In [8]: stats(clock)
min 99
mean 146.331533333
max 247
shape (300, 400)
dtype uint8

In [9]: stats(clock2)
min 99
mean 146.398597
max 247
shape (1000, 1000)
dtype uint8

In [10]: stats(clock3)
min 0
mean 81.600154
max 254
shape (1000, 1000)
dtype uint8

In [11]:

imresize on misc.bytescale-lowhigh branch

In [1]: import scipy

In [2]: scipy.__version__
Out[2]: '0.19.0.dev0+09d22ca'

In [3]: from skimage import data

In [4]: clock = data.clock()

In [5]: def stats(img):
    ...:     print 'min', img.min()
    ...:     print 'mean', img.mean()
    ...:     print 'max', img.max()
    ...:     print 'shape', img.shape
    ...:     print 'dtype', img.dtype
    ...:     

In [6]: clock2 = scipy.misc.imresize(clock.astype('uint8'), [1000, 1000])

In [7]: clock3 = scipy.misc.imresize(clock.astype('float32'), [1000, 1000])

In [8]: stats(clock)
min 99
mean 146.331533333
max 247
shape (300, 400)
dtype uint8

In [9]: stats(clock2)
min 99
mean 146.398597
max 247
shape (1000, 1000)
dtype uint8

In [10]: stats(clock3)
min 0
mean 81.601141
max 254
shape (1000, 1000)
dtype uint8

In [11]:

Also note that the issue identified in #4458 can be easily overcome by simply providing an appropriate mode parameter when calling imresize:

In [20]: stats(scipy.misc.imresize(clock.astype('float32'), [1000, 1000], mode='F'))
min 99.345
mean 146.331
max 246.36
shape (1000, 1000)
dtype float32

gdooper · 2016-10-08T16:59:59Z

We could probably make a change to misc.imresize to check the datatype of the input image and call toimage with the appropriate mode parameter set, but my preference would be to address that in a separate PR.

gdooper · 2016-10-08T17:51:41Z

Regarding #5305, it's pretty much the same situation as #4458. As far as I can tell, this PR should have very little impact on imsave. The only difference would be a slight difference in values due to the changes in rounding.

The main issue with misc.imsave is that it calls misc.toimage' with default parameters. So saving a float32 array to an image withmisc.imsavewill convert the input array down to an 8bit image and scale the data to (0,255). For PNG images, this may be appropriate, but there are many applications where a user would want their data saved as a float32 TIF image. This isn't possible withmisc.imsave`:

imsave using scipy 0.18.1

In [35]: stats(clock.astype('float32'))
min 99.0
mean 146.332
max 247.0
shape (300, 400)
dtype float32

In [36]: scipy.misc.imsave('/tmp/clock_float32.tif', clock.astype('float32'))

In [37]: clock_float32 = scipy.misc.imread('/tmp/clock_float32.tif')

In [38]: stats(clock_float32)
min 0
mean 81.542275
max 255
shape (300, 400)
dtype uint8

In [39]:

imsave using misc.bytescale-lowhigh

In [13]: stats(clock.astype('float32'))
min 99.0
mean 146.332
max 247.0
shape (300, 400)
dtype float32

In [14]: scipy.misc.imsave('/tmp/clock_float32.tif', clock.astype('float32'))

In [15]: clock_float32 = scipy.misc.imread('/tmp/clock_float32.tif')

In [16]: stats(clock_float32)
min 0
mean 81.5432666667
max 255
shape (300, 400)
dtype uint8

In [17]:

So again, this PR doesn't change the behavior of misc.imsave. It does very slightly change the value of some pixels due to the change in rounding though. My recommendation would be to address the imsave issues brought up in #5305 in a different PR. It should be a fairly easy change by providing an appropriate mode parameter to toimage.

rgommers · 2016-10-23T09:21:49Z

We could probably make a change to misc.imresize to check the datatype of the input image and call toimage with the appropriate mode parameter set, but my preference would be to address that in a separate PR.

Agreed. Thanks for the detailed investigation @gdooper! You've pretty much figured out what's needed for those two issues, so if you'd be interested to send follow up PRs for those that would be great.

This is good to go, so merging. Thanks all!

Greg Dooper added 2 commits September 11, 2016 11:44

Add failing test

7990a03

Fix misc.bytescale issue with low/high parameters

609317f

gdooper mentioned this pull request Sep 15, 2016

Bugfix/6552 misc.bytescale #6554

Closed

perimosocordiae reviewed Sep 15, 2016

View reviewed changes

Remove unnecessary term

b6ed678

perimosocordiae changed the title ~~Misc.bytescale lowhigh~~ BUG: misc: allow both cmin/cmax and low/high params in bytescale Sep 17, 2016

perimosocordiae added scipy.misc needs-decision Items that need further discussion before they are merged or closed labels Sep 17, 2016

Preserve ndarray subclass type

165d6a3

Greg Dooper added 2 commits September 28, 2016 22:26

Round up

0b6955f

Use clip function to constrain output values

08dc629

Raise ValueError on invalid low/high params

b7abf2a

Ensure that 0 <= low <= high <= 255

rgommers merged commit b762903 into scipy:master Oct 23, 2016

rgommers added this to the 0.19.0 milestone Oct 23, 2016

rgommers added maintenance Items related to regular maintenance tasks and removed needs-decision Items that need further discussion before they are merged or closed labels Oct 23, 2016

rgommers mentioned this pull request Dec 30, 2016

scipy.misc.imresize changes image range #4458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: misc: allow both cmin/cmax and low/high params in bytescale #6578

BUG: misc: allow both cmin/cmax and low/high params in bytescale #6578

gdooper commented Sep 15, 2016

perimosocordiae left a comment

perimosocordiae Sep 15, 2016

gdooper Sep 16, 2016

perimosocordiae Sep 15, 2016

gdooper Sep 16, 2016

perimosocordiae Sep 16, 2016

rgommers Sep 26, 2016

rgommers commented Sep 26, 2016

WarrenWeckesser commented Sep 26, 2016 •

edited

gdooper commented Sep 29, 2016

WarrenWeckesser commented Sep 29, 2016

gdooper commented Sep 29, 2016

WarrenWeckesser commented Sep 29, 2016

gdooper commented Sep 30, 2016

WarrenWeckesser commented Oct 5, 2016

rgommers commented Oct 8, 2016

gdooper commented Oct 8, 2016

gdooper commented Oct 8, 2016

gdooper commented Oct 8, 2016

rgommers commented Oct 23, 2016

BUG: misc: allow both cmin/cmax and low/high params in bytescale #6578

BUG: misc: allow both cmin/cmax and low/high params in bytescale #6578

Conversation

gdooper commented Sep 15, 2016

perimosocordiae left a comment

Choose a reason for hiding this comment

perimosocordiae Sep 15, 2016

Choose a reason for hiding this comment

gdooper Sep 16, 2016

Choose a reason for hiding this comment

perimosocordiae Sep 15, 2016

Choose a reason for hiding this comment

gdooper Sep 16, 2016

Choose a reason for hiding this comment

perimosocordiae Sep 16, 2016

Choose a reason for hiding this comment

rgommers Sep 26, 2016

Choose a reason for hiding this comment

rgommers commented Sep 26, 2016

WarrenWeckesser commented Sep 26, 2016 • edited

gdooper commented Sep 29, 2016

WarrenWeckesser commented Sep 29, 2016

gdooper commented Sep 29, 2016

WarrenWeckesser commented Sep 29, 2016

gdooper commented Sep 30, 2016

WarrenWeckesser commented Oct 5, 2016

rgommers commented Oct 8, 2016

gdooper commented Oct 8, 2016

imresize on scipy 0.18.1

imresize on misc.bytescale-lowhigh branch

gdooper commented Oct 8, 2016

gdooper commented Oct 8, 2016

imsave using scipy 0.18.1

imsave using misc.bytescale-lowhigh

rgommers commented Oct 23, 2016

WarrenWeckesser commented Sep 26, 2016 •

edited