Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster convex_hull_image polygon drawing for 2D images #2928

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ehusby
Copy link

@ehusby ehusby commented Dec 20, 2017

Description

Replaced grid_points_in_poly with calls to skimage.draw.polygon_perimeter and scipy.ndimage.morphology.binary_fill_holes in convex polygon drawing step for a 2D image.
For large 2D images (~10,000 x ~10,000 pixels), this substitution can result in a function-call-to-return speedup of more than 5x (from 23.2 sec to 4.6 sec for the tested image that has about 150 convex hull edges) while producing a convex hull image that is nearly identical to the image created by the current drawing routine. What follows is a comparison of the cProfile results and convex hull images created by the two routines.

Testing script

To test the differences between the two drawing routines, I ran the following script to do a cProfile on the skimage.morphology.convex_hull_image function with and without changes to the "convex_hull.py" file in my skimage package library. Results using the function in its current state (without my changes) are referred to as the latest method, while results using the function in its faster state (with my changes) are referred to as the faster method. Additionally, I tested both the offset_coordinates=True and offset_coordinates=False options to convex_hull_image.

import cProfile
import numpy as np
import tifffile
from PIL.Image import frombytes

from skimage.morphology import convex_hull_image


test_dir = 'C:/Users/husby036/Documents/Cprojects/test_s2s/testFiles/'


def saveImage(array, path):
    if array.dtype == np.bool:
        img = frombytes(mode='1', size=array.shape[::-1], data=np.packbits(array, axis=1))
        img.save(path)
    else:
        tifffile.imsave(path, array)
    print "'{}' saved".format(path)


function = 'convex_hull_image(image, offset_coordinates=True)'

chull_type = 'latest'

image = tifffile.imread(test_dir+'image.tif')
print("{} [{}]:\n".format(function, chull_type))
cProfile.run('image_chull = {}'.format(function))
saveImage(image_chull, test_dir+'chull_{}.tif'.format(chull_type))

debug_mask = np.zeros(image.shape, dtype=np.int8)
debug_mask[image_chull] = 1
debug_mask[image] += 2
saveImage(debug_mask, test_dir+'chull_{}_inspection.tif'.format(chull_type))

For my primary test case, I chose the following 10341 x 11441 px source image:

image_large

In this figure, yellow pixels are one/True/data pixels while purple pixels are zero/False/nodata pixels.

offset_coordinates=True

cProfile results

latest: offset_latest_cprof.txt

201 function calls in 23.216 seconds

faster: offset_faster_cprof.txt

908 function calls (907 primitive calls) in 4.576 seconds

We see a speedup of about 5x for this large input array.

Result image comparison

offset_diff

Histograms show counts of pixel values with the pixel value corresponding to each histogram bar taken to be the x-axis value at the left end of the bar.

In the upper set of images, yellow pixels (value=1) are part of the convex hull while purple pixels (value=0) are not.
The lower left image titled "Difference" is an image obtained by subtracting the upper left image from the upper right image (faster minus latest). Therefore, yellow pixels (value=1) are hull pixels in faster but not hull pixels in latest, and vice-versa for purple pixels (value=-1).
The lower right image titled "Boolean Difference" is to help spot where differences in the images are located; yellow pixels (value=1) are where differences occur.

The "Difference" figure zoomed to red box:

offset_diff_zoom

In the "Difference" figure, yellow and purple rectangles show one-pixel differences in the border width of the drawn hull. For the offset_coordinates=True option, we assume latest is most correct and want to minimize the total number of blue pixels. The total number of blue pixels is is approximately 10, which is good. Since there are approximately 10,000 yellow pixels in the "Difference" figure, we see that faster almost always draws a thicker polygon border than latest.

For closer inspection of the drawing routines in relation to the source image, the following "_inspection" images were created as shown in the last lines of the above testing script.
In "*_inspection" images, yellow pixels (value=3) are pixels that are both one/True/data in the source image and are part of the drawn convex hull, light blue pixels (value=1) are pixels that are zero/False/nodata in the source image and are part of the drawn convex hull, purple pixels (value=0) are neither of these things, and green pixels (value=2) are errors -- pixels that are one/True/data in the source image but are not part of the drawn convex hull.

offset_latest_inspect
offset_faster_inspect

Both latest and faster inspection images zoomed to the red box (only shown in the former):

offset_latest_inspect_zoom
offset_faster_inspect_zoom

We see that neither drawing routine (for offset_coordinates=True, but not so much for the opposite option, as seen later) has any green error pixels, which is good. We also see that, in general, faster is a bit more loose on drawing the convex hull border than latest, accounting for the significant number of pixels that are hull pixels in faster but not hull pixels in latest, as seen earlier.

offset_coordinates=False

cProfile results

latest: nooffset_latest_cprof.txt

197 function calls in 22.974 seconds

faster: nooffset_faster_cprof.txt

896 function calls (895 primitive calls) in 4.403 seconds

Result image comparison

nooffset_diff

The "Difference" figure zoomed to red box:

nooffset_diff_zoom

nooffset_latest_inspect
nooffset_faster_inspect

Both latest and faster inspection images zoomed to the red box (only shown in the former):

nooffset_latest_inspect_zoom
nooffset_faster_inspect_zoom

For offset_coordinates=False, we see that latest has a small but significant number of "green error pixels" (as mentioned above). While it's not obvious to me why exactly faster does not have any green error pixels, this is not unexpected as the method has been shown to draw a looser convex hull border than latest, allowing it to retain all edge one/True/data pixels in the source image as part of the convex hull.

Update with smaller secondary test image

In response to @soupault's comment, I've done what I should've done at first and compared the two drawing routines on a small image that's also a test case referenced in the documentation for scipy.morphology.convex_hull_image.

Here's the source image:

triangle_image

offset_coordinates=True

cProfile results

latest: tri_offset_latest_cprof.txt

201 function calls in 0.007 seconds

faster: tri_offset_faster_cprof.txt

428 function calls (427 primitive calls) in 0.008 seconds

We see that for a small input array (for which there tend to be less convex hull edges), there is little difference in drawing time.

Result image comparison

tri_offset_diff

This test case makes clear that the main difference in the result images is that faster (for which skimage.draw.polygon_perimeter is responsible) draws a thicker border of the convex hull than latest (for which grid_points_in_poly is responsible).

offset_coordinates=False

cProfile results

latest: tri_nooffset_latest_cprof.txt

197 function calls in 0.006 seconds

faster: tri_nooffset_faster_cprof.txt

408 function calls (407 primitive calls) in 0.007 seconds

Result image comparison

tri_nooffset_diff

I was actually only planning on running the offset_coordinates=True option for this small image test case, but I had a hunch that I would see something interesting if I tried offset_coordinates=False as well! With this option, we see that faster gets the correct result! What logically followed was to do a comparison of (latest, offset_coordinates=True) with (faster, offset_coordinates=False) for the large primary test image:

mix_diff

Only 27 pixels across the two images differ!

Conclusion

With these new results, the fact that the (faster, offset_coordinates=False) combination has been shown to have no "green error pixels" for both image size extremes, and that this combination creates the correct convex hull image for the small image test case, I see a great benefit of integrating the drawing routine from faster into convex_hull_image without the need for offsetting coordinates in 2D.

Checklist

  • Consider making the faster drawing routine an optional argument to convex_hull_image.

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

Replaced `grid_points_in_poly` with calls to `skimage.draw.polygon_perimeter` and `scipy.ndimage.morphology.binary_fill_holes` in convex polygon drawing step for a 2D image.

For large 2D images (~10,000 x ~10,000 pixels), this substitution can result in a function-call-to-return speedup of more than 5x (from 23.0 sec to 4.4 sec for a particular image with about 150 convex hull edges) while producing a convex hull image that is nearly identical to the image created by the current drawing routine.  In following comments, I will compare the two results of these two routines.
@pep8speaks
Copy link

pep8speaks commented Dec 20, 2017

Hello @husby036! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 01, 2018 at 04:11 Hours UTC

@ehusby ehusby changed the title Faster convex polygon drawing for 2D images Faster convex_hull_image polygon drawing for 2D images Dec 20, 2017
@@ -2,7 +2,8 @@
from itertools import product
import numpy as np
from scipy.spatial import ConvexHull
from ..measure.pnpoly import grid_points_in_poly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make changes to grid_points_in_poly instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which functions use that method, but if they could benefit from this speedup for drawing polygons with many edges (though I can't say at how many edges you start seeing the speedup), applying the changes there instead could work well.

@soupault
Copy link
Member

This PR description is just perfect 🚒 .
From my standpoint, the proposed modifications should be considered as an enhancement, and, therefore, scheduled for the integration.

@husby036 do you see any downsides of the faster method? Maybe some cases where it fails or underperforms in comparison with the latest?

@ehusby
Copy link
Author

ehusby commented Dec 21, 2017

@soupault Thank you! It's my first PR for a public repo, so I tried to put in my best effort.

I've updated the description with a test of a smaller image for comparison. The results are quite interesting -- I encourage you to take a look!

In short, I found (faster, offset_coordinates=False) actually creates a better result image than (faster, offset_coordinates=True).
(faster, offset_coordinates=False) produces the same result as (latest, offset_coordinates=True) (the ideal setting for latest) in the small image test case, while differing by 27 pixels in the large image test case.

Without further testing, I believe the improved drawing routine should produce an image close enough to the ideal image in most cases to make the speedup benefit worth the slight chance of drawing a convex hull border that is 1-px too wide (from the ideal setting for latest) in some cases.

@jni
Copy link
Member

jni commented Dec 25, 2017

Well, my mind is suitably boggled. I unfortunately won't have time to do a detailed review until late January (Merry Christmas, everyone! ;) but:

a) the performance benefit is substantial enough that I agree with @soupault that this should definitely be included one way or another very soon, but
b) we need to be really careful about the deprecation path. These single pixel changes may seem inconsequential in the overall scheme of things, but various downstream packages may depend on these things for their testing. In particular, we need to check that this new method gives correct results other than some floating point approximation errors. See these comments in #2847 for what kind of differences are acceptable or not. In those comments, I downplay the effect of single pixel changes, but I do that in favour of not changing things. When we do want to change the output, we must be much more careful.

@ehusby
Copy link
Author

ehusby commented Dec 25, 2017

@jni Thank you for your input, and Merry Christmas to you!

I agree 100% that the deprecation path must be analyzed in a stronger fashion than simply comparing the result images pixel-by-pixel. When I have more time, I would take a look at the differences between the grid_points_in_poly and polygon_perimeter methods to better understand the fine differences between the two drawing routines and the implications of making this substitution.

Pardon me for wanting to get the word out there that a speedup is possible, as it's greatly benefited me already. :) I'm actually using this faster polygon drawing routine for the processing of satellite imagery in a con**cave**_hull_image algorithm I recently developed! I hope to add this function to the scikit-image package in the near future. :D

@jni
Copy link
Member

jni commented Dec 25, 2017

Pardon me for wanting to get the word out there that a speedup is possible, as it's greatly benefited me already. :)

Oh we are very happy to see this!

I hope to add this function to the scikit-image package in the near future. :D

Sounds great!

@stefanv
Copy link
Member

stefanv commented Dec 25, 2017

Also see #931

@soupault soupault modified the milestones: 0.14.1, 0.14 Apr 27, 2018
@jni
Copy link
Member

jni commented May 1, 2018

Hi @husby036 et al!

Sorry for the long delay in getting back to this.

For a start, some of the tests are failing with index-out-of-bounds errors:
https://travis-ci.org/scikit-image/scikit-image/jobs/319439828#L4341

Others are failing due to the pixel-wise difference between this PR and the expected output, as we expected.

So, my suggestion:

  • fix the errors. ;)
  • add a keyword argument to convex_hull_image. Naming suggestion: fast_drawing=None, which gets converted to False by default in 0.14, and True from 0.16 onwards. See the deprecation cycle guidelines from the contributor guide. After 0.16 we can deprecate the argument altogether.
  • add tests that show the correct (pixel-wise) behaviour for fast_drawing=True, and pass fast_drawing=False for existing tests.

@husby036 are you up for this? Do you know how to update a PR? Let us know if you need guidance.

@ehusby
Copy link
Author

ehusby commented May 1, 2018

Hi again, @jni!

No worries about the delay. I've been plowing through MATLAB code at my work since then... (has it really been 4 months? oh no) This is a good excuse for a break. ;)

I am up for making the necessary changes. Thank you for providing the link to the Travis CI build log -- I didn't know I could see that!

My hunch is that the current offset_coordinates keyword argument is causing most (if not all) of the index-out-of-bounds errors, given that the skimage.draw.polygon_perimeter method in faster that replaces the polygon-drawing functionality of grid_points_in_poly in latest essentially produces the desired coordinate offset in 2D; Giving offset_coordinates=True to faster essentially dilates the resulting convex polygon from what is desired, which will surely cause the array bounds error if the polygon is to share an edge with the border of the image.

I'll do more testing soon. Still learning the whole GitHub thing, but now that I've found the nice documentation on how to run the automated testing for my modifications to skimage locally, I think I can figure it out. :)

@soupault soupault modified the milestones: 0.14, 0.15 May 23, 2018
@soupault soupault modified the milestones: 0.15, 0.16 Apr 20, 2019
@lmmx
Copy link

lmmx commented Nov 29, 2020

I picked up this issue on the SciPy sprint earlier in the year and figured out that the source of the IndexError (line 4322 of the logs) at:

mask[hull_perim_r, hull_perim_c] = True

...is occurring due to the coords being modified indiscriminately by the offsets assigned from _offsets_diamond on line 123 of skimage.morphology.convex_hull:

offsets = _offsets_diamond(image.ndim)
coords = (coords[:, np.newaxis, :] + offsets).reshape(-1, ndim)

To explain:

  • hull_perim_r, hull_perim_c come from polygon_perimeter(vertices[:, 0], vertices[:, 1])
  • vertices comes from hull.points[hull.vertices] (which is just a fancy indexing selecting a subset of hull.points)
  • hull.points comes from ConvexHull(coords)
  • coords were, as I said above, modified from integer to half-pixel floats by addition with the half-pixel offsets
    • this results in some becoming invalid (i.e. outside the image and hence also outside the mask, hence the IndexError)

I propose that we replace line 124 with the following:

coords = apply_partial_offsets(img, coords, offsets)

Using the following function:

def apply_partial_offsets(img, coords, offsets, retain_original_points=True):
    """
    Apply the offsets only to the non-edge pixels, along with the trivial zero-offset
    if `retain_original_points` is True (default: True, recommended due to corner loss).
    """
    if retain_original_points:
        # Insert the trivial offset of [0., 0.] into `offsets`
        offsets = np.insert(offsets, 0, 0., axis=0)
    row_max, col_max = np.subtract(img.shape, 1)
    # bool masks for the subsets of `coords` including each edge (one edge at a time)
    edge_t, edge_b = [coords[:,0] == lim for lim in (0, row_max)]
    edge_l, edge_r = [coords[:,1] == lim for lim in (0, col_max)]
    edge_includers = [edge_t, edge_b, edge_l, edge_r]
    if retain_original_points:
        dummy_edge = np.zeros_like(edge_t, dtype=bool) # False so offset always applied
        edge_includers.insert(0, dummy_edge)
    offset_mask = np.invert(edge_includers).T
    offset_idx = np.argwhere(offset_mask.ravel()).ravel()
    coords = (coords[:, np.newaxis, :] + offsets).reshape(-1, img.ndim)[offset_idx]
    return coords

Notice that the penultimate line of the function essentially just repeats the original line but then filters out the indexes at which the offsets should be masked because the coordinates at that position in coords were at an edge (so the top edge pixels should not be offset by the positive row offset [0.5, 0.], etc. as this would increment the row coordinate from 9 to 9.5 and that rounds to 10 hence the IndexError on an image with just 10 rows as in the test case).

The dummy_edge is just an all-False pseudo-edge, which corresponds to True in the offset_mask at the positions corresponding to the 'trivial' zero offset, i.e. it's used to ensure the input coords are also in the output coords. I found that without this the corner pixel is missing in the hull (as shown by plotting the mask returned by bugfix in this demo program):

It should probably only be used in the case of the 2D convex hull, so maybe we need to add an if statement like that at line 110?

Can anyone review these suggestions before I submit a PR? 😃

This is my first contribution to scikit-image so please let me know what else I can do for the PR.

I rewrote the above function for clarity a few times, please let me know if it isn't the proper skimage coding style (e.g. .ravel() could equally be replaced by .reshape(-1))

@ehusby
Copy link
Author

ehusby commented Nov 29, 2020

Awesome, awesome work @lmmx! Thank you for taking up the mantle on this upgrade after I let it get swept under the rug. I don't know if I can be of any assistance as I think you have a good handle on the situation now, but please let me know if I can!

As I alluded to in an earlier comment, my main project of 2017 was writing an algorithm to produce a 2D concave hull image for processing of large satellite images. I might need to ask for your help to test that when the time comes to submit it to scikit-image! ;)

@lmmx lmmx mentioned this pull request Dec 23, 2020
8 tasks
Base automatically changed from master to main February 18, 2021 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants