Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling really large polygons with draw.polygon #5451

Closed
c-arthurs opened this issue Jul 2, 2021 · 5 comments
Closed

Handling really large polygons with draw.polygon #5451

c-arthurs opened this issue Jul 2, 2021 · 5 comments
Labels

Comments

@c-arthurs
Copy link

c-arthurs commented Jul 2, 2021

As I understand it, the main use case for draw.polygon is for use in indexing arrays, as mentioned in the doctoring - May be used to directly index into an array, e.g. img[rr, cc] = 1

I have been doing a project with whole slide images where the polygons are so large that the loop that generates the # output coordinate arrays results in the os killing the script due to memory issues.

For my own use case, I have made an almost duplicate function (clunky solution shown here) where the loop just directly updates the input array image, and rather than returning the lists of coordinates it returns the image. This obviously solved my memory issues and I was wondering if it would be a good idea to add the feature?

There is also the point that other than solving my issue it would surely improve performance, as updating the polygon coordinates directly on the image/array is faster than generating a list of coordinates and then indexing the image to assign a value to the area?

Please let me know if this is at all useful and if so whether it is better placed as a set of optional arguments to _polygon (such as image=None and replace_value=None) or a separate function? If not useful then let me know and I will continue to use my dev branch. Cheers

@c-arthurs c-arthurs changed the title Memory issues with really large polygons when using draw.polygon Handling really large polygons with draw.polygon Jul 2, 2021
@grlee77
Copy link
Contributor

grlee77 commented Jul 3, 2021

Hi @c-arthurs,

It seems reasonable to me to have an function to draw the coordinates into an existing array.

If one wants to populate a dense array as in your use case, then the implementation here seems like it would always be preferable. However, if one only wants only the coordinates and shape is much larger than the extent of the polygon, the original _polygon should use less memory since the size of rr and cc would be much less than the mask size.

p.s.

Your proposed function can be accelerated substantially by changing the bottom part of the function to:

    # make contiguous arrays for r, c coordinates
    cdef cnp.float64_t[::1] rptr = np.ascontiguousarray(r, np.float64)
    cdef cnp.float64_t[::1] cptr = np.ascontiguousarray(c, np.float64)
    cdef cnp.uint8_t[:, ::1] mask_ptr = np.ascontiguousarray(mask, np.uint8)
    cdef Py_ssize_t r_i, c_i

    for r_i in range(minr, maxr+1):
        for c_i in range(minc, maxc+1):
            if point_in_polygon(cptr, rptr, <double>c_i, <double>r_i):
                mask_ptr[c_i, r_i] = True

    return mask

Indexing into the mask_ptr memoryview is much faster than indexing into a Python array. Also, it is helpful to make the range/index variables r_i, c_i as Py_ssize_t and just cast the inputs to point_in_polygon to double.

The change to the type of r_i and c_i can also be done for the original _polygon function, but I think having to expand the lists dynamically still hurts performance in that case.

@c-arthurs
Copy link
Author

Hi @grlee77 , thank you for looking at it and the speedup from your edits is really useful.
I will see if I can get a good enough pull request together

@rfezzani
Copy link
Member

rfezzani commented Jul 8, 2021

First of all, thank you @c-arthurs for your contribution and your interest in skimage!
I am just jumping in the discussion here to notify that I am not really happy with adding draw functions to skimage 😕
I am more on the side of removing/deprecating modules that are not image processing (io, draw, viewer...)
Sorry for my late answer 🙏

@c-arthurs
Copy link
Author

Hi @rfezzani, thanks for the comments, I actually agree with you.

After running a few benchmarks it seems that my implementation of the polygon filling is pretty slow compared to using PIL to do the same thing (PIL - 3.46 ms ± 473 µs per loop, skimage - 2.34 s ± 163 ms). Therefore it is probably best not to needlessly inflate the codebase when it can be done faster elsewhere - especially as adding np_anyint may also increase packaging times.

I will close the issue and am more than happy to reopen it if anyone feels that the function would be a useful addition. I'll leave the pull request open for now until someone decides whether it is worth including or dropping

@grlee77
Copy link
Contributor

grlee77 commented Jul 12, 2021

Thanks @c-arthurs for measuring against Pillow/PIL and reporting back here. Perhaps we should update draw module documentation to point users with large data to other packages for the drawing functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants