Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type inference failing when using a heapq of namedtuples #7408

Closed
jni opened this issue Sep 16, 2021 · 4 comments
Closed

Type inference failing when using a heapq of namedtuples #7408

jni opened this issue Sep 16, 2021 · 4 comments
Labels
needtriage question Notes an issue as a question

Comments

@jni
Copy link
Contributor

jni commented Sep 16, 2021

Apologies that this question falls inside a reasonably complex function. I'm hoping the solution is obvious to more experienced Numba users. Also apologies that I am not posting this on the discourse: I am actually getting an error while posting it, "Sorry you cannot post a link to that host". 🤷

Screen Shot 2021-09-16 at 4 59 49 pm

I am getting a type inference failure within this function:

https://github.com/jni/platelet-unet-watershed/blob/46b167a035e196abc33a8d5888c8afe077d448a1/plateseg/watershed.py#L91-L151).

Briefly, this is a variant of the watershed algorithm in which the ability to propagate a label to a pixel depends on the direction of propagation. As the watershed fronts propagate, each pixel gets added to a heapq priority queue. The pixels are represented as namedtuples containing the value (the directional propagation "resistance") (float32), the age (how long ago the pixel was added to the queue) (int), the index (np.intp), and the source of the watershed basin (np.intp).

All the elements are the same and have from what I can tell well-defined data sources, but numba is unhappy and raises the following warning:

/Users/jni/projects/platelet-unet-watershed/plateseg/watershed.py:91: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "raveled_affinity_watershed" failed type inference due to: Type of variable 'elem.1' cannot be determined, operation: call $176load_global.0(heap, func=$176load_global.0, args=[Var(heap, watershed.py:111)], kws=(), vararg=None, target=None), location: /Users/jni/projects/platelet-unet-watershed/plateseg/watershed.py (129)

File "plateseg/watershed.py", line 129:
def raveled_affinity_watershed(
    <source elided>
    while len(heap) > 0:
        elem = heappop(heap)
        ^

Things I've tried include:

  • removing the .age element of the namedtuple
  • using a regular tuple instead of a named tuple
  • wrapping age in np.int32()

in all cases the failure seems to be equivalent — numba doesn't know how to sort my (named)tuples of stuff.

Does anyone have any hints of how I could help numba out here? From my reading, it should these days be possible to have a heapq of namedtuples compiled my numba, but maybe I'm missing some key limitations.

Thank you!

@jni
Copy link
Contributor Author

jni commented Sep 16, 2021

Ok, I've managed to reproduce the error using a simpler plain watershed. Code here, the symptom is the same:

/Users/jni/projects/play/numba-ws.py:14: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "raveled_watershed" failed type inference due to: Type of variable 'elem.1' cannot be determined, operation: call $102load_global.0(heap, func=$102load_global.0, args=[Var(heap, numba-ws.py:34)], kws=(), vararg=None, target=None), location: /Users/jni/projects/play/numba-ws.py (46)

File "numba-ws.py", line 46:
def raveled_watershed(
    <source elided>
    while len(heap) > 0:
        elem = heappop(heap)
        ^

@jni
Copy link
Contributor Author

jni commented Sep 16, 2021

Ah! A closer look at the heapq example in the notebooks pointed me to adding this line:

    heap = [Element(image_raveled[0], age, marker_coords[0], marker_coords[0])]
    _ = heappop(heap)

to the top of the function, and the warning goes away! 🎉

... however, the function is still 120x slower than the equivalent Cython function from scikit-image, which looks more or less identical, except using a C/C++ queue. New gist:

https://gist.github.com/jni/e0f3c8d057c13dc6456a53196e6301ea

So this is now a performance question/issue... 😬

@gmarkall
Copy link
Member

Your latest gist appears to measure the compilation time as well as the execution time. If you do a call to raveled_watershed prior to the timing section, what times do you then see?

See also: https://numba.readthedocs.io/en/stable/user/5minguide.html?highlight=measure#how-to-measure-the-performance-of-numba

@gmarkall gmarkall added question Notes an issue as a question needtriage labels Sep 16, 2021
@jni
Copy link
Contributor Author

jni commented Sep 16, 2021

omg, thanks @gmarkall, I forgot that caching is not on by default! 🤦 😅 When I duplicate that code block I get:

 $ python numba-ws.py 
0.845205545425415
0.08204936981201172
0.09456133842468262

🚀 🚀 🚀

After turning on caching, well, there's a bigger overhead than I'd
naively expect to use the cache, but definitely nothing dramatic:

 $ python numba-ws.py 
0.17975640296936035
0.0802457332611084
0.04548001289367676

Now to see whether I can port these improvements to the original problem! 🎉 I'll close this for now, thank you for pointing that out, and I'll circle back if I run into more issues. 🙏

@jni jni closed this as completed Sep 16, 2021
jni added a commit to jni/platelet-unet-watershed that referenced this issue Sep 16, 2021
jni added a commit to jni/platelet-unet-watershed that referenced this issue Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needtriage question Notes an issue as a question
Projects
None yet
Development

No branches or pull requests

2 participants