Type inference failing when using a heapq of namedtuples #7408

jni · 2021-09-16T07:01:28Z

Apologies that this question falls inside a reasonably complex function. I'm hoping the solution is obvious to more experienced Numba users. Also apologies that I am not posting this on the discourse: I am actually getting an error while posting it, "Sorry you cannot post a link to that host". 🤷

I am getting a type inference failure within this function:

https://github.com/jni/platelet-unet-watershed/blob/46b167a035e196abc33a8d5888c8afe077d448a1/plateseg/watershed.py#L91-L151).

Briefly, this is a variant of the watershed algorithm in which the ability to propagate a label to a pixel depends on the direction of propagation. As the watershed fronts propagate, each pixel gets added to a heapq priority queue. The pixels are represented as namedtuples containing the value (the directional propagation "resistance") (float32), the age (how long ago the pixel was added to the queue) (int), the index (np.intp), and the source of the watershed basin (np.intp).

All the elements are the same and have from what I can tell well-defined data sources, but numba is unhappy and raises the following warning:

/Users/jni/projects/platelet-unet-watershed/plateseg/watershed.py:91: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "raveled_affinity_watershed" failed type inference due to: Type of variable 'elem.1' cannot be determined, operation: call $176load_global.0(heap, func=$176load_global.0, args=[Var(heap, watershed.py:111)], kws=(), vararg=None, target=None), location: /Users/jni/projects/platelet-unet-watershed/plateseg/watershed.py (129)

File "plateseg/watershed.py", line 129:
def raveled_affinity_watershed(
    <source elided>
    while len(heap) > 0:
        elem = heappop(heap)
        ^

Things I've tried include:

removing the .age element of the namedtuple
using a regular tuple instead of a named tuple
wrapping age in np.int32()

in all cases the failure seems to be equivalent — numba doesn't know how to sort my (named)tuples of stuff.

Does anyone have any hints of how I could help numba out here? From my reading, it should these days be possible to have a heapq of namedtuples compiled my numba, but maybe I'm missing some key limitations.

Thank you!

The text was updated successfully, but these errors were encountered:

jni · 2021-09-16T07:37:14Z

Ok, I've managed to reproduce the error using a simpler plain watershed. Code here, the symptom is the same:

/Users/jni/projects/play/numba-ws.py:14: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "raveled_watershed" failed type inference due to: Type of variable 'elem.1' cannot be determined, operation: call $102load_global.0(heap, func=$102load_global.0, args=[Var(heap, numba-ws.py:34)], kws=(), vararg=None, target=None), location: /Users/jni/projects/play/numba-ws.py (46)

File "numba-ws.py", line 46:
def raveled_watershed(
    <source elided>
    while len(heap) > 0:
        elem = heappop(heap)
        ^

jni · 2021-09-16T08:13:35Z

Ah! A closer look at the heapq example in the notebooks pointed me to adding this line:

    heap = [Element(image_raveled[0], age, marker_coords[0], marker_coords[0])]
    _ = heappop(heap)

to the top of the function, and the warning goes away! 🎉

... however, the function is still 120x slower than the equivalent Cython function from scikit-image, which looks more or less identical, except using a C/C++ queue. New gist:

https://gist.github.com/jni/e0f3c8d057c13dc6456a53196e6301ea

So this is now a performance question/issue... 😬

gmarkall · 2021-09-16T08:29:36Z

Your latest gist appears to measure the compilation time as well as the execution time. If you do a call to raveled_watershed prior to the timing section, what times do you then see?

See also: https://numba.readthedocs.io/en/stable/user/5minguide.html?highlight=measure#how-to-measure-the-performance-of-numba

jni · 2021-09-16T11:18:22Z

omg, thanks @gmarkall, I forgot that caching is not on by default! 🤦 😅 When I duplicate that code block I get:

 $ python numba-ws.py 
0.845205545425415
0.08204936981201172
0.09456133842468262

🚀 🚀 🚀

After turning on caching, well, there's a bigger overhead than I'd
naively expect to use the cache, but definitely nothing dramatic:

 $ python numba-ws.py 
0.17975640296936035
0.0802457332611084
0.04548001289367676

Now to see whether I can port these improvements to the original problem! 🎉 I'll close this for now, thank you for pointing that out, and I'll circle back if I run into more issues. 🙏

See numba/numba#7408

gmarkall added question Notes an issue as a question needtriage labels Sep 16, 2021

jni closed this as completed Sep 16, 2021

jni added a commit to jni/platelet-unet-watershed that referenced this issue Sep 16, 2021

Add heap type hinting as shown in numba docs

5126cf9

See numba/numba#7408

jni added a commit to jni/platelet-unet-watershed that referenced this issue Sep 17, 2021

Add heap type hinting as shown in numba docs

503b334

See numba/numba#7408

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type inference failing when using a heapq of namedtuples #7408

Type inference failing when using a heapq of namedtuples #7408

jni commented Sep 16, 2021

jni commented Sep 16, 2021

jni commented Sep 16, 2021

gmarkall commented Sep 16, 2021

jni commented Sep 16, 2021

Type inference failing when using a heapq of namedtuples #7408

Type inference failing when using a heapq of namedtuples #7408

Comments

jni commented Sep 16, 2021

jni commented Sep 16, 2021

jni commented Sep 16, 2021

gmarkall commented Sep 16, 2021

jni commented Sep 16, 2021