Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convolution.convolve() Cython code hoist optimizations #7

Open
jamienoss opened this issue Jan 31, 2018 · 0 comments
Open

convolution.convolve() Cython code hoist optimizations #7

jamienoss opened this issue Jan 31, 2018 · 0 comments

Comments

@jamienoss
Copy link
Owner

jamienoss commented Jan 31, 2018

Hoist index computions

astropy/convolution/boundary_none.pyx::convolve2d_boundary_none()

...
 # Now run the proper convolution
        for i in range(wkx, nx - wkx):
            for j in range(wky, ny - wky):
                top = 0.
                bot = 0.
                for ii in range(i - wkx, i + wkx + 1):
                    for jj in range(j - wky, j + wky + 1):
                        val = f[ii, jj]
                        ker = g[<unsigned int>(nkx - 1 - (wkx + ii - i)),
                                <unsigned int>(nky - 1 - (wky + jj - j))]
                        if not npy_isnan(val):
                            top += val * ker
                            bot += ker
                if normalize_by_kernel:
                    if bot == 0:
                        conv[i, j] = f[i, j]
                    else:
                        conv[i, j] = top / bot
                else:
                    conv[i, j] = top
...

Is (at least) more performant as the foloowing (though this is probably (hopefully) optimized as such by the compiler anyhow)

...
   cdef unsigned int i, j, ii, jj
    cdef unsigned int nkx_minus_1 = nkx-1, nky_minus_1 = nky-1
    cdef unsigned int wkx_minus_i, wky_minus_j
    cdef unsigned int ker_i, ker_j
    cdef unsigned int ny_minus_wky = ny - wky
    cdef unsigned int i_minus_wkx, wkx_plus_1 = wkx + 1
    cdef unsigned int j_minus_wky, wky_plus_1 = wky + 1
    cdef unsigned int i_plus_wkx_plus_1, j_plus_wky_plus_1
    cdef unsigned int nkx_minus_1_minus_wkx_plus_i, nky_minus_1_minus_wky_plus_j

    cdef int iimin, iimax, jjmin, jjmax
    cdef DTYPE_t top, bot, ker, val

    # release the GIL
    with nogil:

        # Now run the proper convolution
        for i in range(wkx, nx - wkx):
             # wkx - 1
            wkx_minus_i = wkx - i
            # i - wkx
            i_minus_wkx = i - wkx
            # i + wkx + 1
            i_plus_wkx_plus_1 = i + wkx_plus_1
            # nkx - 1 - (wkx - i)
            nkx_minus_1_minus_wkx_plus_i = nkx_minus_1 - wkx_minus_i
            for j in range(wky, ny_minus_wky):
                wky_minus_j = wkx - j
                j_minus_wky = j - wky
                j_plus_wky_plus_1 = j + wky_plus_1
                nky_minus_1_minus_wky_plus_j = nky_minus_1 - wky_minus_j

                top = 0.
                bot = 0.
                for ii in range(i_minus_wkx, i_plus_wkx_plus_1):
                    # nkx - 1 - (wkx + ii - i)
                    ker_i = nkx_minus_1_minus_wkx_plus_i - ii
                    for jj in range(j_minus_wky, j_plus_wky_plus_1):
                        ker_j = nky_minus_1_minus_wky_plus_j - jj
                        val = f[ii, jj]
                        ker = g[ker_i, ker_j]
                        if not npy_isnan(val):
                            top += val * ker
                            bot += ker
                if normalize_by_kernel:
                    if bot == 0:
                        conv[i, j] = f[i, j]
                    else:
                        conv[i, j] = top / bot
                else:
                    conv[i, j] = top
    # GIL acquired again here
    return conv
...

Hoist kernel.sum()

c.f. #4

bot += ker (sum of the kernel) is being re-computed for every element in the image. This is unnecessary. Since bot is being reset to 0 for each element, it is highly unlikely that the C compiler is clever enough to optimize this out by hoisting it above the outer most loop.

Hmmm, maybe not. The interpolation scheme seems to prevent this hoist and is the reason that the kernel summation exists in the Cython code. If the image element is NaN then the kernel element is not incorporated in the sum. (Only an issue for NaN interpolation.)

Shortcircuit normalize_by_kernel

c.f. #4

If this was straight C, I would make this function inline and then wrap it with two funcs or by a single with a condition on normalize_by_kernel so as to shortcircuit the internal conditional pre-call. Since the function is now inline, the C compiler would (should) remove the conditional from within the loop altogether. This way the performance increase can be obtained whilst keeping the code in a single place to remove duplication and aid readability plus and maintenance.

@jamienoss jamienoss mentioned this issue Jan 31, 2018
10 tasks
@jamienoss jamienoss changed the title convolution.convolve() core code general perf rewrite convolution.convolve() hoist optimizations Feb 1, 2018
@jamienoss jamienoss changed the title convolution.convolve() hoist optimizations convolution.convolve() Cython code hoist optimizations Feb 1, 2018
Repository owner locked and limited conversation to collaborators Mar 13, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant