Shortcut deforestation and loop fusion for array expressions. #1110
This pull request builds upon #1092 (and its dependencies) to rewrite individual array operations into a single array expression. The result uses fewer broadcast loops (loop fusion, by virtue of moving two or more operations into the same kernel), and fewer temporaries (shortcut deforestation).
…array expression rewrites.
…break anything. Added a fallback context, and an initial docstring.
…t, modifying the rewrite pass interface a bit.
…ted working on array expression lowering.
… into a new method, BaseContext.call_internal().
…t.parse() instead of worrying over fine-grain differences in the AST across minor version.
…dified rewrites and lowering to properly remove child expressions and deal with constants.
…roots in TestArrayExpressions.test_complex_expr().
…variables won't be destroyed, and that work isn't duplicated inside the kernels.
… test for the added functionality.
… and added a unit test.
…rs per Oscar's recommendation.
Just to make sure we are going faster:
In : %cpaste Pasting code; enter '--' alone on the line to stop or use Ctrl-D. :import numpy as np; from numba import * : :A, B, C = np.random.random(1000), np.random.random(1000) + 1., np.random.random(1000) : :def pos_root(As, Bs, Cs): : return (-Bs + (((Bs ** 2.) - (4. * As * Cs)) ** 0.5)) / (2. * As) : :pos_root_1 = njit(no_rewrites=True)(pos_root) :pos_root_2 = njit(pos_root) : :%timeit pos_root(A, B, C) :%timeit pos_root_1(A, B, C) :%timeit pos_root_2(A, B, C) :-- /home/jriehl/.virtualenvs/numba3/bin/ipython:6: RuntimeWarning: invalid value encountered in sqrt The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached 10000 loops, best of 3: 114 µs per loop The slowest run took 4513.62 times longer than the fastest. This could mean that an intermediate result is being cached 1 loops, best of 3: 260 µs per loop The slowest run took 46370.71 times longer than the fastest. This could mean that an intermediate result is being cached 1 loops, best of 3: 10.8 µs per loop
Just to make sure things stabilize:
In : %timeit pos_root_1(A, B, C) 1000 loops, best of 3: 253 µs per loop In : %timeit pos_root_2(A, B, C) 100000 loops, best of 3: 10.2 µs per loop