New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortcut deforestation and loop fusion for array expressions. #1110

Merged
merged 24 commits into from May 14, 2015

Conversation

Projects
None yet
4 participants
@jriehl
Contributor

jriehl commented May 1, 2015

This pull request builds upon #1092 (and its dependencies) to rewrite individual array operations into a single array expression. The result uses fewer broadcast loops (loop fusion, by virtue of moving two or more operations into the same kernel), and fewer temporaries (shortcut deforestation).

Show outdated Hide outdated numba/npyufunc/array_exprs.py
rewriting those expressions to a single operation that will expand
into something similar to a ufunc call.
'''
_operators = set(npydecl.NumpyRulesArrayOperator._op_map.keys()).union(

This comment has been minimized.

@ovillellas

ovillellas May 13, 2015

Would it be worth it to have this set already built and exported by npydecl? This would prevent relying on underscore members of classes from another submodule and document that it is actually part of the exported interface.

@ovillellas

ovillellas May 13, 2015

Would it be worth it to have this set already built and exported by npydecl? This would prevent relying on underscore members of classes from another submodule and document that it is actually part of the exported interface.

This comment has been minimized.

@jriehl

jriehl May 14, 2015

Contributor

I don't know if it is "worth it", but I think moving that set into npydecl is a good idea.

@jriehl

jriehl May 14, 2015

Contributor

I don't know if it is "worth it", but I think moving that set into npydecl is a good idea.

work_list = list(blocks.items())
while work_list:
key, block = work_list.pop()
matches = rewrite.match(block, pipeline.typemap,

This comment has been minimized.

@ovillellas

ovillellas May 13, 2015

The match/apply split seems a bit artificial to me, in the sense that rewrite will need information already obtained by match (as seen as its sole implementation right now, where instance variables are used to hold that state). Is there any case where it would be useful to call "match" without a matching "apply" which follows immediately after the "if true" check?. This would also reduce a rewrite to be a single function. Note that a "match_and_apply" could notify of a "no rewrite" by returning None as the resulting block.

@ovillellas

ovillellas May 13, 2015

The match/apply split seems a bit artificial to me, in the sense that rewrite will need information already obtained by match (as seen as its sole implementation right now, where instance variables are used to hold that state). Is there any case where it would be useful to call "match" without a matching "apply" which follows immediately after the "if true" check?. This would also reduce a rewrite to be a single function. Note that a "match_and_apply" could notify of a "no rewrite" by returning None as the resulting block.

This comment has been minimized.

@jriehl

jriehl May 14, 2015

Contributor

This division between match/apply is an artifact of how I think about rewriting languages. When using a rewriting DSL, you commonly have a pattern match half, and a replacement term construction half. In the rewriting DSL, there is a namespace that gets built by pattern matching, and this namespace gets passed to the constructor. Here, internal state replaces that namespace, but the pattern still "works". Keeping these separate may also give us more flexibility in the future when composing rewrites.

@jriehl

jriehl May 14, 2015

Contributor

This division between match/apply is an artifact of how I think about rewriting languages. When using a rewriting DSL, you commonly have a pattern match half, and a replacement term construction half. In the rewriting DSL, there is a namespace that gets built by pattern matching, and this namespace gets passed to the constructor. Here, internal state replaces that namespace, but the pattern still "works". Keeping these separate may also give us more flexibility in the future when composing rewrites.

@@ -0,0 +1,349 @@
from __future__ import print_function, division, absolute_import

This comment has been minimized.

@ovillellas

ovillellas May 13, 2015

I wonder if it would make sense a directory for rewrites. I am not sure if I would expect a rewrite in a file named "array_exprs.py" inside npyufunc. It may be early though, but if we start adding rewrites it would make sense having a place for them.

@ovillellas

ovillellas May 13, 2015

I wonder if it would make sense a directory for rewrites. I am not sure if I would expect a rewrite in a file named "array_exprs.py" inside npyufunc. It may be early though, but if we start adding rewrites it would make sense having a place for them.

This comment has been minimized.

@jriehl

jriehl May 14, 2015

Contributor

I do think if we start having a lot of rewrites in fundamental Numba, a separate directory (package) would be appropriate. I was also thinking, however, of array_exprs being an example of a plug-in module that contains rewrites and lowering code all in one place (similar to how numba.npyufunc.dufunc includes its own typing and lowering code). The idea is to improve the user extension story so they can see how they can write their own modules without going into numba.typing, numba.targets, or (in the future) numba.rewrites.

@jriehl

jriehl May 14, 2015

Contributor

I do think if we start having a lot of rewrites in fundamental Numba, a separate directory (package) would be appropriate. I was also thinking, however, of array_exprs being an example of a plug-in module that contains rewrites and lowering code all in one place (similar to how numba.npyufunc.dufunc includes its own typing and lowering code). The idea is to improve the user extension story so they can see how they can write their own modules without going into numba.typing, numba.targets, or (in the future) numba.rewrites.

from .dufunc import DUFunc
@rewrites.register_rewrite

This comment has been minimized.

@pitrou

pitrou May 14, 2015

Member

Since the order of rewrites will probably be significant, I wonder if it's a good idea to register them by decorator. Generated code could easily change without noticing when reordering imports or shuffling code around. What do you think?

@pitrou

pitrou May 14, 2015

Member

Since the order of rewrites will probably be significant, I wonder if it's a good idea to register them by decorator. Generated code could easily change without noticing when reordering imports or shuffling code around. What do you think?

This comment has been minimized.

@jriehl

jriehl May 14, 2015

Contributor

I was going for some symmetry with typing and lowering, but this is a good point that I considered while writing this. My current opinion is that rewrites should be composed into a singular class if they are going to have any interactions. Otherwise, they should not interact with each other. Another possibility is adding optional composition rules to the decorator, when needed.

@jriehl

jriehl May 14, 2015

Contributor

I was going for some symmetry with typing and lowering, but this is a good point that I considered while writing this. My current opinion is that rewrites should be composed into a singular class if they are going to have any interactions. Otherwise, they should not interact with each other. Another possibility is adding optional composition rules to the decorator, when needed.

nb_axy_1 = cres_1.entry_point
control_pipeline2 = RewritesTester.mk_no_rw_pipeline(arg_tys)
cres_2 = control_pipeline2.compile_extra(ax2)

This comment has been minimized.

@pitrou

pitrou May 14, 2015

Member

ax2 has the exact same code as axy. Why is it important to test it?

@pitrou

pitrou May 14, 2015

Member

ax2 has the exact same code as axy. Why is it important to test it?

This comment has been minimized.

@jriehl

jriehl May 14, 2015

Contributor

Just being super paranoid; one of the pipelines might have altered axy, and I wanted a "second opinion" at hand.

@jriehl

jriehl May 14, 2015

Contributor

Just being super paranoid; one of the pipelines might have altered axy, and I wanted a "second opinion" at hand.

Show outdated Hide outdated numba/npyufunc/array_exprs.py
'''When we've found array expressions in a basic block, rewrite that
block, returning a new, transformed block.
'''
if config.DUMP_IR:

This comment has been minimized.

@pitrou

pitrou May 14, 2015

Member

Instead of duplicating this in each rewrite pass, we probably want to put it in the RewriteRegistry, no?

@pitrou

pitrou May 14, 2015

Member

Instead of duplicating this in each rewrite pass, we probably want to put it in the RewriteRegistry, no?

This comment has been minimized.

@jriehl

jriehl May 14, 2015

Contributor

Agreed.

@jriehl

jriehl May 14, 2015

Contributor

Agreed.

target_name = instr.target.name
expr = instr.value
if isinstance(expr, ir.Expr) and isinstance(
typemap.get(target_name, None), types.Array):

This comment has been minimized.

@pitrou

pitrou May 14, 2015

Member

Is there a reason for the typemap lookup to fail? Just wondering.

@pitrou

pitrou May 14, 2015

Member

Is there a reason for the typemap lookup to fail? Just wondering.

This comment has been minimized.

@jriehl

jriehl May 14, 2015

Contributor

Nope. Just more of my seemingly arbitrary defensive coding. (* smirk *)

@jriehl

jriehl May 14, 2015

Contributor

Nope. Just more of my seemingly arbitrary defensive coding. (* smirk *)

@pitrou

This comment has been minimized.

Show comment
Hide comment
@pitrou

pitrou May 14, 2015

Member

This looks good (module comments above). Did you run any benchmarks?

Member

pitrou commented May 14, 2015

This looks good (module comments above). Did you run any benchmarks?

@seibert

This comment has been minimized.

Show comment
Hide comment
@seibert

seibert May 14, 2015

Contributor

I think this is ready to go once a few of the questions above have been answered.

Contributor

seibert commented May 14, 2015

I think this is ready to go once a few of the questions above have been answered.

jriehl added some commits Apr 23, 2015

Fixed some input checks in Pipeline.stage_nopython_rewrites() to not …
…break anything. Added a fallback context, and an initial docstring.
Created lowering stub for array expressions, and added plumbing for i…
…t, modifying the rewrite pass interface a bit.
Moved actual compilation code from BaseContext.compile_internal() to …
…new method, BaseContext.compile_only_no_cache().
Broke the call code generation part of BaseContext.compile_internal()…
… into a new method, BaseContext.call_internal().
Rewrote some of the AST construction in _lower_array_expr() to use as…
…t.parse() instead of worrying over fine-grain differences in the AST across minor version.
Added test for fusing many operators that also included constants. Mo…
…dified rewrites and lowering to properly remove child expressions and deal with constants.
Fixed array expression counting, and decreased likelihood of complex …
…roots in TestArrayExpressions.test_complex_expr().
Fleshed out two common subexpression tests that verify subexpression …
…variables won't be destroyed, and that work isn't duplicated inside the kernels.
Moved configuration-based IR dump out of specific rewrite pass and in…
…to rewrites.RewriteRegistry.apply() per code review comment.
@jriehl

This comment has been minimized.

Show comment
Hide comment
@jriehl

jriehl May 14, 2015

Contributor

Just to make sure we are going faster:

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:import numpy as np; from numba import *
:
:A, B, C = np.random.random(1000), np.random.random(1000) + 1., np.random.random(1000)
:
:def pos_root(As, Bs, Cs):
:    return (-Bs + (((Bs ** 2.) - (4. * As * Cs)) ** 0.5)) / (2. * As)
:
:pos_root_1 = njit(no_rewrites=True)(pos_root)
:pos_root_2 = njit(pos_root)
:
:%timeit pos_root(A, B, C)
:%timeit pos_root_1(A, B, C)
:%timeit pos_root_2(A, B, C)
:--
/home/jriehl/.virtualenvs/numba3/bin/ipython:6: RuntimeWarning: invalid value encountered in sqrt

The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 114 µs per loop
The slowest run took 4513.62 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 260 µs per loop
The slowest run took 46370.71 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 10.8 µs per loop

Just to make sure things stabilize:

In [2]: %timeit pos_root_1(A, B, C)
1000 loops, best of 3: 253 µs per loop

In [3]: %timeit pos_root_2(A, B, C)
100000 loops, best of 3: 10.2 µs per loop
Contributor

jriehl commented May 14, 2015

Just to make sure we are going faster:

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:import numpy as np; from numba import *
:
:A, B, C = np.random.random(1000), np.random.random(1000) + 1., np.random.random(1000)
:
:def pos_root(As, Bs, Cs):
:    return (-Bs + (((Bs ** 2.) - (4. * As * Cs)) ** 0.5)) / (2. * As)
:
:pos_root_1 = njit(no_rewrites=True)(pos_root)
:pos_root_2 = njit(pos_root)
:
:%timeit pos_root(A, B, C)
:%timeit pos_root_1(A, B, C)
:%timeit pos_root_2(A, B, C)
:--
/home/jriehl/.virtualenvs/numba3/bin/ipython:6: RuntimeWarning: invalid value encountered in sqrt

The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 114 µs per loop
The slowest run took 4513.62 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 260 µs per loop
The slowest run took 46370.71 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 10.8 µs per loop

Just to make sure things stabilize:

In [2]: %timeit pos_root_1(A, B, C)
1000 loops, best of 3: 253 µs per loop

In [3]: %timeit pos_root_2(A, B, C)
100000 loops, best of 3: 10.2 µs per loop
@seibert

This comment has been minimized.

Show comment
Hide comment
@seibert

seibert May 14, 2015

Contributor

Nice :)

Contributor

seibert commented May 14, 2015

Nice :)

seibert added a commit that referenced this pull request May 14, 2015

Merge pull request #1110 from jriehl/array_exprs
Shortcut deforestation and loop fusion for array expressions.

@seibert seibert merged commit 1f5aec7 into numba:master May 14, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jriehl jriehl deleted the jriehl:array_exprs branch May 14, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment