Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortcut deforestation and loop fusion for array expressions. #1110

Merged
merged 24 commits into from May 14, 2015

Conversation

@jriehl
Copy link
Contributor

@jriehl jriehl commented May 1, 2015

This pull request builds upon #1092 (and its dependencies) to rewrite individual array operations into a single array expression. The result uses fewer broadcast loops (loop fusion, by virtue of moving two or more operations into the same kernel), and fewer temporaries (shortcut deforestation).

@jriehl jriehl force-pushed the jriehl:array_exprs branch 6 times, most recently from ef761c2 to f771fc2 May 4, 2015
@jriehl jriehl force-pushed the jriehl:array_exprs branch from eba9ca0 to 4052391 May 12, 2015
rewriting those expressions to a single operation that will expand
into something similar to a ufunc call.
'''
_operators = set(npydecl.NumpyRulesArrayOperator._op_map.keys()).union(

This comment has been minimized.

@ovillellas

ovillellas May 13, 2015

Would it be worth it to have this set already built and exported by npydecl? This would prevent relying on underscore members of classes from another submodule and document that it is actually part of the exported interface.

This comment has been minimized.

@jriehl

jriehl May 14, 2015
Author Contributor

I don't know if it is "worth it", but I think moving that set into npydecl is a good idea.

work_list = list(blocks.items())
while work_list:
key, block = work_list.pop()
matches = rewrite.match(block, pipeline.typemap,

This comment has been minimized.

@ovillellas

ovillellas May 13, 2015

The match/apply split seems a bit artificial to me, in the sense that rewrite will need information already obtained by match (as seen as its sole implementation right now, where instance variables are used to hold that state). Is there any case where it would be useful to call "match" without a matching "apply" which follows immediately after the "if true" check?. This would also reduce a rewrite to be a single function. Note that a "match_and_apply" could notify of a "no rewrite" by returning None as the resulting block.

This comment has been minimized.

@jriehl

jriehl May 14, 2015
Author Contributor

This division between match/apply is an artifact of how I think about rewriting languages. When using a rewriting DSL, you commonly have a pattern match half, and a replacement term construction half. In the rewriting DSL, there is a namespace that gets built by pattern matching, and this namespace gets passed to the constructor. Here, internal state replaces that namespace, but the pattern still "works". Keeping these separate may also give us more flexibility in the future when composing rewrites.

@@ -0,0 +1,349 @@
from __future__ import print_function, division, absolute_import

This comment has been minimized.

@ovillellas

ovillellas May 13, 2015

I wonder if it would make sense a directory for rewrites. I am not sure if I would expect a rewrite in a file named "array_exprs.py" inside npyufunc. It may be early though, but if we start adding rewrites it would make sense having a place for them.

This comment has been minimized.

@jriehl

jriehl May 14, 2015
Author Contributor

I do think if we start having a lot of rewrites in fundamental Numba, a separate directory (package) would be appropriate. I was also thinking, however, of array_exprs being an example of a plug-in module that contains rewrites and lowering code all in one place (similar to how numba.npyufunc.dufunc includes its own typing and lowering code). The idea is to improve the user extension story so they can see how they can write their own modules without going into numba.typing, numba.targets, or (in the future) numba.rewrites.

@jriehl jriehl force-pushed the jriehl:array_exprs branch from af0706b to d8cf84f May 13, 2015
from .dufunc import DUFunc


@rewrites.register_rewrite

This comment has been minimized.

@pitrou

pitrou May 14, 2015
Contributor

Since the order of rewrites will probably be significant, I wonder if it's a good idea to register them by decorator. Generated code could easily change without noticing when reordering imports or shuffling code around. What do you think?

This comment has been minimized.

@jriehl

jriehl May 14, 2015
Author Contributor

I was going for some symmetry with typing and lowering, but this is a good point that I considered while writing this. My current opinion is that rewrites should be composed into a singular class if they are going to have any interactions. Otherwise, they should not interact with each other. Another possibility is adding optional composition rules to the decorator, when needed.

nb_axy_1 = cres_1.entry_point

control_pipeline2 = RewritesTester.mk_no_rw_pipeline(arg_tys)
cres_2 = control_pipeline2.compile_extra(ax2)

This comment has been minimized.

@pitrou

pitrou May 14, 2015
Contributor

ax2 has the exact same code as axy. Why is it important to test it?

This comment has been minimized.

@jriehl

jriehl May 14, 2015
Author Contributor

Just being super paranoid; one of the pipelines might have altered axy, and I wanted a "second opinion" at hand.

'''When we've found array expressions in a basic block, rewrite that
block, returning a new, transformed block.
'''
if config.DUMP_IR:

This comment has been minimized.

@pitrou

pitrou May 14, 2015
Contributor

Instead of duplicating this in each rewrite pass, we probably want to put it in the RewriteRegistry, no?

This comment has been minimized.

@jriehl

jriehl May 14, 2015
Author Contributor

Agreed.

target_name = instr.target.name
expr = instr.value
if isinstance(expr, ir.Expr) and isinstance(
typemap.get(target_name, None), types.Array):

This comment has been minimized.

@pitrou

pitrou May 14, 2015
Contributor

Is there a reason for the typemap lookup to fail? Just wondering.

This comment has been minimized.

@jriehl

jriehl May 14, 2015
Author Contributor

Nope. Just more of my seemingly arbitrary defensive coding. (* smirk *)

@pitrou
Copy link
Contributor

@pitrou pitrou commented May 14, 2015

This looks good (module comments above). Did you run any benchmarks?

@seibert
Copy link
Contributor

@seibert seibert commented May 14, 2015

I think this is ready to go once a few of the questions above have been answered.

Jon Riehl added 17 commits Apr 29, 2015
… into a new method, BaseContext.call_internal().
…t.parse() instead of worrying over fine-grain differences in the AST across minor version.
…dified rewrites and lowering to properly remove child expressions and deal with constants.
…roots in TestArrayExpressions.test_complex_expr().
…variables won't be destroyed, and that work isn't duplicated inside the kernels.
…to rewrites.RewriteRegistry.apply() per code review comment.
@jriehl jriehl force-pushed the jriehl:array_exprs branch from d8cf84f to 9587f80 May 14, 2015
@jriehl
Copy link
Contributor Author

@jriehl jriehl commented May 14, 2015

Just to make sure we are going faster:

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:import numpy as np; from numba import *
:
:A, B, C = np.random.random(1000), np.random.random(1000) + 1., np.random.random(1000)
:
:def pos_root(As, Bs, Cs):
:    return (-Bs + (((Bs ** 2.) - (4. * As * Cs)) ** 0.5)) / (2. * As)
:
:pos_root_1 = njit(no_rewrites=True)(pos_root)
:pos_root_2 = njit(pos_root)
:
:%timeit pos_root(A, B, C)
:%timeit pos_root_1(A, B, C)
:%timeit pos_root_2(A, B, C)
:--
/home/jriehl/.virtualenvs/numba3/bin/ipython:6: RuntimeWarning: invalid value encountered in sqrt

The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 114 µs per loop
The slowest run took 4513.62 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 260 µs per loop
The slowest run took 46370.71 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 10.8 µs per loop

Just to make sure things stabilize:

In [2]: %timeit pos_root_1(A, B, C)
1000 loops, best of 3: 253 µs per loop

In [3]: %timeit pos_root_2(A, B, C)
100000 loops, best of 3: 10.2 µs per loop
@seibert
Copy link
Contributor

@seibert seibert commented May 14, 2015

Nice :)

seibert added a commit that referenced this pull request May 14, 2015
Shortcut deforestation and loop fusion for array expressions.
@seibert seibert merged commit 1f5aec7 into numba:master May 14, 2015
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jriehl jriehl deleted the jriehl:array_exprs branch May 14, 2015
@tlienart tlienart mentioned this pull request Mar 27, 2020
0 of 113 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants