Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid temp variable assignments #6575

Merged
merged 25 commits into from Feb 2, 2021
Merged

Conversation

ehsantn
Copy link
Collaborator

@ehsantn ehsantn commented Dec 13, 2020

Avoids temporary variable assignment in bytecode processing if possible. This is possible since the bytecode is stack-based and not possible for temporaries to be reused after pop (@sklam @stuartarchibald are there corner cases to the contrary?).

This has several benefits:

  1. Faster compilation time.
  2. Easier to read IR for compiler developers.
  3. Easier to understand diagnostics info for users (less variable renaming in copy propagation).

Here is a benchmark that demonstrates the compile time benefit:

import numpy as np
import numba

n = 3000
func_text = "def f(A):\n"
func_text += "  a = 1\n"
for i in range(n):
    func_text += "  a += 2\n"
func_text += "  return a\n"

loc_vars = {}
exec(func_text, {"np": np}, loc_vars)
f = loc_vars["f"]
g = numba.njit(parallel=True)(f)
g(1.1)
ss = list(g.get_metadata()[(numba.core.types.float64,)]['pipeline_times'].values())[0]
t = 0
for c, v in sorted(ss.items(), key=lambda a: a[1].run, reverse=True):
    print(c, v.run)
    t += v.run

print("total time:", t)

Output without the change (on a 2019 MacBook Pro):

20_parfor_pass 5.905982993
22_nopython_backend 1.9330777959999992
14_nopython_type_inference 0.9144962719999998
0_translate_bytecode 0.2972127149999999
19_nopython_rewrites 0.19224473500000006
13_reconstruct_ssa 0.17365723700000002
15_annotate_types 0.12848974599999963
7_inline_closure_likes 0.12716022299999996
21_ir_legalization 0.11417557299999892
16_strip_phis 0.056172681999999696
17_inline_overloads 0.0500376229999997
6_generic_rewrites 0.04966603999999997
3_with_lifting 0.04004002899999981
2_ir_processing 0.03994495599999981
18_pre_parfor_pass 0.006993542000000019
4_rewrite_semantic_constants 0.0028603620000000607
12_literal_unroll 0.0023769529999999595
8_make_function_op_code_to_jit_function 0.0022147299999999426
9_inline_inlinables 0.0022072989999999404
11_find_literally 0.0013742650000001522
10_dead_branch_prune 4.756300000008373e-05
5_dead_branch_prune 4.534399999989169e-05
23_dump_parfor_diagnostics 4.564999999345787e-06
1_fixup_args 3.013000000162691e-06
total time: 10.040486256

Output with the change:

20_parfor_pass 4.1957661260000005
22_nopython_backend 2.241625401000001
14_nopython_type_inference 0.8676065420000001
0_translate_bytecode 0.2924158010000002
19_nopython_rewrites 0.14321445799999966
7_inline_closure_likes 0.09993198800000003
13_reconstruct_ssa 0.08409876099999991
21_ir_legalization 0.07250176500000016
15_annotate_types 0.06791784199999995
17_inline_overloads 0.0478961830000002
6_generic_rewrites 0.03699844800000007
16_strip_phis 0.036633089000000396
2_ir_processing 0.03256095300000017
3_with_lifting 0.03200921099999987
18_pre_parfor_pass 0.005553696000000219
4_rewrite_semantic_constants 0.002378401999999946
8_make_function_op_code_to_jit_function 0.0015986740000000221
12_literal_unroll 0.001518659999999894
9_inline_inlinables 0.0014611500000001332
11_find_literally 0.0009793890000000527
10_dead_branch_prune 4.392100000005783e-05
5_dead_branch_prune 3.926100000017918e-05
23_dump_parfor_diagnostics 4.813000000325474e-06
1_fixup_args 2.575999999976375e-06
total time: 8.264757110000003

Parfor pass is significantly impacted since these extra copies fill up copy propagation data structures and slow them down I believe.

@esc
Copy link
Member

esc commented Dec 14, 2020

@ehsantn thanks for submitting this, I have marked it as ready for review. Although, it looks like all the public CI tests failed.

@stuartarchibald stuartarchibald added this to the PR Backlog milestone Dec 14, 2020
@sklam
Copy link
Member

sklam commented Dec 14, 2020

Looking at the CI failures. I am guessing a lot of the IR analysis/transformation has assumed the temp variables to exist.

@stuartarchibald
Copy link
Contributor

With the large amount of statement rewriting needed in the interpreter for python 3.9, might be a good idea to wait until those are in before looking at a change along these lines.

@ehsantn
Copy link
Collaborator Author

ehsantn commented Dec 14, 2020

@sklam The issues in CI seem to be expected and straightforward to work through.
@stuartarchibald sure, I'll just push what I have so far and pause for now.

DrTodd13
DrTodd13 previously approved these changes Dec 15, 2020
Copy link
Collaborator

@DrTodd13 DrTodd13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious the motivation to make these changes. Assuming this passes the tests then the concept behind the changes seems fine.

@ehsantn
Copy link
Collaborator Author

ehsantn commented Dec 15, 2020

The motivation is compilation time and IR readability as in description. The rest of changes are just adapting to the new IR.

@stuartarchibald stuartarchibald added the Effort - long Long size effort needed label Dec 17, 2020
@luk-f-a
Copy link
Contributor

luk-f-a commented Dec 18, 2020

stupid question: the nopython_backend time seems to have gone up. Could this increase compilation time when not using parfor?

@ehsantn
Copy link
Collaborator Author

ehsantn commented Dec 18, 2020

@luk-f-a This can only decrease compilation time even without parfors. The small difference here is probably just random variation in runtime (need to average several runs to see small differences).

Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified the changes. I was also hoping to see some time reduction in LLVM SROA pass but that didn't happen. Nonetheless, I can confirm the overall compile-time improvements.

There is just two minor fixes needed.

  • Please remove the unrelated filealgebra.log.
  • And, I suggested additional comments to help find the reason for a change in typeinfer.

numba/core/typeinfer.py Outdated Show resolved Hide resolved
@sklam sklam added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Jan 29, 2021
@sklam sklam modified the milestones: PR Backlog, Numba 0.53 RC Jan 29, 2021
@sklam
Copy link
Member

sklam commented Jan 29, 2021

Regarding the CI failure:

FAIL: test_common_subexpressions (numba.tests.test_array_exprs.TestArrayExpressions)

it is caused by arrayexpr fusing the subexpression into a monolithic one.

@sklam
Copy link
Member

sklam commented Jan 30, 2021

The arrayexpr test can be fixed by:

diff --git a/numba/np/ufunc/array_exprs.py b/numba/np/ufunc/array_exprs.py
index 8cf5779b3..0b747616d 100644
--- a/numba/np/ufunc/array_exprs.py
+++ b/numba/np/ufunc/array_exprs.py
@@ -158,7 +158,7 @@ class RewriteArrayExprs(rewrites.Rewrite):
             self.array_assigns[instr.target.name] = new_instr
             for operand in self._get_operands(expr):
                 operand_name = operand.name
-                if operand_name in self.array_assigns:
+                if operand.is_temp and operand_name in self.array_assigns:
                     child_assign = self.array_assigns[operand_name]
                     child_expr = child_assign.value
                     child_operands = child_expr.list_vars()

This would retain the original behavior and not fuse array-expressions that are stored in user variable.

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this improvement to code generation. I've manually looked at impact of the changes and it does appear make the IR clearer and more concise. My main concern with this patch as-is is that there are no tests that validate the transforms being made are producing the expected result.

# the same temporary is assigned to multiple variables in cases
# like a = b[i] = 1, so need to handle replaced temporaries in
# later setitem/setattr nodes
if (isinstance(inst, (ir.SetItem, ir.SetItem))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both items in the class_or_tuple arg are the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.

@ehsantn
Copy link
Collaborator Author

ehsantn commented Feb 1, 2021

@sklam @stuartarchibald thanks for the detailed review and suggested changes. I think I incorporated all of them. Let me know if anything else is needed.

@stuartarchibald added a test in ca58e53. This PR also changes some other tests that check the IR structure to expect the new format.

sklam
sklam previously approved these changes Feb 1, 2021
@sklam
Copy link
Member

sklam commented Feb 1, 2021

@stuartarchibald, do you want to verify this?

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch and all the fixes. The change-set looks good.

@stuartarchibald stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on author Waiting for author to respond to review 4 - Waiting on CI Review etc done, waiting for CI to finish labels Feb 2, 2021
@sklam sklam merged commit d938288 into numba:master Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge Effort - long Long size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants