New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid recompiling inlined overload impl #6856
Conversation
@stuartarchibald, tagged you for review since you are more familiar with this part of the code base |
@ehsantn @farahhariri maybe it would be good to add a reproducer or a benchmark to make it easier for reviewers to verify? |
Below is a benchmark that requires installing bodo 2021.2: import numpy as np
import pandas as pd
import bodo
import numba
n = 10
func_text = "def f(A):\n"
func_text += " a = 1\n"
for i in range(n):
func_text += " a += A.sum()\n"
func_text += " return a\n"
loc_vars = {}
exec(func_text, {"np": np, "pd": pd}, loc_vars)
f = loc_vars["f"]
g = bodo.jit(distributed=False)(f)
A = pd.Series([1.1])
print(g(A))
ss = list(list(g.get_metadata().values())[0]['pipeline_times'].values())[0]
t = 0
for c, v in sorted(ss.items(), key=lambda a: a[1].run, reverse=True):
print(c, v.run)
t += v.run
print(t) Without this patch it takes about 3.5s for me (MacBook Pro 2019) but with this patch it takes 3s. |
I think it'd be good to come up with a simple reproducer that is stand alone and uses just Numba, it'll be easier to validate and debug. |
I've tried this: from numba import njit
from numba.extending import overload
import numpy as np
_GLOBAL = 1234
def _gen_involved():
_FREEVAR = 0xCAFE
def foo(a, b, c=12, d=1j, e=None):
f = a + b
a += _FREEVAR
g = np.zeros(c, dtype=np.complex64)
h = f + g
i = 1j / d
# For SSA, zero init, n and t
n = 0
t = 0
if np.abs(i) > 0:
k = h / i
l = np.arange(1, c + 1)
m = np.sqrt(l - g) + e * k
if np.abs(m[0]) < 1:
for o in range(a):
n += 0
if np.abs(n) < 3:
break
n += m[2]
p = g / l
q = []
for r in range(len(p)):
q.append(p[r])
if r > 4 + 1:
s = 123
t = 5
if s > 122 - c:
t += s
t += q[0] + _GLOBAL
return f + o + r + t + r + a + n
return foo
def fortran(a, b, c=12, d=1j, e=None):
pass
impl = _gen_involved()
overload(fortran, inline='always')(lambda a, b, c=12, d=1j, e=None: impl)
tmp = "def call_many_involved():\n\ta = 0\n"
buf = ['\ta += fortran(1, 1, 1, 1, 1)' for _ in range(10)]
tmp += '\n'.join(buf)
tmp += '\n\treturn a'
l = {}
exec(tmp, {'np': np, 'fortran': fortran}, l)
fn = njit(l['call_many_involved'])
fn()
ss = list(list(fn.get_metadata().values())[0]['pipeline_times'].values())[0]
t = 0
for c, v in sorted(ss.items(), key=lambda a: a[1].run, reverse=True):
print(c, v.run)
t += v.run
print(t) but it seems to not be demonstrating the effect described. |
This pull request is marked as stale as it has had no activity in the past 3 months. Please respond to this comment if you're still interested in working on this. Many thanks! |
We use
inline="always"
for most of our overloads, and found that implementations being compiled over and over again during typing is a major compilation time bottleneck. Avoiding recompilation by this PR saves around 40% of total compilation time in my testing.