New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fill 2darray or recarray by fields is 30 times faster than by row. #9547
Comments
I found an intresting way:
def build_func(dtype):
namepair = [(i, i) for i in dtype.names]
names = ['points[i%%2].%s = first.%s' % i for i in namepair]
local = {}
func = '''
def fill_by_attr(points):
first = points[0]
for i in range(102400000):
%s
'''%('\n'+' '*12).join(names)
print(func)
exec(func.replace('\n ', '\n'), local)
return nb.njit(local['fill_by_attr'])
fill_by_attr = build_func(t_point)
fill_by_attr(points)
start = time()
fill_by_attr(points)
print(time()-start) it could be written like a decorator,to generate some customized function from Template. |
@yxdragon thank you for reporting this, I seem to get different results here, where fill-by-row is faster? Or did I miss something?
|
numba.version is 0.59.0, Windows 11. |
@yxdragon do you have access to try on a different system, like Linux or OSX? |
I try it on mac os |
If I change the benchmark so it does more work (not just operating on the same element all the time): import numba as nb
import numpy as np
from time import time
t_point = np.dtype([('x', np.float32), ('y', np.float32)])
N = 102400000
points = np.zeros(N, dtype=t_point)
@nb.njit
def fill_by_row(points):
first = points[0]
for i in range(N):
points[i] = first
@nb.njit
def fill_by_attr(points):
first = points[0]
for i in range(N):
points[i].x = first.x
points[i].y = first.y
fill_by_row(points)
start = time()
fill_by_row(points)
print(time()-start)
fill_by_attr(points)
start = time()
fill_by_attr(points)
print(time()-start)
Do you get similar performance with this example? |
@gmarkall yes, I get similar performance. So that's because the cpu cache works when operating the same block? |
My intuition here is that your original microbenchmark is doing too little to permit measurement of the execution speed of the workload, and other confounding factors will make up the majority of the measured time instead. |
Thanks, another question: @nb.njit
def fill_by_row(points):
points[0] = np.void((1,1), dtype=t_point) I try (x, y), np.array((x,y)), np.void((x,y)), all not works. |
I also find cur.x, cur.y = first.x, first.y is as fast as points[i] = first. @nb.njit |
fill by row cost: 0.07s
fill by x, y cost 0.0025s
what 's the matter? Is there some way to improve the performance of fill by row?
The text was updated successfully, but these errors were encountered: