fill 2darray or recarray by fields is 30 times faster than by row. #9547

yxdragon · 2024-04-29T04:00:00Z

import numba as nb
import numpy as np
from time import time

t_point = np.dtype([('x', np.float32), ('y', np.float32)])

points = np.zeros(2, dtype=t_point)

@nb.njit
def fill_by_row(points):
    first = points[0]
    for i in range(102400000):
        points[i%2] = first

@nb.njit
def fill_by_attr(points):
    first = points[0]
    for i in range(102400000):
        points[i%2].x = first.x
        points[i%2].y = first.y
        
fill_by_row(points)
start = time()
fill_by_row(points)
print(time()-start)

fill_by_attr(points)
start = time()
fill_by_attr(points)
print(time()-start)

fill by row cost: 0.07s
fill by x, y cost 0.0025s

what 's the matter? Is there some way to improve the performance of fill by row?

yxdragon · 2024-04-29T04:15:45Z

I found an intresting way:

generate string code in python
use exec to inject to locals
use jit function to compile it

def build_func(dtype):
    namepair = [(i, i) for i in dtype.names]
    names = ['points[i%%2].%s = first.%s' % i for i in namepair]
    local = {}
    func = '''
    def fill_by_attr(points):
        first = points[0]
        for i in range(102400000):
            %s
    '''%('\n'+' '*12).join(names)
    print(func)
    exec(func.replace('\n    ', '\n'), local)
    return nb.njit(local['fill_by_attr'])

fill_by_attr = build_func(t_point)

fill_by_attr(points)
start = time()
fill_by_attr(points)
print(time()-start)

it could be written like a decorator，to generate some customized function from Template.

esc · 2024-04-29T09:37:54Z

@yxdragon thank you for reporting this, I seem to get different results here, where fill-by-row is faster? Or did I miss something?

 💣 zsh» python issue_9547.py
0.14577174186706543
0.34764719009399414

yxdragon · 2024-04-29T15:29:26Z

numba.version is 0.59.0, Windows 11.

esc · 2024-04-29T20:39:52Z

@yxdragon do you have access to try on a different system, like Linux or OSX?

yxdragon · 2024-04-30T09:11:25Z

I try it on mac os
fill by row: 0.179s
fill by x, y : 0.014s

gmarkall · 2024-04-30T14:36:27Z

If I change the benchmark so it does more work (not just operating on the same element all the time):

import numba as nb
import numpy as np
from time import time

t_point = np.dtype([('x', np.float32), ('y', np.float32)])

N = 102400000

points = np.zeros(N, dtype=t_point)

@nb.njit
def fill_by_row(points):
    first = points[0]
    for i in range(N):
        points[i] = first

@nb.njit
def fill_by_attr(points):
    first = points[0]
    for i in range(N):
        points[i].x = first.x
        points[i].y = first.y
        
fill_by_row(points)
start = time()
fill_by_row(points)
print(time()-start)

fill_by_attr(points)
start = time()
fill_by_attr(points)
print(time()-start)

$ python repro.py 
0.05387234687805176
0.05723166465759277

Do you get similar performance with this example?

yxdragon · 2024-04-30T15:40:18Z

@gmarkall yes, I get similar performance. So that's because the cpu cache works when operating the same block?

gmarkall · 2024-04-30T15:45:29Z

My intuition here is that your original microbenchmark is doing too little to permit measurement of the execution speed of the workload, and other confounding factors will make up the majority of the measured time instead.

yxdragon · 2024-04-30T16:04:17Z

Thanks, another question:
how to fill one row in jit function:

@nb.njit
def fill_by_row(points):
    points[0] = np.void((1,1), dtype=t_point)

I try (x, y), np.array((x,y)), np.void((x,y)), all not works.
And I found another issue, fill row with tuple.
#9476.

yxdragon · 2024-04-30T16:10:37Z

My intuition here is that your original microbenchmark is doing too little to permit measurement of the execution speed of the workload, and other confounding factors will make up the majority of the measured time instead.

I also find cur.x, cur.y = first.x, first.y is as fast as points[i] = first.

@nb.njit
def fill_by_attr(points):
first = points[0]
for i in range(N):
cur = points[i]
cur.x, cur.y = first.x, first.y

esc added the needtriage label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fill 2darray or recarray by fields is 30 times faster than by row. #9547

fill 2darray or recarray by fields is 30 times faster than by row. #9547

yxdragon commented Apr 29, 2024

yxdragon commented Apr 29, 2024

esc commented Apr 29, 2024

yxdragon commented Apr 29, 2024

esc commented Apr 29, 2024

yxdragon commented Apr 30, 2024

gmarkall commented Apr 30, 2024

yxdragon commented Apr 30, 2024

gmarkall commented Apr 30, 2024

yxdragon commented Apr 30, 2024

yxdragon commented Apr 30, 2024

fill 2darray or recarray by fields is 30 times faster than by row. #9547

fill 2darray or recarray by fields is 30 times faster than by row. #9547

Comments

yxdragon commented Apr 29, 2024

yxdragon commented Apr 29, 2024

esc commented Apr 29, 2024

yxdragon commented Apr 29, 2024

esc commented Apr 29, 2024

yxdragon commented Apr 30, 2024

gmarkall commented Apr 30, 2024

yxdragon commented Apr 30, 2024

gmarkall commented Apr 30, 2024

yxdragon commented Apr 30, 2024

yxdragon commented Apr 30, 2024