Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fill 2darray or recarray by fields is 30 times faster than by row. #9547

Open
yxdragon opened this issue Apr 29, 2024 · 10 comments
Open

fill 2darray or recarray by fields is 30 times faster than by row. #9547

yxdragon opened this issue Apr 29, 2024 · 10 comments

Comments

@yxdragon
Copy link

import numba as nb
import numpy as np
from time import time

t_point = np.dtype([('x', np.float32), ('y', np.float32)])

points = np.zeros(2, dtype=t_point)

@nb.njit
def fill_by_row(points):
    first = points[0]
    for i in range(102400000):
        points[i%2] = first

@nb.njit
def fill_by_attr(points):
    first = points[0]
    for i in range(102400000):
        points[i%2].x = first.x
        points[i%2].y = first.y
        
fill_by_row(points)
start = time()
fill_by_row(points)
print(time()-start)

fill_by_attr(points)
start = time()
fill_by_attr(points)
print(time()-start)

fill by row cost: 0.07s
fill by x, y cost 0.0025s

what 's the matter? Is there some way to improve the performance of fill by row?

@yxdragon
Copy link
Author

I found an intresting way:

  1. generate string code in python
  2. use exec to inject to locals
  3. use jit function to compile it
def build_func(dtype):
    namepair = [(i, i) for i in dtype.names]
    names = ['points[i%%2].%s = first.%s' % i for i in namepair]
    local = {}
    func = '''
    def fill_by_attr(points):
        first = points[0]
        for i in range(102400000):
            %s
    '''%('\n'+' '*12).join(names)
    print(func)
    exec(func.replace('\n    ', '\n'), local)
    return nb.njit(local['fill_by_attr'])

fill_by_attr = build_func(t_point)

fill_by_attr(points)
start = time()
fill_by_attr(points)
print(time()-start)

it could be written like a decorator,to generate some customized function from Template.

@esc esc added the needtriage label Apr 29, 2024
@esc
Copy link
Member

esc commented Apr 29, 2024

@yxdragon thank you for reporting this, I seem to get different results here, where fill-by-row is faster? Or did I miss something?

 💣 zsh» python issue_9547.py
0.14577174186706543
0.34764719009399414

@yxdragon
Copy link
Author

numba.version is 0.59.0, Windows 11.

@esc
Copy link
Member

esc commented Apr 29, 2024

@yxdragon do you have access to try on a different system, like Linux or OSX?

@yxdragon
Copy link
Author

I try it on mac os
fill by row: 0.179s
fill by x, y : 0.014s

@gmarkall
Copy link
Member

If I change the benchmark so it does more work (not just operating on the same element all the time):

import numba as nb
import numpy as np
from time import time

t_point = np.dtype([('x', np.float32), ('y', np.float32)])

N = 102400000

points = np.zeros(N, dtype=t_point)

@nb.njit
def fill_by_row(points):
    first = points[0]
    for i in range(N):
        points[i] = first

@nb.njit
def fill_by_attr(points):
    first = points[0]
    for i in range(N):
        points[i].x = first.x
        points[i].y = first.y
        
fill_by_row(points)
start = time()
fill_by_row(points)
print(time()-start)

fill_by_attr(points)
start = time()
fill_by_attr(points)
print(time()-start)
$ python repro.py 
0.05387234687805176
0.05723166465759277

Do you get similar performance with this example?

@yxdragon
Copy link
Author

@gmarkall yes, I get similar performance. So that's because the cpu cache works when operating the same block?

@gmarkall
Copy link
Member

My intuition here is that your original microbenchmark is doing too little to permit measurement of the execution speed of the workload, and other confounding factors will make up the majority of the measured time instead.

@yxdragon
Copy link
Author

Thanks, another question:
how to fill one row in jit function:

@nb.njit
def fill_by_row(points):
    points[0] = np.void((1,1), dtype=t_point)

I try (x, y), np.array((x,y)), np.void((x,y)), all not works.
And I found another issue, fill row with tuple.
#9476.

@yxdragon
Copy link
Author

My intuition here is that your original microbenchmark is doing too little to permit measurement of the execution speed of the workload, and other confounding factors will make up the majority of the measured time instead.

I also find cur.x, cur.y = first.x, first.y is as fast as points[i] = first.

@nb.njit
def fill_by_attr(points):
first = points[0]
for i in range(N):
cur = points[i]
cur.x, cur.y = first.x, first.y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants