strange behavior of @guvectorize #5948

pyrot23 · 2020-07-02T02:36:56Z

Reporting a bug

[ x] I have tried using the latest released version of Numba (most recent is
visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
[x ] I have included below a minimal working reproducer (if you are unsure how
to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).

I am not sure where the bug is, but as the following example shows, there is very strange behavior

import numpy as np
import numba as nb
from numba import  guvectorize, float32 


# a simple matrix-vector multiplication
@guvectorize([(float32[:, :], float32[:], float32[:])], '(n,m),(m)->(n)', nopython=True)  #
def g(x, y, res):
    n, m = x.shape
    for ii in range(n):
        for jj in range(m):
            res[ii] += x[ii, jj] * y[jj]


if __name__ == "__main__":
    # 10 by 10 matrix times 10 by 1 vector, all entries are 1, should give a vector of 10s
    sz = 10
    x = np.ones((sz, sz), dtype=np.float32)
    y = np.ones(sz, dtype=np.float32)
    res = np.zeros(sz, dtype=np.float32)
    # ------------------------------------------------
    print("Before calculation :")
    print("# -----------------------------------")
    print("x = ")
    print(x)
    print("y = ")
    print(y)
    # print("res = ")
    # print(res)  # <<<  I have to print out the res first, otherwise the result is wrong !!!
    # ------------------------------------------------
    res = g(x, y)
    print("# -----------------------------------")
    print("After calculation :")
    print("# -----------------------------------")
    print("res = ")
    print(res)

output when res is NOT printed out, results are 1 greater than they should be (see below last row of the output)

Before calculation :
# -----------------------------------
x = 
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
y = 
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
# -----------------------------------
After calculation :
# -----------------------------------
res = 
[11. 11. 11. 11. 11. 11. 11. 11. 11. 11.]

output when res is printed out, everything is right

Before calculation :
# -----------------------------------
x = 
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
y = 
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
res = 
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
# -----------------------------------
After calculation :
# -----------------------------------
res = 
[10. 10. 10. 10. 10. 10. 10. 10. 10. 10.]

The text was updated successfully, but these errors were encountered:

esc · 2020-07-02T09:06:46Z

@pyrot23 thanks for reporting this, I can reproduce! It seems somewhat odd, indeed.

esc · 2020-07-03T12:04:51Z

I took another look at this today, and I believe the issue is caused by assigning res = (x, y). I checked the documentation and it states that:

Contrary to vectorize() functions, guvectorize() functions don’t return their result value: they take it as an array argument,

https://numba.pydata.org/numba-doc/latest/user/vectorize.html#the-guvectorize-decorator

If I remove the assignment I see:

After calculation :
# -----------------------------------
res =
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

In both cases. @pyrot23 can you try that and report back if you see the same?

pyrot23 · 2020-07-06T05:04:43Z

but the result should be a list of 10, not 0.

I think the document states that function decorated with @guvectorize doesn't use return statement, instead one should pass a pointer into the function to get the return value.

Let's forget about the variable res for a moment, I tested print out the results directly as

print( g(x,y) )

so no res or any assignment is involved in this case, the output is

[11. 11. 11. 11. 11. 11. 11. 11. 11. 11.]

I only got the correct result, a list of 10, by pre-creating a res variable, and print it out.

esc · 2020-07-06T08:00:44Z

@pyrot23 thanks for reporting back. I can confirm what you observe. While I am not yet 100% certain as to the root cause of the issue, I am tending towards looking at how the res array is initialized. Looking at the documentation (https://numba.pydata.org/numba-doc/latest/user/vectorize.html#the-guvectorize-decorator) we see:

@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + y

And then:

>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> g(a, 2)
array([2, 3, 4, 5, 6])

So, in this case, the initialization of the res array is irrelevant because we use assignment in the function.

If we now change that code to:

@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] += x[i] + y  # <--- using the sum assignment

So instead of plain assignment we use the sum assignment operator. If we now run the function and observe the return value, we will receive a different value each time. Here are some samples:

💥 zsh» python foo099.py
[ 4611686018427387906 -4611677244501484145                    7
                  444           4457379328]
💥 zsh» python foo099.py
[-5764607523034234878 -6917520248495808817      140466324045831
                    5      140466344739879]
💥 zsh» python foo099.py
[-2305843009213693950 -2305834232957708325      140420095803399
                    5      140420106056231]

I speculate that the reason this is happening is because res is internally initialized using np.empty which simply allocates an empty array on the heap, where the values are simply the contents of what what previously at that memory location in heap, i.e. the values are initialized with heap trash.

So, what we need to do now, is initialize the res array correctly and pass it in as an argument.:

a = np.arange(5)
r = np.zeros(5, dtype="int64")
g(a, 2, r)

And then we get the correct result again.

How does this relate to your problem and the strange behavior you are seeing? So, first of all, the code is also using the sum-assignment so a correct initialization of the res array will be critical. So, if you change the example to:

import numpy as np
from numba import guvectorize, float32


# a simple matrix-vector multiplication
@guvectorize([(float32[:, :], float32[:], float32[:])], '(n,m),(m)->(n)', nopython=True)  #
def g(x, y, res):
    n, m = x.shape
    for ii in range(n):
        for jj in range(m):
            res[ii] += x[ii, jj] * y[jj]


if __name__ == "__main__":
    # 10 by 10 matrix times 10 by 1 vector, all entries are 1, should give a vector of 10s
    sz = 10
    x = np.ones((sz, sz), dtype=np.float32)
    y = np.ones(sz, dtype=np.float32)
    res = np.zeros(sz, dtype=np.float32)
    # ------------------------------------------------
    print("Before calculation :")
    print("# -----------------------------------")
    print("x = ")
    print(x)
    print("y = ")
    print(y)
    print("res = ")
    print(res)  # <<<  I have to print out the res first, otherwise the result is wrong !!!
    # ------------------------------------------------
    g(x, y, res)
    print("# -----------------------------------")
    print("After calculation :")
    print("# -----------------------------------")
    print("res = ")
    print(res)

It prints the following, correct, result:

Before calculation :
# -----------------------------------
x =
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
y =
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
res =
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
# -----------------------------------
After calculation :
# -----------------------------------
res =
[10. 10. 10. 10. 10. 10. 10. 10. 10. 10.]

The question remains, why you are seeing an array of 11s instead of garbled heap-trash values? The answer is that this is pure luck! By printing the values beforehand, this is creating sort of a "side-effect": the values in the region of the heap that the res array is initialized from (within the @guvectorize decorated function) "magically" contains the correct values, in this case a series of 0s and so the output appears to be correct. Note however, this works by coincidence and not by design and should never be relied upon and the correct way to implement this is to correctly initialize the res array and pass it in as an argument.

pyrot23 · 2020-07-07T07:07:39Z

thanks for the extensive explanation.

I think that is just my misunderstanding. the result argument is not a reference/pointer, but rather a declaration.

esc added the needtriage label Jul 2, 2020

esc added no action required No action was needed to resolve. and removed needtriage labels Jul 6, 2020

pyrot23 closed this as completed Jul 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strange behavior of @guvectorize #5948

strange behavior of @guvectorize #5948

pyrot23 commented Jul 2, 2020

esc commented Jul 2, 2020

esc commented Jul 3, 2020

pyrot23 commented Jul 6, 2020

esc commented Jul 6, 2020

pyrot23 commented Jul 7, 2020

strange behavior of @guvectorize #5948

strange behavior of @guvectorize #5948

Comments

pyrot23 commented Jul 2, 2020

Reporting a bug

esc commented Jul 2, 2020

esc commented Jul 3, 2020

pyrot23 commented Jul 6, 2020

esc commented Jul 6, 2020

pyrot23 commented Jul 7, 2020