Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strange behavior of @guvectorize #5948

Closed
pyrot23 opened this issue Jul 2, 2020 · 5 comments
Closed

strange behavior of @guvectorize #5948

pyrot23 opened this issue Jul 2, 2020 · 5 comments
Labels
no action required No action was needed to resolve.

Comments

@pyrot23
Copy link

pyrot23 commented Jul 2, 2020

Reporting a bug

I am not sure where the bug is, but as the following example shows, there is very strange behavior

import numpy as np
import numba as nb
from numba import  guvectorize, float32 


# a simple matrix-vector multiplication
@guvectorize([(float32[:, :], float32[:], float32[:])], '(n,m),(m)->(n)', nopython=True)  #
def g(x, y, res):
    n, m = x.shape
    for ii in range(n):
        for jj in range(m):
            res[ii] += x[ii, jj] * y[jj]


if __name__ == "__main__":
    # 10 by 10 matrix times 10 by 1 vector, all entries are 1, should give a vector of 10s
    sz = 10
    x = np.ones((sz, sz), dtype=np.float32)
    y = np.ones(sz, dtype=np.float32)
    res = np.zeros(sz, dtype=np.float32)
    # ------------------------------------------------
    print("Before calculation :")
    print("# -----------------------------------")
    print("x = ")
    print(x)
    print("y = ")
    print(y)
    # print("res = ")
    # print(res)  # <<<  I have to print out the res first, otherwise the result is wrong !!!
    # ------------------------------------------------
    res = g(x, y)
    print("# -----------------------------------")
    print("After calculation :")
    print("# -----------------------------------")
    print("res = ")
    print(res)

output when res is NOT printed out, results are 1 greater than they should be (see below last row of the output)

Before calculation :
# -----------------------------------
x = 
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
y = 
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
# -----------------------------------
After calculation :
# -----------------------------------
res = 
[11. 11. 11. 11. 11. 11. 11. 11. 11. 11.]

output when res is printed out, everything is right

Before calculation :
# -----------------------------------
x = 
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
y = 
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
res = 
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
# -----------------------------------
After calculation :
# -----------------------------------
res = 
[10. 10. 10. 10. 10. 10. 10. 10. 10. 10.]
@esc
Copy link
Member

esc commented Jul 2, 2020

@pyrot23 thanks for reporting this, I can reproduce! It seems somewhat odd, indeed.

@esc esc added the needtriage label Jul 2, 2020
@esc
Copy link
Member

esc commented Jul 3, 2020

I took another look at this today, and I believe the issue is caused by assigning res = (x, y). I checked the documentation and it states that:

Contrary to vectorize() functions, guvectorize() functions don’t return their result value: they take it as an array argument, 

https://numba.pydata.org/numba-doc/latest/user/vectorize.html#the-guvectorize-decorator

If I remove the assignment I see:

After calculation :
# -----------------------------------
res =
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

In both cases. @pyrot23 can you try that and report back if you see the same?

@pyrot23
Copy link
Author

pyrot23 commented Jul 6, 2020

but the result should be a list of 10, not 0.

I think the document states that function decorated with @guvectorize doesn't use return statement, instead one should pass a pointer into the function to get the return value.

Let's forget about the variable res for a moment, I tested print out the results directly as

print( g(x,y) )

so no res or any assignment is involved in this case, the output is

[11. 11. 11. 11. 11. 11. 11. 11. 11. 11.]

I only got the correct result, a list of 10, by pre-creating a res variable, and print it out.

@esc
Copy link
Member

esc commented Jul 6, 2020

@pyrot23 thanks for reporting back. I can confirm what you observe. While I am not yet 100% certain as to the root cause of the issue, I am tending towards looking at how the res array is initialized. Looking at the documentation (https://numba.pydata.org/numba-doc/latest/user/vectorize.html#the-guvectorize-decorator) we see:

@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + y

And then:

>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> g(a, 2)
array([2, 3, 4, 5, 6])

So, in this case, the initialization of the res array is irrelevant because we use assignment in the function.

If we now change that code to:

@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] += x[i] + y  # <--- using the sum assignment

So instead of plain assignment we use the sum assignment operator. If we now run the function and observe the return value, we will receive a different value each time. Here are some samples:

💥 zsh» python foo099.py
[ 4611686018427387906 -4611677244501484145                    7
                  444           4457379328]
💥 zsh» python foo099.py
[-5764607523034234878 -6917520248495808817      140466324045831
                    5      140466344739879]
💥 zsh» python foo099.py
[-2305843009213693950 -2305834232957708325      140420095803399
                    5      140420106056231]

I speculate that the reason this is happening is because res is internally initialized using np.empty which simply allocates an empty array on the heap, where the values are simply the contents of what what previously at that memory location in heap, i.e. the values are initialized with heap trash.

So, what we need to do now, is initialize the res array correctly and pass it in as an argument.:

a = np.arange(5)
r = np.zeros(5, dtype="int64")
g(a, 2, r)

And then we get the correct result again.

How does this relate to your problem and the strange behavior you are seeing? So, first of all, the code is also using the sum-assignment so a correct initialization of the res array will be critical. So, if you change the example to:

import numpy as np
from numba import guvectorize, float32


# a simple matrix-vector multiplication
@guvectorize([(float32[:, :], float32[:], float32[:])], '(n,m),(m)->(n)', nopython=True)  #
def g(x, y, res):
    n, m = x.shape
    for ii in range(n):
        for jj in range(m):
            res[ii] += x[ii, jj] * y[jj]


if __name__ == "__main__":
    # 10 by 10 matrix times 10 by 1 vector, all entries are 1, should give a vector of 10s
    sz = 10
    x = np.ones((sz, sz), dtype=np.float32)
    y = np.ones(sz, dtype=np.float32)
    res = np.zeros(sz, dtype=np.float32)
    # ------------------------------------------------
    print("Before calculation :")
    print("# -----------------------------------")
    print("x = ")
    print(x)
    print("y = ")
    print(y)
    print("res = ")
    print(res)  # <<<  I have to print out the res first, otherwise the result is wrong !!!
    # ------------------------------------------------
    g(x, y, res)
    print("# -----------------------------------")
    print("After calculation :")
    print("# -----------------------------------")
    print("res = ")
    print(res)

It prints the following, correct, result:

Before calculation :
# -----------------------------------
x =
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
y =
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
res =
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
# -----------------------------------
After calculation :
# -----------------------------------
res =
[10. 10. 10. 10. 10. 10. 10. 10. 10. 10.]

The question remains, why you are seeing an array of 11s instead of garbled heap-trash values? The answer is that this is pure luck! By printing the values beforehand, this is creating sort of a "side-effect": the values in the region of the heap that the res array is initialized from (within the @guvectorize decorated function) "magically" contains the correct values, in this case a series of 0s and so the output appears to be correct. Note however, this works by coincidence and not by design and should never be relied upon and the correct way to implement this is to correctly initialize the res array and pass it in as an argument.

@esc esc added no action required No action was needed to resolve. and removed needtriage labels Jul 6, 2020
@pyrot23
Copy link
Author

pyrot23 commented Jul 7, 2020

thanks for the extensive explanation.

I think that is just my misunderstanding. the result argument is not a reference/pointer, but rather a declaration.

@pyrot23 pyrot23 closed this as completed Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no action required No action was needed to resolve.
Projects
None yet
Development

No branches or pull requests

2 participants