-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: core: in dot(), make copies if out has memory overlap with input #8539
Conversation
BLAS does not allow aliased inputs. If user-provided out= argument may overlap in memory with one of the inputs to dot(), put the output to a temporary work array and copy back after the operation.
else { | ||
Py_INCREF(out); | ||
ret = out; | ||
} | ||
Py_INCREF(out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Looks like it has already been done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, a note would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the variable naming, it should be more clear what the difference between ret and result is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charris: yes it's needed, since both variables are owned references.
@juliantaylor: agreed, would be best to change.
@@ -742,11 +767,13 @@ cblas_matrixproduct(int typenum, PyArrayObject *ap1, PyArrayObject *ap2, | |||
|
|||
Py_DECREF(ap1); | |||
Py_DECREF(ap2); | |||
return PyArray_Return(ret); | |||
Py_DECREF(ret); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might note that this triggers copyback if needed.
def test_dot_out_mem_overlap(self): | ||
np.random.seed(1) | ||
|
||
for dtype in [np.object_, np.float32, np.complex128, np.int64]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also test float64
and complex64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need multiple dtype tests at all? none of the new code has any relations to type. Doesn't matter much here as its a fast test, but I like to avoid unnecessary test bloat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two code paths --- blas and non-blas, so you need at least two dtypes.
x = np.dot(a, b, out=b) | ||
assert_equal(x, y, err_msg=repr(dtype)) | ||
# Test BLAS and non-BLAS code paths | ||
for dtype in [np.float32, np.int64]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I prefer the maximal tests because the code may change in the future. The tests not only check current behavior, they also define correct behavior for the future.
Thanks Pauli. |
Thank you @pv |
BLAS does not allow aliased inputs. If user-provided out= argument may
overlap in memory with one of the inputs to dot(), put the output to a
temporary work array and copy back after the operation.
Fixes gh-8440