Skip to content

Commit

Permalink
make CudaNdarray_sger work with neg strides.
Browse files Browse the repository at this point in the history
  • Loading branch information
nouiz committed Jan 23, 2012
1 parent e12965a commit 6174405
Showing 1 changed file with 17 additions and 8 deletions.
25 changes: 17 additions & 8 deletions theano/sandbox/cuda/cuda_ndarray.cu
Original file line number Diff line number Diff line change
Expand Up @@ -2999,29 +2999,38 @@ int CudaNdarray_sger(float alpha, CudaNdarray * x, CudaNdarray * y, CudaNdarray
PyErr_SetString(PyExc_NotImplementedError, "non-c continugous A in sger");
return -1;
}

// Same for this, be safe
assert (CudaNdarray_HOST_STRIDES(x)[0] >= 0);
assert (CudaNdarray_HOST_STRIDES(y)[0] >= 0);

// Since Sger expects A in col-major, we invert x and y to fake this.
int x_strides = CudaNdarray_HOST_STRIDES(x)[0];
CudaNdarray * x_ = x;
if(x_strides == 0){
assert(CudaNdarray_HOST_DIMS(x)[0] == 1);
x_strides = 4;
} else if(x_strides < 0){
x_ = (CudaNdarray*)CudaNdarray_Copy(x);
x_strides = CudaNdarray_HOST_STRIDES(x_)[0];
}

int y_strides = CudaNdarray_HOST_STRIDES(y)[0];
CudaNdarray * y_ = y;
if(y_strides == 0){
assert(CudaNdarray_HOST_DIMS(y)[0] == 1);
y_strides = 4;
} else if(y_strides < 0){
y_ = (CudaNdarray*)CudaNdarray_Copy(y);
y_strides = CudaNdarray_HOST_STRIDES(y_)[0];
}

if(CudaNdarray_SIZE(A))
if(CudaNdarray_SIZE(A)){
cublasSger(CudaNdarray_HOST_DIMS(y)[0], CudaNdarray_HOST_DIMS(x)[0], alpha,
CudaNdarray_DEV_DATA(y), y_strides,
CudaNdarray_DEV_DATA(x), x_strides,
CudaNdarray_DEV_DATA(y_), y_strides,
CudaNdarray_DEV_DATA(x_), x_strides,
CudaNdarray_DEV_DATA(A), CudaNdarray_HOST_DIMS(A)[1]);
}
CNDA_THREAD_SYNC;
if(x_ != x)
Py_DECREF(x_);
if(y_ != y)
Py_DECREF(y_);

cudaError_t err = cudaGetLastError();
if (CUBLAS_STATUS_SUCCESS != err)
Expand Down

2 comments on commit 6174405

@jaberg
Copy link

@jaberg jaberg commented on 6174405 Jan 25, 2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - did this actually come up in usage? How are negative strides handled on CPU?

Anyway, this brings to mind a special case that if both x.stride < 0 and A.stride[0] < 0 then no copy is needed because you can do the whole dot-product backward. Same thing for y.stride and A.stride[1]. This same thinking goes for the CPU implementation as well. But it's a tricky thing to write and test (would take an hour or two at least) so how about adding this text as a comment, or making a ticket for it. Someone trying to optimize his backward GER / GEMM may some day implement it.

@nouiz
Copy link
Owner Author

@nouiz nouiz commented on 6174405 Jan 25, 2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happened in already existing tests for outer on the gpu. It was testing it like this
outer(a[::-1], b)
outer(a, b[::-1])

I don't know what is done on the cpu. I made a ticket for 0.5.1 release.

Please sign in to comment.