-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A new CUDA kernel for CuMatrixBase<Real>::FindRowMaxId; #780
Conversation
Thanks! |
value[threadIdx.x] = mat[i+j*d.stride]; | ||
index[threadIdx.x] = threadIdx.x; | ||
__syncthreads(); | ||
for (int32_cuda j = tid; j < d.cols; j += CU1DBLOCK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we could put a comment:
// Loop over blocks (coalesced access to memory),
Hi, thank you for the code! It's very nice, the loop over blocks with coalesced access to the memory did a very good job. The code looks very good, maybe few explanatory comments can be added for people who are might not be familiar with CUDA tricks can learn from them (already mentioned above). |
Comments added. Very glad I can help. |
OK- Karel, merge it if you think it's ready. On Mon, May 16, 2016 at 10:31 AM, Shiyin Kang notifications@github.com
|
@kangshiyin, would you mind rebasing this against 'master'? I don't like to have merges in feature branches. |
... command should be just [or git rebase upstream/master or whatever you call it.] |
Looks right ? There's an extra commit 'f8af246' after rebasing and pushing to my |
Also found a bug on timing. The improvement is not that big. New: |
It looks to me like instead of rebasing and pushing (which would have required --force) you probably rebased, merged with what you previously committed, and pushed. You should be able to fix it by rebasing again and then pushing with --force. |
Old: LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<float>, for dim = 1024, speed was 3.99218 gigaflops. LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<double>, for dim = 1024, speed was 3.46283 gigaflops. New: LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<float>, for dim = 1024, speed was 66.2965 gigaflops. LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<double>, for dim = 1024, speed was 58.442 gigaflops.
f8af246
to
24b886a
Compare
"--force" works. Thx. |
I'd like to merge it in. But, it says that the branch is out-of-date and the 'automatic merge' button is not available. So it cannot be synchronized through the web-app (maybe a side effect of 'rebase'?). |
Just merged. |
Old:
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 3.99218 gigaflops.
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 3.46283 gigaflops.
New:
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 66.2965 gigaflops.
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 58.442 gigaflops.