Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new CUDA kernel for CuMatrixBase<Real>::FindRowMaxId; #780

Merged
merged 4 commits into from
May 18, 2016

Conversation

kangshiyin
Copy link
Contributor

Old:
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 3.99218 gigaflops.
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 3.46283 gigaflops.

New:
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 66.2965 gigaflops.
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 58.442 gigaflops.

@danpovey
Copy link
Contributor

Thanks!
It looks plausible but I'll hold off merging while I give @vesis84 a chance to say something.
Dan

value[threadIdx.x] = mat[i+j*d.stride];
index[threadIdx.x] = threadIdx.x;
__syncthreads();
for (int32_cuda j = tid; j < d.cols; j += CU1DBLOCK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could put a comment:
// Loop over blocks (coalesced access to memory),

@KarelVesely84
Copy link
Contributor

Hi, thank you for the code! It's very nice, the loop over blocks with coalesced access to the memory did a very good job. The code looks very good, maybe few explanatory comments can be added for people who are might not be familiar with CUDA tricks can learn from them (already mentioned above).
Thanks!
K.

@kangshiyin
Copy link
Contributor Author

Comments added. Very glad I can help.

@danpovey
Copy link
Contributor

OK- Karel, merge it if you think it's ready.

On Mon, May 16, 2016 at 10:31 AM, Shiyin Kang notifications@github.com
wrote:

Comments added. Very glad I can help.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#780 (comment)

@danpovey
Copy link
Contributor

@kangshiyin, would you mind rebasing this against 'master'? I don't like to have merges in feature branches.

@danpovey
Copy link
Contributor

... command should be just
git checkout faster-FindRowMaxId
git rebase master

[or git rebase upstream/master or whatever you call it.]

@kangshiyin
Copy link
Contributor Author

Looks right ? There's an extra commit 'f8af246' after rebasing and pushing to my origin/faster-FindRowMaxId. Should I use a new branch name and a new pull request after rebase?

@kangshiyin
Copy link
Contributor Author

Also found a bug on timing. The improvement is not that big.

New:
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 14.6525 gigaflops.
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId, for dim = 1024, speed was 12.0564 gigaflops.

@danpovey
Copy link
Contributor

It looks to me like instead of rebasing and pushing (which would have required --force) you probably rebased, merged with what you previously committed, and pushed. You should be able to fix it by rebasing again and then pushing with --force.

sykang@sepc83 and others added 4 commits May 18, 2016 09:57
Old:
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<float>, for dim = 1024, speed was 3.99218 gigaflops.
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<double>, for dim = 1024, speed was 3.46283 gigaflops.

New:
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<float>, for dim = 1024, speed was 66.2965 gigaflops.
LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<double>, for dim = 1024, speed was 58.442 gigaflops.
@kangshiyin
Copy link
Contributor Author

"--force" works. Thx.

@KarelVesely84
Copy link
Contributor

I'd like to merge it in. But, it says that the branch is out-of-date and the 'automatic merge' button is not available. So it cannot be synchronized through the web-app (maybe a side effect of 'rebase'?).
There are instructions for merging by command line... Dan have you ever done that?

@danpovey danpovey merged commit 26ef88f into kaldi-asr:master May 18, 2016
@danpovey
Copy link
Contributor

Just merged.
I think you have to press the 'merge' button and then checkbox appears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants