Would you share the performance data? #2

oscarriddle · 2019-06-06T08:48:23Z

I tried to realize the matrix multiplication over batched 2D input matrix A with the same 2D matrix B.

By simply expand the gridDims.z, which seems the same way like you did, I find the computation become quite slow.

salehjg · 2019-06-07T19:17:04Z

Hi :D

I'm assuming you have [matrix A of batch size b] and [matrix B of batch size b].
how exactly can I help?
For the default matrix sizes, this is the nvvp timeline
batch-matmul-cuda(gtx1070).nvvp.zip

Just consider that this is not the fastest possible kernel for batch tiled-matrix-multiplication.

salehjg · 2019-06-07T19:22:16Z

btw, it might be the tile size or the block size that hinders fast multiplication.
It should be easier to track down performance issues for a given kernel and input configuration.

oscarriddle · 2019-06-12T03:22:23Z

Hi, thanks for your reply.

I'm trying to design a fast batch tiled-mm kernel. In my circumstance, the batch means that multiple matrix A multiply with a single matrix B. Not multi-A multiply correspondingly with multi-B. I will take a deeper look into your idea :)

Thanks,

Here is my idea:

First, complete the tiled MM by high efficient share memory size, then start the kernel with:

gridDim(DIVUP(CC, Tile_size), DIVUP(CH*CW, Tile_size), 1), 
blockDim(Tile_size, Tile_size)

Above method realized a 2D MM with pretty good performance. Then I tried to do the batched As with single B, by utilizing the gridDims.z like below:

gridDim(DIVUP(CC, Tile_size), DIVUP(CH*CW, Tile_size), batch_size), 
blockDim(Tile_size, Tile_size)

The performance becomes not that good, so I wondered whether there exist some good ideas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would you share the performance data? #2

Would you share the performance data? #2

oscarriddle commented Jun 6, 2019

salehjg commented Jun 7, 2019 •

edited

Loading

salehjg commented Jun 7, 2019 •

edited

Loading

oscarriddle commented Jun 12, 2019 •

edited

Loading

Would you share the performance data? #2

Would you share the performance data? #2

Comments

oscarriddle commented Jun 6, 2019

salehjg commented Jun 7, 2019 • edited Loading

salehjg commented Jun 7, 2019 • edited Loading

oscarriddle commented Jun 12, 2019 • edited Loading

salehjg commented Jun 7, 2019 •

edited

Loading

salehjg commented Jun 7, 2019 •

edited

Loading

oscarriddle commented Jun 12, 2019 •

edited

Loading