Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depthwise Conv1d performance (a naive CUDA kernel is 10x faster) #75747

Open
BlinkDL opened this issue Apr 13, 2022 · 4 comments
Open

Depthwise Conv1d performance (a naive CUDA kernel is 10x faster) #75747

BlinkDL opened this issue Apr 13, 2022 · 4 comments
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@BlinkDL
Copy link

BlinkDL commented Apr 13, 2022

馃殌 The feature, motivation and pitch

Please improve the CUDA performance of Depthwise Conv1d :)

FYI, I write a naive CUDA kernel and it's already 10x faster than pytorch:

https://github.com/BlinkDL/RWKV-CUDA

RTX3090:
pytorch = fwd 14ms bwd 65ms
CUDA kernel v3 = fwd 0.8ms bwd 5.5ms

Alternatives

No response

Additional context

No response

cc @ngimel

@gchanan gchanan added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 14, 2022
@ipostr08
Copy link

ipostr08 commented Apr 18, 2022

Yes, to be more general, group convolutions are very slow. Nothing has been done to fix it despite many people asking for a fix over the years. E.g. #73764, #18631, #70954, https://discuss.pytorch.org/t/group-convolution-takes-much-longer-than-normal-convolution/92214, https://twitter.com/wightmanr/status/1486146507132661760.

@songyuc
Copy link

songyuc commented Sep 19, 2022

馃殌 The feature, motivation and pitch

Please improve the CUDA performance of Depthwise Conv1d :)

FYI, I write a naive CUDA kernel and it's already 10x faster than pytorch:

https://github.com/BlinkDL/RWKV-CUDA

RTX3090:

pytorch = fwd 14ms bwd 65ms

CUDA kernel v3 = fwd 0.8ms bwd 5.5ms

Alternatives

No response

Additional context

No response

cc @ngimel

Would you consider making a PR about it?馃榾

@JulienSiems
Copy link

@BlinkDL Same here! Would be very interested in a PR and ready to help!

@BlinkDL
Copy link
Author

BlinkDL commented Oct 12, 2022

@BlinkDL Same here! Would be very interested in a PR and ready to help!

Hi you can work on my code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants